10,000 Matching Annotations
  1. Oct 2025
    1. AbstractBackground Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.Findings We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed.Conclusions tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf126), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jérôme Salignon

      This manuscript presents tcga-data-nf, a Nextflow-based pipeline for downloading, preprocessing, and analyzing TCGA multi-omic data, with a focus on gene regulatory network (GRN) inference. The workflow integrates established bioinformatics tools (PANDA, DRAGON, and LIONESS) and adheres to best practices for reproducibility through containerization (Docker, Conda, and Nextflow profiles). The authors demonstrate the utility of their pipeline by applying it to colorectal cancer subtypes, identifying potential regulatory interactions in TGF-β signaling. The manuscript is well-written and well-structured and provides sufficient methodological details, as well as Jupyter notebooks, for reproducibility. However, there are some areas that require clarification and improvement for acceptance in GigaScience, particularly regarding the scope of the tool, the quality of the inferred regulatory networks, the case study figure, benchmarking, statistical validation, and parameters.

      Major comments:

      • While the pipeline is well designed and executed, the overall impact of the tool feels somewhat limited, especially for a journal like GigaScience, due to its pretty specific application to building GRNs in TCGAs, the relatively small number of parameters, the support of only 2 omics type, and the lack of novel algorithms. To increase the impact of this tool I would recommend adding functionalities, such as:

      o Supporting additional tools. A great strength of the pipeline is the integration with the Network Zoo (NetZoo) ecosystem. However, only three tools are included from NetZoo. Including additional tools would likely increase the scope of users interested in using the pipeline. In particular, an important weakness of the current pipeline is that it is not possible to conduct differential analysis between different networks, which prevents users from identifying the most significant differences between two networks of interest (e.g., CMS2 vs CMS4). The NetZoo contains different tools to conduct such analyses, such as Alpaca 1 or Crane 2, thus this may be implemented to make the pipeline more useful to a broader user base.

      o Adding parameters. A strength of the pipeline is the ability to customize it using various parameters. However, as such the pipeline does not offer many parameters. It would be beneficial to make the pipeline a bit more customizable. For example, novel parameters could be: adding options for excluding selected samples, using different batch correction methods, different methods to map CpGs to genes, additional normalization methods, and additional quality controls (e.g., PCA for methylation samples, md5sum checks). These are just examples and do not need to be all implemented but adding some extra parameters would help make the pipeline more appealing and customizable to various users.

      • The quality of the inferred regulatory networks is hard to judge. There are no direct comparisons with any other tools.

      o For instance, it is mentioned in the text that GRAND networks were derived using a fixed set of parameters, but it could be helpful to show a direct comparison between GRNs built from your tools with those from GRAND. This could reveal how the ability to customize GRNs using the pipeline's parameters helps in getting better biological insights.

      o Alternatively, or in addition, one could compare how networks built by your method fare in comparison to networks built from other methods, like RegEnrich 3 or NetSeekR 4, in terms of biological insights, accuracy, scalability, speed, functionalities and/or memory usage.

      o Another angle to judge the regulatory networks would be to check in a case study if the predicted gene interactions between disease and control networks are enriched in disease and gene-gene interactions databases, such as DisGeNet 5.

      • Figure 2 needs re-work:

      o Panel A and C: text is too small. "tf" should be written TF. "oi" should have another name. These panels might be moved to the supplements.

      o Panel D is confusing. Without significance it is hard to understand what the point of this panel is. I can see that certain TFs are cited in the main text but without information about significance, these may seem like cherry-picking. The legends states: Annotation of all TFs in cluster D (columns) to the Reactome parent term. "Immune system" and "Cellular respondes to stimuli" are more consistenly involved in cluster D, in comparison to cluster A.. However, this is a key result which should be shown in a main figure, not in Figure S6. I would also recommend using a -log scale when displaying the p-values to highlight the most significant entries.

      o Panel E is quite confusing; first, the color coding is unclear. For instance, what represents blue, purple and red colors? Second, what represents the edges' widths? I would recommend using different shapes for the methylation and expression nodes to reduce the number of colors, and adding a color legend. I would also consider merging the two graphs and representing in color the difference in the edge values so the reader can directly see the key differences.

      • Benchmarking analysis could be included to show the runtime and memory requirement for each pipeline step. It would also be beneficial to analyze a larger dataset than colon cancer to assess the scalability.

      • Statistical analysis: If computationally feasible, permutation testing could be implemented to quantify the robustness of inferred regulatory interactions. Also, in the method section, it should be clarified that FDR correction was applied for pathway enrichment analysis.

      Minor comments:

      • I am not sure why duplicate samples are discarded in the pipeline. Why not add counts for RNA-Seq and averaging beta values? I would expect that to yield more robust results.

      • It is a bit unclear in what context the NetworkDataCompanion tool could be used outside the workflow. It is also unclear how it helps with quality controls. Please clarify these aspects.

      • The manuscript is well-written, but words are sometimes missing or wrongly written, it needs careful re-read.

      • The expression '"same-same"' is unclear to me.

      • In this sentence: "Some of "same-same" genes (STAT5A, CREB3L1"…, I am not sure in which table or figure I can find this result?

      • Text is too small in the Directed Acyclic Graph, especially in Figure S4. Also, I would recommend adding the Directed Acyclic Graphs from Figure S1-S4 to the online documentation.

      • Regarding the code, I was puzzled to see a copyConfigFiles process. Also, there are files in bin/r/local_assets, these should be located in assets. And the container for the singularity and docker profile is likely the same, this should be clarified in the code.

      • It is recommended to remove the "defaults" channel from the list of channels declared in the containers/conda_envs/analysis.yml file. Please see information about that here https://www.anaconda.com/blog/is-conda-free and here https://www.theregister.com/2024/08/08/anaconda_puts_the_squeeze_on/.

      Additional comments (which do not need to be addressed):

      • Future work may consider enabling the use of the pipeline to build GRNs from other data sources than TCGA (i.e., nf-netzoo). Recount3 data is already being parsed for GTEx and TCGA samples, so it might be relatively easy to adapt the pipeline so that it can be used on any arbitrary recount3 dataset. Similarly, it could be useful if one could specify a dataset on the recountmethylation database 6 to build GRNs. While these unimodal datasets could not be used with the DRAGON method they would still benefit from all other features of the pipeline.

      • Using a nf-core template would enable better structure of the code and increase the visibility of the tool. Also using multiple containers is usually easier to maintain and update than a single large container, especially when a single tool needs to be updated or when modifying part of the pipeline. Another comment is that the code contains many comments which are not to explain the code but more like quick draft which makes the code harder to read by others.

      References 1. Padi, M., and Quackenbush, J. (2018). Detecting phenotype-driven transitions in regulatory network structure. npj Syst Biol Appl 4, 1-12. https://doi.org/10.1038/s41540-018-0052-5. 2. Lim, J.T., Chen, C., Grant, A.D., and Padi, M. (2021). Generating Ensembles of Gene Regulatory Networks to Assess Robustness of Disease Modules. Front. Genet. 11. https://doi.org/10.3389/fgene.2020.603264. 3. Tao, W., Radstake, T.R.D.J., and Pandit, A. (2022). RegEnrich gene regulator enrichment analysis reveals a key role of the ETS transcription factor family in interferon signaling. Commun Biol 5, 1-12. https://doi.org/10.1038/s42003-021-02991-5. 4. Srivastava, H., Ferrell, D., and Popescu, G.V. (2022). NetSeekR: a network analysis pipeline for RNA-Seq time series data. BMC Bioinformatics 23, 54. https://doi.org/10.1186/s12859-021-04554-1. 5. Hu, Y., Guo, X., Yun, Y., Lu, L., Huang, X., and Jia, S. (2025). DisGeNet: a disease-centric interaction database among diseases and various associated genes. Database 2025, baae122. https://doi.org/10.1093/database/baae122. 6. Maden, S.K., Walsh, B., Ellrott, K., Hansen, K.D., Thompson, R.F., and Nellore, A. (2023). recountmethylation enables flexible analysis of public blood DNA methylation array data. Bioinformatics Advances 3, vbad020. https://doi.org/10.1093/bioadv/vbad020.

    1. ABSTRACTNanopore sequencing is a widespread and important method in genomics science. The raw electrical current signal data from a typical nanopore sequencing experiment is large and complex. This can be stored in two alternative file formats that are presently supported: POD5 is a signal data file format used by default on instruments from Oxford Nanopore Technologies (ONT); SLOW5 is an open-source file format originally developed as an alternative to ONT’s previous file format, which was known as FAST5. The choice of format may have important implications for the cost, speed and simplicity of nanopore signal data analysis, management and storage. To inform this choice, we present a comparative evaluation of POD5 vs SLOW5. We conducted benchmarking experiments assessing file size, analysis performance and usability on a variety of different computer architectures. SLOW5 showed superior performance during sequential and non-sequential (random access) file reading on most systems, manifesting in faster, cheaper basecalling and other analysis, and we could find no instance in which POD5 file reading was significantly faster than SLOW5. We demonstrate that SLOW5 file writing is highly parallelisable, thereby meeting the demands of data acquisition on ONT instruments. Our analysis also identified differences in the complexity and stability of the software libraries for SLOW5 (slow5lib) and POD5 (pod5), including a large discrepancy in the number of underlying software dependencies, which may complicate the pod5 compilation process. In summary, many of the advantages originally conceived for SLOW5 remain relevant today, despite the replacement of FAST5 with POD5 as ONT’s core file format.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf118), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jan Voges

      Comments to Author: Synopsis: The manuscript builds on the authors' previous work introducing the SLOW5 format for Oxford Nanopore signal data as an improvement over the FAST5 format. Since then, Oxford Nanopore Technologies (ONT) has introduced its own new format, POD5. This paper directly compares SLOW5 and POD5. The authors claim that SLOW5 provides higher reading speeds for both sequential and random access, writing speeds sufficient to keep pace with data acquisition in sequencing machines, comparable file sizes with no significant storage penalty, a simpler implementation with fewer dependencies. The paper is clearly written, includes extensive supplementary information, and references the source code for all tools used in the experiments. Comments: - Sequential access performance: To me it is unclear whether SLOW5's advantage in sequential access originates from its file layout or from the use of mmap I/O versus traditional I/O. A small ablation study, forcing both SLOW5 and POD5 tools to use the same I/O method on platforms with currently large performance differences, would clarify where the performance gain originates from. - Figure 4: While POD5's dependency structure is indeed more complex than that of slow5lib, the current tree representation exaggerates this complexity. Many common packages (e.g., Python, zlib) appear multiple times as dependency of multiple other packages. A dependency graph where each package appears only once would be a more informative representation. - Figure 5: POD5 versions prior to 0.1.0 appear to be preview releases (and are even marked as such on GitHub). Breaking changes during early previews are normal, so including them in the same visual space as stable versions risks being misleading. - Figure 5: Breaking change at version 0.1.12: The timeline indicates a breaking change at POD5 version 0.1.12 which seems particularly relevant as the latest breaking change after version 0.1.0. However, this change is not reflected in the POD5 compatibility matrix on the right. An explanation of what type of breaking change occurred would clarify its impact and help readers assess compatibility risk. - Random access "walker strategy": A brief explanation comparing it to SLOW5's index-file approach would improve accessibility without requiring readers to consult external documentation.

    1. Reviewer #1 (Public review):

      The authors have implemented several clarifications in the text and improved the connection between their findings and previous work. As stated in my initial review, I had no major criticisms of the previous version of the manuscript, and I continue to consider this a solid and well-written study. However, the revised manuscript still largely reiterates existing findings and does not offer novel conceptual or experimental advances. It supports previous conclusions suggesting a likely conserved sex determination locus in aculeate hymenopterans, but does so without functional validation (i.e., via experimental manipulation) of the candidate locus in O. biroi. I also wish to clarify that I did not intend to imply that functional assessments in the Pan et al. study were conducted in more than one focal species; my previous review explicitly states that the locus's functional role was validated in the Argentine ant.

    2. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include (in what will be lines 123-126) the highlighted portion of the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion (in what will be lines 372-374): “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results, in what will be lines 172-174: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the highlighted portion of the following sentence (in what will be lines 268-270) to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below (in what will be lines 287-295), with the additions highlighted:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      L307-308 should state homozygous for either allele in THE MAJORITY of diploid males.

      This will be fixed in the revised manuscript, in what will be line 321.

      Reviewer #3 (Recommendations for the authors):

      The association between heterozygosity in the CSD candidate region and female development in O. biroi, along with the high sequence homology of this region to CSD loci identified in two distantly related ant species, is not sufficient to fully address the evolution of the CSD locus and the mechanisms of sex determination.

      Given that functional genetic tools, such as genome editing, have already been established in O. biroi, I strongly recommend that the authors investigate the role of the lncRNA through knockout or knockdown experiments and assess its impact on the sex-specific splicing pattern of the downstream tra gene.

      Although knockout experiments of the lncRNA would be illuminating, the primary signal of complementary sex determination is heterozygosity. As is clearly stated in our manuscript and that of (Pan et al. 2024), it does not appear to be heterozygosity within the lncRNA that induces female development, but rather heterozygosity in non-transcribed regions linked to the lncRNA. Therefore, future mechanistic studies of sex determination in O. biroi, L. humile, and other ants should explore how homozygosity or heterozygosity of this region impacts the sex determination cascade, rather than focusing (exclusively) on the lncRNA.

      With this in mind, we developed three sets of guide RNAs that cut only one allele within the mapped CSD locus, with the goal of producing deletions within the highly variable region within the mapped locus. This would lead to functional hemizygosity or homozygosity within this region, depending on how the cuts were repaired. We also developed several sets of PCR primers to assess the heterozygosity of the resultant animals. After injecting 1,162 eggs over several weeks and genotyping the hundreds of resultant animals with PCR, we confirmed that we could induce hemizygosity or homozygosity within this region, at least in ~1/20 of the injected embryos. Although it is possible to assess the sex-specificity of the splice isoform of tra as a proxy for sex determination phenotypes (as done by (Pan et al. 2024)), the ideal experiment would assess male phenotypic development at the pupal stage. Therefore, over several more weeks, we injected hundreds more eggs with these reagents and reared the injected embryos to the pupal stage. However, substantial mortality was observed, with only 12 injected eggs developing to the pupal stage. All of these were female, and none of them had been successfully mutated.

      In conclusion, we agree with the reviewer that functional experiments would be useful, and we made extensive attempts to conduct such experiments. However, these experiments turned out to be extremely challenging with the currently available protocols. Ultimately, we therefore decided to abandon these attempts.  

      We opted not to include these experiments in the paper itself because we cannot meaningfully interpret their results. However, we are pleased that, in this response letter, we can include a brief description for readers interested in attempting similar experiments.

      Since O. biroi reproduces parthenogenetically and most offspring develop into females, observing a shift from female- to male-specific splicing of tra upon early embryonic knockout of the lncRNA would provide much stronger evidence that this lncRNA is essential for female development. Without such functional validation, the authors' claim (lines 36-38) seems to reiterate findings already proposed by Pan et al. (2024) and, as such, lacks sufficient novelty.

      We have responded to the issue of “lack of novelty” above. But again, the actual CSD locus in both O. biroi and L. humile appears to be distinct from (but genetically linked to) the lncRNA, and we have no experimental evidence that the putative lncRNA in O. biroi is involved in sex determination at all. Because of this, and given the experimental challenges described above, we do not currently intend to pursue functional studies of the lncRNA.

      References

      Hasselmann M, Gempe T, Schiøtt M, Nunes-Silva CG, Otte M, Beye M. 2008. Evidence for the evolutionary nascence of a novel sex determination pathway in honeybees. Nature 454:519–522.

      Koch V, Nissen I, Schmitt BD, Beye M. 2014. Independent Evolutionary Origin of fem Paralogous Genes and Complementary Sex Determination in Hymenopteran Insects. PLOS ONE 9:e91883.

      Matthey-Doret C, van der Kooi CJ, Jeffries DL, Bast J, Dennis AB, Vorburger C, Schwander T. 2019. Mapping of multiple complementary sex determination loci in a parasitoid wasp. Genome Biology and Evolution 11:2954–2962.

      Miyakawa MO, Mikheyev AS. 2015. QTL mapping of sex determination loci supports an ancient pathway in ants and honey bees. PLOS Genetics 11:e1005656.

      Pan Q, Darras H, Keller L. 2024. LncRNA gene ANTSR coordinates complementary sex determination in the Argentine ant. Science Advances 10:eadp1532.

      Privman E, Wurm Y, Keller L. 2013. Duplication and concerted evolution in a master sex determiner under balancing selection. Proceedings of the Royal Society B: Biological Sciences 280:20122968.

      Rabeling C, Kronauer DJC. 2012. Thelytokous parthenogenesis in eusocial Hymenoptera. Annual Review of Entomology 58:273–292.

      Schmieder S, Colinet D, Poirié M. 2012. Tracing back the nascence of a new sex-determination pathway to the ancestor of bees and ants. Nature Communications 3:1–7.

      Vorburger C. 2013. Thelytoky and Sex Determination in the Hymenoptera: Mutual Constraints. Sexual Development 8:50–58.

    1. výdaje

      Bylo by lepší dát sem nejdřív nějaký přehled, o jaké výdaje se jedná, jaký mají vztah k bydlení, jaký resort to má na starosti. Čtenář by se pak lépe zorientoval v tom grafu. (mj. tápu v tom propojení resort/typ výdaje: bylo by vhodné to více vysvětlit právě úvodním podrobnějším popisem).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics, and the well-established behavioral paradigm outcome-specific PIT-sPIT), Octavia Soegyono and colleagues decipher the diNerential contribution of dopamine receptors D1 and D2 expressing spiny projection neurons (SPNs). 

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these eNects were specific to stimulus-based actions, as valuebased choices were left intact in all manipulations. 

      This is a well-designed study, and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and adds to the current literature.

      We thank the Reviewer for their positive assessment. 

      Reviewer 2 (Public Review):

      Summary: 

      This manuscript by Soegyono et al. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cueguided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no eNects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter-only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum was required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths: 

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value-guided action selection. The inclusion of reporter-only control groups is rigorous and rules out nonspecific eNects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provide a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry. 

      We thank the Reviewer for their positive assessment. 

      Weaknesses: 

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration of D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to the ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      We acknowledge the reviewer's valuable suggestion that demonstrating NAc-S D1- and D2-SPNs engagement in outcome-specific PIT through another technique would strengthen our optogenetic findings. Several approaches could provide this validation. Chemogenetic manipulation, as the reviewer suggested, represents one compelling option. Alternatively, immunohistochemical assessment of phosphorylated histone H3 at serine 10 (P-H3) oMers another promising avenue, given its established utility in reporting striatal SPNs plasticity in the dorsal striatum (Matamales et al., 2020). We hope to complete such an assessment in future work since it would address the limitations of previous work that relied solely on ERK1/2 phosphorylation measures in NAc-S SPNs (Laurent et al., 2014). The manuscript was modified to report these future avenues of research (page 12). 

      Regarding the null result from optical silencing of D2 terminals in the ventral pallidum, we agree with the reviewer's assessment. While we acknowledge this limitation in the current manuscript (page 13), we aim to address this gap in future studies to provide a more complete mechanistic understanding of the circuit.

      Reviewer 3 (Public Review):

      Summary:

      The authors present data demonstrating that optogenetic inhibition of either D1- or D2MSNs in the NAc Shell attenuates expression of sensory-specific PIT while largely sparing value-based decision on an instrumental task. They also provide evidence that SS-PIT depends on D1-MSN projections from the NAc-Shell to the VP, whereas projections from D2-MSNs to the VP do not contribute to SS-PIT.

      Strengths:

      This is clearly written. The evidence largely supports the authors' interpretations, and these eNects are somewhat novel, so they help advance our understanding of PIT and NAc-Shell function.

      We thank the Reviewer for their positive assessment. 

      Weaknesses:

      I think the interpretation of some of the eNects (specifically the claim that D1-MSNs do not contribute to value-based decision making) is not fully supported by the data presented.

      We appreciate the reviewer's comment regarding the marginal attenuation of valuebased choice observed following NAc-S D1-SPN silencing. While this manipulation did produce a slight reduction in choice performance, the behavior remained largely intact. We are hesitant to interpret this marginal eMect as evidence for a direct role of NAc-S D1SPNs in value-based decision-making, particularly given the substantial literature demonstrating that NAc-S manipulations typically preserve such choice behavior (Corbit et al., 2001; Corbit & Balleine, 2011; Laurent et al., 2012). Furthermore, previous work has shown that NAc-S D1 receptor blockade impairs outcome-specific PIT while leaving value-based choice unaMected (Laurent et al., 2014). We favor an alternative explanation for our observed marginal reduction. As documented in Supplemental Figure 1, viral transduction extended slightly into the nucleus accumbens core (NAc-C), a region established as critical for value-based decision-making (Corbit et al., 2001; Corbit & Balleine, 2011; Laurent et al., 2012; Parkes et al., 2015). The marginal impairment may therefore reflect inadvertent silencing of a small number of  NAc-C D1-SPNs rather than a functional contribution from NAc-S D1-SPNs. Future studies specifically targeting larger NAc-C D1-SPN populations would help clarify this possibility and provide definitive resolution of this question.

      Reviewer 1 (Recommendations for the Author):

      My main concerns and comments are listed below.

      (1) Could the authors provide the "raw" data of the PIT tests, such as PreSame vs Same vs PreDiNerent vs DiNerent? Could the authors clarify how the Net responding was calculated? Was it Same minus PreSame & DiNerent minus PreDiNerent, or was the average of PreSame and PreDiNerent used in this calculation?

      The raw data for PIT testing across all experiments are now included in the Supplemental Figures (Supplemental Figures S1E, S2E, S3E, and S4E). Baseline responding was quantified as the average number of lever presses per minute for both actions during the two-minute period (i.e., average of PreSame and PreDiMerent) preceding each stimulus presentation. This methodology has been clarified in the revised manuscript (page 7).

      (2) While both sexes are utilized in the current study, no statistical analysis is provided. Can the authors please comment on this point and provide these analyses (for both training and tests)?

      As noted in the original manuscript, the final sample sizes for female and male rats were insuMicient to provide adequate statistical power for sex-based analyses (page 15). To address this limitation, we have now cited a previous study from our laboratory (Burton et al., 2014) that conducted such analyses with suMicient power in identical behavioural tasks. That study identified only marginal sex diMerences in performance, with female rats exhibiting slightly higher magazine entry rates during Pavlovian conditioning. Importantly, no diMerences were observed in outcome-specific PIT or value-based choice performance between sexes.

      (3) Regarding Figure 1 - Anterograde tracing in D1-Cre and A2a-Cre rats (from line 976), I have one major and one minor question:

      (3.1) I do not understand the rationale of showing anterograde tracing from the Dorsal Striatum (DS) as this region is not studied in the current work. Moreover, sagittal micrographs of D1-Cre and A2a-Cre would be relevant here. Could the authors please provide these micrographs and explain the rationale for doing tracing in DS?

      We included dorsal striatum (DS) tracing data as a reference because the projection patterns of D1 and D2 SPNs in this region are well-established and extensively characterized, in contrast to the more limited literature on these cell types in the NAc-S. Regarding the comment about sagittal micrographs, we are uncertain of the specific concern as these images are presented in Figure 1B.

      If the reviewer is requesting sagittal micrographs for NAc-S anterograde tracing, we did not employ this approach because: (1) the NAc-S and ventral pallidum are anatomically adjacent regions and (2) the medial-lateral coordinates of the ventral pallidum and lateral hypothalamus do not align optimally with those of the NAc-S, limiting the utility of sagittal analysis for these projections.

      (3.2) There is no description about how the quantifications were done: manually? Automatically? What script or plugin was used? If automated, what were the thresholding conditions? How many brain sections along the anteroposterior axis? What was the density of these subpopulations? Can the authors include a methodological section to address this point?

      We apologize for the omission of quantification methods used to assess viral transduction specificity. This methodological description has now been added to the revised manuscript (page 22). Briefly, we employed a manual procedure in two sections per rat, and cell counts were completed in a defined region of interest located around the viral infusion site.

      (4) Lex A & Hauber (2008) Dopamine D1 and D2 receptors in the nucleus accumbens core and shell mediate Pavlovian-instrumental transfer. Learning & memory 15:483- 491, should be cited and discussed. It also seems that the contribution of the main dopaminergic source of the brain, the ventral tegmental area, is not cited, while it has been investigated in PIT in at least 3 studies regarding sPIT only, notably the VP-VTA pathway (Leung & Balleine 2015, accurately cited already).

      We did not include the Lex & Hauber (2008) study because its experimental design (single lever and single outcome) prevents diMerentiation between the eMects of Pavlovian stimuli on action performance (general PIT) versus action selection (outcome-specific PIT, as examined in the present study). Drawing connections between their findings and our results would require speculative interpretations regarding whether their observed eMects reflect general or outcome-specific PIT mechanisms, which could distract from the core findings reported in the article.

      Several studies examining the role of the VTA in outcome-specific PIT were referenced in the manuscript's introduction. Following the reviewer's recommendation, these references have also been incorporated into the discussion section (page 13). 

      (5) While not directly the focus of this study, it would be interesting to highlight the accumbens dissociation between General vs Specific PIT, and how the dopaminergic system (diNerentially?) influences both forms of PIT.

      We agree with the reviewer that the double dissociation between nucleus accumbens core/shell function and general/specific PIT is an interesting topic. However, the present manuscript does not examine this dissociation, the nucleus accumbens core, or general PIT. Similarly, our study does not directly investigate the dopaminergic system per se. We believe that discussing these topics would distract from our core findings and substantially increase manuscript length without contributing novel data directly relevant to these areas. 

      (6) While authors indicate that conditioned response to auditory stimuli (magazine visits) are persevered in all groups, suggesting intact sensitivity to the general motivational properties of reward-predictive stimuli (lines 344, 360), authors can't conclude about the specificity of this behavior i.e. does the subject use a mental representation of O1 when experiencing S1, leading to a magazine visits to retrieve O1 (and same for S2-O2), or not? Two food ports would be needed to address this question; also, authors should comment on the fact that competition between instrumental & pavlovian responses does not explain the deficits observed.

      We agree with the Reviewer that magazine entry data cannot be used to draw conclusions about specificity, and we do not make such claims in our manuscript. We are therefore unclear about the specific concern being raised. Following the Reviewer’s recommendation, we have commented on the fact that response competition could not explain the results obtained (page 11, see also supplemental discussion). 

      The minor comments are listed below.

      (7) A high number of rats were excluded (> 32 total), and the number of rats excluded for NAc-S D1-SPNs-VP is not indicated.

      We apologize for omitting the number of rats excluded from the experiment examining NAc-S D1-SPN projections to the ventral pallidum. This information has been added to the revised manuscript (page 22).

      (7.1) Can authors please comment on the elevated number of exclusions?

      A total of 133 rats were used across the reported experiments, with 40 rats excluded based on post-mortem analyses. This represents an attrition rate of approximately 30%, which we consider reasonable given that most animals received two separate viral infusions and two separate fiber-optic cannula implantations, and that the inclusion of both female and male rats contributed to some variability in coordinates and so targeting. 

      (7.2) Can authors please present the performance of these animals during the tasks (OFF conditions, and for control ones, both ON & OFF conditions)?

      Rats were excluded after assessing the spread of viral infusions, placement of fibre-optic cannulas and potential damage due to the surgical procedures (page 21). The requested data are presented below and plotted in the same manner as in Figures 3-6. The pattern of performance in excluded animals was highly variable. 

      Author response image 1.

       

      (8) For tracing, only males were used, and for electrophysiology, only females were used.

      (8.1) Can authors please comment on not using both sexes in these experiments? 

      We agree that equal allocation of female and male rats in the experiments presented in Figures 1-2 would have been preferable. Animal availability was the sole factor determining these allocations. Importantly, both female and male D1-Cre and A2A-Cre rats were used for the NAc-S tracing studies, and no sex diMerences were observed in the projection patterns. The article describing the two transgenic lines of rats did not report any sex diMerence (Pettibone et al., 2019). 

      (8.2) Is there evidence in the literature that the electrophysiological properties of female versus male SPNs could diNer?

      The literature indicates that there is no sex diMerence in the electrophysiological properties of NAc-S SPNs (Cao et al., 2018; Willett et al., 2016).  

      (8.3) It seems like there is a discrepancy between the number of animals used as presented in the Figure 2 legend versus what is described in the main text. In the Figure legend, I understand that 5 animals were used for D1-Cre/DIO-eNpHR3.0 validation, and 7 animals for A2a-Cre/DIO-eNpHR3.0; however, the main text indicates the use of a total of 8 animals instead of the 12 presented in the Figure legend. Can authors please address this mismatch or clarify?

      The number of rats reported in the main text and Figure 2 legend was correct. However, recordings sometimes involved multiple cells from the same animal, and this aspect of the data was incorrectly reported and generated confusion. We have clarified the numbers in both the main text and Figure 2 legend to distinguish between animal counts and cell counts. 

      (9) Overall, in the study, have the authors checked for outliers?

      Performance across all training and testing stages was inspected to identify potential behavioral outliers in each experiment. Abnormal performance during a single session within a multi-session stage was not considered suMicient grounds for outlier designation. Based on these criteria, no subjects remaining after post-mortem analyses exhibited performance patterns warranting exclusion through statistical outlier analysis. However, we have conducted the specific analyses requested by the Reviewer, as described below. 

      (9.1) In Figure 3, it seems that one female in the eYFP group, in the OFF situation, for the diNerent condition, has a higher level of responding than the others. Can authors please confirm or refute this visual observation with the appropriate statistical analysis?

      Statistical analysis (z-score) confirmed the reviewer's observation regarding responding of the diMerent action in the OFF condition for this subject (|z| = 2.58). Similar extreme responding was observed in the ON condition (|z| = 2.03). Analyzing responding on the diMerent action in isolation is not informative in the context of outcome-specific PIT. Additional analyses revealed |z| < 2 when examining the magnitude of choice discrimination in outcome-specific PIT (i.e., net same versus net diMerent responding) in both ON and OFF conditions. Furthermore, this subject showed |z| < 2 across all other experimental stages. Based on these analyses, we conclude that the subject should be kept in all analyses. 

      (9.2) In Figure 5, it seems that one male, in the ON situation, in the diNerent condition, has a quite higher level of responding - is this subject an outlier? If so, how does it aNect the statistical analysis after being removed? And who is this subject in the OFF condition?

      The reviewer has identified two diMerent male rats infused with the eNpHR3.0 virus and has asked closer examination of their performance.

      The first rat showed outlier-level responding on the diMerent action in the ON condition (|z| = 2.89) but normal responding for all other measures across LED conditions (|z| < 2). Additional analyses revealed |z| = 2.55 when examining choice discrimination magnitude in outcome-specific PIT during the ON condition but not during the OFF condition (|z| = 0.62). This subject exhibited |z| < 2 across all other experimental stages.

      The second rat showed outlier-level responding on the same action in the OFF condition (|z| = 2.02) but normal responding for all other measures across LED conditions (|z| < 2). Additional analyses revealed |z| = 2.12 when examining choice discrimination magnitude in outcome-specific PIT during the OFF condition but not during the ON condition (|z| = 0.67). This subject also exhibited |z| < 2 across all other experimental stages.

      We excluded these two subjects and conducted the same analyses as described in the original manuscript. Baseline responding did not diMer between groups (p = 0.14), allowing to look at the net eMect of the stimuli. Overall lever presses were greater in the eYFP rats (Group: F(1,16) = 6.08, p < 0.05; η<sup>2</sup> = 0.28) and were reduced by LED activation (LED: F(1,16) = 9.52, p < 0.01; η<sup>2</sup> = 0.44) and this reduction depended on the group considered (Group x LED: F(1,16) = 12.125, p < 0.001; η<sup>2</sup> = 0.43). Lever press rates were higher on the action earning the same outcome as the stimuli compared to the action earning the diMerent outcome (Lever: F(1,16)= 49.32; η<sup>2</sup> = 0.76; p < 0.001), regardless of group (Group x Lever: p = 0.14). There was a Lever by LED light condition interaction (Lever x LED: F(1,16)= 5.25; η<sup>2</sup> = 0.24; p < 0.05) but no an interaction between group, LED light condition, and Lever during the presentation of the predictive stimuli (p = 0.10). Given the significant Group x LED and Lever x LED interactions, additional analyses were conducted to determine the source of these interactions. In eYFP rats, LED activation had no eMect (LED: p = 0.70) and lever presses were greater on the same action (Lever: (F(1,9) = 23.94, p < 0.001; η<sup>2</sup> = 0.79) regardless of LED condition (LED x Lever: p = 0.72). By contrast, in eNpHR3.0 rats, lever presses were reduced by LED activation (LED: F(1,9) = 23.97, p < 0.001; η<sup>2</sup> = 0.73), were greater on the same action (Lever: F(1,9) = 16.920, p < 0.001; η<sup>2</sup> = 0.65) and the two factors interacted (LED x Lever: F(1,9) = 9.12, p < 0.01; η<sup>2</sup> = 0.50). These rats demonstrated outcome-specific PIT in the OFF condition (F(1,9) = 27.26, p < 0.001; η<sup>2</sup> = 0.75) but not in the ON condition (p = 0.08).

      Overall, excluding these two rats altered the statistical analyses, but both the original and revised analyses yielded the same outcome: silencing the NAc-S D1-SPN to VP pathway disrupted PIT. More importantly, we do not believe there are suMicient grounds to exclude the two rats identified by the reviewer. These animals did not display outlier-level responding across training stages or during the choice test. Their potential classification as outliers would be based on responding during only one LED condition and not the other, with notably opposite patterns between the two rats despite belonging to the same experimental group. 

      (10) I think it would be appreciable if in the cartoons from Figure 5.A and 6.A, the SPNs neurons were color-coded as in the results (test plots) and the supplementary figures (histological color-coding), such as D1- in blue & D2-SPNs in red.

      Our current color-coding system uses blue for D1-SPNs transduced with eNpHR3.0 and red for D2-SPNs transduced with eNpHR3.0. The D1-SPNs and D2-SPNs shown in Figures 5A and 6A represent cells transduced with either eYFP (control) or eNpHR3.0 virus and therefore cannot be assigned the blue or red color, which is reserved for eNpHR3.0transduced cells specifically. The micrographs in the Supplemental Figures maintain consistency with the color-coding established in the main figures.

      (11) As there are (relatively small) variations in the control performance in term of Net responding (from ~3 to ~7 responses per min), I wonder what would be the result of pooling eYFP groups from the two first experiments (Figures 3 & 4) and from the two last ones (Figures 5 & 6) - would the same statically results stand or vary (as eYFP vs D1-Cre vs A2a-Cre rats)? In particular for Figures 3 & 4, with and without the potential outlier, if it's indeed an outlier.

      We considered the Reviewer’s recommendation but do not believe the requested analysis is appropriate. The Reviewer is requesting the pooling of data from subjects of distinct transgenic strains (D1-Cre and A2A-Cre rats) that underwent surgical and behavioral procedures at diMerent time points, sometimes months apart. Each experiment was designed with necessary controls to enable adequate statistical analyses for testing our specific hypotheses. 

      (12) Presence of cameras in operant cages is mentioned in methods, but no data is presented regarding recordings, though authors mention that they allow for real-time observations of behavior. I suggest removing "to record" or adding a statement about the fact that no videos were recorded or used in the present study.

      We have removed “to record” from the manuscript (page 18). 

      (13) In all supplementary Figures, "F" is wrongly indicated as "E".

      We thank the Reviewer for reporting these errors, which have been corrected. 

      (14) While the authors acknowledge that the eNicacy of optogenetic inhibition of terminals is questionable, I think that more details are required to address this point in the discussion (existing literature?). Maybe, the combination of an anterograde tracer from SPNs to VP, to label VP neurons (to facilitate patching these neurons), and the Credependent inhibitory opsin in the NAc Shell, with optogenetic illumination at the level of the VP, along with electrophysiological recordings of VP neurons, could help address this question but may, reasonably, seem challenging technically.

      Our manuscript does not state that optogenetic inhibition of terminals is questionable. It acknowledges that we do not provide any evidence about the eMicacy of the approach. Regardless, we have provided additional details and suggestions to address this lack of evidence (page 13). 

      (15) A nice addition could be an illustration of the proposed model (from line 374), but it may be unnecessary.

      We have carefully considered the reviewer's recommendation. The proposed model is detailed in three published articles, including one that is freely accessible, which we have cited when presenting the model in our manuscript (page 14). This reference should provide interested readers with easy access to a comprehensive illustration of the model.

      Reviewer 2 (Recommendations for the Author):

      As noted in my public comments, this is a truly excellent and compelling study. I have only a few minor comments.

      (1) I could not find the coordinates/parameters for the dorsal striatal AAV injections for that component of the tract tracing experiment.

      We apologize for this omission, which has now been corrected (page 16). 

      (2) Please add the final group sizes to the figure captions.

      We followed the Reviewer’s recommendation and added group sizes in the main figure captions. 

      (3) The discussion of group exclusions (p 21 line 637) seems to accidentally omit (n = X) the number of NAc-S D1-SPNs-VP mice excluded.

      We apologize for this omission, which has now been corrected (page 22). 

      (4) There were some labeling issues in the supplementary figures (perhaps elsewhere, too). Specifically, panel E was listed twice (once for F) in captions.

      We apologize for this error, which has now been corrected.  

      (5) Inspection of the magazine entry data from PIT tests suggests that the optogenetic manipulations may have had some eNects on this behavior and would encourage the authors to probe further. There was a significant group diNerence for D1-SPN inhibition and a marginal group eNect for D2-SPNs. The fact that these eNects were in opposite directions is intriguing, although not easily interpreted based on the canonical D1/D2 model. Of course, the eNects are not specific to the light-on trials, but this could be due to carryover into light-oN trials. An analysis of trial-order eNects seems crucial for interpreting these eNects. One might also consider normalizing for pre-test baseline performance. Response rates during Pavlovian conditioning seem to suggest that D2eNpHR mice showed slightly higher conditioned responding during training, which contrasts with their low entry rates at test. I don't see any of this as problematic -- but more should be done to interpret these findings.

      We thank the reviewer for raising this interesting point regarding magazine entry rates. Since these data are presented in the Supplemental Figures, we have added a section in the Supplemental Material file that elaborates on these findings. This section does not address trial order eMects, as trial order was fully counterbalanced in our experiments and the relevant statistical analyses would lack adequate power. Baseline normalization was not conducted because the reviewer's suggestion was based on their assumption that eNpHR3.0 rats in the D2-SPNs experiment showed slightly higher magazine entries during Pavlovian training. However, this was not the case. In fact, like the eNpHR3.0 rats in the D1-SPNs experiment, they tended to display lower magazine entries during training. The added section therefore focuses on the potential role of response competition during outcome-specific PIT tests. Although we concluded that response competition cannot explain our findings, we believe it may complicate interpretation of magazine entry behavior. Thus, we recommend that future studies examine the role of NAc-S SPNs using purely Pavlovian tasks. It is worth nothing that we have recently completed experiments (unpublished) examining NAc-S D1- and D2-SPN silencing during stimulus presentation in a Pavlovian task identical to the one used here. Silencing of either SPN population had no eMect on magazine entry behavior.

      Reviewer 3 (Recommendations for the Author):

      Broad comments:

      Throughout the manuscript, the authors draw parallels between the eNect established via pharmacological manipulations and those shown here with optogenetic manipulation. I understand using the pharmacological data to launch this investigation, but these two procedures address very diNerent physiological questions. In the case of a pharmacological manipulation, the targets are receptors, wherever they are expressed, and in the case of D2 receptors, this means altering function in both pre-synaptically expressed autoreceptors and post-synaptically expressed D2 MSN receptors. In the case of an optogenetic approach, the target is a specific cell population with a high degree of temporal control. So I would just caution against comparing results from these types of studies too closely.

      Related to this point is the consideration of the physiological relevance of the manipulation. Under normal conditions, dopamine acts at D1-like receptors to increase the probability of cell firing via Ga signaling. In contrast, dopamine binding of D2-like receptors decreases the cell's firing probability (signaling via Gi/o). Thus, shunting D1MSN activation provides a clear impression of the role of these cells and, putatively, the role of dopamine acting on these cells. However, inhibiting D2-MSNs more closely mimics these cells' response to dopamine (though optogenetic manipulations are likely far more impactful than Gi signaling). All this is to say that when we consider the results presented here in Experiment 2, it might suggest that during PIT testing, normal performance may require a halting of DA release onto D2-MSNs. This is highly speculative, of course, just a thought worth considering.

      We agree with the comments made by the Reviewer, and the original manuscript included statements acknowledging that pharmacological approaches are limited in the capacity to inform about the function of NAc-S SPNs (pages 4 and 9). As noted by the Reviewer, these limitations are especially salient when considering NAc-S D2-SPNs. Based on the Reviewer’s comment, we have modified our discussion to further underscore these limitations (page 12). Finally, we agree with the suggestion that PIT may require a halting of DA release onto D2-SPNs. This is consistent with the model presented, whereby D2-SPNs function is required to trigger enkephalin release (page 13).     

      Section-Specific Comments and Questions:

      Results:

      Anterograde tracing and ex vivo cell recordings in D1 Cre and A2a Cre rats: Why are there no statistics reported for the e-phys data in this section? Was this merely a qualitative demonstration? I realize that the A2a-Cre condition only shows 3 recordings, so I appreciate the limitations in analyzing the data presented.

      The reviewer is correct that we initially intended to provide a qualitative demonstration. However, we have now included statistical analyses for the ex vivo recordings. It is important to note that there were at least 5 recordings per condition, though overlapping data points may give the impression of fewer recordings in certain conditions. We have provided the exact number of recordings in both the main text (page 5) and figure legend. 

      What does trial by trial analysis look like, because in addition to the eNects of extinction, do you know if the responsiveness of the opsin to light stimulation is altered after repeated exposures, or whether the cells themselves become compromised in any way with repeated light-inhibition, particularly given the relatively long 2m duration of the trial.

      The Reviewer raises an interesting point, and we provide complete trial-by-trial data for each experiment below. As identified by the Reviewer, there is some evidence for extinction, although it remained modest. Importantly, the data suggest that light stimulation did not aMect the physiology of the targeted cells. In eNpHR3.0 rats, performance across OFF trials remained stable (both for Same and DiMerent) even though they were preceded by ON trials, indicating no carryover eMects from optical stimulation.

      Author response image 2.

       

      The statistics for the choice test are not reported for eNpHR-D1-Cre rats, but do show a weakening of the instrumental devaluation eNect "Group x Lever x LED: F1,18 = 10.04, p < 0.01, = 0.36". The post hoc comparisons showed that all groups showed devaluation, but it is evident that there is a weakening of this eNect when the LED was on (η<sup>2</sup> = 0.41) vs oN (η<sup>2</sup> = 0.78), so I think the authors should soften the claim that NAcS-D1s are not involved in value-based decision-making. (Also, there is a typo in the legend in Figure S1, where the caption for panel "F" is listed as "E".) I also think that this could be potentially interesting in light of the fact that with circuit manipulation, this same weakening of the instrumental devaluation eNect was not observed. To me, this suggests that D1-NAcS that project to a diNerent region (not VP) contribute to value-based decision making.

      This comment overlaps with one made in the Public Review, for which we have already provided a response. Given its importance, we have added a section addressing this point in the supplemental discussion of the Supplementary Material file, which aligns with the location of the relevant data. The caption labelling error has been corrected.

      Materials and Methods:

      Subjects:

      Were these heterozygous or homozygous rats? If hetero, what rats were used for crossbreeding (sex, strain, and vendor)? Was genotyping done by the lab or outsourced to commercial services? If genotyping was done within the lab, please provide a brief description of the protocol used. How was food restriction established and maintained (i.e., how many days to bring weights down, and was maintenance achieved by rationing or by limiting ad lib access to food for some period in the day)?

      The information requested by the Reviewer have been added to the subjects section (pages 15-16).  

      Were rats pair/group housed after implantation of optic fibers?

      We have clarified that rats were group houses throughout (see subjects section; pages 15-16). 

      Behavioral Procedures:

      How long did each 0.2ml sucrose infusion take? For pellets, for each US delivery, was it a single pellet or two in quick succession?

      We have modified the method section to indicate that the sucrose was delivered across 2 seconds and that a single pellet was provided (page 17). 

      The CS to ITI duration ratio is quite low. Is there a reason such a short ratio was used in training?

      These parameters are those used in all our previous experiments on outcome-specific PIT. There is no specific reason for using such a ratio, except that it shortens the length of the training session. 

      Relative to the end of training, when were the optical implantation surgeries conducted, and how much recovery time was given before initiating reminder training and testing?

      Fibre-optic implantation was conducted 3-4 days after training and another 3-4 days were given for recovery. This has been clarified in the Materials and methods section (pages 15-16).

      I think a diagram or schematic showing the timeline for surgeries, training, and testing would be helpful to the audience.

      We opted for a text-based experimental timeline rather than a diagram due to slight temporal variations across experiments (page 15).

      On trials, when the LED was on, was light delivered continuously or pulsed? Do these opto-receptors 'bleach' within such a long window?

      We apologize for the lack of clarity; the light was delivered continuously. We have modified the manuscript (pages 6 and 19) and figure legend accordingly. The postmortem analysis did not provide evidence for photobleaching (Supplemental Figures) and as noted above, the behavioural results do not indicate any negative physiological impact on cell function.  

      Immunofluorescence: The blocking solution used during IHC is described as "NHS"; is this normal horse serum?

      The Reviewer is correct; NHS stands for normal horse serum. This has been added (page 21). 

      Microscopy and imaging:

      For the description of rats excluded due to placement or viral spread problems, an n=X is listed for the NAc S D1 SPNs --> VP silencing group. Is this a typo, or was that meant to read as n=0? Also, was there a major sex diNerence in the attrition rate? If so, I think reporting the sex of the lost subjects might be beneficial to the scientific community, as it might reflect a need for better guidance on sex-specific coordinates for targeting small nuclei.

      We apologize for the error regarding the number of excluded animals. This error has been corrected (page 23). There were no major sex diMerences in the attrition rate. The manuscript has been updated to provide information about the sex of excluded animals (page 23). 

      References

      Cao, J., Willett, J. A., Dorris, D. M., & Meitzen, J. (2018). Sex DiMerences in Medium Spiny Neuron Excitability and Glutamatergic Synaptic Input: Heterogeneity Across Striatal Regions and Evidence for Estradiol-Dependent Sexual DiMerentiation. Front Endocrinol (Lausanne), 9, 173. https://doi.org/10.3389/fendo.2018.00173

      Corbit, L. H., Muir, J. L., Balleine, B. W., & Balleine, B. W. (2001). The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. J Neurosci, 21(9), 3251-3260. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=11312 310&retmode=ref&cmd=prlinks

      Corbit, L. H., & Balleine, B. W. (2011). The general and outcome-specific forms of Pavlovian-instrumental transfer are diMerentially mediated by the nucleus accumbens core and shell. J Neurosci, 31(33), 11786-11794. https://doi.org/10.1523/JNEUROSCI.2711-11.2011

      Laurent, V., Bertran-Gonzalez, J., Chieng, B. C., & Balleine, B. W. (2014). δ-Opioid and Dopaminergic Processes in Accumbens Shell Modulate the Cholinergic Control of Predictive Learning and Choice. J Neurosci, 34(4), 1358-1369. https://doi.org/10.1523/JNEUROSCI.4592-13.2014

      Laurent, V., Leung, B., Maidment, N., & Balleine, B. W. (2012). μ- and δ-opioid-related processes in the accumbens core and shell diMerentially mediate the influence of reward-guided and stimulus-guided decisions on choice. J Neurosci, 32(5), 1875-1883. https://doi.org/10.1523/JNEUROSCI.4688-11.2012

      Matamales, M., McGovern, A. E., Mi, J. D., Mazzone, S. B., Balleine, B. W., & BertranGonzalez, J. (2020). Local D2- to D1-neuron transmodulation updates goal-directed learning in the striatum. Science, 367(6477), 549-555. https://doi.org/10.1126/science.aaz5751

      Parkes, S. L., Bradfield, L. A., & Balleine, B. W. (2015). Interaction of insular cortex and ventral striatum mediates the eMect of incentive memory on choice between goaldirected actions. J Neurosci, 35(16), 6464-6471. https://doi.org/10.1523/JNEUROSCI.4153-14.2015

      Pettibone, J. R., Yu, J. Y., Derman, R. C., Faust, T. W., Hughes, E. D., Filipiak, W. E., Saunders, T. L., Ferrario, C. R., & Berke, J. D. (2019). Knock-In Rat Lines with Cre Recombinase at the Dopamine D1 and Adenosine 2a Receptor Loci. eNeuro, 6(5). https://doi.org/10.1523/ENEURO.0163-19.2019

      Willett, J. A., Will, T., Hauser, C. A., Dorris, D. M., Cao, J., & Meitzen, J. (2016). No Evidence for Sex DiMerences in the Electrophysiological Properties and Excitatory Synaptic Input onto Nucleus Accumbens Shell Medium Spiny Neurons. eNeuro, 3(1), ENEURO.0147-15.2016. https://doi.org/10.1523/ENEURO.0147-15.2016

    1. Frontal Bone: Forms the forehead and the roof of the orbits (eye sockets). Parietal Bones (2): Form the sides and roof of the cranium. Temporal Bones (2): Form the sides of the cranium, housing the ears. Occipital Bone: Forms the back and base of the cranium.

      “O PEST F” is a a good way to remember the structure of the skull

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      This study presents evidence that remote memory in the APP/PS1 mouse model of Alzheimer's disease (AD) is associated with PV interneuron hyperexcitability and increased inhibition of cortical engram cells. Its strength lies in the fact that it explores a neglected aspect of memory research - remote memory impairments related to AD (for which the primary research focus is usually on recent memory impairments) -which has received minimal attention to date. While the findings are intriguing, the weakness of the paper hovers around purely correlational types of evidence and superficial data analyses, which require substantial revisions as outlined below. 

      We thank the reviewer for their feedback, and we appreciate the recognition of the study’s novelty in addressing remote memory impairments in AD. We acknowledge the reviewer’s concerns and have implemented revisions to strengthen the manuscript.

      Major concerns: 

      (1) In light of previous work, including that by the authors themselves, the data in Figure 1 should be implemented by measurements of recent memory recall in order to assess whether remote memories are exclusively impaired or whether remote memory recall merely represents a continuation of recent memory impairments.

      We agree with the reviewer that is an important point. In line with their suggestion in minor comment 1, we now omitted the statement on recent memory in the results (previously on lines 109-111 and 117). Nonetheless, previous independent experiments from our group have repeatedly shown recent memory deficits in APP/PS1 mice at 12 weeks of age, including a recent article published in 2023. We refer the reviewer to figure 2c in Végh et al. (2014) and figure 2i in Kater et al. (2023). We have added a reference of the latter paper to our discussion section (line 458-459). Therefore, we are confident that the recent memory deficit at 12 weeks of age is a stable phenotype in our APP/PS1 mice.

      With these data in mind, we argue that the remote memory recall impairment is not a continuation of recent memory impairments. Recent memory deficits emerge already at 12 weeks of age, and when remote memory is assessed at 16 weeks (4 weeks after training at 12 weeks of age), APP/PS1 mice are still capable of forming and retrieving a remote memory. This suggests that remote memory retrieval can occur even when recent memory is compromised, arguing against the idea that the remote memory deficit observed at 20 weeks is a continuation of earlier recent memory impairments. We have clarified this point in the revised manuscript by adding the following sentence to the discussion section (line 462-465): 

      ‘This suggests that a remote memory can be formed even when recent memory expression is already compromised, indicating that the remote memory deficit in 20-week-old APP/PS1 mice is not a continuation of earlier recent memory impairments.’

      (2) Figure 2 shows electrophysiological properties of PV cells in the mPFC that correlate with the behavior shown in Figure 1. However, the mice used in Figure 2 are different than the mice used in Figure 1. Thus, the data are correlative at best, and the authors need to confirm that behavioral impairments in the APP/PS1 mice crossed to PV-Cre (and SST-Cre mice) used in Figure 2 are similar to those of the APP/PS1 mice used in Figure 1. Without that, no conclusions between behavioral impairments and electrophysiological as well as engram reactivation properties can be made, and the central claims of the paper cannot be upheld. 

      We thank the reviewer for raising this concern. Indeed, the remote memory impairment and PV hyperexcitability are correlative data, and therefore we do not make causal claims based on these data. However, please note that most of our key findings, including behavioural impairments, characterization of the engram ensemble and reactivation thereof, as well as inhibitory input measurements, were acquired using the same mouse line (APP/PS1), strengthening the coherence of our conclusions. Also, our electrophysiological findings in APP/PS1 (enhanced sIPSC frequency) and APP/PS1-PV-Cre-tdTomato (enhanced PV cell excitability) mice align well. Direct comparisons between the transgenic mouse lines APP/PS1 and APP/PS1 Parv-Cre were performed in our previous studies, confirming that these lines are similar in terms of behaviour and pathology. Specifically, we demonstrated that APP/PS1 mice display spatial memory impairments at 16 weeks of age, Fig 4a-d, consistent with the deficits observed in APP/PS1 Parv-Cre mice at 16 weeks of age, Fig 5a-c (Hijazi et al., 2020a). Additionally, Hijazi et al. (2020a) showed that soluble and insoluble Aβ levels do not differ between APP/PS1 Parv-Cre and APP/PS1 mice (sFig. 1), indicating comparable levels of pathology between these lines. While we do not have a similar characterization of the APP/PS1 SST-Cre line, we should mention that we also did not observe excitability differences in SST cells. We now acknowledge the limitation in the revised discussion section (line 480-487), and stress that our electrophysiology and behavioural findings are correlative in nature:

      ‘Although the excitability measurements were performed in APP/PS1-PV-Cre-tdTomato mice, and not in the APP/PS1 parental line, we previously found that these transgenic mouse lines exhibit comparable amyloid pathology (both soluble and insoluble amyloid beta levels) as well as similar spatial memory deficits (Hijazi et al., 2020a; Kater et al., 2023). Thus, our observations indicate that the APP/PS1 PV-Cre-tdTomato and APP/PS1 lines are similar in terms of pathology and behaviour. Nonetheless, further work is needed to identify a causal link between PV cell hyperexcitability and remote memory impairment.’ 

      (3) The reactivation data starting in Figure 3 should be analysed in much more depth: 

      a) The authors restrict their analysis to intra-animal comparisons, but additional ones should be performed, such as inter-animal (WT vs APP/PS1) as well as inter-age (12-16w vs 16-20w). In doing so, reactivation data should be normalized to chance levels per animal, to account for differences in labelling efficiency - this is standard in the field (see original Tonegawa papers and for a reference). This could highlight differences in total reactivation that are already apparent, such as for instance in WT vs APP/PS1 at 20w (Figure 3o) and highlight a decrease in reactivation in AD mice at this age, contrary to what is stated in lines 213-214. 

      We would like to thank the reviewer for the valuable input on the reactivation data in Figure 3. 

      We agree with the reviewer and now depict the data as normalized to chance levels (Figure 3). The original figures are now supplemental (sFig. 5). The reactivation data normalized to chance are similar to the original results, i.e. no difference was observed in the reactivation of the mPFC engram ensemble between genotypes. The reviewer may have overlooked that we did perform inter-animal (WT vs. APP/PS1) comparisons, however these were not significantly different. We have made this clearer in the main text, lines 277, 288-289, 294-295 and 303-304. Moreover, the reviewer recommended including inter-age group comparisons, which have now been added to the supplemental figures (sFig. 6). No genotype-dependent differences were observed. While a main effect of age group did emerge, indicating that there is a potential increased overlap between Fos+ and mCherry+ in animals aged 16-20 weeks, we caution against overinterpreting this finding. These experimental groups were processed in separate cohorts, with viral injection and 4TM-induced tagging performed at different moments in time, which may have contributed to the observed differences in overlap. We have addressed this point in the revised discussion (line 612-617):

      ‘Furthermore, we also observed an increase in the amount overlap between Fos+ and mCherry+ engram cells when comparing the 12-16w and 16-20w age groups. This finding should be interpreted with caution, as the experimental groups were processed in separate cohorts, with viral injections and 4TM-induced tagging performed at different moments in time. This may have contributed to the observed differences between ages.’

      b) Comparing the proportion of mcherry+ cells in PV- and PV+ is problematic, considering that the PV- population is not "pure" like the PV+, but rather likely to represent a mix of different pyramidal neurons (probably from several layers), other inhibitory neurons like SST and maybe even glial cells. Considering this, the statement on line 218 is misleading in saying that PVs are overrepresented. If anything, the same populations should be compared across ages or groups.  

      We thank the reviewer for their insightful comment and agree that the PV- population of cells is likely more heterogenous than the PV+ population. However, we would like to clarify that all quantified cells were selected based on Nissl immunoreactivity, and to exclude non-neuronal cells, stringent thresholding was applied in the script that was used to identify Nissl+ cells. The threshold information has now been added to the methods section (line 758-760). Thus, although heterogenous, the analysed PV- population reflects a neuronal subset. In response to the reviewer’s suggestion, we have now included overlap measurements relative to chance levels (Figure 3). These analyses did not reveal differences with the original analyses, i.e., there are no genotype specific differences. We have also incorporated the suggested inter-age group comparisons (sFig. 6) and found no differences between age groups. In light of the raised concerns, we have removed the statement that PV cells were overrepresented in the engram ensemble.

      c) A similar concern applies to the mcherry- population in Figure 4, which could represent different types of neurons that were never active, compared to the relatively homogeneous engram mcherry+ population. This could be elegantly fixed by restricting the comparison to mCherry+Fos+ vs mCherry+Fos- ensembles and could indicate engram reactivation-specific differences in perisomatic inhibition by PV cells. 

      The comparison the reviewer suggests, comparing mCherry+Fos+ to mCherry+Fos- is indeed conceptually interesting and could provide more insight into engram reactivation and PV input. However, there are practical limitations to performing this analysis, as neurons in close proximity need to be compared in a pairwise manner to account for local variability in staining intensity. As shown in Figure 3c+k and Figure 4a+b, d+e, PV immunostaining intensity varies to a certain extend within a given image. While pairwise comparisons of neighbouring neurons were feasible when analysing mCherry+ and mCherry- cells, they are unfortunately not feasible for the mCherry+Fos+ vs. mCherry+Fos- comparison. The occurrence of spatially adjacent mCherry+Fos+ and mCherry+Fos- neurons is too sparse for a pairwise comparison. This analysis would therefore result in substantial under-sampling and limit the reliability of the analysis. Nonetheless, we agree with the reviewer that the mCherry- population may be more heterogenous than the mCherry+ population, despite the fact that PV+ neurons and that non-neuronal cells were excluded from both populations in the analyses. We therefore added a statement to the discussion to acknowledge this limitation (line 536-539): 

      ‘Although PV+ cells were not included in this analysis and we excluded non-neuronal cells based on the area of the Nissl stain, the mCherry- population was potentially more heterogenous than the mCherry+ population, which may have contributed to the differences we observed.’

      (4) At several instances, there are some doubts about the statistical measures having been employed: 

      a) In Figure 4f, it is unclear why a repeated measurement ANOVA was used as opposed to a regular ANOVA. 

      b) In Supplementary Figure 2b, a Mann-Whitney test was used, supposedly because the data were not normally distributed. However, when looking at the individual data points, the data does seem to be normally distributed. Thus, the authors need to provide the test details as to how they measured the normalcy of distribution. 

      a) Based on the pairwise comparison of neighbouring neurons within animals, the data in Figure 4f was analysed with a repeated measure ANOVA. 

      b) We thank the author for their comment on Supplementary Figure 2b. The data is indeed normally distributed, and we have analysed it using a D’Agostino & Pearson test. We have corrected this in the supplemental figure. 

      Minor concerns: 

      (1) Line 117: The authors cite a recent memory impairment here, as shown by another paper. However, given the notorious difficulty in replicating behavioral findings, in particular in APP/PS1 mice (number of backcrossings, housing conditions, etc., might differ between laboratories), such a statement cannot be made. The authors should either show in their own hands that recent memory is indeed affected at 12 weeks of age, or they should omit this statement. 

      We thank the reviewer for this thoughtful comment. As noted in our response to major concern (1), we have addressed this concern by providing additional information and clarification in the discussion (line 462-465) regarding the possibility that remote memory impairments are a continuation of recent memory impairments. As mentioned in our response, we have added a reference to a more recent study from our lab (Kater et al. (2023). These findings are consistent with the earlier report from our lab (Végh et al. (2014), underscoring the reproducibility of this phenotype across independent cohorts and time. Notably, the experiments in the 2023 and present study were performed using the same housing and experimental conditions. Nevertheless, in light of the reviewer’s suggestion, and to avoid overstatement or speculation, we have now omitted the sentence referring to recent memory impairments at 12 weeks of age from the results section.

      (2) Pertaining to Figure 3, low-resolution images of the mPFC should be provided to assess the spread of injection and the overall degree of double-positive cells.  

      We agree with the reviewer and have added images of the mPFC as a supplemental figure (sFig. 3) that show the spread of the injection. Unfortunately, it is not possible to visualize the overall degree of double-positive cells at a lower magnification (or low-resolution). Representative examples of colocalization are presented in Figure 3.

      Reviewer #2 (Public review): 

      This study presents a comprehensive investigation of remote memory deficits in the APP/PS1 mouse model of Alzheimer's disease. The authors convincingly show that these deficits emerge progressively and are paralleled by selective hyperexcitability of PV interneurons in the mPFC. Using viral-TRAP labeling and patch-clamp electrophysiology, they demonstrate that inhibitory input onto labeled engram cells is selectively increased in APP/PS1 mice, despite unaltered engram size or reactivation. These findings support the idea that alterations in inhibitory microcircuits may contribute to cognitive decline in AD. 

      However, several aspects of the study merit further clarification. Most critically, the central paradox, i.e., increased inhibitory input without an apparent change in engram reactivation, remains unresolved. The authors propose possible mechanisms involving altered synchrony or impaired output of engram cells, but these hypotheses require further empirical support. Additionally, the study employs multiple crossed transgenic lines without reporting the progression of amyloid pathology in the mPFC, which is important for interpreting the relationship between circuit dysfunction and disease stage. Finally, the potential contribution of broader network dysfunction, such as spontaneous epileptiform activity reported in APP/PS1 mice, is also not addressed. 

      We thank the reviewer for their evaluation and appreciate the positive assessment of our study’s contributing to understanding remote memory deficits and the dysfunction of inhibitory microcircuits in AD. We also acknowledge the relevant points raised and have revised the manuscript to clarify our interpretations. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Line 68: What are "APP23xPS45" mice? This is most likely a typo.

      This line is a previously reported double transgenic amyloid beta mouse model that was obtained by crossing APP23 (overexpressing human amyloid precursor protein with the Swedish double mutation at position 670/671) with PS45 (carrying a transgene for mutant Presenilin 1, G384A mutation) (Busche et al., 2008; Grienberger et al., 2012). 

      (2) Line 148: The authors should also briefly describe in the main text that APP/PS1 x SST-Cre mice were generated and used here.  

      We thank the reviewer for their comment and have added their suggestion to the main text (line 166-168):

      ‘To do this, APP/PS1 mice were crossed with SST-Cre mice to generate APP/PS1 SST-Cre mice. Following microinjection of AAV-hSyn::DIO-mCherry into the mPFC, recordings were obtained from SST neurons.’

      (3) The discussion should be condensed because of redundancies on several occasions. For example, memory allocation is discussed starting on line 371, then again on line 392. This should be combined. Likewise, how the correlative nature of the findings about PV interneurons could be further functionally addressed is discussed on lines 413 and 454, and should be condensed into one paragraph. 

      We thank the reviewer for this suggestion and have revised the discussion to remove the redundancies as proposed.  

      Reviewer #2 (Recommendations for the authors): 

      To strengthen the manuscript, the following points should be addressed: 

      (1) Quantify amyloid pathology: It is essential to assess amyloid-β levels (soluble and insoluble) in the mPFC of APP/PS1-PV-Cre-tdTomato mice at the studied ages. This would help determine whether the observed circuitlevel changes track with disease progression as seen in canonical APP/PS1 models. 

      We thank the reviewer for this valuable suggestion and agree that assessing Aβ levels in the mPFC is important to determine whether the observed circuit level alterations in APP/PS1 mice coincide with the progression of amyloid pathology. Therefore, we assessed the amyloid plaque load in the mPFC of APP/PS1 mice at 16 and 20 weeks of age (new supplemental figure sFig. 1) and observed no difference in plaque load between these two time points. This suggests that the increased excitability in the mPFC cannot be attributed to differences in plaque load (insoluble amyloid beta).

      In line with this, we previously studied both soluble and insoluble Aβ levels in the CA1 and reported that there are no differences between 12 and 16 weeks of age (Kater et al., 2023), while PV cell hyperexcitability is present at 16 weeks of age (Hijazi et al., 2020a). From 24 weeks onwards, the level of amyloid beta increases. Similarly, Végh et al. (2014) showed using immunoblotting that monomeric and low molecular weight oligomeric forms of soluble Aβ are already present as early as 6 weeks of age and become more prominent at 24 weeks of age. Although the soluble Aβ measurements were performed in the hippocampus, we think these findings can be extrapolated to cortical regions, as the APP and PS1 mutations in APP/PS1 mice are driven by a prion promotor, which should induce consistent expression across brain regions. Data from other research groups support this hypothesis (Kim et al., 2015; Zhang et al., 2011). Thus, large regional differences in soluble Aβ are not expected. The temporal progression suggests that increasing levels of soluble amyloid beta might contribute to the emergence of PV cell hyperexcitability. We have added this point to the manuscript (line 585-591):

      ‘Since amyloid beta plaque load in the mPFC remains comparable between 16- and 20-week-old APP/PS1 mice, the observed increased excitability is unlikely the result of changes in insoluble amyloid beta levels. Previous data from our lab show that soluble amyloid beta is already present as early as 6 weeks of age and becomes more prominent at 24 weeks of age (Kater et al., 2023; Végh et al., 2014). The progressive increase in soluble amyloid beta levels may contribute to the emergence of PV cell hyperexcitability.’

      Finally, we previously compared soluble and insoluble amyloid beta levels in APP/PS1 and APP/PS1 Parv Cre mice and show that these are similar (Hijazi et al., 2020a). While our current study shows the progression of amyloid beta accumulation in APP/PS1 mice, these mice also exhibit altered microcircuitry (enhanced sIPSC frequency on engram cells) at 20 weeks of age, the same age at which we observed PV cell hyperexcitability in APP/PS1 Parv Cre tdTomato mice. This further supports the generalizability of our findings across genotypes, between APP/PS1 and APP/PS1 Parv Cre tdTomato mice. 

      (2) Examine later disease stages: Since the current effects are modest, assessing memory performance, PV cell excitability, and engram inhibition at more advanced stages could clarify whether these alterations become more pronounced with disease progression. 

      We thank the reviewer for this thoughtful suggestion. Investigating advanced disease stages could indeed provide valuable insights into whether the observed alterations in memory performance, PV cell hyperexcitability and engram inhibition become more pronounced over time. Our previous work has shown that changes in pyramidal cell excitability emerge at a later stage than in PV cells, supporting the idea of progressive circuit dysfunction (Hijazi et al., 2020a). However, at these more advanced stages, additional pathological processes, such as an increased gliosis (Janota, Brites, Lemere, & Brito, 2015; Kater et al., 2023) and synaptic loss (Alonso-Nanclares, MerinoSerrais, Gonzalez, & DeFelipe, 2013; Bittner et al., 2012), will likely contribute to both electrophysiological and behavioural measurements. Furthermore, we would like to point out that the current changes observed in memory performance, PV hyperexcitability and increased inhibitory input on engram cells at 16-20 weeks of age are not modest, but already quite substantial. Our focus on these early time points in APP/PS1 mice were intentional, as it helps us understand the initial changes in Alzheimer’s disease at a circuit level and to identify therapeutic targets early intervention. What happens at later stages is certainly of interest, but beyond the scope of this study and should therefore be addressed in future studies. We have incorporated a discussion related to this point into the revised manuscript (line 602-606):

      ‘Moreover, it is relevant to investigate whether changes in PV and PYR cell excitability, as well as input onto engram cells in the mPFC, become more pronounced at later disease stages. Nonetheless, by focussing on early disease timepoints in the present study, we aimed to understand the initial circuit-level changes in AD and identify targets for early therapeutic intervention.’

      (3) Address network hyperexcitability: Spontaneous epileptiform activity has been reported in APP/PS1 mice from 4 months of age (Reyes-Marin & Nuñez, 2017). Including EEG data or discussing this point in relation to your findings would help contextualize the observed inhibitory remodeling within broader network dysfunction. 

      We thank the reviewer for this valuable input and for highlighting the study by Reyes-Marin and Nuñez (2017). In line with this, we recently reported longitudinal local field potential (LFP) recordings in freely behaving APP/PS1 Parv-Cre mice and wild type control animals between the ages of 3 to 12 months (van Heusden et al., 2023). Weekly recordings were performed in the home cage under awake mobile conditions. These data showed no indications of epileptiform activity during wakefulness, consistent with previous findings that epileptic discharges in APP/PS1 mice predominantly occur during sleep (Gureviciene et al., 2019). Recordings were obtained from the prefrontal cortex (PFC), parietal cortex and the hippocampus. In contrast, the study by Reyes-Marin and Nuñez (2017) recorded from the somatosensory cortex in anesthetized animals. Here, during spontaneous recordings, no differences were observed in delta, theta or alpha frequency bands between APP/PS1 and WT mice. Interestingly, we observed an early increase in absolute power, particularly in the hippocampus and parietal cortex from 12 to 24 weeks of age in APP/PS1 mice. In the PFC we found a shift in relative power from lower to higher frequencies and a reduction in theta power. Connectivity analyses revealed a progressive, age-dependent decline in theta/alpha coherence between the PFC and both the parietal cortex and hippocampus. Given the well-established role of PV interneurons network synchrony and coordinating theta and gamma oscillations critical for cognitive function (Sohal, Zhang, Yizhar, & Deisseroth, 2009; Xia et al., 2017), these findings support the idea of early circuit dysfunction in APP/PS1 mice. Our findings, i.e. hyperexcitability of PV cells, align with these LFP based networklevel observations. These data suggest an early shift in the E/I balance, contributing to altered oscillatory dynamics and impaired inter-regional connectivity, possibly leading to alterations in memory. However, whether the observed PV hyperexcitability in our study directly contributes to alterations in power and synchrony remains to be elucidated. Furthermore, it would be interesting to determine the individual contribution of PV cell hyperexcitability in the hippocampus versus the mPFC to network changes and concurrent memory deficits. We have added a statement on network hyperexcitability to the discussion (line 561-565). 

      ‘Interestingly, we recently found a progressive disruption of oscillatory network synchrony between the mPFC and hippocampus in APP/PS1 Parv-Cre mice (van Heusden et al., 2023). However, whether the observed PV cell hyperexcitability directly contributes to changes in inter-regional synchrony, and whether this leads to alterations at a network level, i.e. increased inhibitory input on engram cells, and consequently to memory deficits, remains to be elucidated in future studies.’ 

      (4) Mechanisms responsible for PV hyperexcitability: Related to the previous point, a discussion of the possible underlying mechanisms, e.g., direct effects of amyloid-β, inflammatory processes, or compensatory mechanisms, would strengthen the discussion. 

      We agree with the reviewer that this will strengthen the discussion. We have now added a comprehensive discussion in the revised manuscript to address potential mechanisms responsible for PV cell hyperexcitability (line 579-594).:

      ‘Prior studies have shown that neurons in the vicinity of amyloid beta plaques show increased excitability (Busche et al., 2008). We demonstrated that PV neurons in the CA1 are hyperexcitable and that treatment with a BACE1 inhibitors, i.e. reducing amyloid beta levels, rescues PV excitability (Hijazi et al., 2020a). In line with this, we also reported that addition of amyloid beta to hippocampal slices increases PV excitability, without altering pyramidal cell excitability (Hijazi et al., 2020a). Finally, applying amyloid beta to an induced mouse model of PV hyperexcitability further impairs PV function (Hijazi et al., 2020b). Since amyloid beta plaque load in the mPFC remains comparable between 16- and 20-week-old APP/PS1 mice, the observed increased excitability is unlikely the result of changes in insoluble amyloid beta levels. Previous data from our lab show that soluble amyloid beta is already present as early as 6 weeks of age and becomes more prominent at 24 weeks of age (Kater et al., 2023; Végh et al., 2014). The progressive increase in soluble amyloid beta levels may contribute to the emergence of PV cell hyperexcitability. We hypothesize that the hyperexcitability induced by amyloid beta may result from disrupted ion channel function, as PV neuron dysfunction can result from altered potassium (Olah et al., 2022) and sodium channel activity (Verret et al., 2012).’

      (5) Excitatory-inhibitory balance: While the main focus is on increased inhibition onto engram cells, the reported increase in sEPSC frequency (Figure 5g) across genotypes suggests the presence of excitatory remodelling as well. A brief discussion of how this may interact with increased inhibition would be valuable.  

      We thank the reviewer for this comment regarding the interaction between excitatory and inhibitory remodelling. We have now incorporated this discussion point into the revised manuscript (line 528-534):

      ‘Interestingly, both WT and APP/PS1 mice showed an increase in sEPSC frequency onto engram cells, suggesting that increased excitatory input is a consequence of memory retrieval and not affected by genotype. However, only in APP/PS1 mice, the augmented excitatory input coincided with an elevation of inhibitory input onto engram cells. The resulting imbalance between excitation and inhibition could therefore potentially disrupt the precise control of engram reactivation and contribute to the observed remote memory impairment.’

      References

      Alonso-Nanclares, L., Merino-Serrais, P., Gonzalez, S., & DeFelipe, J. (2013). Synaptic changes in the dentate gyrus of APP/PS1 transgenic mice revealed by electron microscopy. J Neuropathol Exp Neurol, 72(5), 386-395. doi:10.1097/NEN.0b013e31828d41ec

      Bittner, T., Burgold, S., Dorostkar, M. M., Fuhrmann, M., Wegenast-Braun, B. M., Schmidt, B., . . . Herms, J. (2012). Amyloid plaque formation precedes dendritic spine loss. Acta Neuropathologica, 124(6), 797807. doi:10.1007/s00401-012-1047-8

      Busche, M. A., Eichhoff, G., Adelsberger, H., Abramowski, D., Wiederhold, K. H., Haass, C., . . . Garaschuk, O. (2008). Clusters of hyperactive neurons near amyloid plaques in a mouse model of Alzheimer's disease. Science, 321(5896), 1686-1689. doi:10.1126/science.1162844

      Grienberger, C., Rochefort, N. L., Adelsberger, H., Henning, H. A., Hill, D. N., Reichwald, J., . . . Konnerth, A. (2012). Staged decline of neuronal function in vivo in an animal model of Alzheimer's disease. Nat Commun, 3, 774. doi:10.1038/ncomms1783

      Gureviciene, I., Ishchenko, I., Ziyatdinova, S., Jin, N., Lipponen, A., Gurevicius, K., & Tanila, H. (2019). Characterization of Epileptic Spiking Associated With Brain Amyloidosis in APP/PS1 Mice. Front Neurol, 10, 1151. doi:10.3389/fneur.2019.01151

      Hijazi, S., Heistek, T. S., Scheltens, P., Neumann, U., Shimshek, D. R., Mansvelder, H. D., . . . van Kesteren, R. E. (2020a). Early restoration of parvalbumin interneuron activity prevents memory loss and network hyperexcitability in a mouse model of Alzheimer's disease. Mol Psychiatry, 25(12), 3380-3398. doi:10.1038/s41380-019-0483-4

      Hijazi, S., Heistek, T. S., van der Loo, R., Mansvelder, H. D., Smit, A. B., & van Kesteren, R. E. (2020b). Hyperexcitable Parvalbumin Interneurons Render Hippocampal Circuitry Vulnerable to Amyloid Beta. iScience, 23(7), 101271. doi:10.1016/j.isci.2020.101271

      Janota, C. S., Brites, D., Lemere, C. A., & Brito, M. A. (2015). Glio-vascular changes during ageing in wild-type and Alzheimer's disease-like APP/PS1 mice. Brain Res, 1620, 153-168. doi:10.1016/j.brainres.2015.04.056

      Kater, M. S. J., Huffels, C. F. M., Oshima, T., Renckens, N. S., Middeldorp, J., Boddeke, E., . . . Verheijen, M. H. G. (2023). Prevention of microgliosis halts early memory loss in a mouse model of Alzheimer's disease. Brain Behav Immun, 107, 225-241. doi:10.1016/j.bbi.2022.10.009

      Kim, H. Y., Kim, H. V., Jo, S., Lee, C. J., Choi, S. Y., Kim, D. J., & Kim, Y. (2015). EPPS rescues hippocampus-dependent cognitive deficits in APP/PS1 mice by disaggregation of amyloid-β oligomers and plaques. ature Communications, 6(1), 8997. doi:10.1038/ncomms9997

      Olah, V. J., Goettemoeller, A. M., Rayaprolu, S., Dammer, E. B., Seyfried, N. T., Rangaraju, S., . . . Rowan, M. J. M. (2022). Biophysical Kv3 channel alterations dampen excitability of cortical PV interneurons and contribute to network hyperexcitability in early Alzheimer’s. Elife, 11, e75316. doi:10.7554/eLife.75316

      Reyes-Marin, K. E., & Nuñez, A. (2017). Seizure susceptibility in the APP/PS1 mouse model of Alzheimer's disease and relationship with amyloid β plaques. Brain Res, 1677, 93-100. doi:10.1016/j.brainres.2017.09.026

      Sohal, V. S., Zhang, F., Yizhar, O., & Deisseroth, K. (2009). Parvalbumin neurons and gamma rhythms enhance cortical circuit performance. Nature, 459(7247), 698-702. doi:10.1038/nature07991

      van Heusden, F. C., van Nifterick, A. M., Souza, B. C., França, A. S. C., Nauta, I. M., Stam, C. J., . . . van Kesteren, R. E. (2023). Neurophysiological alterations in mice and humans carrying mutations in APP and PSEN1 genes. Alzheimers Res Ther, 15(1), 142. doi:10.1186/s13195-023-01287-6

      Végh, M. J., Heldring, C. M., Kamphuis, W., Hijazi, S., Timmerman, A. J., Li, K. W., . . . van Kesteren, R. E. (2014). Reducing hippocampal extracellular matrix reverses early memory deficits in a mouse model of Alzheimer's disease. Acta Neuropathol Commun, 2, 76. doi:10.1186/s40478-014-0076-z

      Verret, L., Mann, E. O., Hang, G. B., Barth, A. M., Cobos, I., Ho, K., . . . Palop, J. J. (2012). Inhibitory interneuron deficit links altered network activity and cognitive dysfunction in Alzheimer model. Cell, 149(3), 708-721. doi:10.1016/j.cell.2012.02.046

      Xia, F., Richards, B. A., Tran, M. M., Josselyn, S. A., Takehara-Nishiuchi, K., & Frankland, P. W. (2017). Parvalbumin-positive interneurons mediate neocortical-hippocampal interactions that are necessary for memory consolidation. Elife, 6. doi:10.7554/eLife.27868

      Zhang, W., Hao, J., Liu, R., Zhang, Z., Lei, G., Su, C., . . . Li, Z. (2011). Soluble Aβ levels correlate with cognitive deficits in the 12-month-old APPswe/PS1dE9 mouse model of Alzheimer's disease. Behavioural Brain Research, 222(2), 342-350. doi:https://doi.org/10.1016/j.bbr.2011.03.072

  2. minio.la.utexas.edu minio.la.utexas.edu
    1. o put it in the terms of St. Thomas Aquinas:An unjust law is a human law that is not rooted in eternal law and natural law.

      My favorite law is LAW, love always wins.

    1. La ceguera de género en las encuestas mexicanas sobre discriminación hacia a las personas inmigrantes
      1. “al no incorporar una perspectiva de género en los cuestionarios, retratan una realidad equivocada.” pág. 181 Si bien la autora señala que la falta de incorporación de la perspectiva de género en los instrumentos de medición conduce a que los cuestionarios de las encuestas sobre migración y discriminación produzcan datos incompletos. Esto quiere decir, que, los resultados no reflejan diferencias sistemáticas entre hombres y mujeres migrantes, lo que a su vez limita la capacidad de conocer plenamente las dinámicas de discriminación basada en género dentro del fenómeno migratorio. En pocas palabras, si las encuestas no preguntan ¿Eres hombre o mujer migrante? o ¿Cómo afecta tu género tu experiencia migratoria?, entonces pasan por alto que las mujeres migrantes podrían estar viviendo discriminaciones distintas o adicionales. Es como si miráramos sólo la parte visible del iceberg y creyéramos que todo lo que importa está ahí arriba.

      2. “no se considera la manera en que el género interseca con otras formas de desigualdad, como la nacionalidad o la clase social.” pág. 180 El texto resalta que la ceguera de género no sólo ignora las diferencias entre hombres y mujeres, sino que también desconoce la interseccionalidad. Es decir, el género se cruza con otros factores estructurales (como clase, etnicidad o estatus migratorio) que agravan las condiciones de vulnerabilidad. Su omisión impide comprender cómo las mujeres migrantes enfrentan formas múltiples y simultáneas de discriminación. Esto me parece clave, porque no todas las mujeres migrantes viven lo mismo, pues no es igual una mujer centroamericana pobre que una extranjera con más recursos. Ignorar esas diferencias es como decir que la discriminación afecta a todos igual, y eso borra las desigualdades que más pesan en la vida real.

      3. “la de un país al que sólo llegan y por el que sólo transita una migración masculina.” pág. 180 Ahora bien, al describir el efecto de la “ceguera de género”, la autora observa que las encuestas reproducen un imaginario migratorio masculinizado, ya que, se asume implícitamente que la migración es mayoritariamente masculina. Esta visión sesgada impide reconocer la presencia, la situación y las necesidades específicas de las mujeres migrantes, reproduciendo una invisibilidad de género en los datos y, por ende, en las políticas públicas que podrían derivar de esos datos. Me hace pensar que muchas veces cuando escuchamos la palabra “los migrantes”, quizás pensamos automáticamente en hombres jóvenes. Pero, ¿Qué pasa con las mujeres que migran solas o en contexto familiar? Si los instrumentos no lo captan, es como si ni siquiera estuvieran consideradas. Y eso tiene consecuencias porque si no se mide, no se visibiliza, y si no se visibiliza, difícilmente se atiende.

    2. De los muchos mitos que existen respecto a México y su sociedad, quizá dos de los más perversos son que el racismo no existe y que somos un país de puertas abiertas frente a la inmigración extranjera.

      Estoy de acuerdo es un mito que no existe el racismo, aunque muchas personas creen esto o están escépticas a la idea , es importante verlo de manera mas critica ya que en México sí hay racismo y discriminación.

    1. Art. 735

      REsp 1747637 / SP

      DIREITO CIVIL. RECURSO ESPECIAL. AÇÃO DE INDENIZAÇÃO POR DANOS MORAIS. ATO LIBIDINOSO PRATICADO CONTRA PASSAGEIRA NO INTERIOR DE UMA COMPOSIÇÃO DE METRÔ NA CIDADE DE SÃO PAULO/SP ("ASSÉDIO SEXUAL"). RESPONSABILIDADE DA TRANSPORTADORA. NEXO CAUSAL. ROMPIMENTO. FATO EXCLUSIVO DE TERCEIRO. CONEXIDADE COM A ATIVIDADE DE TRANSPORTE. RESPONSABILIDADE DA CPTM. 1. Ação ajuizada em 02/07/2014. Recurso especial interposto em 28/10/2015 e distribuído ao Gabinete em 31/03/2017. 2. O propósito recursal consiste em definir se a concessionária do metrô da cidade de São Paulo/SP deve responder pelos danos morais sofridos por passageira que foi vítima de ato libidinoso ou assédio sexual praticado por outro usuário, no interior de um vagão. 3. A cláusula de incolumidade é ínsita ao contrato de transporte, implicando <u>obrigação de resultado</u> do transportador, consistente em levar o passageiro com conforto e segurança ao seu destino, salvo se demonstrada causa de exclusão do nexo de causalidade, notadamente o caso fortuito, a força maior ou a culpa exclusiva da vítima ou de terceiro. 4. O fato de terceiro, conforme se apresente, pode ou não romper o nexo de causalidade. Exclui-se a responsabilidade do transportador quando a conduta praticada por terceiro, sendo causa única do evento danoso, não guarda relação com a organização do negócio e os riscos da atividade de transporte, equiparando-se a fortuito externo. De outro turno, a culpa de terceiro não é apta a romper o nexo causal quando se mostra conexa à atividade econômica e aos riscos inerentes à sua exploração, caracterizando fortuito interno. 5. Na hipótese, conforme consta no acórdão recorrido, a recorrente foi vítima de ato libidinoso praticado por outro passageiro do trem durante a viagem, isto é, um conjunto de atos referidos como assédio sexual. 6. É evidente que ser exposta a assédio sexual viola a cláusula de incolumidade física e psíquica daquele que é passageiro de um serviço de transporte de pessoas. 7. Na hipótese em julgamento, a ocorrência do assédio sexual guarda conexidade com os serviços prestados pela recorrida CPTM e, por se tratar de fortuito interno, a transportadora de passageiros permanece objetivamente responsável pelos danos causados à recorrente. Precedente. 8. Recurso especial não provido.


      AgInt no AgInt no AREsp 2152026 / CE

      CIVIL. AGRAVO INTERNO NO AGRAVO INTERNO NO AGRAVO EM RECURSO ESPECIAL. AUSÊNCIA DE VIOLAÇÃO DOS ARTS. 489 E 1.022 DO CPC. OMISSÕES INEXISTENTES. INDENIZAÇÃO POR DANOS MORAIS E ESTÉTICOS. ACIDENTE DE TRÂNSITO. TRANSPORTE COLETIVO. RESPONSABILIDADE CONTRATUAL OBJETIVA. CLÁUSULA DE INCOLUMIDADE DOS PASSAGEIROS. EXCLUDENTE DE RESPONSABILIDADE INEXISTENTE NO CASO CONCRETO. CULPA DE TERCEIRO. FORTUITO INTERNO. RISCO DA ATIVIDADE. VALOR DA INDENIZAÇÃO. EXCESSO NÃO CARACTERIZADO. REEXAME DE FATOS E PROVAS. IMPOSSIBILIDADE. SÚMULA N. 7/STJ. 1. Não se reconhecem a omissão e negativa de prestação jurisdicional quando há o exame, de forma fundamentada, de todas as questões submetidas à apreciação judicial na medida necessária para o deslinde da controvérsia, ainda que em sentido contrário à pretensão da parte. Ausência de violação dos arts. 489 e 1.022 do CPC. 2. Nos termos da jurisprudência desta Corte, a responsabilidade do transportador em relação aos passageiros é contratual e objetiva, somente podendo ser elidida por fortuito externo, força maior, fato exclusivo da vítima ou por fato doloso e exclusivo de terceiro - quando este não guardar conexão com a atividade de transporte. Precedentes. 3. O ato culposo de terceiro, conexo com a atividade do transportador e relacionado com os riscos próprios do negócio, caracteriza o fortuito interno, inapto a excluir a responsabilidade do transportador. 4. Hipótese em que o acidente de trânsito é risco inerente à exploração da atividade econômica de modo que, mesmo que causados exclusivamente por ato culposo de terceiro, são considerados fortuitos internos, incapazes de excluir a responsabilidade civil do transportador quanto à incolumidade dos passageiros. 5. O valor arbitrado a título de reparação civil observou os critérios de proporcionalidade e de razoabilidade, além de estar compatível com as circunstâncias narradas no acórdão e sua eventual redução demandaria, por consequência, o reexame de fatos e provas, o que é vedado em recurso especial ante o óbice da Súmua n. 7/STJ. Agravo interno improvido.


      1. "Na linha dos precedentes desta Corte, acidentes ocorridos em auto-estradas, mesmo por culpa exclusiva de terceiros, são considerados fortuitos internos, incapazes, por isso, de afastar a responsabilidade Civil do transportador." (AgRg nos EDcl no REsp 1318095/MG, Rel. Ministro SIDNEI BENETI, TERCEIRA TURMA, julgado em 19/06/2012, DJe 27/06/2012).
    2. autorização

      Observe que o trespasse, por si só, importa, automaticamente, em vedação ao alienante em fazer concorrência.

      Não é o contrato que proíbe a concorrência, mas a própria lei. As partes podem, no entanto, autorizar que o alienante faça concorrência.

    3. entender-se-á dado o assentimento

      Tratando-se de hipoteca, excepciona-se a regra do art. 299, pú, visto que o silêncio nessa situação será interpretado como assentimento.

    4. silêncio

      No contexto de transmissão de obrigação, no que tange à assunção de dívida, o silêncio protege o credor. Isto é, acaso não manifeste expressa concordância quanto à assunção da dívida, será interpretado como recusa.

    5. por sua conta correrão os riscos

      Se comprador ordenar o transporte da coisa, será por sua conta e risco. A exceção é quando o vendedor não cumprir com as determinações e causar prejuízo.

    6. lugar onde ela se encontrava

      Contrato de compra e venda que não estipular o local da tradição, será presumido que esse local é onde a está a coisa.

    7. imprevisíveis

      PROCESSUAL CIVIL. RECURSO ESPECIAL. AÇÃO REVISIONAL DE CONTRATO DE ALUGUEL ENTRE SHOPPING CENTER E LOJISTA. SUPERVENIÊNCIA DA PANDEMIA DECORRENTE DA COVID-19. CONTRATOS PARITÁRIOS. REGRA GERAL. PRINCÍPIO DO PACTA SUNT SERVANDA. POSSIBILIDADE DE REVISÃO. HIPÓTESES EXCEPCIONAIS. PREVISÃO DO ART. 317 DO CÓDIGO CIVIL. TEORIA DA IMPREVISÃO. ART. 478 DO CÓDIGO CIVIL. TEORIA DA ONEROSIDADE EXCESSIVA. RESOLUÇÃO. INTERPRETAÇÃO SISTEMÁTICA E TELEOLÓGICA DO DISPOSITIVO QUE AUTORIZA TAMBÉM A REVISÃO. PANDEMIA DA COVID-19 QUE CONFIGURA, EM TESE, EVENTO IMPREVISÍVEL E EXTRAORDINÁRIO APTO A POSSIBILITAR A REVISÃO DO CONTRATO DE ALUGUEL, DESDE QUE PREENCHIDOS OS DEMAIS REQUISITOS LEGAIS. HIPÓTESE DOS AUTOS. AUSÊNCIA DE COMPROVAÇÃO. MANUTENÇÃO DA DECISÃO RECORRIDA.

      • 1. Ação revisional de contrato de aluguel entre shopping center e lojista, ajuizada em 20/4/2020, da qual foi extraído o presente recurso especial, interposto em 30/8/2022 e concluso ao gabinete em 20/10/2022.

      • 2. O propósito recursal consiste em decidir se é cabível a revisão de contrato de aluguel firmado entre shopping center e lojista, com fundamento nas teorias da imprevisão (art. 317 do CC) e onerosidade excessiva (art. 478 do CC), em razão da superveniência da pandemia do coronavírus.

      • 3. Nos contratos empresariais deve ser conferido especial prestígio aos princípios da liberdade contratual e do pacta sunt servanda, diretrizes positivadas no art. 421, caput, e 421-A do Código Civil, incluídas pela Lei nº 13.874/2019.

      • 4. Nada obstante, o próprio diploma legal consolidou hipóteses de revisão e resolução dos contratos (317, 478, 479 e 480 do CC). Com amparo doutrinário, verifica-se que o art. 317 configura cláusula geral de <u>revisão</u> da prestação contratual e que a interpretação sistêmica e teleológica dos arts. 478, 479 e 480 autorizam também a revisão judicial do pactuado.

      • 5. A <u>Teoria da Imprevisão</u> (art. 317 do CC), de matriz francesa, exige a comprovação dos seguintes requisitos: (I) obrigação a ser adimplida em momento posterior ao de sua origem; (II) superveniência de evento imprevisível; (III) que acarrete desproporção manifesta entre o valor da prestação devida e o do momento de sua execução. A pedido da parte, o juiz poderá corrigir o valor da prestação, de modo a assegurar, quanto possível, o seu valor real.

      • 6. A <u>Teoria da Onerosidade Excessiva</u> (art. 478 do CC), de origem italiana, pressupõe (I) contratos de execução continuada ou diferida; (II) superveniência de acontecimento extraordinário e imprevisível; (III) que acarrete prestação excessivamente onerosa para uma das partes; (IV) extrema vantagem para a outra; e (V) inimputabilidade da excessiva onerosidade da prestação ao lesado.

      Possibilidade de flexibilização da "extrema vantagem".

      • 7. A pandemia da Covid-19 configura crise sanitária sem precedentes, que não apenas colocou em risco, mas também resultou, lamentavelmente, na perda de incontáveis vidas. Diante do cenário emergencial, garantiu-se às autoridades públicas, no âmbito de suas competências, a adoção de medidas necessárias para tentar preservar, ao máximo, a saúde e a vida das pessoas (Lei nº 13.979/2020). Nesse contexto, entes da Federação decretaram a suspensão de atividades e do funcionamento de estabelecimentos comerciais e industriais (lockdown), entre os quais se destacam, por exemplo, o atendimento ao público em shopping centers - excepcionados, muitas vezes, os supermercados, laboratórios, clínicas de saúde e farmácias neles existentes.

      • 8. A situação de pandemia não constitui, por si só, justificativa para o inadimplemento da obrigação, mas é circunstância que, por sua imprevisibilidade, extraordinariedade e por seu grave impacto na situação socioeconômica mundial, não pode ser desprezada pelos contratantes, tampouco pelo Poder Judiciário. Desse modo, a revisão de contratos paritários com fulcro nos eventos decorrentes da pandemia não pode ser concebida de maneira abstrata, mas depende, sempre, da análise da relação contratual estabelecida entre as partes, sendo imprescindível que a pandemia tenha interferido de forma substancial e prejudicial na relação negocial.

      • 9. A superveniência de doença disseminada mundialmente, que, na tentativa de sua contenção, ocasionou verdadeiro lockdown econômico e isolamento social, qualifica-se como evento imprevisível, porquanto não foi prevista, conhecida ou examinada pelos contratantes quando da celebração do negócio jurídico, e extraordinário, pois distante da álea e das consequências ínsitas e objetivamente vinculadas ao contrato.

      • 10. Conclui-se que a pandemia ocasionada pela Covid-19 pode ser qualificada como evento imprevisível e extraordinário apto a autorizar a revisão dos aluguéis em contratos estabelecidos pelo shopping center e seus lojistas, desde que verificados os demais requisitos legais estabelecidos pelo art. 317 ou 478 do Código Civil.

      • 11. Na mesma linha de raciocínio, esta Corte permitiu a revisão proporcional de aluguel em razão das consequências particulares da pandemia da Covid-19 em relação à empresa de coworking, cujo faturamento foi drasticamente reduzido no período pandêmico (REsp 1.984.277/DF, Quarta Turma, DJe 9/9/2022).

      • 12. Hipótese em que o contexto fático delineado pelo Tribunal de origem, soberano no exame do acervo fático-probatório, demostra não estar caracterizado o desequilíbrio na relação locatícia no contrato estabelecido entre o shopping center (recorrido) e o lojista (recorrente), pois não verificada a desproporção (art. 317) ou a excessiva onerosidade (art. 478) na prestação in concreto. Ao contrário, o acórdão estadual afirma que o recorrido concedeu desconto substancial no valor do aluguel em razão do cenário pandêmico de suspensão das atividades econômicas. Ausentes os requisitos legais, não há possibilidade de revisão do contrato. Necessidade de manutenção da decisão.

      • 13. Recurso especial conhecido e desprovido.

      (REsp n. 2.032.878/GO, relatora Ministra Nancy Andrighi, Terceira Turma, julgado em 18/4/2023, DJe de 20/4/2023.)

    8. condições impossíveis

      CONDIÇÕES IMPOSSÍVEIS

      • Se suspensi<u>V</u>as: in<u>V</u>alidam o negócio jurídico;
      • Se resolu<u>T</u>ivas: são inexis<u>T</u>en<u>T</u>es no negócio jurídico.

      Essa lógica faz todo o sentido, considerando que:

      • A cláusula resolutiva não impede o exercício ou a aquisição do direito. Implementada a condição resolutiva, o negócio é extinto. Com isso, se a cláusula resolutiva é impossível, apenas se risca do negócio jurídico, já que não o afetará e jamais ocorrerá de fato.

      • Situação é outra quanto à cláusula suspensiva, a qual impede a aquisição do direito até a sua implementação. Logo, se negócio jurídico é subordinado à cláusula suspensiva impossível, ele jamais se concretizará, razão pela qual a lei decreta a invalidade da avença.

    1. Why 2025 Signals the End of the Traditional Event LoopWith multishot receives and accepts, an io_uring-based server can initialize its I/O intents once and then simply process completions in a loop. The “event loop” becomes a completion-processing loop,

      section header says 2025, but 6.0 was released in October 2022. still, exciting shift, and phase shifts tend to happen slower.

    1. Tato část zprávy si klade za cíl odpovědět na jednu z nejčastějších otázek v debatě o bydlení: Chybí v Česku byty? A pokud ano, kolik jich chybí?

      S ohledem na to, k čemu zpráva nakonec došla, by bylo taktičtější trochu mírnit tyto cíle.

  3. www.planalto.gov.br www.planalto.gov.br
    1. cinco dias

      Mesma disposição do CPC quanto à inexistência de prazo específico. Com isso, o ato deve ser praticado em até 5 dias.

    1. I

      ADMINISTRATIVO. ENUNCIADO ADMINISTRATIVO N. 2/STJ. SERVIDOR PÚBLICO ESTADUAL. APOSENTADORIA POR INVALIDEZ. REVERSÃO. INSUBSISTÊNCIA DOS MOTIVOS GERADORES DA INCAPACIDADE LABORAL. POSSIBILIDADE. DECADÊNCIA. INOCORRÊNCIA. TEORIA DA ACTIO NATA. - 1. Não há óbices ao conhecimento dos recursos especiais submetidos a esta Corte Superior pelo Estado e pela Assembleia recorrente. - 2. A aposentadoria por invalidez é de ordem <u>temporária</u>. - 3. Verificada a insubsistência dos motivos geradores da incapacidade laboral, deve a Administração Pública proceder à reversão ao serviço público de servidor aposentado por invalidez. - 4. "O servidor aposentado por invalidez poderá ser convocado a qualquer momento para reavaliação das condições que ensejaram a aposentadoria, procedendo-se à reversão, com o seu retorno à atividade, quando a junta médica oficial declarar insubsistentes os motivos da aposentadoria (...)" (MS 15.141/DF, Rel. Ministro HAMILTON CARVALHIDO, CORTE ESPECIAL, DJe 24/05/2011), - 5. A pretensão somente se inicia com a <u>ciência da insubsistência dos motivos</u> que ensejaram a aposentadoria, uma vez que, aqui, não se está diante de anulação ou revogação do ato originário concessivo. - 6. "O curso do prazo prescricional do direito de reclamar inicia-se somente quando o titular do direito subjetivo violado passa a conhecer o fato e a extensão de suas conseqüências, conforme o princípio da 'actio nata'" (REsp 1257387/RS, Rel. Ministra ELIANA CALMON, SEGUNDA TURMA, DJe 17/09/2013). - 7. Embargos de declaração acolhidos como agravos regimentais, agravos regimentais não providos.

      (EDcl no REsp n. 1.443.365/SC, relator Ministro Mauro Campbell Marques, Segunda Turma, julgado em 10/5/2016, DJe de 16/5/2016.)

    1. Author response:

      Reviewer #1 (Public review):

      Major Concerns:

      (1) Lack of Direct Evidence for RadD-NKp46 Interaction

      The central claim that RadD interacts with NKp46 is not formally demonstrated. A direct binding assay (e.g., Biacore, ELISA, or pull-down with purified proteins) is essential to support this assertion. The absence of this fundamental experiment weakens the mechanistic conclusions of the study.

      The reviewer is correct. Direct assays are currently quite impossible because RadD is huge protein and it will take years to purify it. Instead, we used immunoprecipitation assays using NKp46-Ig (Author response images 1 and 2). Fusobacteria were lysed using RIPA buffer, and the lysates were centrifuged twice to separate the supernatant from the pellet (which contains the bacterial membranes). The resulting lysates were incubated overnight with 2.5 µg of purified NKp46 and protein G-beads. After thorough washing, the bound proteins were placed in sample buffer and heated at 95 °C for 8 minutes. The eluates were run on a 10% acrylamide gel and visualized by Coomassie blue staining. As can be seen the NKp46-Ig was able to precipitate protein band around 350Kd in both F. polymorphum ATCC10953 (Author response image 1) and in F. nucleatum ATCC23726 (Author response image 2).

      Author response image 1. NKp46 immunoprecipitation with Fusobacterium polymorphum (ATCC 10953) lysates. The resulting lysates of supernatant and pellet of Fusobacterium were immunoprecipitated (IP) with 2.5 μg of control fusion protein (RBD-Ig) or with NKp46-Ig. A 2.5 μg of purified fusion proteins were also run on gel.

      Author response image 2. NKp46 immunoprecipitation with Fusobacterium nucleatum (ATCC 23726) lysates. The resulting lysates of supernatant and pellet of Fusobacterium were immunoprecipitated (IP) with 2.5 μg of Control fusion protein (RBD-Ig) or with NKp46-Ig. 2.5 μg of purified fusion proteins were also run on gel.

      (2) Figure 2: Binding Specificity and Bacterial Strains

      A CEACAM1-Ig control should be included in all binding experiments to distinguish between specific and non-specific Ig interactions. There is differential Ig binding between strains ATCC 23726 and 10953. The authors should quantify RadD expression in each strain to determine if the difference in binding is due to variation in RadD levels.

      No significant difference in mCEACAM-1-Ig binding was observed across multiple independent experiments. Author response image 3 shows a representative histogram showing mCEACAM-1-Ig binding to F. nucleatum ATCC 23726 and F. polymorphum ATCC 10953. Comparable binding levels were detected in both bacterial species (upper histogram). Similarly, NKp46-Ig and Ncr1-Ig fusion proteins exhibited comparable binding patterns (lower histogram). It is currently not possible to quantify RadD expression directly, as no anti-RadD antibody is available.

      Author response image 3. CEACAM-1 Ig binding to Fusobacterium ATCC 23726 and ATCC 10953. Upper histograms show staining with secondary antibody alone (gray) compared to CEACAM-1 Ig (black line). Lower histograms show binding of NKp46 and Ncr1 fusion proteins to the two Fusobacterium strains. Gray represent secondary antibody controls.

      (3) Figure 3: Flow Cytometry Inconsistencies and Missing Controls

      What do the FITC-negative, Ig-negative events represent? The authors should clarify whether these are background signals, bacterial aggregates, or debris.

      We now present the gating strategy used in these experiments (Author response image 4). Fusion negative Ig samples were the bacterial samples stained only with the secondary antibody APC (anti-human AF647). The TITC-negative represent unlabeled bacteria.

      Author response image 4. Gating strategy for FITC-labeled Fusobacterium stained with fusion proteins. Bacteria were first gated as shown in the left panel. The gated population was then further analyzed in the right plot: the lower-left quadrant represents bacterial debris, the upper-left quadrant corresponds to FITC-stained bacteria only, and the upper-right quadrant shows bacteria double-positive for FITC and APC, indicating binding of the fusion proteins.

      Panel B, CEACAM1-Ig binding appears markedly increased compared to WT bacteria. The reason for this enhancement should be discussed-does it reflect upregulation of the bacterial ligand or an artifact of overexpression? Fluorescence compensation should be carefully reviewed for the NKp46/NCR1-Ig binding assays to ensure that the signals are not due to spectral overlap or nonspecific binding. Importantly, binding experiments using the FadI/RadD double knockout strain are missing and should be included. This control is essential.

      We don’t know why expression of CEACAM1-Ig binding is increased. Indeed, it will be nice to have the FadI/RadD double knockout strain which we currently don’t have.

      In Panel E, the basis for calculating fold-change in MFI is unclear. Please indicate the reference condition to which the change is normalized.

      The mean fluorescence intensity (MFI) fold change was calculated by dividing the MFI obtained from staining with the fusion proteins by the MFI of the corresponding secondary antibody control (bacteria incubated without fusion proteins).

      (4) Figure 4: Binding Inhibition and Receptor Sensitivity

      Panel A lacks representative FACS plots and is currently difficult to interpret.

      Fusobacteria binding to CEACAM-1, NKp46, and NCR1 fusion proteins was tested in the presence of 5 and 10 mM L-arginine (Author response image 5). L-arginine inhibited the binding of NKp46-Ig and NCR1-Ig, whereas no effect was observed on CEACAM-1-Ig binding.

      Author response image 5. Fusobacterium binding inhibition by L-Arginine. The figure shows the binding of CEACAM1-Ig (left panel), NKp46-Ig (middle panel), and Ncr1-Ig (right panel) in the presence of 0 mM (black), 5 mM (red), and 10 mM (blue) L-arginine.

      Differences in the sensitivity of human vs. mouse NKp46 to arginine inhibition should be discussed, given species differences in receptor-ligand interactions.

      Ncr1, the murine orthologue of human NKp46, shares approximately 58% sequence identity with its human counterpart (1). The observed differences in arginine-mediated inhibition of bacterial binding between mouse and human NKp46 might stem from structural differences or distinct posttranslational modifications, such as glycosylation. Indeed, prediction algorithms combined with high-performance liquid chromatography analysis revealed that Ncr1 possesses two putative novel O-glycosylation sites, of which only one is conserved in humans (2).

      References

      (1) Biassoni R., Pessino A., Bottino C., Pende D., Moretta L., Moretta A. The murine homologue of the human NKp46, a triggering receptor involved in the induction of natural cytotoxicity. Eur J Immunol. 1999 Mar; 29(3).

      (2) Glasner A., Roth Z., Varvak A., Miletic A., Isaacson B., Bar-On Y., Jonjić S., Khalaila I., Mandelboim O. Identification of putative novel O-glycosylations in the NK killer receptor Ncr1 essential for its activity. Cell Discov. 2015 Dec 22; 1:15036.

      What are the inhibition results using F. nucleatum strains deficient in FadI?

      The inhibition pattern observed in the F. nucleatum ΔFadI mutant was comparable to that of the wild-type strain (Author response image 6). When cultured under identical conditions and exposed to increasing concentrations of arginine (0, 5, and 10 mM), the F. nucleatum ΔFadI strain also demonstrated a dose-dependent reduction in binding to NKp46 and Ncr1.

      Author response image 6. Arginine inhibition of NKp46-Ig and Ncr1-Ig binding in F. nucleatum ΔFadI. Histograms show NKp46-Ig (A, C) and Ncr1-Ig (B, D) binding to F. nucleatum ATCC10953 ΔFadI (A and B) and to F. nucleatum ATCC23726 ΔFadI (A and B) following exposure to 5 mM and 10 mM L-Arginine. Panels (E) and (F) display the mean fluorescence intensity (MFI) quantification corresponding to (A and B) and (C and D), respectively.

      In Panel B, CEACAM1-Ig and RadD-deficient bacteria must be included as negative controls for binding specificity upon anti-NKp46 blocking.

      We appreciate the request to include CEACAM1-Ig and RadD-deficient bacteria as negative controls for specificity under anti-NKp46 blocking. We don’t not think it is necessary since the 02 antibody is specific for NKp46, we used other anti0NKp46 antibodies that did not block the interaction and an irrelevant antibofy, we showed that arginine produced a dose-dependent reduction in NKp46/Ncr1 binding, consistent with an arginine-inhibitable RadD interaction already shown in our manuscript (Fig. 4A). The ΔRadD strains we used already demonstrate loss of NKp46/Ncr1 binding and loss of NK-boosting activity (Figs. 3, 5). Collectively, these data establish that NKp46/Ncr1 recognition of a high-molecular-weight ligand consistent with RadD is specific and functionally relevant.

      Figure 5: Functional NK Activation and Tumor Killing

      In Panels B and C, the key control condition (NK cells + anti-NKp46, without bacteria) is missing. This is needed to evaluate if NKp46 recognition is involved in tumor killing. The authors should explicitly test whether pre-incubation of NK cells with bacteria enhances their anti-tumor activity.

      No significant difference in NK cell cytotoxicity was observed between untreated NK cells and NK cells incubated with anti-NKp46 antibody in the absence of bacteria. Therefore, the NK + anti-NKp46 (O2) group was included as an additional control alongside the other experimental conditions shown in Figures 5b and 5c, and is presented in Author response image 7 below.

      Author response image 7. NK cytotoxicity against breast cancer cell lines. NK cell cytotoxicity against T47D (left) and MCF7 (right) breast cancer cell lines. This experiment follows the format of Figure 5b and 5c, with the addition of the NK cells + O2 antibody group. No significant differences were observed when values were normalized to NK cells alone.

      Could bacteria induce stress signals in tumor cells that sensitize them to NK killing? This distinction is critical.

      It remains unclear whether the bacteria induce stress-related signals in tumor cells that render them more susceptible to NK cell–mediated cytotoxicity.

      (6) Figure 5D: Mechanism of Peripheral Activation

      It is suggested that contact between bacteria and NK cells in the periphery leads to their activation. Can the authors confirm whether this pre-activation leads to enhanced killing of tumor targets, or if bacteria-tumor co-localization is required? The literature indicates that F. nucleatum localizes intracellularly within tumor cells. If so, how is RadD accessible to NKp46 on infiltrating NK cells?

      We do not expect that pre-activation of NK cells with bacteria would enhance their tumor-killing capacity. In fact, when NK cells were co-incubated with bacteria, we occasionally observed NK cell death. Although F. nucleatum can reside intracellularly, bacterial entry requires prior adhesion to tumor cells. At this stage—before internalization—the bacteria are accessible for recognition and binding by NK cells.

      (8) Figure 5E and In Vivo Relevance

      Surprisingly, F. nucleatum infection is associated with increased tumor burden. Does this reflect an immunosuppressive effect? Are NK cells inhibited or exhausted in infected mice (TGIT, SIGLEC7...)? If NK cell activation leads to reduced tumor control in the infected context, the role of RadD-induced activation needs further explanation. RadD-deficient bacteria, which do not activate NK cells, result in even poorer tumor control. This paradox needs to be addressed: how can NK activation impair tumor control while its absence also reduces tumor control?

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. The increased tumor burden observed in infected mice may therefore result from bacterial interference with immune cell infiltration and accumulation within the tumor microenvironment (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun 11, 3259 (2020)). Consequently, the NK cells that do reach the tumor site can recognize and kill F. nucleatum–bearing tumor cells through RadD–NKp46 interactions. In the absence of RadD, this recognition is impaired, leading to reduced NK-mediated cytotoxicity and increased tumor growth.

      (9) NKp46-Deficient Mice: Inconsistencies

      In Ncr1⁻/⁻ mice, infection with WT or RadD-deficient F. nucleatum has no impact on tumor burden. This suggests that NKp46 is dispensable in this context and casts doubt on the physiological relevance of the proposed mechanism. This contradiction should be discussed more thoroughly.

      Ncr1 is also directly involved in mediating NK cell–dependent killing of tumor cells, even in the absence of bacterial infection. Therefore, in Ncr1-deficient mice, F. nucleatum has no additional effect on tumor progression (Glasner, A., Ghadially, H., Gur, C., Stanietsky, N., Tsukerman, P., Enk, J., Mandelboim, O. Recognition and prevention of tumor metastasis by the NK receptor NKp46/NCR1. J Immunol. 2012).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A previous study by this group (PMID: 38952680) demonstrated that RadD of F. nucleatum binds to NK cells via Siglec-7, thereby diminishing their cytotoxic potential. They further proposed that the RadD-Siglec-7 interaction could act as an immune evasion mechanism exploited by tumor cells. In contrast, the present study reports that RadD of F. nucleatum can also bind to the activating receptor NKp46 on NK cells, thereby enhancing their cytotoxic function.

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. In contrast, NKp46 and its murine homologue, Ncr1, both recognize and bind the bacterium.

      While F. nucleatum-mediated tumor progression has been documented in breast and colon cancers, the current study proposes an NK-activating role for F. nucleatum in HNSC. However, it remains unclear whether tumor-infiltrating NK cells in HNSC exhibit differential expression of NKp46 compared to Siglec-7. Furthermore, heterogeneity within the NK cell compartment, particularly in the relative abundance of NKp46⁺ versus Siglec-7⁺ subsets, may differ substantially among breast, colon, and HNSC tumors. Such differences could have been readily investigated using publicly available single-cell datasets. A deeper understanding of this subset heterogeneity in NK cells would better explain why F. nucleatum is passively associated with a favorable prognosis in HNSC but correlates with poor outcomes in breast and colon cancers.

      Currently, there are no publicly available single-cell datasets suitable for characterizing NK cell heterogeneity in the context of F. nucleatum infection—particularly regarding the expression of Siglec-7, NKp46, or CEACAM1 and their potential association with poor clinical outcomes in breast, head and neck squamous cell carcinoma (HNSC), or colorectal cancer (CRC). Furthermore, no RNA-seq datasets are available for breast cancer cases specifically associated with F. nucleatum infection and poor prognosis. Therefore, we analyzed bulk RNA expression datasets for Siglec-7 and CEACAM1 and evaluated their associations with HNSC and CRC using the same patient databases utilized in our manuscript (Author response image 8). No significant differences in Siglec-7 expression were detected between HNSC and CRC samples (Author response image 8A). Although CEACAM1 mRNA levels did not differ between F. nucleatum–positive and –negative cases within either cancer type, its overall expression was higher in CRC compared to HNSC (Author response image 8B).

      Author response image 8. Siglec7 and Ceacam1 expression and the prognostic effect of F. nucleatum in a tumor-type-specific manner. Comparison of Siglec7 (A) and Ceacam1 (B) expression across HNSC and CRC tumors. Log₂ expression levels of NKp46 mRNA were compared across HNSC and CRC cohorts, stratified by F. nucleatum positive and negative. Results were analyzed by one-way ANOVA with Bonferroni post hoc correction.

      (2) The in vivo tumor data (Figure 5D-F) appear to contradict the authors' claims. Specifically, Figure 5E suggests that WT mice engrafted with AT3 breast tumors and inoculated with WT F. nucleatum exhibited an even greater tumor burden compared to mice not inoculated with F. nucleatum, indicating a tumor-promoting effect. This finding conflicts with the interpretation presented in both the results and discussion sections.

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. The increased tumor burden observed in infected mice may therefore result from bacterial interference with immune cell infiltration and accumulation within the tumor microenvironment (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun 11, 3259 (2020)). Consequently, the NK cells that do reach the tumor site can recognize and kill F. nucleatum–bearing tumor cells through RadD–NKp46 interactions. In the absence of RadD, this recognition is impaired, leading to reduced NK-mediated cytotoxicity and increased tumor growth.

      (3) Although the authors acknowledge that F. nucleatum may have tumor context-specific roles in regulating NK cell responses, it is unclear why they chose a breast cancer model in which F. nucleatum has been reported to promote tumor growth. A more appropriate choice would have been the well-established preclinical oral cancer model, such as the 4-nitroquinoline 1-oxide (4NQO)-induced oral cancer model in C57BL/6 mice, which would more directly relate to HNSC biology.

      The tumor model we employed is, to date, the only model in which F. nucleatum has been shown to exert a measurable effect, which is why we selected it for our study (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun. 2020; 11: 3259). We have not tested the 4-nitroquinoline-1-oxide (4NQO)–induced oral cancer model, and we are uncertain whether its use would be ethically justified.

      (4) Since RadD of F. nucleatum can bind to both Siglec-7 and NKp46 on NK cells, exerting opposing functional effects, the expression profiles of both receptors on intratumoral NK cells should be evaluated. This would clarify the balance between activating and inhibitory signals in the tumor microenvironment and provide a more mechanistic explanation for the observed tumor context-dependent outcomes.

      This question was answered in Author response image 8 above.

    1. Lo que más me llamó la atención de la lectura fue la comparación entre la desconfianza de Sócrates hacia la escritura y las críticas actuales hacia la inteligencia artificial. Me pareció muy interesante cómo el autor muestra que, aunque una tecnología pueda generar temor o resistencia, finalmente termina transformando la manera en que pensamos y vivimos, tal como ocurrió con la escritura o las redes sociales. La frase que más me impactó fue: “Si absolutamente todos adoptáramos su uso en todas las áreas de la vida, pronto nadie tendría habilidades.” Me hizo reflexionar sobre la dependencia que estamos desarrollando hacia la inteligencia artificial y sobre el riesgo de perder la capacidad de aprender y crear por nosotros mismos. Creo que el texto podría mejorar si el autor profundizara un poco más en las posibles formas de integrar la IA sin perder el valor del aprendizaje humano, por ejemplo, mostrando ejemplos positivos de cómo esta tecnología puede complementar nuestras capacidades en lugar de reemplazarlas. También sería interesante incluir una perspectiva más global, considerando cómo distintos contextos culturales enfrentan la adopción de la IA.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review): 

      Summary: 

      The manuscript explores behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H2S motivates the study of responses in C. elegans. The revised manuscript does not seem to have significantly addressed what was lacking in the initial version. 

      The authors have added further characterization of possible ASJ sensing of H2S by calcium imaging but ASJ does not appear to be directly involved. Genetic and parallel analysis of O2 and CO2 responsive pathways do not reveal further insights regarding potential mechanisms underlying H2S sensing. Gene expression analysis extends prior work. Finally, the authors have examined how H2S-evoked locomotory behavioral responses are affected in mutants with altered stress and detoxification response to H2S, most notably hif-1 and egl-9. These data, while examining locomotion, are more suggestive that observed effects on animal locomotion are secondary to altered organismal toxicity as opposed to specific behavioral responedse 

      Overall, the manuscript provides a wide range of intriguing observations, but mechanistic insight or a synthesis of disparate data is lacking. 

      We thank the reviewer for the valuable feedback. We agree that while our investigation provides broad coverage, it does not fully resolve the mechanisms of H<sub>2</sub>S perception. As both reviewers noted, the avoidance response to high levels of H<sub>2</sub>S is most likely driven by its toxicity, particularly at the level of mitochondria, rather than by direct perception of H<sub>2</sub>S. We also favor this model and have revised the results and discussion to highlight this interpretation, while acknowledging that other mechanisms cannot be excluded (main changes lines 387-402 and 535-547).

      Building on this view, our observations point toward mitochondrial ROS transients as the trigger for H<sub>2</sub>S avoidance. First, toxic levels of H<sub>2</sub>S are known to promote ROS production (1). Second, similar to acute H<sub>2</sub>S, brief exposure to rotenone, an ETC complex I inhibitor that rapidly generates mitochondrial ROS, triggers locomotory responses (Figure 7E) (Lines 393-396). Third, regardless of duration, rotenone exposure inhibits H<sub>2</sub>S-evoked avoidance (Figure 7E) (Lines 389-391), likely by preventing or dampening H<sub>2</sub>S-evoked mitochondrial ROS bursts when ETC function is impaired and ROS is already high. Notably, animals subjected to prolonged rotenone exposure, ETC mutants, and quintuple sod mutants, each experiencing chronically high ROS levels, fail to respond to H<sub>2</sub>S and display reduced locomotory activity, presumably due to ROS toxicity and/or activation of stress-adaptive mechanisms (Figure 7).

      Consistent with the activation of stress-responsive pathways, H<sub>2</sub>S exposure alters expression of genes controlled by SKN-1 and HIF-1 signaling. Both pathways are ROS-sensitive and promote adaptation to chronic ROS production (2-4). Their activation, as in egl-9, render these animals insensitive to H<sub>2</sub>S-evoked ROS transients (Figure 5B) (Lines 303-305). Conversely, mutants defective in these adaptive pathways, such as hif-1, still show initial locomotory responses to H<sub>2</sub>S, but rapidly lose activity during prolonged H<sub>2</sub>S exposure (Figure 5D) (Lines 318-319). These observations suggest that HIF-1 pathway is dispensable for initiating the response to H<sub>2</sub>S evoked ROS transients, but essential for protecting against ROS toxicity.

      In this context, the neural circuit we examined, such as ASJ neurons, is not directly involved in H<sub>2</sub>S perception (Line 165-169 and 448-457). Instead, it likely modulates a circuit that is responsive to ROS toxicity. This circuit is also influenced by ambient O<sub>2</sub> levels, the state of O<sub>2</sub> sensing circuit, and nutrient status, in a manner reminiscent of the CO<sub>2</sub> responses (5, 6).

      Reviewer #4 (Public review): 

      Summary: 

      The authors establish a behavioral paradigm for avoidance of H2S and conduct a large candidate screen to identify genetic requirements. They follow up by genetically dissecting a large number of implicated pathways - insulin, TGF-beta, oxygen/HIF-1, and mitochondrial ROS, which have varied effects on H2S avoidance. They additionally assay whole-animal gene expression changes induced by varying concentrations and durations of H2S exposure. 

      Strengths: 

      The implicated pathways are tested extensively through mutants of multiple pathway molecules. The authors address previous reviewer concerns by directly testing the ability of ASJ to respond to H2S via calcium imaging. This allows the authors to revise their previous conclusion and determine that ASJ does not directly respond to H2S and likely does not initiate the behavioral response. 

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      Despite the authors focus on acute perception of H2S, I don't think the experiments tell us much about perception. I think they indicate pathways that modulate the behavior when disrupted, especially because most manipulations used broadly affect physiology on long timescales. For instance, genetic manipulation of ASJ signaling, oxygen sensing, HIF-1 signaling, mitochondrial function, as well as starvation are all expected to constitutively alter animal physiology, which could indirectly modulate responses to H2S. The authors rule out effects on general locomotion in some cases, but other physiological changes could relatively specifically modulate the H2S response without being involved in its perception. 

      I am actually not convinced that H2S is directly perceived by the C. elegans nervous system at all. As far as I can tell, the avoidance behavior could be a response to H2S-induced tissue damage rather than the gas itself. 

      We thank the reviewer for the valuable insights, and fully agree that the H<sub>2</sub>S may not be directly perceived by C. elegans. Please see detailed responses below.

      Reviewer #4 (Recommendations for the authors): 

      The clarity of the paper is improved in this version. My main issue has to do with "perception" of H2S. At times the authors suggest that hydrogen sulfide should be perceived by a neural circuit ("we did not specifically identify the neural circuit mediating H2S signaling"), while at other times they discuss the possibility that it is not directly perceived neuronally ("Supporting the idea that acute mitochondrial ROS generation initiates avoidance of high H2S levels,"). The authors should clearly state their model for H2S perception. Do they think there is a receptor and sensory neuron for H2S (not identified in this paper)? If not, what does it mean for there to be a neural circuit mediating the response? To me, it looks more like what is being "perceived" by a neural circuit is ROS-induced toxicity, not H2S itself. 

      To drill down on direct modulation of acute perception, are any of the pathway manipulations used in this paper performed on the timescale of perception? Rotenone for 10 mins is close to that timescale, and in fact it increases speed independently of H2S, consistent with ROSinduced toxicity, not H2S being the signal that induces the behavior. Optogenetic activation of RMG could also be on the acute timescale. Can the authors clarify for how long blue light was on the worms before the start of the assay? Or was it turned on at the same time as video acquisition commenced? This could be evidence that RMG acutely modulates this behavioral response. 

      I feel that the ASJ calcium imaging data should be in the main figure given its importance in revising the original model. 

      We thank the reviewer for the valuable advice.

      As suggested, ASJ calcium imaging data are displayed in the main figure (Figure 2I) (Line 167).

      As both reviewers noted, our initial presentation was not sufficiently clear regarding the mechanism underlying H<sub>2</sub>S avoidance. We agree with the reviewer that H<sub>2</sub>S avoidance is unlikely mediated by direct perception via a H<sub>2</sub>S-specific receptor, but likely arises from acute mitochondrial dysfunction and ROS generation. 

      ROS

      In line with the reviewer’s perspective, our observations point toward mitochondrial ROS transients as the trigger for H<sub>2</sub>S avoidance. First, toxic levels of H<sub>2</sub>S are known to promote ROS production (1). Second, similar to acute H<sub>2</sub>S, brief exposure to rotenone, an ETC complex I inhibitor that rapidly generates mitochondrial ROS, triggers locomotory responses (Figure 7E) (Lines 393-396). Third, regardless of duration, rotenone exposure inhibits H<sub>2</sub>S-evoked avoidance (Figure 7E) (Lines 389-391), likely by preventing or dampening H<sub>2</sub>S-evoked mitochondrial ROS bursts when ETC function is impaired and ROS is already high. Notably, animals subjected to prolonged rotenone exposure, ETC mutants, and quintuple sod mutants, each experiencing chronically high ROS levels, fail to respond to H<sub>2</sub>S and display reduced locomotory activity, presumably due to ROS toxicity and/or activation of stress-adaptive mechanisms (Figure 7). We revised the Results and Discussion to present the model more consistently (main changes lines 387-402 and 535-547).

      Consistent with the activation of stress-responsive pathways, H<sub>2</sub>S exposure alters expression of genes controlled by SKN-1 and HIF-1 signaling. Both pathways are ROS-sensitive and promote adaptation to chronic ROS production (2-4). Their activation, as in egl-9, render these animals insensitive to H<sub>2</sub>S-evoked ROS transients (Figure 5B) (Lines 303-305). Conversely, mutants defective in these adaptive pathways, such as hif-1, still show initial locomotory responses to H<sub>2</sub>S, but rapidly lose activity during prolonged H<sub>2</sub>S exposure (Figure 5D) (Lines 318-319). These observations suggest that HIF-1 pathway is dispensable for initiating the response to H<sub>2</sub> Sevoked ROS transients, but essential for protecting against ROS toxicity.

      ASJ neurons

      ASJ neurons and DAF-11 signaling are required for H<sub>2</sub>S-evoked behavioral responses. However, ASJ does not exhibit an H<sub>2</sub>S-evoked calcium transient. It suggests that ASJ neurons do not directly detect H<sub>2</sub>S (Line 165-169 and 448-457), but likely modulate the circuit responsive to ROS toxicity. This circuit can also be modulated by ambient O<sub>2</sub> levels, the state of O<sub>2</sub> sensing circuit, and nutrient status, in a manner reminiscent of the CO<sub>2</sub> responses (5, 6). 

      O<sub>2</sub> sensing circuit

      Consistent with the reviewer’s view, we favor the model that H<sub>2</sub>S avoidance is likely induced by ROS transients. We believe that the state of O<sub>2</sub> sensing circuit, similar to ASJ neurons, modulates the neural circuit that is responsive to H<sub>2</sub>S-evoked ROS toxicity. This circuit is inhibited as long as O<sub>2</sub> sensing circuit is active. In the RMG optogenetic experiment, channelrhodopsin was photo-stimulated as soon as the assay was initiated at 7% O<sub>2</sub> (Methods Lines 633-634 and Figure legend Lines 1177-1178), therefore RMG remained active throughout the assay including at 7% O<sub>2</sub>. Our interpretation is that RMG activation inhibits this ROSresponsive circuit and H<sub>2</sub>S avoidance. However, these observations do not resolve if H<sub>2</sub>S is acutely and directly perceived. The modulation of H<sub>2</sub>S response by O<sub>2</sub> circuit was discussed between Lines 437-447.

      References

      (1) J. Jia et al., SQR mediates therapeutic effects of H(2)S by targeting mitochondrial electron transport to induce mitochondrial uncoupling. Sci Adv 6, eaaz5752 (2020).

      (2) S. J. Lee, A. B. Hwang, C. Kenyon, Inhibition of Respiration Extends C. elegans Life Span via Reactive Oxygen Species that Increase HIF-1 Activity. Current Biology 20, 2131-2136 (2010).

      (3) C. Lennicke, H. M. Cocheme, Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81, 3691-3707 (2021).

      (4) D. A. Patten, M. Germain, M. A. Kelly, R. S. Slack, Reactive oxygen species: stuck in the middle of neurodegeneration. J Alzheimers Dis 20 Suppl 2, S357-367 (2010).

      (5) A. J. Bretscher, K. E. Busch, M. de Bono, A carbon dioxide avoidance behavior is integrated with responses to ambient oxygen and food in Caenorhabditis elegans. Proc Natl Acad Sci U S A 105, 8044-8049 (2008).

      (6) E. A. Hallem, P. W. Sternberg, Acute carbon dioxide avoidance in Caenorhabditis elegans. Proc Natl Acad Sci U S A 105, 8038-8043 (2008).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors clusters within the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, immunohistochemistry, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, revealing broad and in some cases receptor-specific effects on cAMP, calcium, and ERK signaling.

      Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection and knockdown experiments uncover a growth-promoting role for one CT-like peptide (AjCT2). Injection of AjCT2 at high concentration, or long-term knockdown of the AjCT precursor, affects diverse growth-related parameters including weight gain rate, specific growth rate, and transcript levels of growth-regulating transcription factors. The authors also reveal a growth-promoting function for the PDFR-like receptor AjPDFR2, suggesting that this receptor mediates the effects of AjCT2 on growth.

      Weaknesses:

      The authors present a more detailed phylogenetic analysis in the revised version, including a larger number of species. But some clusters in the analysis are not well supported because they have only low bootstrap values. This makes it difficult to interpret the clustering in some parts of the tree.

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterised in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterised previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterised in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Expression of CT-like peptides was investigated both at transcript and protein level, but insight into the expression of the three peptide receptors is limited. This makes it difficult to understand the mechanism underlying the (different) functions of the two CT-like peptides in vivo. The authors identify differences in signal transduction cascades activated by each peptide, which might underpin distinct functions, but these differences were established only in heterologous cells.

      We appreciate the reviewer's insightful comments. Regarding expression of CT-like peptide receptors, we have quantitatively analyzed the mRNA expression levels of the three receptors in key tissues using qRT-PCR (Figure 6, line 319) and receptor expression exhibits significant tissue-specific differences. Combined with the heterologous expression assays and In vivo functional validation, we believe our findings have provided clear mechanistic insights into the functional divergence of the two CT-like peptides. Investigation of the expression of the three receptor proteins in A. japonicus would require generation of specific antibodies, which was beyond the scope of this study. Furthermore, immunohistochemical visualization of neuropeptide receptor expression in other invertebrates has not been reported widely, which likely reflects technical difficulties in generation of antibodies that can be used to specifically detect receptor proteins that are typically expressed a low level in comparison to the neuropeptides that act as their ligands. 

      We acknowledge that investigating signal transduction cascades in heterologous cells (rather than native A. japonicus cells) is a limitation. However, as a non-model organism, A. japonicus currently lacks established cell lines for such research. Therefore, using heterologous cells was the most feasible approach to examine the differential signaling cascades activated by the peptides through the three receptors. Importantly, our in vivo experiments demonstrated that long-term knockdown of either the AjCT precursor or AjPDFR2 resulted in similar and significant growth defects. The phenotypic consistency strongly suggests that AjCT2 and AjPDFR2 function within the same signaling pathway, with AjPDFR2 serving as the key receptor functionally activated by AjCT2.

      The authors show overlapping phenotypes for a long-term knockdown of the AjCT precursor and the AjPDFR2 receptor, suggesting that the growth-regulating functions of AjCT2 are mediated by this receptor pathway. However, it remains unclear whether this mechanism underpins the growth-regulating function of AjCT2, until further in vivo evidence for this ligand-receptor interaction is presented. For example, the authors could investigate whether knockdown of AjPDFR2 attenuates the effects of AjCT2 peptide injection. In addition, a functional PDF system in this species remains uncharacterized, and a potential role of PDF-like peptides in growth regulation has not yet been investigated in A. japonicus. Therefore, it also remains unclear whether the ability of CT-like peptides to activate PDFRs is an evolutionary ancient property of this peptide family or whether this is an example of convergent evolution in some protostomian (Drosophila) and deuterostomian (sea cucumber) species.

      Thank you for the reviewer’s insightful comments and constructive questions. We acknowledge the request for more direct evidence to demonstrate how AjCT2 functions in vivo through AjPDFR2. However, long-term knockdown of the AjCT precursor and AjPDFR2 both resulted in identical and significant growth defect phenotypes. The high phenotypic consistency, combined with the activation effect of AjCT2 on AjPDFR2 in heterologous cells, strongly suggests that they function within the same signaling pathway, with AjPDFR2 serving as the key receptor functionally activated by AjCT2. While exogenous peptide injection combined with receptor knockdown is a classic method for verifying receptor activation, phenotypic overlap itself is widely accepted in genetic research as robust evidence for pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017). A. japonicus is a non-model organism with a 3-month aestivation period in summer followed shortly by winter hibernation. During these periods, we are unable to conduct in vivo experiments. Any single experimental suggestion from reviewers could potentially require one more year of research and we have already conducted an additional year of research, in response to reviewer feedback, since submitting the original manuscript. We hope therefore that these challenges associated with working with aquatic invertebrate non-model organisms is recognized by the reviewers.

      We fully agree that the functional PDF/PDFR system in A. japonicus and its potential role in growth regulation remain uncharacterized. Currently, the precursors of the PDF-type neuropeptide in echinoderms remain unidentified, which precludes clear pharmacological characterization of the two receptors. While further exploration of echinoderm PDF-type neuropeptides is still needed, our phylogenetic analysis-conducted using the maximum likelihood method with optimized parameters and rigorous sequence curation-demonstrates that the deuterostomian PDFRs (including AjPDFR1 and AjPDFR2) are positioned in a clade with the well-characterized protostomian PDFR clades with extremely high bootstrap support (value=100). Therefore, these two receptors in A. japonicus clearly belong to the PDF receptor family and our findings clearly indicate that the ability of CT-like peptides to activate PDFRs is either an evolutionarily ancient and conserved property or has arisen independently in different lineages. Details of methods employed to produce the new receptor tree are included in line 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

      Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor, but they also activate the two "PDF receptors", ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and "PDF receptors" is weak.

      Strengths:

      This is the first study to report pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

      Weaknesses:

      The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionary ancient since similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such low support, it is unclear if the clade comprising deuterostomian "PDFR" is in fact PDFRs and not another receptor type whose endogenous ligand (besides CT) remains to be discovered.

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterized in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterized previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterized in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Reviewer #2 (Recommendations for the authors):

      Figure 1C: The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such support, I would be hesitant to label the blue clade as deuterostomian PDFR for two reasons: 1) no members of this clade have been shown to be activated by a PDF-like substance and 2) the current study shows that these receptors are activated by CT-type peptides. Therefore, the phylogenetic analyses do not support the conclusions of this paper. What is the basis for calling these receptors PDFR and not CTR in light of weak phylogenetic support?

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterized in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterized previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterized in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739 The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      We agree with the reviewer that no members of the PDF-type receptor clade in deuterostomes have yet been shown to be activated by a PDF-like substance. That is because the precursors of the PDF-type neuropeptides in echinoderms remain unidentified so far, which precludes clear pharmacological characterization of these receptors within the deuterostomian PDFR clade. However, the new phylogenetic tree now provides strong support (bootstrap value = 100) for the clade comprising deuterostomian and protostomian PDFRs, confirming the classification of AjPDFR1 and AjPDFR2 as PDF-type receptors. 

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      The new results following AjCT and AjPDFR2 knockdown are a welcome addition. While this additional evidence supports the claim that AjCT could mediate its effects via AjPDFR2, this evidence does not show that AjCT acts as an endogenous ligand for PDFR in vivo. In combination with the weak phylogenetic analyses, I would recommend the authors to key down their claims that they have functionally characterized a PDFR (in the title and text).

      Thank you for your insightful comments and we do understand the reviewer’s concern. 

      Regarding “the weak phylogenetic analyses”, as highlighted above, we have produced a new phylogenetic tree (Fig 2, line 206) that provides strong bootstrap support for the clade comprising deuterostome and protostome PDF-type receptors. For this reason, it is our opinion that inclusion of “pigment-dispersing factor-type receptors” in the title of the paper is appropriate. The details of phylogenetic analysis method were added in line 727-739, and the updated phylogenetic tree has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183). Besides, long-term knockdown of the AjCT precursor and AjPDFR2 both resulted in identical and significant growth defect phenotypes. And the observation of phenotypic overlap is widely accepted in genetic research as strong evidence for pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017). This high degree of phenotypic consistency, coupled with our in vitro finding that AjCT2 specifically activates AjPDFR2, strongly supports the conclusion that AjCT2 and AjPDFR2 function within the same signaling pathway in vivo, with AjPDFR2 serving as the key receptor functionally activated by AjCT2.

      References:

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

      Since there is no formal logic defining the use of "type" vs "like" vs "related", I would encourage the authors to use one term (of their choice) to avoid unnecessary confusion. Or another possibility is that these relationships are defined at some point in the manuscript so that it becomes clear to the reader.

      Thank you for the reviewer’s comments. The “CT-related peptides” has defined in the Introduction (line 54-58). As per your suggestion, we have now defined both “CT-type peptides” and “CT-like peptides” in the Introduction (line 76-79). “CT-type peptides” are characterized by an N-terminal disulphide bridge, whereas “CT-like peptides” (diuretic hormone 31 (DH31)-type peptides) lack this feature. Additionally, in accordance with the definitions, we have corrected these three descriptions in the revised manuscript (line 80, 83, 88 for “CT-type peptides”) to ensure consistent and accurate usage of these terms.

      "To provide in vivo evidence supporting CT-mediated activation of "PDF" receptors, we conducted the following experiments: Firstly, we confirmed that AjPDFR1 and AjPDFR2were the functional receptors of AjCT1and AjCT2 (Figure 2, 3 and 4). Secondly, injection of AjCT2 and siAjCTP1/2-1 in vivo induced corresponding changes in AjPDFR1and AjPDFR2expression levels in the intestine (Figure 8C, 9A, 9B and 9C)."

      None of these experiments provide direct evidence that CT activates PDFR in vivo. The functional studies are indeed a welcome addition but they cannot discriminate between correlation and causation.

      Thank you for the reviewer’s insightful comments. We agree that the functional studies do not constitute direct proof that CT’s activation of PDFR in vivo. However, we observed identical and significant growth defect phenotypes following long-term knockdown of the AjCT precursor and the AjPDFR2. This high degree of phenotypic congruence, combined with the established in vitro activation of AjPDFR2 by AjCT2, provides strong support for the conclusion that AjCT2 acts as the key endogenous ligand activating the AjPDFR2 signaling pathway in vivo. Importantly, such phenotypic overlap has been widely accepted in genetic research as strong evidence for functional pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017).

      References:

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

    1. Las semillas secas se trataron con 100 ppm de GA₃ , 2 % de CaCl₂ , 1 % de KH₂PO₃ y 40 ppm de Na₂SeO₃ a una temperatura de 16 a 18 °C durante 20 h. Las semillas tratadas se lavaron 2 o 3 veces con agua destilada y se sembraron directamente en las bandejas de plántula para el siguiente ciclo de SpeedyPaddy.

      Las semillas inmaduras fueron tratadas con una mezcla de hormonas y sales para romper la dormancia, estimular la germinación y fortalecer el embrión, permitiendo sembrar la siguiente generación en menos de un día.

    2. cría rápida

      Es un método usado en mejoramiento genético de cultivos para que las plantas crezcan, florezcan y produzcan semillas mucho más rápido. En lugar de esperar una sola cosecha al año, con la cría rápida se pueden lograr 4 o 5 generaciones de plantas en un mismo año.

    1. Es importante entender que los públicos con los que mantiene contacto la empresa son distintos; por ende, las estrategias deben ir direccionadas hacia ellos y cimentadas en el uso eficiente de los canales de comunicación apropiados, a fin de hacer llegar la propuesta de productos o servicios que ofertan de forma eficaz al público objetivo.

      Considero que es esencial el recalcar esto, púes efectivamente cada empresa al tener diferentes giros, debe de efectuar diferentes estrategias, aunque todas deben de guiar hacia el mismo objetivo que es la buena comunicación bidireccional.

    2. Los flujos de comunicación en la empresa deben ser multidirecciona-les, a fin de hacer llegar el mensaje que se intenta transmitir desde los distintos departamentos hacia los diferentes niveles de jerarquía. De esta manera, se evitan errores por falta de comprensión en los procesos productivos y, con ello, se ahorra una cantidad importante de tiempo, esfuerzo y recursos disponibles en la empresa. Las ventajas que trae consigo un buen sistema de comunicación empresarial se pueden ver reflejadas en la capacidad de la industria para ganar posicionamiento en el mercado y lograr diferenciarse a través de productos o servicios, mejores y más completos que los de la competencia

      Función Clave de la Comunicación: Su rol principal es permitir que el plan estratégico se ejecute eficientemente en todos los niveles jerárquicos, optimizando tiempo, recursos (humanos y económicos) y esfuerzo.

    3. 78Gabriel Alejandro Diaz Muñoz, David Rodolfo Guambi EspinozaAXIOMA - Revista Científica de Investigación, Docencia y Proyección Social. Diciembre 2022. Número 27, pp 72-78.ISSN: 1390-6267- E-ISSN: 2550-6684Figura 1. Proceso de comunicación empresarial según el flujo deinformaciónFuente: Castro, 2017, p. 16Los canales por donde circula todo el flujo de información concerniente a la empresa se estiman como el vehículo que se encarga de transportar el contenido de los mensajes desde el emisor hacia el receptor y cons-tituyen el nexo entre la fuente del mensaje y el destinatario (Oyarvide, Reyes y Montaño, 2017). La comunicación en las empresas modernas suele ser escrita y oral, y generalmente se utiliza esta última para el normal desenvolvimiento de las actividades diarias, pero depende de la formalidad o informalidad con la que se desee transmitir el mensa-je para usar medios de comunicación orales o escritos. Es común que en las empresas aquellos asuntos de mayor relevancia sean tratados o comunicados mediante correos electrónicos, memorandos u oficios, encabezados por el nombre de la persona a quien va dirigido el men-saje, el departamento al que pertenece e inclusive un pequeño saludo de consideración y estima o despedida al final del texto. En la tabla 1 se detallan las características, ventajas y desventajas entre la comunica-ción formal e informal.Tabla 1. Tipos de comunicación, características, ventajas y desventajasTipo de comunicación Formal InformalCaracterísticas Se utilizan canales ofi-ciales de la empresa, correos electrónicos oficiales.Se utilizan mensajes de texto, llamadas telefó-nicas o comunicación verbal.Existen plazos definidos y establecidos con an-terioridad para enviar el mensaje.Es imprevista, se da de forma casual.Se involucran los geren-tes y todos los miem-bros de la empresa.Se utiliza más común-mente entre compañe-ros de trabajo.Puede ser oral o escrita. Generalmente es oral.Ventajas Hay menos probabili-dad de que se cometan errores por causa con-fusión o malos enten-didos.Es rápida, tiene me-nor control, no existen responsabilidades ante los altos mandos de la empresa.Desventajas En ocasiones puede lle-gar a ser burocrática.La información no siem-pre es confiable.Puede ser percibida como inflexible por al-gunos miembros de la empresa, puesto que debe seguirse un mis-mo orden y estructura.No sirve como instru-mento para la toma de decisiones.Fuente: Elaboración propia a partir de Carvajal et al. (2018, p. 64)Empresarialmente hablando, existen varios elementos determinantes en el éxito de una compañía. La comunicación es uno de los tantos factores que permiten mantener buenas relaciones entre los miembros del equipo a través del intercambio de información y mensajes que se transmiten mediante distintos canales, tanto para proveer opiniones y pensamientos similares, como para expresar ideologías personales y, al seleccionar la mejor idea o estrategia, trazar planes de acción que fomenten el trabajo en equipo, permitan cumplir los objetivos y faciliten el desarrollo organi-zacional. Fernández (2016) sostiene que la comunicación, además de ser una herramienta poderosa, es un instrumento de cambio que permite la introducción, difusión, aceptación e interiorización de los nuevos valores y pautas de gestión que acompañan el desarrollo organizacional. La comunicación, concretamente, constituye una práctica absolutamen-te necesaria, ya que, mediante los procesos comunicacionales, se vincu-lan y entrelazan las interrelaciones entre el personal, a fin de consolidar los lazos de cooperación y camaradería y, como resultado, que la orga-nización progrese, sea más competitiva y ello se vea reflejado en el de-sarrollo profesional y crecimiento personal de los miembros de equipo. Díaz, Valdes y Quintana (2018) aseguran que la gestión que realizan los directivos debe estar encaminada a cumplir los objetivos institucionales, pero sin olvidar brindar incentivos, reconoc

      Necesidad de Planificación: La comunicación no debe ser un proceso improvisado. El artículo subraya que debe ser estructurada, ordenada y planificada desde la alta dirección, e integrada en el plan estratégico general de la compañía.

    4. El contexto empresarial actual promueve mercados cada vez más globalizados y competitivos. Esto obliga a las organizaciones a diferenciarse y posicionar sus productos en el mercado y en la mente de sus clientes, considerando para ello sus características, necesidades y deseos, a fin de desarrollar una ventaja competitiva que les permita sobresalir entre sus competidores (Olivar, 2020). Ante esta situación, las empresas se ven en la imperiosa necesidad de desarrollar y ejecutar procesos de gestión eficaces, producto del intelecto de los directivos organizacionales. Estos procesos generalmente se relacionan con factores que inciden de manera importante en la competitividad, calidad total, eficiencia y enfoque en la mejora continua. Como es lógico, la combinación de todos estos elementos influye en la productividad de la empresa para que pueda alcanzar la diferenciación, posicionamiento de marca y, como resultado, ganancias y rentabilidad. No obstante, para conseguirlo, es indispensable que la comunicación desde los mandos altos y medios de la compañía hacia los diferentes departamentos sea eficaz y el mensaje llegue a todo el personal en sus distintos niveles de jerarquía, a fin obtener los resultados esperados y que esto, a su vez, exhorte a todos los involucrados a propiciar cambios y transformaciones en respuesta a las exigencias del entorno y del mercado en general (Matos de Rojas et al., 2018).

      Coincido plenamente con la observación de que la globalización y la competencia obligan a las empresas a diferenciarse mediante estrategias comunicacionales efectivas. Desde mi punto de vista, esta sección destaca un punto clave: la comunicación no solo influye en el marketing o la imagen corporativa, sino también en la eficiencia operativa interna, ya que una información mal transmitida puede generar retrasos o duplicación de esfuerzos. En este sentido, la gestión comunicacional actúa como un sistema nervioso de la empresa, permitiendo la coordinación entre sus distintas áreas.

    5. Los autores Pilligua y Arteaga (2019), por su parte, hacen referencia a lo mencionado en el apartado anterior al sostener que el clima organizacional define la manera en la que cada persona percibe su trabajo, analizando para ello el medio ambiente físico y humano en el que se desarrollan las actividades diarias, lo cual incide directamente en la satisfacción del personal y, por lo tanto, en la productividad.

      Me parece muy acertada la relación que los autores establecen entre la deficiente comunicación y el deterioro del clima laboral. Un entorno donde los mensajes no fluyen o son ambiguos tiende a generar desmotivación, frustración y alta rotación de empleados. En mi experiencia, la comunicación organizacional no solo busca informar, sino también dar sentido al trabajo. Cuando el personal entiende el propósito de sus tareas y percibe coherencia entre lo que se dice y lo que se hace, aumenta su compromiso y satisfacción. Esto convierte la comunicación en un pilar de la cultura organizacional.

    1. devido

      Termo inicial do auxílio por incapacidade temporária: - 16º dia de afastamento, se empregado; - O dia que iniciar a incapacidade, para os demais segurados afastados por mais de 15 dias; - A data de entrada do requerimento, se afastado por 30 dias após o

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. 10.6. Bibliography# [j1] Social model of disability. November 2023. Page Version ID: 1184222120. URL: https://en.wikipedia.org/w/index.php?title=Social_model_of_disability&oldid=1184222120#Social_construction_of_disability (visited on 2023-12-07). [j2] Color blindness. December 2023. Page Version ID: 1188749829. URL: https://en.wikipedia.org/w/index.php?title=Color_blindness&oldid=1188749829 (visited on 2023-12-07). [j3] David Robson. The women with superhuman vision. BBC, February 2022. URL: https://www.bbc.com/future/article/20140905-the-women-with-super-human-vision (visited on 2023-12-07). [j4] Mayo Clinic Staff. Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) - Symptoms and causes. 2023. URL: https://www.mayoclinic.org/diseases-conditions/chronic-fatigue-syndrome/symptoms-causes/syc-20360490 (visited on 2023-12-07). [j5] Ableism. December 2023. Page Version ID: 1188412565. URL: https://en.wikipedia.org/w/index.php?title=Ableism&oldid=1188412565 (visited on 2023-12-07). [j6] Ash. Autism is NOT A Disability. July 2022. URL: https://www.autism360.com/autism-is-not-a-disability/ (visited on 2023-12-07). [j7] Neurodiversity. November 2023. Page Version ID: 1187185735. URL: https://en.wikipedia.org/w/index.php?title=Neurodiversity&oldid=1187185735#Neurotypical (visited on 2023-12-07). [j8] Mayo Clinic Staff. Generalized anxiety disorder - Symptoms and causes. 2017. URL: https://www.mayoclinic.org/diseases-conditions/generalized-anxiety-disorder/symptoms-causes/syc-20360803 (visited on 2023-12-07). [j9] Mayo Clinic Staff. Depression (major depressive disorder) - Symptoms and causes. 2022. URL: https://www.mayoclinic.org/diseases-conditions/depression/symptoms-causes/syc-20356007 (visited on 2023-12-07). [j10] Myopia. December 2023. Page Version ID: 1188263181. URL: https://en.wikipedia.org/w/index.php?title=Myopia&oldid=1188263181 (visited on 2023-12-07). [j11] How to ADHD. What is ADHD? July 2020. URL: https://www.youtube.com/watch?v=xMWtGozn5jU (visited on 2023-12-07). [j12] How to ADHD. What is Executive Function and Why Do We Need it? March 2021. URL: https://www.youtube.com/watch?v=H4YIHrEu-TU (visited on 2023-12-07). [j13] Assistive technology. December 2023. Page Version ID: 1188353371. URL: https://en.wikipedia.org/w/index.php?title=Assistive_technology&oldid=1188353371 (visited on 2023-12-07). [j14] Liftware - Eat with confidence. URL: https://www.liftware.com/ (visited on 2023-12-07). [j15] C. L. Lynch. Invisible Abuse: ABA and the things only autistic people can see. NeuroClastic, March 2019. URL: https://neuroclastic.com/invisible-abuse-aba-and-the-things-only-autistic-people-can-see/ (visited on 2023-12-07). [j16] The Lies and Dangers of "Conversion Therapy". URL: https://www.hrc.org/resources/the-lies-and-dangers-of-reparative-therapy (visited on 2023-12-07). [j17] Universal design. December 2023. Page Version ID: 1188054790. URL: https://en.wikipedia.org/w/index.php?title=Universal_design&oldid=1188054790 (visited on 2023-12-07). [j18] Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich. Ability-Based Design: Concept, Principles and Examples. ACM Trans. Access. Comput., 3(3):9:1–9:27, April 2011. URL: https://doi.org/10.1145/1952383.1952384 (visited on 2023-12-07), doi:10.1145/1952383.1952384. [j19] Inclusive design. December 2023. Page Version ID: 1188074097. URL: https://en.wikipedia.org/w/index.php?title=Inclusive_design&oldid=1188074097 (visited on 2023-12-07). [j20] Rumman Chowdhury. Sharing learnings about our image cropping algorithm. May 2021. URL: https://blog.twitter.com/engineering/en_us/topics/insights/2021/sharing-learnings-about-our-image-cropping-algorithm (visited on 2023-12-07). [j21] Cynthia Bennett. Cynthia Bennett – Human-Computer Interaction Researcher. 2022. URL: https://www.bennettc.com/ (visited on 2023-12-07). [j22] Sasha Costanza-Chock. Design Justice : Community-Led Practices to Build the Worlds We Need. The MIT Press, 2020. ISBN 978-0-262-35686-2 978-0-262-04345-8. URL: https://directory.doabooks.org/handle/20.500.12854/78577 (visited on 2023-12-15), doi:10.7551/mitpress/12255.001.0001. [j23] Meg Miller and Ilaria Parogni. The Hidden Image Descriptions Making the Internet Accessible. The New York Times, February 2022. URL: https://www.nytimes.com/interactive/2022/02/18/arts/alt-text-images-descriptions.html (visited on 2023-12-07). [j24] Alannah Oleson. Beyond “Average” Users: Building Inclusive Design Skills with the CIDER Technique. Bits and Behavior, October 2022. URL: https://medium.com/bits-and-behavior/beyond-average-users-building-inclusive-design-skills-with-the-cider-technique-413969544e6d (visited on 2023-12-07).

      I looked at Sasha Costanza-Chock’s Design Justice book [j22], and I really liked how it focuses on who actually gets to be part of the design process. The author talks about how design should be led by the people who are most affected by it, instead of just big companies or tech experts. That connects really well to what the chapter said about “who gets to be designers.” It made me think that if more people from different backgrounds helped design technology, things like the soap dispenser problem probably wouldn’t happen.

    1. § 1º

      Súmula 576/STJ - Ausente requerimento administrativo no INSS, o termo inicial para a implantação da aposentadoria por invalidez concedida judicialmente será a data da citação válida.

      Como conciliar a Súmula 576 do STJ com a decisão do STF que impõe o prévio requerimento administrativo (RE 631240/MG)?

      • Se formos analisar os precedentes que deram origem à Súmula 576-STJ, iremos perceber que eles envolvem processos judiciais iniciados antes da decisão do STF no RE 631240/MG, ou seja, na época em que a jurisprudência majoritária não exigia o prévio requerimento administrativo para que o segurado pudesse ingressar com a ação.
      • Portanto, os debates que envolveram a Súmula 576-STJ ocorreram em processos surgidos em dado momento histórico em que o segurado ainda podia escolher se primeiro iria tentar requerer o benefício na via administrativa ou se já queria propor diretamente a ação judicial pleiteando a aposentadoria por invalidez. Assim, a Súmula 576-STJ surgiu principalmente para dirimir estes processos.

      (CAVALCANTE, Márcio André Lopes. Súmula 576-STJ. Buscador Dizer o Direito, Manaus. Disponível em: https://buscadordizerodireito.com.br/jurisprudencia/3660/sumula-576-stj.)

  5. www.planalto.gov.br www.planalto.gov.br
    1. a partir do momento

      Não necessariamente o processo será anulado desde o começo, mas tão somente a partir do momento em que o MP deveria ter intervindo.

    2. só pode ser decretada

      Observe que a nulidade relativa à ausência de intimação do MP não é automática. O próprio MP deve se pronunciar a respeito da existência ou não de prejuízo antes da decretação da nulidade.

    1. IV
      • Informativo nº 731
      • 4 de abril de 2022.
      • QUARTA TURMA
      • Processo: REsp 1.978.138-SP, Rel. Min. Antonio Carlos Ferreira, Quarta Turma, por unanimidade, julgado em 22/03/2022.

      Ramo do Direito DIREITO PROCESSUAL CIVIL

      TemaPaz, Justiça e Instituições Eficazes <br /> Ação civil pública. Legitimidade ativa ad causam. Administração pública indireta. Pertinência temática. Necessidade.

      Destaque - A legitimidade ativa na ação civil pública das pessoas jurídicas da administração pública indireta depende da pertinência temática entre suas finalidades institucionais e o interesse tutelado.

      Informações do Inteiro Teor - Inicialmente, a pertinência temática consiste na "harmonização entre as finalidades institucionais das associações civis ou dos órgãos públicos legitimados e o objeto a ser tutelado na ação civil pública. Em outras palavras, mencionadas pessoas somente poderão propor a ação civil pública em defesa de um interesse cuja tutela seja de sua finalidade institucional"

      • É fato que o art. 5º da Lei n. 7.347/1985 apenas exige expressamente da associação, pessoa jurídica de direito privado, a comprovação de pertinência temática para propositura de ação civil pública.

      • Por conseguinte, em uma interpretação literal, não seria necessária a comprovação da pertinência temática para que as autarquias, empresas públicas, fundações públicas e sociedades de economia mista ajuizassem ações coletivas.

      • Nessa perspectiva, os integrantes da administração pública indireta passariam a ter amplos poderes, concorrendo, inclusive, com as finalidades institucionais do Ministério Público e da Defensoria Pública, convertendo-se em verdadeiros "procuradores universais", com legitimidade para ajuizamento das mais variadas demandas coletivas, independentemente de sua área de atuação.

      • Tal concepção <u>ignora</u> as competências legais e estatutárias das instituições, as quais delimitam o campo de atuação das pessoas jurídicas integrantes da administração pública indireta. Sob o mesmo raciocínio, a doutrina entende que "*não basta a existência fática de uma pessoa da Administração Pública indireta: necessário se faz o exame de seu regime estatutário* (lei, regulamento, contrato ou ato de constituição etc.). Será o seu estatuto que conferirá legitimidade adequada (ou não) à pessoa jurídica, com densidades diferentes: uma coisa é uma autarquia; outra, uma sociedade de economia mista com capital aberto na bolsa de valores".

      • Portanto, não há como considerar titular do interesse, na propositura da ação coletiva, pessoa jurídica da administração pública indireta sem nenhum vínculo com a tese jurídica deduzida, cujo objeto litigioso não se encontra entre aqueles a serem protegidos por sua finalidade institucional.

    1. It is simple for an attacker to set up an evil twin attack in public places when the network utilizes open Wi-Fi.

      Kẻ tấn công có thể dễ dàng thiết lập một cuộc tấn công ET ở những nơi công cộng khi mạng sử dụng Wi-Fi mở.

    1. multa
      • Informativo 1143
      • ADPF 1011 / PE
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. GILMAR MENDES
      • Julgamento: 28/06/2024 (Virtual)
      • Ramo do Direito: Processual Civil, Constitucional
      • Matéria: Execução; Multa Simples; Tribunal De Contas; Legitimidade Ad Causam/Fiscalização Contábil, Financeira e Orçamentária; Controle Externo; Tribunal de Contas

      Multas aplicadas pelo Tribunal de Contas estadual: legitimidade dos entes públicos para executá-las

      Tese fixada - 1. O Município prejudicado é o legitimado para a execução de crédito decorrente de multa aplicada por Tribunal de Contas estadual a agente público municipal, em razão de danos causados ao erário municipal.

      • 2. Compete ao Estado-membro a execução de crédito decorrente de multas simples, aplicadas por Tribunais de Contas estaduais a agentes públicos municipais, em razão da inobservância das normas de Direito Financeiro ou, ainda, do descumprimento dos deveres de colaboração impostos, pela legislação, aos agentes públicos fiscalizados.

      Resumo - Os estados possuem legitimidade ativa para executar multas meramente sancionatórias aplicadas por seus Tribunais de Contas em face de agentes públicos municipais que, por seus atos, infrinjam as normas de Direito Financeiro ou violem os deveres de colaboração com o órgão de controle, impostos pela legislação.

      • A Constituição Federal de 1988 confere aos Tribunais de Contas em todo o País a competência para aplicar as sanções previstas em lei aos responsáveis por ilegalidades de despesas ou irregularidades nas contas (1).

      • Consoante o julgamento que originou a fixação da tese do Tema 642 da repercussão geral, o que determina o ente competente para executar a multa aplicada pelas Cortes de Contas estaduais é a natureza jurídica dessa sanção. A multa simples imposta ao agente público municipal — que diz respeito à modalidade sancionatória de responsabilidade financeira — em razão da grave inobservância de normas financeiras, contábeis e orçamentárias, ou como consequência direta da violação de deveres de colaboração que os agentes fiscalizados devem guardar com o órgão de controle (obrigações acessórias), configura ferramenta de desincentivo à prática de futuras transgressões dessas normas e, em certos casos, de reafirmação da autoridade das decisões ou diligências determinadas pelos Tribunais de Contas.

      • Por outro lado, as penalidades de imputação de débito e de multa proporcional ao dano abrangem a modalidade reintegratória de responsabilidade financeira, eis que visam recompor o erário em virtude de desvio, pagamento indevido ou falta de cobrança ou liquidação, nos termos da lei.

      • Nesse contexto, quando as sanções aplicadas pelo Tribunal de Contas estadual a agente público municipal referirem-se ao ressarcimento ao erário, a legitimidade para executá-las é do município cujo patrimônio público foi atingido (2), ao passo que é o próprio estado o legitimado ativo para executar as multas que decorrem do poder sancionador da Corte de Contas (sanção pecuniária e que não possui qualquer relação com a existência de dano ao erário) (3).

      • Com base nesses e em outros entendimentos, o Plenário, por unanimidade, julgou procedente a ação, bem como (i) assentou que a presente decisão não afeta automaticamente a coisa julgada formada em momento anterior à publicação da ata deste julgamento; e (ii) determinou o acréscimo de uma nova proposição (item 2) à tese do Tema 642 da repercussão geral, a fim de abranger o novo entendimento do Tribunal.


      • Informativo 1029
      • RE 1003433 / RJ - Tema 642
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. MARCO AURÉLIO
      • Redator(a) do acórdão: Min. ALEXANDRE DE MORAES
      • Julgamento: 14/09/2021 (Virtual)
      • Ramo do Direito: Processual Civil, Administrativo
      • Matéria: Execução; Controle externo

      Legitimidade para executar multa por danos causados a erário municipal

      Tese fixada - O Município prejudicado é o legitimado para a execução de crédito decorrente de multa aplicada por Tribunal de Contas estadual a agente público municipal, em razão de danos causados ao erário municipal.

      Resumo - Os estados não têm legitimidade ativa para a execução de multas aplicadas, por Tribunais de Contas estaduais, em face de agentes públicos municipais, que, por seus atos, tenham causado prejuízos a municípios.

      • Se a multa aplicada pelo Tribunal de Contas decorre da prática de atos que causaram prejuízo ao erário municipal, o legitimado ativo para a execução do crédito fiscal é o município lesado, e não o estado (1). Entendimento diverso caracterizaria hipótese de enriquecimento sem causa.

      • Com base nesse entendimento, o Plenário, por maioria, ao julgar o Tema 642 da RG, negou provimento a recurso extraordinário. Vencidos os ministros Marco Aurélio (relator) e Edson Fachin.

      Precedentes: RE 525.663 AgR e RE 223.037

    2. regulador
      • Informativo 1007
      • ADI 1668 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. EDSON FACHIN
      • Julgamento: 27/02/2021 (Virtual)
      • Ramo do Direito: Administrativo
      • Matéria: AGÊNCIAS REGULADORAS

      Serviços de telecomunicações: criação da ANATEL e competências do órgão regulador

      Resumo - A competência atribuída ao chefe do Poder Executivo para expedir decreto em ordem a instituir ou eliminar a prestação do serviço em regime público, em concomitância ou não com a prestação no regime privado, aprovar o plano geral de outorgas do serviço em regime público e o plano de metas de universalização do serviço prestado em regime público está em perfeita consonância com o poder regulamentar previsto no art. 84, IV, parte final, e VI, da Constituição Federal (CF). O art. 18, I, II e III da Lei 9.472/1997 é compatível com os arts. 21, XI, e 48, XII, da Constituição Federal (CF).

      • A competência da Agência Nacional de Telecomunicações (ANATEL) para expedir normas subordina-se aos preceitos legais e regulamentares que regem a outorga, prestação e fruição dos serviços de telecomunicações no regime público e no regime privado. O art. 19, IV e X, da Lei 9.472/1997, desse modo, é constitucional.

      • A busca e posterior apreensão efetuada sem ordem judicial, com base apenas no poder de polícia de que é investida a ANATEL, mostra-se <u>inconstitucional</u> diante da violação ao disposto no princípio da inviolabilidade de domicílio, à luz do art. 5º, XI, da Constituição Federal. Logo, o art. 19, XV, da Lei 9.472/1997 é inconstitucional. A competência atribuída ao Conselho Diretor da ANATEL para editar normas próprias de licitação e contratação (Lei 9.472/1997, art. 22, II) deve observar o arcabouço normativo atinente às licitações e aos contratos, em respeito ao princípio da legalidade.

      • Diante da especificidade dos serviços de telecomunicações, é válida a criação de novas modalidades licitatórias por <u>lei de mesma hierarquia</u> da Lei Geral de Licitações (Lei 8.666/1993). Portanto, sua disciplina deve ser feita por meio de lei, e não de atos infralegais, em obediência aos artigos 21, XI, e 22, XXVII, do texto constitucional. Em razão disso, é inconstitucional a expressão “serão disciplinados pela Agência” contida no art. 55 da Lei 9.472/1997.

      • A contratação, a que se refere o art. 59 da Lei 9.472/1997, de técnicos ou empresas especializadas, inclusive consultores independentes e auditores externos, para executar atividades de competência da ANATEL, deve observar o regular procedimento licitatório previsto pelas leis de regência.

      • A possibilidade de concomitância de regimes público e privado de prestação do serviço, assim como a definição das modalidades do serviço são questões estritamente técnicas, da alçada da agência, a quem cabe o estabelecimento das bases normativas de cada matéria relacionada à execução, à definição e ao estabelecimento das regras peculiares a cada serviço.

      • A ANATEL não pode disciplinar procedimento licitatório simplificado por meio de norma de hierarquia inferior à Lei Geral de Licitações, sob pena de ofensa ao princípio da reserva legal. Por isso, são inconstitucionais as expressões “simplificado” e “nos termos por ela regulados” do art. 119, da Lei 9.472/1997.

      • A competência atribuída ao chefe do Poder Executivo para expedir decreto em ordem a instituir ou eliminar a prestação do serviço em regime público, em concomitância ou não com a prestação no regime privado, aprovar o plano geral de outorgas do serviço em regime público e o plano de metas de universalização do serviço prestado em regime público está em perfeita consonância com o poder regulamentar previsto no art. 84, IV, parte final, e VI, da Constituição Federal (CF). O art. 18, I, II e III da Lei 9.472/1997 (1) é compatível com os arts. 21, XI, e 48, XII, da Constituição Federal (CF) (2).

      • De fato, as medidas previstas no art. 18 são atinentes à execução da política de telecomunicações definidas no corpo da Lei 9.472/1997 e estão condicionadas por várias normas desse diploma.

      • O caput do art. 18 da Lei 9.472/1997 observa, portanto, esses dispositivos constitucionais, que atribuem ao Presidente da República a competência para expedir decretos e regulamentos destinados à fiel execução de lei, e a ele outorgam o poder de dispor, mediante decreto, sobre a organização e funcionamento da administração federal.

      • É ínsito ao poder regulamentar atuar secundum legem e intra legem. Assim, atendidos os limites da legislação que rege a matéria, a Lei 9.472/1997, ao tempo em que confere tal poder ao Presidente da República, também fixa parâmetros para o seu exercício.

      • A competência da Agência Nacional de Telecomunicações (ANATEL) para expedir normas subordina-se aos preceitos legais e regulamentares que regem a outorga, prestação e fruição dos serviços de telecomunicações no regime público e no regime privado. O art. 19, IV e X, da Lei 9.472/1997 (3), desse modo, é constitucional.

      • Na esteira da jurisprudência do Supremo Tribunal Federal (STF) (4), cabe às agências reguladoras, como a ANATEL, desempenhar a tarefa ordenadora e fiscalizatórias dos setores a elas submetidos. E, para a adequada execução dessa função, exsurge o poder de expedir normas como imanente à atividade regulatória das agências, a quem compete, no âmbito de sua atuação e nos limites do arcabouço normativo sobre o tema, disciplinar a prestação dos serviços.

      • Não se trata, portanto, de delegação de poderes legislativos, pois a expedição de normas regulatórias é sempre exercida com fundamento na lei, que também lhe serve de limite, mas que não esgota as possibilidades de mediação dos interesses diversos colocados para composição pelos órgãos reguladores.

      • A busca e posterior apreensão efetuada sem ordem judicial, com base apenas no poder de polícia de que é investida a ANATEL, mostra-se inconstitucional diante da violação ao disposto no princípio da inviolabilidade de domicílio, à luz do art. 5º, XI, da Constituição Federal (5). Logo, o art. 19, XV, da Lei 9.472/1997 (6) é inconstitucional.

      • A possibilidade de promoção de interdição de estabelecimentos, instalações ou equipamentos, e apreensão de bens ou produtos, nos termos do art. 3º, parágrafo único, da Lei 10.871/2004 (que dispõe sobre a criação de carreiras e organização de cargos efetivos das autarquias especiais, denominadas agências reguladoras), constitui exercício do poder de polícia da Administração Pública, dotado de autoexecutoriedade, inerente ao exercício dessa função (7).

      • Ocorre que o art. 19, XV, da Lei 9.472/1997, que estabelece a busca e apreensão de bens, tem uma dimensão distinta. Frise-se que, segundo orientação do STF, o conceito de domicílio não está limitado à residência domiciliar, mas abarca também qualquer compartimento privado onde alguém exerce profissão ou atividade (8).

      • A competência atribuída ao Conselho Diretor da ANATEL para editar normas próprias de licitação e contratação (Lei 9.472/1997, art. 22, II) (9) deve observar o arcabouço normativo atinente às licitações e aos contratos, em respeito ao princípio da legalidade.

      • Com efeito, as agências reguladoras não possuem a prerrogativa de legislar em matéria de licitação. Primeiro, porque isso viola a competência legislativa privativa da União (CF, art. 22, XXVII). Segundo, porque inovar no ordenamento jurídico não se encontra dentre os atributos que a função regulatória desses órgãos detêm, uma vez que eles colmatam lacunas propositais de natureza técnica na legislação, mas não podem estabelecer, de forma originária e primária, deveres e obrigações aos particulares, menos ainda exercer atividade criativa no que concerne a modalidades licitatórias e contratuais.

      • Diante da especificidade dos serviços de telecomunicações, é válida a criação de novas modalidades licitatórias por lei de mesma hierarquia da Lei Geral de Licitações (Lei 8.666/1993). Portanto, sua disciplina deve ser feita por meio de lei, e não de atos infralegais, em obediência aos artigos 21, XI, e 22, XXVII, do texto constitucional. Em razão disso, é inconstitucional a expressão “serão disciplinados pela Agência” contida no art. 55 da Lei 9.472/1997 (10).

      • A inserção, no ordenamento jurídico, de novas modalidades licitatórias, por lei que tem o mesmo status que a Lei Geral de Licitações não viola a Carta Magna. Todavia, para que seja respeitado o princípio da reserva legal e, ainda, tendo em vista que a consulta é instituto que não está restrito à ANATEL, mas cuja aplicação foi estendida, por meio do art. 37 da Lei 9.986/2000, a todas as agências reguladoras, a disciplina deve dar-se mediante lei.

      • A contratação, a que se refere o art. 59 da Lei 9.472/1997 (11), de técnicos ou empresas especializadas, inclusive consultores independentes e auditores externos, para executar atividades de competência da ANATEL, deve observar o regular procedimento licitatório previsto pelas leis de regência.

      • Efetivamente, a contratação sem o procedimento licitatório previsto pelas leis de regência fere o art. 22, XXVII, da CF.

      • A possibilidade de concomitância de regimes público e privado de prestação do serviço, assim como a definição das modalidades do serviço são questões estritamente técnicas, da alçada da agência, a quem cabe o estabelecimento das bases normativas de cada matéria relacionada à execução, à definição e ao estabelecimento das regras peculiares a cada serviço.

      • Diante da existência de parâmetros definidores na legislação, e da permissão constitucional para a prestação do serviço de telecomunicações pelo regime privado, por meio de autorização, não se vislumbra inconstitucionalidade nos artigos 65, III, §§ 1º e 2º, 66 e 69 da Lei 9.472/1997 (12).

      • A atribuição à agência da competência para definir os serviços não desborda dos limites de seu poder regulatório.

      • A previsão constitucional do art. 21, XI, permite a exploração “diretamente ou mediante autorização, concessão ou permissão, os serviços de telecomunicações, nos termos da lei ”.

      • Portanto, a despeito da previsão mais genérica do art. 175 da CF (13), no caso dos serviços de telecomunicações, é o texto constitucional que permite a exploração por meio de autorização, o que significa conferir à Administração a faculdade de instituir um regime privado, submetido à livre concorrência, ainda que derrogado parcialmente pela regulação estabelecida pela ANATEL (14).

      • A ANATEL não pode disciplinar procedimento licitatório simplificado por meio de norma de hierarquia inferior à Lei Geral de Licitações, sob pena de ofensa ao princípio da reserva legal. Por isso, são inconstitucionais as expressões “simplificado” e “nos termos por ela regulados” do art. 119, da Lei 9.472/1997 (15).

      • As normas licitatórias são cogentes, não viabilizando atuação livre deste ou daquele administrador, por maior que lhe seja a envergadura.

      • Com base nesse entendimento, o Plenário, por maioria, julgou parcialmente procedente pedido formulado em ação direta ajuizada contra dispositivos da Lei 9.472/1997, que dispõe sobre a organização dos serviços de telecomunicações, a criação e o funcionamento de um órgão regulador e outros aspectos institucionais, nos termos da Emenda Constitucional 8/1995. Vencido o ministro Roberto Barroso.

    3. ato cooperativo
      • ADI 429
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. LUIZ FUX
      • Julgamento: 20/08/2014
      • Publicação: 30/10/2014

      AÇÃO DIRETA DE INCONSTITUCIONALIDADE. TRIBUTÁRIO. NORMAS GERAIS DE DIREITO TRIBUTÁRIO. ICMS. CONSTITUIÇÃO DO ESTADO DO CEARÁ. IMPUGNAÇÃO AOS ARTIGOS 192, §§ 1º E 2º; 193 E SEU PARÁGRAFO ÚNICO; 201 E SEU PARÁGRAFO ÚNICO; 273, PARÁGRAFO ÚNICO; E 283, III, DA CONSTITUIÇÃO ESTADUAL. ADEQUADO TRATAMENTO TRIBUTÁRIO AO ATO COOPERATIVO E ISENÇÃO DE TRIBUTOS ESTADUAIS ÀS PEQUENAS E MICROEMPRESAS; PEQUENOS E MICROPRODUTORES RURAIS; BEM COMO PARA AS EMPRESAS QUE ABSORVAM CONTINGENTES DE DEFICIENTES NO SEU QUADRO FUNCIONAL OU CONFECCIONE E COMERCIALIZE APARELHOS DE FABRICAÇÃO ALTERNATIVA PARA PORTADORES DE DEFICIÊNCIA. DISPOSIÇÕES PREVISTAS NA CONSTITUIÇÃO ESTADUAL. VIOLAÇÃO AO DISPOSTO NO ARTIGO 146, INCISO III, ALÍNEA “C”, DA CRFB/88. COMPETÊNCIA CONCORRENTE DA UNIÃO, ESTADOS E DISTRITO FEDERAL PARA LEGISLAR SOBRE DIREITO TRIBUTÁRIO. ARTIGO 24, INCISO I, DA CRFB/88. AUSÊNCIA DE INCONSTITUCIONALIDADE. DEMAIS DISPOSITIVOS OBJURGADOS. CONCESSÃO UNILATERAL DE BENEFÍCIOS E INCENTIVOS FISCAIS. ICMS. AUSÊNCIA DE CONVÊNIO INTERESTADUAL. AFRONTA AO DISPOSTO NO ARTIGO 155, § 2º, INCISO XII, “G”, DA CRFB/88. CAPUT DO ART. 193 DA CONSTITUIÇÃO ESTADUAL. INTERPRETAÇÃO CONFORME À CONSTITUIÇÃO SEM DECLARAÇÃO DE NULIDADE. EXCLUSÃO DO ICMS DO SEU CAMPO DE INCIDÊNCIA. - 1. O Federalismo brasileiro exterioriza-se, dentre outros campos, no segmento tributário pela previsão de competências legislativo-fiscais privativas dos entes políticos, reservada à Lei Complementar estabelecer normas gerais.

      • 2. A concessão de benefícios fiscais não é matéria relativa à inciativa legislativa privativa do Chefe do Poder Executivo, nos termos do estabelecido no artigo 61, § 1º, inciso II, alínea b, da CRFB/88.

      • 3. O poder de exonerar corresponde a uma derivação do poder de tributar, assim, presente este, não há impedimentos para que as entidades investidas de competência tributária, como o são os Estados-membros, definam hipóteses de isenção ou de não-incidência das espécies tributárias em geral, à luz das regras de competência tributária, o que não interdita a Constituição estadual de dispor sobre o tema.

      • 4. O art. 146, III, “c”, da CRFB/88 determina que lei complementar estabeleça normas gerais sobre matéria tributária e, em especial, quanto ao adequado tratamento tributário a ser conferido ao ato cooperativo praticado pelas sociedades cooperativas.

      • 5. Não há a alegada inconstitucionalidade da Constituição estadual, porquanto a competência para legislar sobre direito tributário é concorrente, cabendo à União estabelecer normas gerais, aos Estados-membros e o Distrito Federal suplementar as lacunas da lei federal sobre normas gerais, afim de afeiçoá-las às particularidades locais, por isso que inexistindo lei federal de normas gerais, acerca das matérias enunciadas no citado artigo constitucional, os Estados podem exercer a competência <u>legislativa plena</u> (§ 3º, do art. 24 da CRFB/88).

      • 6. Consectariamente, o § 1º do artigo 192 da Constituição cearense que estabelece que “o ato cooperativo, praticado entre o associado e sua cooperativa, não implica em operação de mercado”, não é inconstitucional.

      • 7. É que a Suprema Corte, ao apreciar situação análoga, assentou que, enquanto não promulgada a lei complementar a que se refere o art. 146, III, “c”, da CRFB/88, não se pode pretender que, com base na legislação local, não possa o Estado-membro, que tem competência concorrente em se tratando de direito tributário (artigo 24, I e § 3º, da Carta Magna), dê às cooperativas o tratamento que julgar adequado, até porque tratamento adequado <u>não significa necessariamente tratamento privilegiado</u>, verbis: “Inexiste, no caso, ofensa ao artigo 146, III, ‘c’, da Constituição, porquanto esse dispositivo constitucional não concedeu às cooperativas imunidade tributária, razão por que, enquanto não for promulgada a lei complementar a que ele alude, não se pode pretender que, com base na legislação local mencionada no aresto recorrido, não possa o Estado-membro, que tem competência concorrente em se tratando de direito tributário (artigo 24, I e § 3º, da Carta Magna), dar às Cooperativas o tratamento que julgar adequado, até porque tratamento adequado não significa necessariamente tratamento privilegiado.”(RE 141.800, Rel. Min. MOREIRA ALVES, DJ de 30.10.97).

      • 8. A concessão unilateral de benefícios fiscais relativos ao ICMS, sem a prévia celebração de convênio intergovernamental, nos termos do que dispõe a LC nº 24/75, recepcionada inequivocamente consoante jurisprudência da Corte, afronta ao disposto no artigo 155, § 2º, XII, “g”, da CRFB/88.

      • 9. O comando constitucional contido no art. 155, § 2º, inciso “g”, que reserva à lei complementar federal “regular a forma como, mediante deliberação dos Estados e do Distrito Federal, isenções, incentivos e benefícios fiscais serão concedidos e revogados” aplicado, in casu, revela manifesta a inconstitucionalidade material dos dispositivos da Constituição cearense que outorga incentivo fiscal incompatível com a CRFB/88. Precedentes: ADI 84, Rel. Min. ILMAR GALVÃO, Tribunal Pleno, julgado em 15/02/1996, DJ 19-04-1996).

      • 10. A outorga de benefícios fiscais relativos ao ICMS, sem a prévia e necessária celebração de convênio entre os Estados e o Distrito Federal é manifestamente inconstitucional. Precedentes: ADI 2906/RJ, rel. Min. Marco Aurélio, 1º.6.2011; ADI 2376/RJ, rel. Min. Marco Aurélio, 1º.6.2011; ADI 3674/RJ, rel. Min. Marco Aurélio, 1º.6.2011; ADI 3413/RJ, rel. Min. Marco Aurélio, 1º.6.2011; ADI 4457/PR, rel. Min. Marco Aurélio, 1º.6.2011; ADI 3794/PR, rel. Min. Joaquim Barbosa, 1º.6.2011; ADI 2688/PR, rel. Min. Joaquim Barbosa, 1º.6.2011; ADI 1247/PA, rel. Min. Dias Toffolli, 1º.6.2011; ADI 3702/ES, rel. Min. Dias Toffoli, 1º.6.2011; ADI 4152/SP, rel. Min. Cezar Peluso, 1º.6.2011; ADI 3664/RJ, rel. Min. Cezar Peluso, 1º.6.2011; ADI 3803/PR, rel. Min. Cezar Peluso, 1º.6.2011; ADI 2549/DF, rel. Min. Ricardo Lewandowski, 1º.6.2011.

      • 11. Calcado nessas premissas, forçoso concluir que: a) O § 2º do art. 192 da Constituição cearense concede isenção tributária de ICMS aos implementos e equipamentos destinados aos deficientes físicos auditivos, visuais, mentais e múltiplos, bem como aos veículos automotores de fabricação nacional com até 90 HP de potência adaptados para o uso de pessoas portadoras de deficiência, o que acarreta a declaração de sua inconstitucionalidade, sem a pronúncia de nulidade, por um prazo de doze meses. b) O caput do artigo 193 da Constituição cearense isenta as microempresas de tributos estaduais, ao passo que seu parágrafo único estende a isenção, de forma expressa, ao ICMS, o que acarreta a declaração de inconstitucionalidade do parágrafo único e do caput, este por interpretação conforme para excluir de seu âmbito de incidência o ICMS. c) A Inconstitucionalidade do artigo 201 e seu parágrafo único, da Constituição cearense é manifesta, porquanto pela simples leitura dos dispositivos verifica-se que o imposto estadual com tal campo de incidência é o ICMS, verbis: “Art. 201. Não incidirá imposto, conforme a lei dispuser, sobre todo e qualquer produto agrícola pertencente à cesta básica , produzido por pequenos e microprodutores rurais que utilizam apenas a mão-de-obra familiar, vendido diretamente aos consumidores finais. Parágrafo único. A não-incidência abrange produtos oriundos de associações e cooperativas de produção e de produtores, cujos quadros sociais sejam compostos exclusivamente por pequenos e microprodutores e trabalhadores rurais sem terra. d) O parágrafo único do art. 273 e o inciso III do art. 283, da Constituição cearense incidem na mesma inconstitucionalidade, verbis: “Art. 273. Toda entidade pública ou privada que inclua o atendimento à criança e ao adolescente, inclusive os órgãos de segurança, tem por finalidade prioritária assegurar-lhes os direitos fundamentais. Parágrafo único. As empresas privadas que absorvam contingentes de até cinco por cento de deficientes no seu quadro funcional gozarão de incentivos fiscais de redução de um por cento no ICMS. (…) Art. 283. Para estimular a confecção e comercialização de aparelhos de fabricação alternativa para as pessoas portadoras de deficiência, o Estado concederá: (…) III - isenção de cem por cento do ICMS.

      • 12. Pedido de inconstitucionalidade julgado parcialmente procedente para declarar: (i) inconstitucional o parágrafo 2º do art. 192, sem a pronúncia de nulidade, por um prazo de doze meses (ii) parcialmente inconstitucional o caput do art. 193, dando-lhe interpretação conforme para excluir de seu âmbito de incidência o ICMS; (iii) inconstitucional o parágrafo único do artigo 193; (iv) inconstitucional o artigo 201, caput, e seu parágrafo único; (v) inconstitucional o parágrafo único do artigo 273; (vi) inconstitucional o inciso III do artigo 283; julgar improcedente o pedido quanto ao caput e §1º do artigo 192, todos os artigos da Constituição cearense.

      Observação - Acórdão(s) citado(s): (COMPETÊNCIA, LEI COMPLEMENTAR FEDERAL, REGULAÇÃO, BENEFÍCIO FISCAL, ICMS) ADI 84 (TP). (INCONSTITUCIONALIDADE, ESTADO-MEMBRO, CONCESSÃO UNILATERAL, BENEFÍCIO FISCAL, ICMS) ADI 1247 (TP), ADI 2376 (TP), ADI 2549 (TP), ADI 2688 (TP), ADI 2906 (TP), ADI 3413 (TP), ADI 3664 (TP), ADI 3674 (TP), ADI 3702 (TP), ADI 3794 (TP), ADI 3803 (TP), ADI 3809 (TP), ADI 4152 (TP), ADI 4457 (TP). (COMPETÊNCIA, ESTADO-MEMBRO, REGULAÇÃO, TRATAMENTO TRIBUTÁRIO, COOPERATIVA) RE 141800 (2ªT). (ISENÇÃO TRIBUTÁRIA, ICMS, TEMPLO RELIGIOSO) ADI 3421 (TP). Número de páginas: 44. Análise: 12/12/2014, RAF.

      Doutrina Ferreira. BRANCO, Paulo Gustavo Gonet. Curso de Direito Constitucional. 8. ed. São Paulo: Saraiva, 2013. p. 803/804. PYRRHO, Sérgio. Soberania, ICMS e isenções os convênios e os tratados internacionais. Rio de Janeiro: Lumen Juris, 2008. p. 32. TORRES, Ricardo Lobo. Tratado de direito constitucional financeiro

      Obs.: Muito embora a CF preveja lei complementar federal para conferir qual será o tratamento adequado ao ato cooperativo, isso não significa que os Estados-Membros estarão impedidos de legislar plenamente quanto à matéria enquanto não houve a necessária lei complementar federal. Isso é, segue-se a regra geral do art. 24, § 3º, CF que estabelece competência legislativa plena aos Estados acaso União não edite regras gerais.

    4. lei complementar
      • Informativo 991
      • RE 784439 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. ROSA WEBER
      • Julgamento: 26/06/2020 (Virtual)
      • Ramo do Direito: Tributário
      • Matéria: ISS; Impostos

      Natureza taxativa da lista do rol de serviços sujeitos a ISS

      Tese fixada

      • É taxativa a lista de serviços sujeitos ao ISS a que se refere o art. 156, III, da Constituição Federal, admitindo-se, contudo, a incidência do tributo sobre as atividades inerentes aos serviços elencados em lei em razão da interpretação extensiva.

      Resumo - As listas de serviços preveem ser irrelevante a nomenclatura dada ao serviço e trazem expressões para permitir a interpretação extensiva de alguns de seus itens, notadamente se socorrendo da fórmula “e congêneres”. Não existe obstáculo constitucional contra esta sistemática legislativa e excessos interpretativos que venham a ocorrer serão dirimíveis pelo Poder Judiciário.

      Legislação: CF/1988, art. 5º, LV, art. 156, III.

      Precedentes: RE 592.905/SC, relator Min. Eros Grau, DJe de 5.3.2010 (Tema 125 RG)RE 651.703/PR, relator Min. Luiz Fux, DJe de 26.4.2017 (Tema 581 RG)

      Observação: Clipping das sessões virtuais. Acórdão publicado no DJe de 15.9.2020.


      • Informativo 1110
      • ADI 5674 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. ANDRÉ MENDONÇA
      • Julgamento: 29/09/2023 (Virtual)
      • Ramo do Direito: Tributário, Constitucional
      • Matéria: Impostos; ISS; Hipóteses de Incidência; Hospedagem / Sistema Tributário Nacional; Impostos dos Municípios; ISS

      ISS: incidência sobre atividades relativas à hospedagem

      Resumo - É constitucional a incidência do Imposto sobre Serviços de Qualquer Natureza (ISS) sobre as atividades relativas à hospedagem de qualquer natureza, prevista no subitem 9.01 da lista de serviços anexa à Lei Complementar 116/2003.

      • Os contratos que veiculam hospedagem de qualquer natureza, nos meios dispostos na referida lista, são preponderantemente de serviços. Ademais, o ISS incide sobre as atividades que representam obrigações de fazer e obrigações mistas, que incluem obrigação de dar (1).

      • Não se pode fazer confusão entre a relação negocial de hospedagem e o contrato de locação de bem imóvel, de modo que é indevido excluir da base de cálculo desse tributo municipal a parcela da locação da unidade habitacional, visto que a circulação de serviço prevista contratualmente tem caráter singular e ganha sentido econômico com sua visualização unitária.

      • Assim, dada a prevalência da uniformização da legislação federal, reforça-se o entendimento do STJ de que todas as parcelas que integram o preço do serviço de hotelaria compõem a base de cálculo do ISS.

      • Com base nesses e em outros entendimentos, o Plenário, por unanimidade, julgou improcedente a ação, para assentar a constitucionalidade do subitem 9.01 da lista de serviços anexa à Lei Complementar 116/2003 (2).(1) Precedentes citados: RE 651.703 (Tema 581 RG); RE 603.136 (Tema 300 RG) e RE 784.439 (Tema 296 RG). (2) Lista de serviços anexa à Lei Complementar 116/2003: “9 – Serviços relativos a hospedagem, turismo, viagens e congêneres. 9.01 – Hospedagem de qualquer natureza em hotéis, apart-service condominiais, flat, apart-hotéis, hotéis residência, residence-service, suite service, hotelaria marítima, motéis, pensões e congêneres; ocupação por temporada com fornecimento de serviço (o valor da alimentação e gorjeta, quando incluído no preço da diária, fica sujeito ao Imposto Sobre Serviços).”

      Legislação: Lista de serviços anexa à Lei Complementar 116/2003: subitem 9.01.

      Precedentes: RE 651.703 (Tema 581 RG); RE 603.136 (Tema 300 RG) e RE 784.439 (Tema 296 RG).

    1. which holds for some

      Why does it hold for this interval? Is it the error in approximation because the function may require more than 3 Taylor terms? If so, why did they not use Big O notation as above?

    1. II

      Com efeito, vide do acórdão da ADI 5938, que o afastamento deve ser automático e incondicionado. Determinar à gestante que apresente laudo médico para afastamento vulnera a proteção à maternidade e à integral proteção à criança, criando hipóteses em que - por desconhecimento, receio de demissão ou algo do gênero - a mulher tenha contato com ambiente potencialmente insalubre em grau máximo, inclusive durante a amamentação.

    2. Art. 394-A
      • Informativo 942
      • ADI 5938 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. ALEXANDRE DE MORAES
      • Julgamento: 29/05/2019 (Presencial)
      • Ramo do Direito: Trabalho
      • Matéria: PROTEÇÃO À MATERNIDADE

      CLT, art. 394-A: atividade insalubre e afastamento de gestante e de lactante

      Resumo - O Plenário, por maioria, confirmou medida cautelar deferida pelo ministro Alexandre de Moraes (relator) em decisão monocrática e julgou parcialmente procedente pedido formulado em ação direta para declarar a inconstitucionalidade da expressão “quando apresentar atestado de saúde, emitido por médico de confiança da mulher, que recomende o afastamento”, contida nos incisos II e III do art. 394-A da Consolidação das Leis do Trabalho (CLT) (1), inseridos pelo art. 1º da Lei 13.467/2017.

      • O colegiado registrou que, na redação anterior, o preceito estabelecia que a empregada gestante ou lactante seria afastada, enquanto durasse a gestação e a lactação, de quaisquer atividades, operações ou locais insalubres e deveria exercer suas atividades em local salubre.

      • Com a alteração implementada pela Lei 13.467/2017, que promoveu a “Reforma Trabalhista” de 2017, o art. 394-A passou a permitir que a mulher gestante continuasse a realizar suas atividades mesmo em condições insalubres em grau mínimo ou médio. Ainda mais grave, no caso da lactação, que ela permanecesse a desempenhá-las inclusive em grau máximo de insalubridade. Ademais, criou o ônus à gestante ou à lactante da apresentação de atestado de saúde, emitido por médico de sua confiança, que certificasse a necessidade do afastamento. Essa mudança trouxe a exposição dessas trabalhadoras a atividades insalubres.

      • A Corte assinalou que a Constituição Federal (CF) proclama, no caput do art. 6º, a proteção à maternidade como direito social, ligado à dignidade da pessoa humana. Essa proteção é a ratio para inúmeros outros direitos sociais instrumentais, como a licença-gestante, o direito à segurança no emprego, que compreende a tutela da relação de emprego contra dispensa arbitrária sem justa causa da gestante, e, nos termos do art. 7º, a proteção do mercado de trabalho da mulher, mediante incentivos específicos (inciso XX), e a redução dos riscos inerentes ao trabalho, por meio de normas de saúde, higiene e segurança (inciso XXII).

      • Sob essa ótica, a proteção da mulher grávida ou lactante contra o trabalho insalubre caracteriza-se como importante direito social instrumental protetivo tanto da mulher quanto da criança. Trata-se de normas de salvaguarda dos direitos sociais da mulher e de efetivação de integral proteção ao recém-nascido, possibilitando sua convivência com a mãe, nos primeiros meses de vida, de maneira harmônica e segura, sem os perigos de um ambiente insalubre. A imprescindibilidade da máxima eficácia desse direito social também decorre da absoluta prioridade que o art. 227 do texto constitucional (2) estabelece à integral proteção à criança, inclusive ao nascituro e ao recém-nascido lactente.

      • Há, na hipótese, direito de dupla titularidade. A proteção à maternidade e a integral proteção à criança são direitos irrenunciáveis e não podem ser afastados pelo desconhecimento, pela impossibilidade decorrente da distância de centros médicos ou pela própria negligência da gestante ou lactante em apresentar atestado médico, sob pena de prejudicá-la e de prejudicar o recém-nascido. Outras razões poderiam levar a mulher a não apresentar o documento, como, por exemplo, o medo de vir a ser demitida posteriormente ou a pressão para não entregar o atestado.

      • Dessa forma, as expressões impugnadas não estão em consonância com os dispositivos constitucionais. A previsão do <u>afastamento automático</u> da mulher gestante ou lactante do ambiente insalubre está de acordo com a jurisprudência do Supremo Tribunal Federal (STF) em relação à integral proteção à maternidade e à saúde da criança.

      • Na espécie, a mudança trazida pela lei pretendeu a inversão do ônus da demonstração probatória e documental da circunstância insalubre, a inversão da proteção à maternidade e ao nascituro ou recém-nascido. Partiu-se erroneamente da lógica de que, em regra, a insalubridade mínima e a média, durante a gestação, e mesmo a máxima, durante a lactação, não causam riscos. Isso desfavorece a plena proteção do interesse constitucionalmente protegido, na medida em que sujeita a empregada a maior embaraço para o exercício de seus direitos. O caso guarda relação com julgado recente em que apreciado o Tema 497 da repercussão geral (RE 629.053) sobre a estabilidade de empregada gestante.

      • Naquele julgamento, o STF consignou que o conjunto dos direitos sociais foi consagrado constitucionalmente como uma das espécies de direitos fundamentais, caracterizando-se como verdadeiras liberdades positivas, de observância obrigatória em um Estado Social de Direito, visando à melhoria das condições de vida dos hipossuficientes e à concretização da igualdade social.

      • O ministro Edson Fachin frisou que não se trata de reconhecer às mulheres qualquer benesse do ponto de vista constitucional. Por sua vez, o ministro Roberto Barroso acrescentou que a exigência viola o princípio da precaução, que vale também para o ambiente do trabalho, pelo qual, sempre que houver risco ou incerteza, deve ser favorecida a posição mais conservadora e protetiva.

      • A ministra Rosa Weber expôs o histórico do direito e os principais instrumentos internacionais a respeito. Aduziu que a alteração implica inegável retrocesso social, uma vez que revoga anterior norma proibitória desse trabalho da gestante e lactante, além do menoscabo ao direito fundamental à saúde da mãe trabalhadora, pois transfere ao próprio sujeito tutelado a responsabilidade pela conveniência de atestado indicando a necessidade de afastamento do trabalho. Por seu turno, o ministro Luiz Fux também apontou a inconstitucionalidade por violação à igualdade de gênero, acompanhando o que destacado pelo ministro Alexandre de Moraes (relator) e pela ministra Rosa Weber.

      • Já o ministro Celso de Mello reforçou os fundamentos trazidos e registrou que a cláusula que proíbe o retrocesso em matéria social traduz, no processo de sua concretização, verdadeira dimensão negativa pertinente aos direitos sociais, a impedir que os níveis de concretização dessas prerrogativas, uma vez atingidos, venham a ser reduzidos, degradados ou suprimidos.

      • Vencido o ministro Marco Aurélio, que reputou improcedente o pleito formulado na ação. A seu ver, os preceitos encerram tão somente liberdade da mulher prestadora dos serviços, no que prevista a possibilidade de afastamento do ambiente insalubre, e visam atender às exigências do mercado de trabalho para não se criarem óbices à contratação da mão de obra feminina. O ministro afirmou não ser desarrazoada a imposição do atestado médico.

      (1) CLT: “Art. 394-A. Sem prejuízo de sua remuneração, nesta incluído o valor do adicional de insalubridade, a empregada deverá ser afastada de: (...) II – atividades consideradas insalubres em grau médio ou mínimo, quando apresentar atestado de saúde, emitido por médico de confiança da mulher, que recomende o afastamento durante a gestação; III – atividades consideradas insalubres em qualquer grau, quando apresentar atestado de saúde, emitido por médico de confiança da mulher, que recomende o afastamento durante a lactação.” (2) CF/1988: “Art. 227. É dever da família, da sociedade e do Estado assegurar à criança, ao adolescente e ao jovem, com absoluta prioridade, o direito à vida, à saúde, à alimentação, à educação, ao lazer, à profissionalização, à cultura, à dignidade, ao respeito, à liberdade e à convivência familiar e comunitária, além de colocá-los a salvo de toda forma de negligência, discriminação, exploração, violência, crueldade e opressão.” (3) CF/1988: “Art. 1º A República Federativa do Brasil, formada pela união indissolúvel dos Estados e Municípios e do Distrito Federal, constitui-se em Estado Democrático de Direito e tem como fundamentos: (...) IV – os valores sociais do trabalho e da livre iniciativa;”

      Legislação: CF, arts. 1º, IV e 227. CLT, art. 394-A, II e III.

      Precedentes: RE 629.053

  6. www.planalto.gov.br www.planalto.gov.br
    1. § 8o

      Mediante simples aditamento, o Presidente, por despacho, poderá estender os efeitos da suspensão a liminares supervenientes cujo objeto seja idêntico. Ou seja, em virtude da celeridade processual, é prescindível processos autônomos para a suspensão de liminares análogas, desde que haja aditamento da inicial.

      Observe que, assim, a suspensão, em regra, não é para liminares futuras, para as quais necessita-se de <u>aditamento</u> à inicial.

    1. Principio de Razón Suficiente: Leibniz formuló este principio, que establece que debe haber una razón suficiente para que cualquier cosa exista, para que cualquier evento ocurra, o para que cualquier verdad sea cierta. Según él, incluso si no podemos conocer esta razón, debe existir.La Existencia de Dios: Para Leibniz, la razón suficiente última para la existencia del universo es Dios. Dios, según su argumento, es un ser necesario, cuya esencia implica su existencia. Es decir, la existencia de Dios es lógica y metafísicamente necesaria.El Mundo Contingente: Todo en nuestro universo es contingente; podría existir o no existir, y por lo tanto, necesita una razón externa para su existencia. Este mundo contingente no puede ser la razón última de su propia existencia.La Elección del Mejor Mundo Posible: Leibniz argumentaba que, entre todos los mundos posibles, Dios, siendo perfecto y benevolente, habría elegido crear el mejor de todos los mundos posibles. La existencia de "algo" en vez de "nada" se explica porque la nada sería menos perfecta que la existencia de este mundo, que, aunque tenga imperfecciones, permite la existencia del bien y del orden.El Argumento Ontológico Simplificado: Aunque Leibniz también contribuyó al argumento ontológico, en el contexto de esta pregunta, su razonamiento implica que la mera posibilidad de un ser necesario (Dios) lleva a su existencia, porque la nada no tendría razón para prevalecer sobre algo que tiene una razón para existir.

      Leibniz se da un tiro en el pie. Si Dios es necesario pero también es perfecto, el mundo como creación suya no pudo ser de otra manera y no pudo no existir porque Dios no pudo no haberlo creado ya que su decisión de crear el mundo es perfecta y no hay otra decisión posible derivada de su perfección, por lo tanto el mundo es necesario, no contingente ya que es una consecuencia necesaria de un ser necesario por lo tanto ambos existen necesariamente. Además, si no pudo haber momento ni instancia en la que el mundo no existiera porque dios no pudo haber permanecido en un estado de imperfección (ya que crear el mundo y coexistir con el es perfecto, entonces su contrario es imperfecto) el mundo tiene que existir desde siempre con Dios mismo por lo que no comenzó a existir.

    2. El Ser como Pregunta Fundamental: Para Heidegger, la pregunta por el ser (Sein) es la pregunta más fundamental de la filosofía. Él distingue entre "el ser" y "los entes" (o seres, cosas que son). Los filósofos tradicionalmente se han preocupado por los entes, pero Heidegger quiere volver a la pregunta olvidada del ser en sí.El Dasein: Heidegger introduce el concepto de Dasein, que es el ser humano en tanto que tiene la capacidad de preguntar por el ser. Dasein es "ser-ahí", y su esencia radica en su existencia, en su estar en el mundo y su capacidad de cuestionarse sobre el ser.La Nada: En "¿Qué es la metafísica?", Heidegger explora la relación entre el ser y la nada. Para él, la nada no es simplemente la ausencia de algo, sino que es un concepto que debemos experimentar para entender el ser. La nada se revela en la angustia (Angst), una sensación que nos hace conscientes de la posibilidad de la no-existencia, haciendo así que el ser se destaque más claramente.El Abandono del Ser: Heidegger considera que la historia de la metafísica ha sido una historia del olvido del ser, donde la pregunta por el ser ha sido sustituida por preguntas sobre los entes. Este olvido culmina en lo que él llama "nihilismo", donde la nada se vuelve contra el ser mismo, llevando a una crisis en la comprensión del sentido del ser.El Claro del Ser: Heidegger sugiere que debemos retornar a un pensar más originario, donde el ser se manifiesta en lo que él llama "el claro" (Lichtung), un espacio abierto donde el ser puede ser pensado y experimentado más allá de las categorías tradicionales de la metafísica.Ser y Tiempo: En "Ser y Tiempo", Heidegger argumenta que el tiempo es el horizonte desde el cual entendemos el ser. La existencia auténtica implica una relación adecuada con el tiempo, reconociendo nuestra finitud y la temporalidad del ser.

      Heidegger se apropia del concepto de "nada" para dar su explicación de la experiencia mental humana de imaginar la nada y sus consecuencias, derivando en un aprecio profundo por el ser. Parlotea sobre el concepto de nada deformandolo dificultando la comprensión de su idea. El lector de por sí siempre da su toque de deformación de la idea, pero elegir el parloteo por sobre la expresión explícita añade una capa inecesaria mas sobre la interpretación de la idea.

    1. Art. 33

      O locatário tem direito de preferência na compra do imóvel que locatário pretenda vender. Acaso haja preterição, o locatário ainda possui meios para obter a propriedade do bem.

      Para tanto, os requisitos para a constituição de Direito Real a favor do locatário:

      • Requerer, em até 6 (seis) meses, o imóvel para si, contado da data do registro em cartório de imóveis;
      • Depositar o valor do imóvel e demais despesas de transferências;
      • Averbação junto à matrícula do imóvel do contrato de locação, desde que averbado em, no mínimo, 30 (trinta) dias da data da alienação junto à matrícula;

      Observe que a lei estabelece que a averbação do contrato de locação na matrícula do imóvel tem 2 importantes efeitos: - Assegurar que eventual novo locatário observe o prazo de locação, proibindo a denúncia do contrato sem antes decorrer o prazo contratual; - Assegurar a aquisição do imóvel acaso haja preterição do locador quanto ao direito de preferência do locatário.

      Por fim, cabível destacar que a averbação, enquanto manifestação da publicidade dos atos relativos a direitos reais, é essencial para geração de efeito erga omnes. Com efeito, para garantir o direito real de aquisição, é imprescindível a averbação do contrato de locação.

      Lado outro, tratando-se da outra hipótese referente ao prejuízo do direito de preferência do locatário, o pleito de perdas e danos não se submete a registro público como condição.

    2. cláusula de vigência

      STF Súmula 442 A inscrição do contrato de locação no Registro de Imóveis, para a validade da cláusula de vigência contra o adquirente do imóvel, ou perante terceiros, dispensa a transcrição no Registro de Títulos e Documentos.


      Observe que o direito de vigência da locação, na hipótese de alienação do imóvel a terceiros, tem 3 requisitos: - Existir no contrato de locação a cláusula de vigência; - Haver averbação do contrato de locação na matrícula do imóvel. - Locação por prazo determinado.

      Com isso, inexistindo algum dos requisitos acima, não haverá direito à vigência.

  7. bafybeibje2lf6mvlla6qirggc5kwjnk2cpcfki43qw2i2x3vbyidopdxbe.ipfs.inbrowser.link bafybeibje2lf6mvlla6qirggc5kwjnk2cpcfki43qw2i2x3vbyidopdxbe.ipfs.inbrowser.link
    1. Duzentos anos atrás, antes do advento do capitalismo, o status social de um homem permanecia inalterado do princípio ao fim de sua existência: era herdado dos seus ancestrais e nunca mudava. Se nascesse pobre, pobre seria para sempre; se rico – lorde ou duque –, manteria seu ducado, e a propriedade que o acompanhava, pelo resto dos seus dias

    1. Ve vlastnickém bydlení žilo v roce 2024 celkem 46 % osob tvořících domácnosti mladých. Oproti tomu v nájmu bydlelo ve stejném roce 41 % osob v těchto domácnostech. Od roku 2011 se navíc podíl osob žijících v nájemním bydlení trvale zvyšuje (o necelých 12 p.b.), zatímco podíl osob žijících ve vlastnických formách bydlení (družstevním a vlastnickém) klesl o téměř 16 p.b.

      Tady bych tu logiku sdělení otočila

      V posledních 15 letech došlo k velkému posunu ve struktuře podle právního důvodu užívání bytu. Zatímco v roce 2011 žilo 63 % mladých domácností ve vlastnické formě bydlení (vlastnickém, nebo družstevním) a 32 % v nájemním bydlení, v roce 2024 byl tento poměr již téměř vyrovnaný (48 %, resp. 42 %), přičemž podíl domácností ve vlastnické formě se snížil o 15 p.b.

    2. což bylo o 16 p.b. více než u ostatních ekonomicky aktivních domácností

      ...což je přibližně 2x více, než musely vynaložit domácnosti v produktivním věku.

    3. V porovnání s ostatními domácnostmi je růst podílu mladých domácností v nájemním bydlení ještě výraznější. Zejména v krajských městech a v Praze byl v roce 2024 podíl mladých domácností bydlících v nájmu o 40 p.b. vyšší než podíl ostatních ekonomicky aktivních domácností, jejichž zastoupení v nájemním bydlení v krajských městech dlouhodobě klesá.

      Z porovnání mladých domácností v nájemním bydlení s domácnostmi v produktivním věku v nájemním bydlení je zřejmé, že zejména v krajských městech se odlišný vývojový trend těchto dvou skupin neustále prohlubuje; zatímco podíl domácností v produktivním věku, které jsou v nájemním bydlení, se mezi lety 2008 a 2024 snížil z 24 na 17 % (snížení o 7 p.b.), u mladých domácností se ve stejném období zvýšil z 43 na 59 % (zvýšení o 16 p.b.).

    4. osob tvořících domácnosti mladých

      Zároveň by mi tady přišlo jednodušší mluvit o domácnostech a ne o těch osobách, protože

      a) to je formulačně trochu krkolomné

      b) o osobách mluvíš v tom předchozím grafu/textu a čtenář musí být superpozorný, aby odlišil mladé osoby a osoby žijící v domácnostech mladých ;)

      formulaci v předchozím komentu už jsem upravila na domácnosti - zvaž podle sebe

    1. CONSEJOS

      Traduce en tu mente el siguiente extracto al español sin mirar el texto y luego comprueba si lo has hecho correctamente o si existen otras alternativas de vocabulario:

      The advantages of sharing housing are extensive: the most prominent is the savings you have by paying only for a room and not for the whole house. By sharing expenses, there is the possibility of having a larger space for a lower price. As if that were not enough, following the roommate guide that you can see in the video, it is most likely that no misunderstandings will be generated between the cohabitants and you can even make plans together.

      Although there are also disadvantages, the list of advantages can be longer than the list of problems. It is up to the individual to decide which lifestyle he or she prefers to lead.

    1. Damas

      Di si las siguientes frases sobre el juego de las damas son verdaderas o falsas: 1. Se juega con el mismo tablero que el ajedrez. 2. Gana quien coma el mayor número de fichas del contrario.

    2. Relaciona cada juego con su objetivo: 1. Gana quien consigue mayor puntuación al sumar el valor de las letras y/o palabras. 2. Gana quien saca las fichas fuera del tablero antes. 3. Gana quien lleva las fichas al centro del tablero. 4. Gana quien adivina el mayor número de palabras representadas.

    3. Este juego milenario tiene su origen en el juego hindú Chaturanga (información en inglés) o juego del ejército. El objetivo del juego es vencer al adversario acechando al “rey” de manera que no pueda escapar. Esta jugada se conoce como “jaque mate”. Se juega en un tablero cuadriculado 8 x 8 y cada oponente cuenta con 16 piezas para llevar a cabo su cometido. Es un juego que exige concentración y planificación del movimiento de las fichas. De primera intención pueda parecer complicado pero luego que se entienden las reglas tiene la capacidad de atrapar a sus jugadores.

      Encuentra en este extracto un sinónimo de amenazar, adversario, misión, enganchar.

    1. "Ojalá las clases de idiomas tuvieran menos que ver con aprender el idioma y más con explorar el mundo".

      Explica esta frase en tus propias palabras y di si estás de acuerdo o no y por qué.

    2. "Es un poco incómodo cuando la gente dice que tienes algún tipo de 'don' lingüístico o que eres una especie de genio del lenguaje", dice Rawlings.

      ¿Cómo explicas esta cita de Rawlings en tus propias palabras?

    1. Relaciona estas frases con los siguientes platos: quesadilla, tamal, burrito, tacos:

      1. Este plato tiene su origen en las minas de plata del siglo dieciocho en México.
      2. Este plato consiste en tortillas que se pueden doblar y rellenar de frijoles, guacamole y queso.
      3. Este plato tradicionalmente se envuelve en hojas de maíz y se basa en una masa de maíz rellena.
      4. Este plato consiste en tortillas de harina dobladas por la mutad con papas, frijoles o carne y queso fundido por dentro.
    2. "El ceviche es el plato nacional de Perú, que consiste en rodajas de pescado o marisco crudo que se condimentan con sal, cebolla y chiles, y luego se marinan en jugo de limón. Debido a la acidez del jugo, la textura del pescado cambia, al igual que su color: de rosa a blanco", describe TasteAtlas. El platillo de Perú, según los votantes en el sitio web, se colocó en el número 58 de los 100 más populares en el mundo. En la región, es el número ocho.

      Encuentra un sinónimo de: trozos, destemplado, aderezar, adobar

    1. Así son los mejores sistemas educativos del mundo

      Preguntas de comprensión:

      1. "Cambiar el rumbo, transformar la educación". ¿Cómo explicarías este lema en tus propias palabras?

      2. ¿Verdadero o falso? Canadá es líder entre las potencias mundiales en destinar más dinero para la educación.

      3. ¿Verdadero o falso? En Finlandia los padres intervienen con frecuencia en la toma de decisiones escolares.

      4. ¿Verdadero o falso? En Hong Kong se pone énfasis en la memorización y en la creatividad.

    2. La educación pública es el pilar fundamental del sistema finlandés, así como sus maestros, que están altamente valorados. De hecho, antes de llegar a ser docentes, los estudiantes pasan por un sistema de selección muy exigente. Debido al elevado estatus que consiguen y, al contrario que en otros modelos como el de Hong Kong, los padres influyen poco en las decisiones de la escuela.

      ¿Verdadero o falso? En Finlandia los padres intervienen con frecuencia en la toma de decisiones escolares.

    3. Su historia como colonia británica es determinante en su sistema educativo, que no dista mucho de los occidentales. No obstante, desde que se inició la reforma educativa en el 2000, los objetivos han variado y se orientan a una mayor creatividad frente a una menor memorización. El sistema se enfoca al desarrollo personal y el aprendizaje a lo largo de la vida, según las declaraciones de la doctora Catherine K. K. Chan, subsecretaria de Educación de Hong Kong, en el libro ‘Gigantes de la Educación’.  Los padres también desempeñan un papel muy activo en la educación de los pequeños. Por ese motivo, las academias y clases privadas triunfan en Hong Kong, tanto que han convertido a sus profesores en auténticas celebrities; los llamados ‘Tutor kings’.

      ¿Verdadero o falso? En Hong Kong se pone énfasis en la memorización y en la creatividad.

    4. En el país norteamericano las escuelas públicas conviven con las privadas. Sin embargo, el 95% de los padres elige la educación pública para sus hijos, según la Asociación Canadiense de Escuelas Públicas. Se trata de un país que invierte mucho en educación; destina más fondos (per cápita) que cualquier otro país del G8.

      ¿Verdadero o falso? Canadá es líder entre las potencias mundiales en destinar más dinero para la educación.

    1. Busca en el texto una expresión idiomática que significa lo siguiente: Acabar con una situación de indiferencia, desconfianza o tensión con otra persona, iniciando la conversación con ella y procurando crear un ambiente agradable.

    1. "Es muy humano generalizar; a veces las descripciones son jocosas o simpáticas, otras veces pueden resultar más hirientes y eso es reflejo de insensibilidad, ignorancia, desconocimiento o simple odio a los otros"

      ¿Estás de acuerdo con esta afirmación? ¿Por qué?

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.  

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. 

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to Reviewer #1 Public Review #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below. 

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below). 

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study. 

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition,  we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.” 

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation 

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally. 

      Thank you for your comments on this issue. 

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders, 

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and 

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance,  combining different MRI modalities into the prediction models, similar to our stacked models, ocen leads to the highest performance of age prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the lader as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore underfided models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age prediction models from MRI data of largely healthy participants and apply the built age prediction models to participants who are also largely healthy. Accordingly, the age prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fided. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. 

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest.

      Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to beUer understand the stacked regression models used to ensure that these models are not overfit. 

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.  

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features),  “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. Acer looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values. 

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 \= 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models. 

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits? 

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features.  We found Spearman’s ρ to be varied dramatically in both age-prediction (range\=.31-.94) and fluid cognition-prediction (range\=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.   

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model.  The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.  

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.  

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). 

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a budon to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go]. 

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the lec or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.” 

      Second, for MRI processing procedures, we included the following statements.

      From Methods:

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see hdps://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016). 

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “ 

      “Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliod et al., 2019; Fair et al., 2007; Gradon et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliod et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. 

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCPA collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was preprocessed and concatenated across the four runs.  We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC. 

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established preprocessing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey mader volume, “FS_Tot_WM_Vol” or total cortical white mader volume, “FS_SubCort_GM_Vol” or total subcortical grey mader volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and morecomplicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below). 

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘a’: the greater the a, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘ℓ! ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; ℓ! ratio=0) or absolute (known as ‘Lasso’; ℓ! ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and b is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: a using 70 numbers in log space, ranging from .1 and 100, and ℓ!-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘a’ and ‘ℓ! ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘a’ leads to similar predictive performance), resulting in different ‘a’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without spli{ng them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled acer data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikitlearn. Frontiers in Neuroinformatics, 8, 14. hdps://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. hdps://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Saderthwaite, T. D., … on behalf of the ISTAGING Consortium,  the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. hdps://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. hdps://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Saderthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pi alls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. hdps://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. hdps://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. hdps://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. hdps://doi.org/10.1111/j.16000587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. hdps://doi.org/10.1098/rstb.2017.0284

      Elliod, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffid, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. hdps://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. hdps://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. hdps://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. hdps://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175– 1187. hdps://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. hdps://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. hdps://doi.org/10.1093/cercor/bhu239

      Gradon, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. hdps://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fi{ng’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. hdps://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapredo, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. hdps://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. hdps://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. PaUerns, 4(4), 100712. hdps://doi.org/10.1016/j.pader.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. hdps://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. hdps://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. hdps://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. hdps://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. hdps://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. hdps://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Predenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. hdps://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. hdps://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Huder, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. hdps://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. hdps://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapredo, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. hdps://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. hdps://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. hdps://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. hdps://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain-cognition relationship: Integrating task-based fMRI across tasks markedly boosts prediction and test-retest reliability. NeuroImage, 263, 119588. hdps://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. hdps://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. hdps://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. hdps://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. hdps://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer #3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1) Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size<br /> The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns<br /> While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:<br /> (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We have added the confidence intervals for all measured correlations to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9), and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we have changed Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We have now highlighted these motivations more clearly in the Methods of the revised manuscript (Page 16, Lines 405-410).

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “(3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity. “

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As previously mentied in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem

      https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We have made this clearer in the Methods of the second revision (Page 15, Line 367).

      Our consistent results of group differences across all three EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Author response image 1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018). “

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “(3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11). “

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the second half of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach as mentioned by the editor. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below.

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally.

      Thank you for your comments on this issue.

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often leads to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to better understand the stacked regression models used to ensure that these models are not overfit.

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.

      From Methods: “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009).

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.”

      Second, for MRI processing procedures, we included the following statements.

      From Methods: “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “ Sets of Features 1-10: Task fMRI contrast (Task Contrast) Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “

      “ Sets of Features 11-13: Task fMRI functional connectivity (Task FC) Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task.

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCP-A collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was pre-processed and concatenated across the four runs. We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC.

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established pre-processing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey matter volume, “FS_Tot_WM_Vol” or total cortical white matter volume, “FS_SubCort_GM_Vol” or total subcortical grey matter volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘α’: the greater the α, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l1 ratio=0) or absolute (known as ‘Lasso’; l1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and β is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: α using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘α’ and ‘l1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘α’ leads to similar predictive performance), resulting in different ‘α’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices. “

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. https://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Comment:

      The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).

      Strengths:

      A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.

      We sincerely appreciate the reviewer’s recognition of our efforts in employing a multi-method approach, which integrates three complementary experimental paradigms, each leveraging distinct neurophysiological techniques to provide converging evidence.

      In Experiment 1, we found that the degree of inhibition in the pMTG and LIFG was strongly associated with the overlap in gesture-speech representations, as quantified by mutual information. Experiment 2 revealed the time-sensitive dynamics of the pMTG-LIFG circuit in processing both unisensory (gesture or speech) and multisensory information. Experiment 3, utilizing high-temporal-resolution EEG, independently replicated the temporal dynamics of gesture-speech integration observed in Experiment 2, further validating our findings.

      The striking convergence across these methodologically independent approaches significantly bolsters the robustness and generalizability of our conclusions regarding the neural mechanisms underlying multisensory integration.

      Comment 1: I thank the authors for their careful responses to my comments. However, I remain not convinced by their argumentation regarding the specificity of their spatial targeting and the time-windows that they used.

      The authors write that since they included a sham TMS condition, that the TMS selectively disrupted the IFG-pMTG interaction during specific time windows of the task related to gesture-speech semantic congruency. This to me does not show anything about the specificity of the time-windows itself, nor the selectivity of targeting in the TMS condition.

      (1) Selection of brain regions (IFG/pMTG)

      We thank the reviewer for their thoughtful consideration. The choice of the left IFG and pMTG as regions of interest (ROIs) was informed by a meta-analysis of fMRI studies on gesture-speech integration, which consistently identified these regions as critical hubs (see Author response table 1 for detailed studies and coordinates).

      Author response table 1.

      Meta-analysis of previous studies on gesture-speech integration.

      Based on the meta-analysis of previous studies, we selected the IFG and pMTG as ROIs for gesture-speech integration. The rationale for selecting these brain regions is outlined in the introduction in Lines 63-66: “Empirical studies have investigated the semantic integration between gesture and speech by manipulating their semantic relationship[15-18] and revealed a mutual interaction between them19-21 as reflected by the N400 latency and amplitude14 as well as common neural underpinnings in the left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG)[15,22,23].”

      And further described in Lines 77-78: “Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG”. And Lines 85-88: ‘Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to assess whether the activity of these regions was associated with relevant informational matrices”.

      In the Methods section, we clarified the selection of coordinates in Lines 194-200: “Building on a meta-analysis of prior fMRI studies examining gesture-speech integration[22], we targeted Montreal Neurological Institute (MNI) coordinates for the left IFG at (-62, 16, 22) and the pMTG at (-50, -56, 10). In the stimulation protocol for HD-tDCS, the IFG was targeted using electrode F7 as the optimal cortical projection site[36], with four return electrodes placed at AF7, FC5, F9, and FT9. For the pMTG, TP7 was selected as the cortical projection site[36], with return electrodes positioned at C5, P5, T9, and P9.”

      The selection of IFG or pMTG as integration hubs for gesture and speech has also been validated in our previous studies. Specifically, Zhao et al. (2018, J. Neurosci) applied TMS to both areas. Results demonstrated that disrupting neural activity in the IFG or pMTG via TMS selectively impaired the semantic congruency effect (reaction time costs due to semantic incongruence), while leaving the gender congruency effect unaffected.

      These findings identified the IFG and pMTG as crucial hubs for gesture-speech integration, guiding the selection of brain regions for our subsequent studies.

      (2) Selection of time windows

      The five key time windows (TWs) analyzed in this study were derived from our previous TMS work (Zhao et al., 2021, J. Neurosci), where we segmented the gesture-speech integration period (0–320 ms post-speech onset) into eight 40-ms windows. This interval aligns with established literature on gesture-speech integration, particularly the 200–300 ms window noted by the reviewer. As detailed in Lines (776-779): “Procedure of Experiment 2. Eight time windows (TWs, duration = 40 ms) were segmented in relative to the speech IP. Among the eight TWs, five (TW1, TW2, TW3, TW6, and TW7) were chosen based on the significant results in our prior study[23]. Double-pulse TMS was delivered over each of the TW of either the pMTG or the IFG”.

      In our prior work (Zhao et al., 2021, J. Neurosci), we employed a carefully controlled experimental design incorporating two key factors: (1) gesture-speech semantic congruency (serving as our primary measure of integration) and (2) gesture-speech gender congruency (implemented as a matched control factor). Using a time-locked, double-pulse TMS protocol, we systematically targeted each of the eight predefined time windows (TWs) within the left IFG, left pMTG, or vertex (serving as a sham control condition). Our results demonstrated that a TW-selective disruption of gesture-speech integration, indexed by the semantic congruency effect (i.e., a cost of reaction time because of semantic conflict), when stimulating the left pMTG in TW1, TW2, and TW7 but when stimulating the left IFG in TW3 and TW6. Crucially, no significant effects were observed during either sham stimulation or the controlled gender congruency factor (Figure 3 from Zhao et al., 2021, J. Neurosci).

      This triple dissociation - showing effects only for semantic integration, only in active stimulation, and only at specific time points - provides compelling causal evidence that IFG-pMTG connectivity plays a temporally precise role in gesture-speech integration.

      Noted that this work has undergone rigorous peer review by two independent experts who both endorsed our methodological approach. Their original evaluations, provided below:

      Reviewer 1: “significance: Using chronometric TMS-stimulation the data of this experiment suggests a feedforward information flow from left pMTG to left IFG followed by an information flow from left IFG back to the left pMTG.  The study is the first to provide causal evidence for the temporal dynamics of the left pMTG and left IFG found during gesture-speech integration.”

      Reviewer 2: “Beyond the new results the manuscript provides regarding the chronometrical interaction of the left inferior frontal gyrus and middle temporal gyrus in gesture-speech interaction, the study more basically shows the possibility of unfolding temporal stages of cognitive processing within domain-specific cortical networks using short-time interval double-pulse TMS. Although this method also has its limitations, a careful study planning as shown here and an appropiate discussion of the results can provide unique insights into cognitive processing.”

      References:

      Willems, R.M., Ozyurek, A., and Hagoort, P. (2009). Differential roles for left inferior frontal and superior temporal cortex in multimodal integration of action and language. Neuroimage 47, 1992-2004. 10.1016/j.neuroimage.2009.05.066.

      Drijvers, L., Jensen, O., and Spaak, E. (2021). Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Human Brain Mapping 42, 1138-1152. 10.1002/hbm.25282.

      Drijvers, L., and Ozyurek, A. (2018). Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditions. Brain and Language 177, 7-17. 10.1016/j.bandl.2018.01.003.

      Drijvers, L., van der Plas, M., Ozyurek, A., and Jensen, O. (2019). Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noise. Neuroimage 194, 55-67. 10.1016/j.neuroimage.2019.03.032.

      Holle, H., and Gunter, T.C. (2007). The role of iconic gestures in speech disambiguation: ERP evidence. J Cognitive Neurosci 19, 1175-1192. 10.1162/jocn.2007.19.7.1175.

      Kita, S., and Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 48, 16-32. 10.1016/S0749-596x(02)00505-3.

      Bernardis, P., and Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia 44, 178-190. 10.1016/j.neuropsychologia.2005.05.007.

      Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.

      Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.

      Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016.

      Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006

      Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013

      Comment 2: It could still equally well be the case that other regions or networks relevant for gesture-speech integration are targeted, and it can still be the case that these timewindows are not specific, and effects bleed into other time periods. There seems to be no experimental evidence here that this is not the case.

      The selection of IFG and pMTG as regions of interest was rigorously justified through multiple lines of evidence. First, a comprehensive meta-analysis of fMRI studies on gesture-speech integration consistently identified these regions as central nodes (see response to comment 1). Second, our own previous work (Zhao et al., 2018, JN; 2021, JN) provided direct empirical validation of their involvement. Third, by employing the same experimental paradigm, we minimized the likelihood of engaging alternative networks. Fourth, even if other regions connected to IFG or pMTG might be affected by TMS, the distinct engagement of specific time windows of IFG and pMTG minimizes the likelihood of consistent influence from other regions.

      Regarding temporal specificity, our 2021 study (Zhao et al., 2021, JN, see details in response to comment 1) systematically examined the entire 0-320ms integration window and found that only select time windows showed significant effects for gesture-speech semantic congruency, while remaining unaffected during gender congruency processing. This double dissociation (significant effects for semantic integration but not gender processing in specific windows) rules out broad temporal spillover.

      Comment 3: To be more specific, the authors write that double-pulse TMS has been widely used in previous studies (as found in their table). However, the studies cited in the table do not necessarily demonstrate the level of spatial and temporal specificity required to disentangle the contributions of tightly-coupled brain regions like the IFG and pMTG during the speech-gesture integration process. pMTG and IFG are located in very close proximity, and are known to be functionally and structurally interconnected, something that is not necessarily the case for the relatively large and/or anatomically distinct areas that the authors mention in their table.

      Our methodological approach is strongly supported by an established body of research employing double-pulse TMS (dpTMS) to investigate neural dynamics across both primary motor and higher-order cognitive regions. As documented in Author response table 1, multiple studies have successfully applied this technique to: (1) primary motor areas (tongue and lip representations in M1), and (2) semantic processing regions (including pMTG, PFC, and ATL). Particularly relevant precedents include:

      (1) Teige et al. (2018, Cortex): Demonstrated precise spatial and temporal specificity by applying 40ms-interval dpTMS to ATL, pMTG, and mid-MTG across multiple time windows (0-40ms, 125-165ms, 250-290ms, 450-490ms), revealing distinct functional contributions from ATL versus pMTG.

      (2) Vernet et al. (2015, Cortex): Successfully dissociated functional contributions of right IPS and DLPFC using 40ms-interval dpTMS, despite their anatomical proximity and functional connectivity.

      These studies confirm double-pulse TMS can discriminate interconnected nodes at short timescales. Our 2021 study further validated this for IFG-pMTG.

      Author response table 2.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      References:

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.

      Comment 4: But also more in general: The mere fact that these methods have been used in other contexts does not necessarily mean they are appropriate or sufficient for investigating the current research question. Likewise, the cognitive processes involved in these studies are quite different from the complex, multimodal integration of gesture and speech. The authors have not provided a strong theoretical justification for why the temporal dynamics observed in these previous studies should generalize to the specific mechanisms of gesture-speech integration..

      The neurophysiological mechanisms underlying double-pulse TMS (dpTMS) are well-characterized. While it is established that single-pulse TMS can produce brief artifacts (typically within 0–10 ms) due to transient cortical depolarization (Romero et al., 2019, NC), the dynamics of double-pulse TMS (dpTMS) involve more intricate inhibitory interactions. Specifically, the first pulse increases membrane conductance via GABAergic shunting inhibition, effectively lowering membrane resistance and attenuating the excitatory impact of the second pulse. This results in a measurable reduction in cortical excitability at the paired-pulse interval, as evidenced by suppressed motor evoked potentials (MEPs) (Paulus & Rothwell, 2016, J Physiol). Importantly, this neurophysiological mechanism is independent of cognitive domain and has been robustly demonstrated across multiple functional paradigms.

      In our study, we did not rely on previously reported timing parameters but instead employed a dpTMS protocol using a 40-ms inter-pulse interval. Based on the inhibitory dynamics of this protocol, we designed a sliding temporal window sufficiently broad to encompass the integration period of interest. This approach enabled us to capture and localize the critical temporal window associated with ongoing integrative processing in the targeted brain region.

      We acknowledge that the previous phrasing may have been ambiguous, a clearer and more detailed description of the dpTMS protocol has now been provided in Lines 88-92: “To this end, we employed chronometric double-pulse transcranial magnetic stimulation, which is known to transiently reduce cortical excitability at the inter-pulse interval]27]. Within a temporal period broad enough to capture the full duration of gesture–speech integration[28], we targeted specific timepoints previously implicated in integrative processing within IFG and pMTG [23].”

      References:

      Romero, M.C., Davare, M., Armendariz, M. et al. Neural effects of transcranial magnetic stimulation at the single-cell level. Nat Commun 10, 2642 (2019). https://doi.org/10.1038/s41467-019-10638-7

      Paulus W, Rothwell JC. Membrane resistance and shunting inhibition: where biophysics meets state-dependent human neurophysiology. J Physiol. 2016 May 15;594(10):2719-28. doi: 10.1113/JP271452. PMID: 26940751; PMCID: PMC4865581.

      Obermeier, C., & Gunter, T. C. (2015). Multisensory Integration: The Case of a Time Window of Gesture-Speech Integration. Journal of Cognitive Neuroscience, 27(2), 292-307. https://doi.org/10.1162/jocn_a_00688

      Comment 5: Moreover, the studies cited in the table provided by the authors have used a wide range of interpulse intervals, from 20 ms to 100 ms, suggesting that the temporal precision required to capture the dynamics of gesture-speech integration (which is believed to occur within 200-300 ms; Obermeier & Gunter, 2015) may not even be achievable with their 40 ms time windows.

      Double-pulse TMS has been empirically validated across neurocognitive studies as an effective method for establishing causal temporal relationships in cortical networks, with demonstrated sensitivity at timescales spanning 3-60 m. Our selection of a 40-ms interpulse interval represents an optimal compromise between temporal precision and physiological feasibility, as evidenced by its successful application in dissociating functional contributions of interconnected regions including ATL/pMTG (Teige et al., 2018) and IPS/DLPFC (Vernet et al., 2015). This methodological approach combines established experimental rigor with demonstrated empirical validity for investigating the precisely timed IFG-pMTG dynamics underlying gesture-speech integration, as shown in our current findings and prior work (Zhao et al., 2021).

      Our experimental design comprehensively sampled the 0-320 ms post-stimulus period, fully encompassing the critical 200-300 ms window associated with gesture-speech integration, as raised by the reviewer. Notably, our results revealed temporally distinct causal dynamics within this period: the significantly reduced semantic congruency effect emerged at IFG at 200-240ms, followed by feedback projections from IFG to pMTG at 240-280ms. This precisely timed interaction provides direct neurophysiological evidence for the proposed architecture of gesture-speech integration, demonstrating how these interconnected regions sequentially contribute to multisensory semantic integration.

      Comment 6: I do appreciate the extra analyses that the authors mention. However, my 5th comment is still unanswered: why not use entropy scores as a continous measure?

      Analysis with MI and entropy as continuous variables were conducted employing Representational Similarity Analysis (RSA) (Popal et.al, 2019). This analysis aimed to build a model to predict neural responses based on these feature metrics.

      To capture dynamic temporal features indicative of different stages of multisensory integration, we segmented the EEG data into overlapping time windows (40 ms in duration with a 10 ms step size). The 40 ms window was chosen based on the TMS protocol used in Experiment 2, which also employed a 40 ms time window. The 10 ms step size (equivalent to 5 time points) was used to detect subtle shifts in neural responses that might not be captured by larger time windows, allowing for a more granular analysis of the temporal dynamics of neural activity.

      Following segmentation, the EEG data were reshaped into a four-dimensional matrix (42 channels × 20 time points × 97 time windows × 20 features). To construct a neural similarity matrix, we averaged the EEG data across time points within each channel and each time window. The resulting matrix was then processed using the pdist function to compute pairwise distances between adjacent data points. This allowed us to calculate correlations between the neural matrix and three feature similarity matrices, which were constructed in a similar manner. These three matrices corresponded to (1) gesture entropy, (2) speech entropy, and (3) mutual information (MI). This approach enabled us to quantify how well the neural responses corresponded to the semantic dimensions of gesture and speech stimuli at each time window.

      To determine the significance of the correlations between neural activity and feature matrices, we conducted 1000 permutation tests. In this procedure, we randomized the data or feature matrices and recalculated the correlations repeatedly, generating a null distribution against which the observed correlation values were compared. Statistical significance was determined if the observed correlation exceeded the null distribution threshold (p < 0.05). This permutation approach helps mitigate the risk of spurious correlations, ensuring that the relationships between the neural data and feature matrices are both robust and meaningful.

      Finally, significant correlations were subjected to clustering analysis, which grouped similar neural response patterns across time windows and channels. This clustering allowed us to identify temporal and spatial patterns in the neural data that consistently aligned with the semantic features of gesture and speech stimuli, thus revealing the dynamic integration of these multisensory modalities across time. Results are as follows:

      (1)  Two significant clusters were identified for gesture entropy (Figure 1 left). The first cluster was observed between 60-110 ms (channels F1 and F3), with correlation coefficients (r) ranging from 0.207 to 0.236 (p < 0.001). The second cluster was found between 210-280 ms (channel O1), with r-values ranging from 0.244 to 0.313 (p < 0.001).

      (2)  For speech entropy (Figure 1 middle), significant clusters were detected in both early and late time windows. In the early time windows, the largest significant cluster was found between 10-170 ms (channels F2, F4, F6, FC2, FC4, FC6, C4, C6, CP4, and CP6), with r-values ranging from 0.151 to 0.340 (p = 0.013), corresponding to the P1 component (0-100 ms). In the late time windows, the largest significant cluster was observed between 560-920 ms (across the whole brain, all channels), with r-values ranging from 0.152 to 0.619 (p = 0.013).

      (3)  For mutual information (MI) (Figure 1 right), a significant cluster was found between 270-380 ms (channels FC1, FC2, FC3, FC5, C1, C2, C3, C5, CP1, CP2, CP3, CP5, FCz, Cz, and CPz), with r-values ranging from 0.198 to 0.372 (p = 0.001).

      Author response image 1.

      Results of RSA analysis.

      These additional findings suggest that even using a different modeling approach, neural responses, as indexed by feature metrics of entropy and mutual information, are temporally aligned with distinct ERP components and ERP clusters, as reported in the current manuscript. This alignment serves to further consolidate the results, reinforcing the conclusion we draw. Considering the length of the manuscript, we did not include these results in the current manuscript.

      Reference:

      Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to representational similarity analysis for social neuroscience. Social cognitive and affective neuroscience, 14(11), 1243-1253.

      Comment 7: In light of these concerns, I do not believe the authors have adequately demonstrated the spatial and temporal specificity required to disentangle the contributions of the IFG and pMTG during the gesture-speech integration process. While the authors have made a sincere effort to address the concerns raised by the reviewers, and have done so with a lot of new analyses, I remain doubtful that the current methodological approach is sufficient to draw conclusions about the causal roles of the IFG and pMTG in gesture-speech integration.

      To sum up:

      (1) Empirical validation from our prior work (Zhao et al., 2018,2021,JN): The selection of IFG and pMTG as target regions was informed by both: (1) a comprehensive meta-analysis of fMRI studies on gesture-speech integration, and (2) our own prior causal evidence from Zhao et al. (2018, J Neurosci), with detailed stereotactic coordinates provided in the attached Response to Editors and Reviewers letter. The temporal parameters were similarly grounded in empirical data from Zhao et al. (2021, J Neurosci), where we systematically examined eight consecutive 40-ms windows spanning the full integration period (0-320 ms). This study revealed a triple dissociation of effects - occurring exclusively during: (i)semantic integration (but not control tasks), (ii) active stimulation (but not sham), and (iii) specific time windows (but not all time windows)- providing robust causal evidence for the spatiotemporal specificity of IFG-pMTG interactions in gesture-speech processing. Notably, all reviewers recognized the methodological strength of this dpTMS approach in their evaluations (see attached JN assessment for details).

      (2) Convergent evidence from Experiment 3: Our study employed a multi-method approach incorporating three complementary experimental paradigms, each utilizing distinct neurophysiological techniques to provide converging evidence. Specifically, Experiment 3 implemented high-temporal-resolution EEG, which independently replicated the time-sensitive dynamics of gesture-speech integration observed in our double-pulse TMS experiments. The remarkable convergence between these methodologically independent approaches -demonstrating consistent temporal staging of IFG-pMTG interactions across both causal (TMS) and correlational (EEG) measures - significantly strengthens the validity and generalizability of our conclusions regarding the neural mechanisms underlying multisensory integration.

      (3) Established precedents in double-pulse TMS literature: The double-pulse TMS methodology employed in our study is firmly grounded in established neuroscience research. As documented in our detailed Response to Editors and Reviewers letter (citing 11 representative studies), dpTMS has been extensively validated for investigating causal temporal dynamics in cortical networks, with demonstrated sensitivity at timescales ranging from 3-60 ms. Particularly relevant precedents include: 1. Teige et al. (2018, Cortex) successfully dissociated functional contributions of anatomically proximal regions (ATL vs. pMTG vs.mid-MTG) using 40-ms-interval double-pulse TMS; 2. Vernet et al. (2015, Cortex) effectively distinguished neural processing in interconnected frontoparietal regions (right IPS vs. DLPFC) using 40-ms double-pulse TMS parameters. Both parameters are identical to those employed in our current study.

      (4) Neurophysiological Plausibility: The neurophysiological basis for the transient double-pulse TMS effects is well-established through mechanistic studies of TMS-induced cortical inhibition (Romero et al.,2019; Paulus & Rothwell, 2016).

      Taking together, we respectfully submit that our methodology provides robust support for our conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The work by Combrisson and colleagues investigates the degree to which reward and punishment learning signals overlap in the human brain using intracranial EEG recordings. The authors used information theory approaches to show that local field potential signals in the anterior insula and the three sub regions of the prefrontal cortex encode both reward and punishment prediction errors, albeit to different degrees. Specifically, the authors found that all four regions have electrodes that can selectively encode either the reward or the punishment prediction errors. Additionally, the authors analyzed the neural dynamics across pairs of brain regions and found that the anterior insula to dorsolateral prefrontal cortex neural interactions were specific for punishment prediction errors whereas the ventromedial prefrontal cortex to lateral orbitofrontal cortex interactions were specific to reward prediction errors. This work contributes to the ongoing efforts in both systems neuroscience and learning theory by demonstrating how two differing behavioral signals can be differentiated to a greater extent by analyzing neural interactions between regions as opposed to studying neural signals within one region.

      Strengths:

      The experimental paradigm incorporates both a reward and punishment component that enables investigating both types of learning in the same group of subjects allowing direct comparisons.

      The use of intracranial EEG signals provides much needed insight into the timing of when reward and punishment prediction errors signals emerge in the studied brain regions.

      Information theory methods provide important insight into the interregional dynamics associated with reward and punishment learning and allows the authors to assess that reward versus punishment learning can be better dissociated based on interregional dynamics over local activity alone.

      We thank the reviewer for this accurate summary. Please find below our answers to the weaknesses raised by the reviewer.

      Weaknesses:

      The analysis presented in the manuscript focuses solely on gamma band activity. The presence and potential relevance of other frequency bands is not discussed. It is possible that slow oscillations, which are thought to be important for coordinating neural activity across brain regions could provide additional insight.

      We thank the reviewer for pointing us to this missing discussion in the first version of the manuscript. We now made this point clearer in the Methods sections entitled “iEEG data analysis” and “Estimate of single-trial gamma-band activity”:

      “Here, we focused solely on broadband gamma for three main reasons. First, it has been shown that the gamma band activity correlates with both spiking activity and the BOLD fMRI signals (Lachaux et al., 2007; Mukamel et al., 2004; Niessing et al., 2005; Nir et al., 2007), and it is commonly used in MEG and iEEG studies to map task-related brain regions (Brovelli et al., 2005; Crone et al., 2006; Vidal et al., 2006; Ball et al., 2008; Jerbi et al., 2009; Darvas et al., 2010; Lachaux et al., 2012; Cheyne and Ferrari, 2013; Ko et al., 2013). Therefore, focusing on the gamma band facilitates linking our results with the fMRI and spiking literatures on probabilistic learning. Second, single-trial and time-resolved high-gamma activity can be exploited for the analysis of cortico-cortical interactions in humans using MEG and iEEG techniques (Brovelli et al., 2015; 2017; Combrisson et al., 2022). Finally, while previous analyses of the current dataset (Gueguen et al., 2021) reported an encoding of PE signals at different frequency bands, the power in lower frequency bands were shown to carry redundant information compared to the gamma band power.”

      The data is averaged across all electrodes which could introduce biases if some subjects had many more electrodes than others. Controlling for this variation in electrode number across subjects would ensure that the results are not driven by a small subset of subjects with more electrodes.

      We thank the reviewer for raising this important issue. We would like to point out that the gamma activity was not averaged across bipolar recordings within an area, nor measures of connectivity. Instead, we used a statistical approach proposed in a previous paper that combines non-parametric permutations with measures of information (Combrisson et al., 2022). As we explain in the “Statistical analysis” section, mutual information (MI) is estimated between PE signals and single-trial modulations in gamma activity separately for each contact (or for each pair of contacts). Then, a one-sample t-test is computed across all of the recordings of all subjects to form the effect size at the group-level. We will address the point of the electrode number in our answer below.

      The potential variation in reward versus punishment learning across subjects is not included in the manuscript. While the time course of reward versus punishment prediction errors is symmetrical at the group level, it is possible that some subjects show faster learning for one versus the other type which can bias the group average. Subject level behavioral data along with subject level electrode numbers would provide more convincing evidence that the observed effects are not arising from these potential confounds.

      We thank the reviewer for the two points raised. We performed additional analyses at the single-participant level to address the issues raised by the reviewer. We should note, however, that these results are descriptive and cannot be generalized to account for population-level effects. As suggested by the reviewer, we prepared two new figures. The first supplementary figure summarizes the number of participants that had iEEG contacts per brain region and pair of brain regions (Fig. S1A in the Appendix). It can be seen that the number of participants sampled in different brain regions is relatively constant (left panel) and the number of participants with pairs of contacts across brain regions is relatively homogeneous, ranging from 7 to 11 (right panel). Fig. S1B shows the number of bipolar derivations per subject and per brain region.

      Author response image 1.

      Single subject anatomical repartition. (A) Number of unique subject per brain region and per pair of brain regions (B) Number of bipolar derivations per subject and per brain region

      The second supplementary figure describes the estimated prediction error for rewarding and punishing trials for each subject (Fig. S2). The single-subject error bars represent the 95th percentile confidence interval estimated using a bootstrap approach across the different pairs of stimuli presented during the three to six sessions. As the reviewer anticipated, there are indeed variations across subjects, but we observe that RPE and PPE are relatively symmetrical, even at the subject level, and tend toward zero around trial number 10. These results therefore corroborate the patterns observed at the group-level.

      Author response image 2.

      Single-subject estimation of predictions errors. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval.

      Finally, to assess the variability of local encoding of prediction errors across participants, we quantified the proportion of subjects having at least one significant bipolar derivation encoding either the RPE or PPE (Fig. S4). As expected, we found various proportions of unique subjects with significant R/PPE encoding per region. The lowest proportion was achieved in the ventromedial prefrontal cortex (vmPFC) and lateral orbitofrontal cortex (lOFC) for encoding PPE and RPE, respectively, with approximately 30% of the subjects having the effect. Conversely, we found highly reproducible encodings in the anterior insula (aINS) and dorsolateral prefrontal cortex (dlPFC) with a maximum of 100% of the 9 subjects having at least one bipolar derivation encoding PPE in the dlPFC.

      Author response image 3.

      Taken together, we acknowledge a certain variability per region and per condition. Nevertheless, the results presented in the supplementary figures suggest that the main results do not arise from a minority of subjects.

      We would like to point out that in order to assess across-subject variability, a much larger number of participants would have been needed, given the low signal-to-noise ratios observed at the single-participant level. We thus prefer to add these results as supplementary material in the Appendix, rather than in the main text.

      It is unclear if the findings in Figures 3 and 4 truly reflect the differential interregional dynamics in reward versus punishment learning or if these results arise as a statistical byproduct of the reward vs punishment bias observed within each region. For instance, the authors show that information transfer from anterior insula to dorsolateral prefrontal cortex is specific to punishment prediction error. However, both anterior insula and dorsolateral prefrontal cortex have higher prevalence of punishment prediction error selective electrodes to begin with. Therefore the findings in Fig 3 may simply be reflecting the prevalence of punishment specificity in these two regions above and beyond a punishment specific neural interaction between the two regions. Either mathematical or analytical evidence that assesses if the interaction effect is simply reflecting the local dynamics would be important to make this result convincing.

      This is an important point that we partly addressed in the manuscript. More precisely, we investigated whether the synergistic effects observed between the dlPFC and vmPFC encoding global PEs (Fig. 5) could be explained by their respective local specificity. Indeed, since we reported larger proportions of recordings encoding the PPE in the dlPFC and the RPE in the vmPFC (Fig. 2B), we checked whether the synergy between dlPFC and vmPFC could be mainly due to complementary roles where the dlPFC brings information about the PPE only and the vmPFC brings information to the RPE only. To address this point, we selected PPE-specific bipolar derivations from the dlPFC and RPE-specific from the vmPFC and, as the reviewer predicted, we found synergistic II between the two regions probably mainly because of their respective specificity. In addition, we included the II estimated between non-selective bipolar derivations (i.e. recordings with significant encoding for both RPE and PPE) and we observed synergistic interactions (Fig. 5C and Fig. S9). Taken together, the local specificity certainly plays a role, but this is not the only factor in defining the type of interactions.

      Concerning the interaction information results (II, Fig. 3), several lines of evidence suggest that local specificity cannot account alone for the II effects. For example, the local specificity for PPE is observed across all four areas (Fig. 2A) and the percentage of bipolar derivations displaying an effect is large (equal or above 10%) for three brain regions (aINS, dlPLF and lOFC). If the local specificity were the main driving cause, we would have observed significant redundancy between all pairs of brain regions. On the other hand, the interaction between the aINS and lOFC displayed no significant redundant effect (Fig. 3B). Another example is the result observed in lOFC: approximately 30% of bipolar derivations display a selectivity for PPE (Fig. 2B, third panel from the left), but do not show clear signs of redundant encoding at the level of within-area interactions (Fig. 3A, bottom-left panel). Similarly, the local encoding for RPE is observed across all four brain regions (Fig. 2A) and the percentage of bipolar derivations displaying an effect is large (equal or above 10%) for three brain regions (aINS, dlPLF and vmPFC). Nevertheless, significant between-regions interactions have been observed only between the lOFC and vmPFC (Fig. 3B bottom right panel).

      To further support the reasoning, we performed a simulation to show that it is possible to observe synergistic interactions between two regions with the same specificity. As an example, we may consider one region locally encoding early trials of RPE and a second region encoding the late trials of the RPE. Combining the two with the II would lead to synergistic interactions, because each one of them carries information that is not carried by the other. To illustrate this point, we simulated the data of two regions (x and y). To simulate redundant interactions (first row), each region receives a copy of the prediction (one-to-all) and for the synergy (second row), x and y receive early and late PE trials, respectively (all-to-one). This toy example illustrates that the local specificity is not the only factor determining the type of their interactions. We added the following result to the Appendix.

      Author response image 4.

      Local specificity does not fully determine the type of interactions. Within-area local encoding of PE using the mutual information (MI, in bits) for regions X and Y and between-area interaction information (II, in bits) leading to (A) redundant interactions and (B) synergistic interactions about the PE

      Regarding the information transfer results (Fig. 4), similar arguments hold and suggest that the prevalence is not the main factor explaining the arising transfer entropy between the anterior insula (aINS) and dorsolateral prefrontal cortex (dlPFC). Indeed, the lOFC has a strong local specificity for PPE, but the transfer entropy between the lOFC and aINS (or dlPFC) is shown in Fig. S7 does not show significant differences in encoding between PPE and RPE.

      Indeed, such transfer can only be found when there is a delay between the gamma activity of the two regions. In this example, the transfer entropy quantifies the amount of information shared between the past activity of the aINS and the present activity of the dlPFC conditioned on the past activity of the dlPFC. The conditioning ensures that the present activity of the dlPFC is not only explained by its own past. Consequently, if both regions exhibit various prevalences toward reward and punishment but without delay (i.e. at the same timing), the transfer entropy would be null because of the conditioning. As a fact, between 10 to -20% of bipolar recordings show a selectivity to the reward PE (represented by a proportion of 40-60% of subjects, Fig.S4). However, the transfer entropy estimated from the aINS to the dlPFC across rewarding trials is flat and clearly non-significant. If the transfer entropy was a byproduct of the local specificity then we should observe an increase, which is not the case here.

      Reviewer #2:

      Summary:

      Reward and punishment learning have long been seen as emerging from separate networks of frontal and subcortical areas, often studied separately. Nevertheless, both systems are complimentary and distributed representations of rewards and punishments have been repeatedly observed within multiple areas. This raised the unsolved question of the possible mechanisms by which both systems might interact, which this manuscript went after. The authors skillfully leveraged intracranial recordings in epileptic patients performing a probabilistic learning task combined with model-based information theoretical analyses of gamma activities to reveal that information about reward and punishment was not only distributed across multiple prefrontal and insular regions, but that each system showed specific redundant interactions. The reward subsystem was characterized by redundant interactions between orbitofrontal and ventromedial prefrontal cortex, while the punishment subsystem relied on insular and dorsolateral redundant interactions. Finally, the authors revealed a way by which the two systems might interact, through synergistic interaction between ventromedial and dorsolateral prefrontal cortex.

      Strengths:

      Here, the authors performed an excellent reanalysis of a unique dataset using innovative approaches, pushing our understanding on the interaction at play between prefrontal and insular cortex regions during learning. Importantly, the description of the methods and results is truly made accessible, making it an excellent resource to the community.

      This manuscript goes beyond what is classically performed using intracranial EEG dataset, by not only reporting where a given information, like reward and punishment prediction errors, is represented but also by characterizing the functional interactions that might underlie such representations. The authors highlight the distributed nature of frontal cortex representations and propose new ways by which the information specifically flows between nodes. This work is well placed to unify our understanding of the complementarity and specificity of the reward and punishment learning systems.

      We thank the reviewer for the positive feedback. Please find below our answers to the weaknesses raised by the reviewer.

      Weaknesses:

      The conclusions of this paper are mostly supported by the data, but whether the findings are entirely generalizable would require further information/analyses.

      First, the authors found that prediction errors very quickly converge toward 0 (less than 10 trials) while subjects performed the task for sets of 96 trials. Considering all trials, and therefore having a non-uniform distribution of prediction errors, could potentially bias the various estimates the authors are extracting. Separating trials between learning (at the start of a set) and exploiting periods could prove that the observed functional interactions are specific to the learning stages, which would strengthen the results.

      We thank the reviewer for this question. We would like to note that the probabilistic nature of the learning task does not allow a strict distinction between the exploration and exploitation phases. Indeed, the probability of obtaining the less rewarding outcome was 25% (i.e., for 0€ gain in the reward learning condition and -1€ loss in the punishment learning condition). Thus, participants tended to explore even during the last set of trials in each session. This is evident from the average learning curves shown in Fig. 1B of (Gueguen et al., 2021). Learning curves show rates of correct choice (75% chance of 1€ gain) in the reward condition (blue curves) and incorrect choice (75% chance of 1€ loss) in the punishment condition (red curves).

      For what concerns the evolution of PEs, as reviewer #1 suggested, we added a new figure representing the single-subject estimates of the R/PPE (Fig S2). Here, the confidence interval is obtained across all pairs of stimuli presented during the different sessions. We retrieved the general trend of the R/PPE converging toward zero around 10 trials. Both average reward and punishment prediction errors converge toward zero in approximately 10 trials, single-participant curves display large variability, also at the end of each session. As a reminder, the 96 trials represent the total number of trials for one session for the four pairs and the number of trials for each stimulus was only 24.

      Author response image 5.

      Single-subject estimation of predictions errors. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval

      However, the convergence of the R/PPE is due to the average across the pairs of stimuli. In the figure below, we superimposed the estimated R/PPE, per pair of stimuli, for each subject. It becomes very clear that high values of PE can be reached, even for late trials. Therefore, we believe that the split into early/late trials because of the convergence of PE is far from being trivial.

      Author response image 6.

      Single-subject estimation of predictions errors per pair of stimuli. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red)

      Consequently, nonzero PRE and PPE occur during the whole session and separating trials between learning (at the start of a set) and exploiting periods, as suggested by the reviewer, does not allow a strict dissociation between learning vs no-learning. Nevertheless, we tested the analysis proposed by the reviewer, at the local level. We splitted the 24 trials of each pair of stimuli into early, middle and late trials (8 trials each). We then reproduced Fig. 2 by computing the mutual information between the gamma activity and the R/PPE for subsets of trials: early (first row) and late trials (second row). We retrieved significant encoding of both R/PPE in the aINS, dlPFC and lOFC in both early and late trials. The vmPFC also showed significant encoding of both during early trials. The only difference emerges in the late trials of the vmPFC where we found a strong encoding of the RPE only. It should also be noted that here since we are sub-selecting the trials, the statistical analyses are only performed using a third of the trials.

      Taken together, the combination of high values of PE achieved even for late trials and the fact that most of the findings are reproduced even with a third of the trials does not justify the split into early and late trials here. Crucially, this latest analysis confirms that the neural correlates of learning that we observed reflect PE signals rather than early versus late trials in the session.

      Author response image 7.

      MI between gamma activity and R/PPE using early and late trials. Time courses of MI estimated between the gamma power and both RPE (blue) and PPE (red) using either early or late trials (first and second row, respectively). Horizontal thick lines represent significant clusters of information (p<0.05, cluster-based correction, non-parametric randomization across epochs).

      Importantly, it is unclear whether the results described are a common feature observed across subjects or the results of a minority of them. The authors should report and assess the reliability of each result across subjects. For example, the authors found RPE-specific interactions between vmPFC and lOFC, even though less than 10% of sites represent RPE or both RPE/PPE in lOFC. It is questionable whether such a low proportion of sites might come from different subjects, and therefore whether the interactions observed are truly observed in multiple subjects. The nature of the dataset obviously precludes from requiring all subjects to show all effects (given the known limits inherent to intracerebral recording in patients), but it should be proven that the effects were reproducibly seen across multiple subjects.

      We thank the reviewer for this remark that has also been raised by the first reviewer. This issue was raised by the first reviewer. Indeed, we added a supplementary figure describing the number of unique subjects per brain region and per pair of brain regions (Fig. S1A) such as the number of bipolar derivations per region and per subject (Fig. S1B).

      Author response image 8.

      Single subject anatomical repartition. (A) Number of unique subject per brain region and per pair of brain regions (B) Number of bipolar derivations per subject and per brain region

      Regarding the reproducibility of the results across subjects for the local analysis (Fig. 2), we also added the instantaneous proportion of subjects having at least one bipolar derivation showing a significant encoding of the RPE and PPE (Fig. S4). We found a minimum proportion of approximately 30% of unique subjects having the effect in the lOFC and vmPFC, respectively with the RPE and PPE. On the other hand, both the aINS and dlPFC showed between 50 to 100% of the subjects having the effect. Therefore, local encoding of RPE and PPE was never represented by a single subject.

      Author response image 9.

      Similarly, we performed statistical analysis on interaction information at the single-subject level and counted the proportion of unique subjects having at least one pair of recordings with significant redundant and synergistic interactions about the RPE and PPE (Fig. S5). Consistently with the results shown in Fig. 3, the proportions of significant redundant and synergistic interactions are negative and positive, respectively. For the within-regions interactions, approximately 60% of the subjects with redundant interactions are about R/PPE in the aINS and about the PPE in the dlPFC and 40% about the RPE in the vmPFC. For the across-regions interactions, 60% of the subjects have redundant interactions between the aINS-dlPFC and dlPFC-lOFC about the PPE, and 30% have redundant interactions between lOFC-vmPFC about the RPE. Globally, we reproduced the main results shown in Fig. 3.

      Author response image 10.

      Inter-subjects reproducibility of redundant interactions about PE signals. Time-courses of proportion of subjects having at least one pair of bipolar derivation with a significant interaction information (p<0.05, cluster-based correction, non-parametric randomization across epochs) about the RPE (blue) or PPE (red). Data are aligned to the outcome presentation (vertical line at 0 seconds). Proportion of subjects with redundant (solid) and synergistic (dashed) interactions are respectively going downward and upward.

      Finally, the timings of the observed interactions between areas preclude one of the authors' main conclusions. Specifically, the authors repeatedly concluded that the encoding of RPE/PPE signals are "emerging" from redundancy-dominated prefrontal-insular interactions. However, the between-region information and transfer entropy between vmPFC and lOFC for example is observed almost 500ms after the encoding of RPE/PPE in these regions, questioning how it could possibly lead to the encoding of RPE/PPE. It is also noteworthy that the two information measures, interaction information and transfer entropy, between these areas happened at non overlapping time windows, questioning the underlying mechanism of the communication at play (see Figures 3/4). As an aside, when assessing the direction of information flow, the authors also found delays between pairs of signals peaking at 176ms, far beyond what would be expected for direct communication between nodes. Discussing this aspect might also be of importance as it raises the possibility of third-party involvement.

      The local encoding of RPE in the vmPFC and lOFC is observed in a time interval ranging from approximately 0.2-0.4s to 1.2-1.4s after outcome presentation (blue bars in Fig. 2A). The encoding of RPE by interaction information covers a time interval from approximately 1.1s to 1.5s (blue bars in Fig. 3B, bottom right panel). Similarly, significant TE modulations between the vmPFC and lOFC specific for PPE occur mainly in the 0.7s-1.1s range. Thus, it seems that the local encoding of PPE precedes the effects observed at the level of the neural interactions (II and TE). On the other hand, the modulations in MI, II and TE related to PPE co-occur in a time window from 0.2s to 0.7s after outcome presentation. Thus, we agree with the reviewer that a generic conclusion about the potential mechanisms relating the three levels of analysis cannot be drawn. We thus replaced the term “emerge from” by “occur with” from the manuscript which may be misinterpreted as hinting at a potential mechanism. We nevertheless concluded that the three levels of analysis (and phenomena) co-occur in time, thus hinting at a potential across-scales interaction that needs further study. Indeed, our study suggests that further work, beyond the scope of the current study, is required to better understand the interaction between scales.

      Regarding the delay for the conditioning of the transfer entropy, the value of 176 ms reflects the delay at which we observed a maximum of transfer entropy. However, we did not use a single delay for conditioning, we used every possible delay between [116, 236] ms, as explained in the Method section. We would like to stress that transfer entropy is a directed metric of functional connectivity, and it can only be interpreted as quantifying statistical causality defined in terms of predictacìbility according to the Wiener-Granger principle, as detailed in the methods. Thus, it cannot be interpreted in Pearl’s causal terms and as indexing any type of direct communication between nodes. This is a known limitation of the method, which has been stressed in past literature and that we believe does not need to be addressed here.

      To account for this, we revised the discussion to make sure this issue is addressed in the following paragraph:

      “Here, we quantified directional relationships between regions using the transfer entropy (Schreiber, 2000), which is a functional connectivity measure based on the Granger-Wiener causality principle. Tract tracing studies in the macaque have revealed strong interconnections between the lOFC and vmPFC in the macaque (Carmichael and Price, 1996; Öngür and Price, 2000). In humans, cortico-cortical anatomical connections have mainly been investigated using diffusion magnetic resonance imaging (dMRI). Several studies found strong probabilities of structural connectivity between the anterior insula with the orbitofrontal cortex and dorsolateral part of the prefrontal cortex (Cloutman et al., 2012; Ghaziri et al., 2017), and between the lOFC and vmPFC (Heather Hsu et al., 2020). In addition, the statistical dependency (e.g. coherence) between the LFP of distant areas could be potentially explained by direct anatomical connections (Schneider et al., 2021; Vinck et al., 2023). Taken together, the existence of an information transfer might rely on both direct or indirect structural connectivity. However, here we also reported differences of TE between rewarding and punishing trials given the same backbone anatomical connectivity (Fig. 4). [...] “

      Reviewer #3:

      Summary:

      The authors investigated that learning processes relied on distinct reward or punishment outcomes in probabilistic instrumental learning tasks were involved in functional interactions of two different cortico-cortical gamma-band modulations, suggesting that learning signals like reward or punishment prediction errors can be processed by two dominated interactions, such as areas lOFC-vmPFC and areas aINS-dlPFC, and later on integrated together in support of switching conditions between reward and punishment learning. By performing the well-known analyses of mutual information, interaction information, and transfer entropy, the conclusion was accomplished by identifying directional task information flow between redundancy-dominated and synergy-dominated interactions. Also, this integral concept provided a unifying view to explain how functional distributed reward and/or punishment information were segregated and integrated across cortical areas.

      Strengths:

      The dataset used in this manuscript may come from previously published works (Gueguen et al., 2021) or from the same grant project due to the methods. Previous works have shown strong evidence about why gamma-band activities and those 4 areas are important. For further analyses, the current manuscript moved the ideas forward to examine how reward/punishment information transfer between recorded areas corresponding to the task conditions. The standard measurements such mutual information, interaction information, and transfer entropy showed time-series activities in the millisecond level and allowed us to learn the directional information flow during a certain window. In addition, the diagram in Figure 6 summarized the results and proposed an integral concept with functional heterogeneities in cortical areas. These findings in this manuscript will support the ideas from human fMRI studies and add a new insight to electrophysiological studies with the non-human primates.

      We thank the reviewer for the summary such as for highlighting the strengths. Please find below our answers regarding the weaknesses of the manuscript.

      Weaknesses:

      After reading through the manuscript, the term "non-selective" in the abstract confused me and I did not actually know what it meant and how it fits the conclusion. If I learned the methods correctly, the 4 areas were studied in this manuscript because of their selective responses to the RPE and PPE signals (Figure 2). The redundancy- and synergy-dominated subsystems indicated that two areas shared similar and complementary information, respectively, due to the negative and positive value of interaction information (Page 6). For me, it doesn't mean they are "non-selective", especially in redundancy-dominated subsystem. I may miss something about how you calculate the mutual information or interaction information. Could you elaborate this and explain what the "non-selective" means?

      In the study performed by Gueguen et al. in 2021, the authors used a general linear model (GLM) to link the gamma activity to both the reward and punishment prediction errors and they looked for differences between the two conditions. Here, we reproduced this analysis except that we used measures from the information theory (mutual information) that were able to capture linear and non-linear relationships (although monotonic) between the gamma activity and the prediction errors. The clusters we reported reflect significant encoding of either the RPE and/or the PPE. From Fig. 2, it can be seen that the four regions have a gamma activity that is modulated according to both reward and punishment PE. We used the term “non-selective”, because the regions did not encode either one or the other, but various proportions of bipolar derivations encoding either one or both of them.

      The directional information flows identified in this manuscript were evidenced by the recording contacts of iEEG with levels of concurrent neural activities to the task conditions. However, are the conclusions well supported by the anatomical connections? Is it possible that the information was transferred to the target via another area? These questions may remain to be elucidated by using other approaches or animal models. It would be great to point this out here for further investigation.

      We thank the reviewer for this interesting question. We added the following paragraph to the discussion to clarify the current limitations of the transfer entropy and the link with anatomical connections :

      “Here, we quantified directional relationships between regions using the transfer entropy (Schreiber, 2000), which is a functional connectivity measure based on the Granger-Wiener causality principle. Tract tracing studies in the macaque have revealed strong interconnections between the lOFC and vmPFC in the macaque (Carmichael and Price, 1996; Öngür and Price, 2000). In humans, cortico-cortical anatomical connections have mainly been investigated using diffusion magnetic resonance imaging (dMRI). Several studies found strong probabilities of structural connectivity between the anterior insula with the orbitofrontal cortex and dorsolateral part of the prefrontal cortex (Cloutman et al., 2012; Ghaziri et al., 2017), and between the lOFC and vmPFC (Heather Hsu et al., 2020). In addition, the statistical dependency (e.g. coherence) between the LFP of distant areas could be potentially explained by direct anatomical connections (Schneider et al., 2021). Taken together, the existence of an information transfer might rely on both direct or indirect structural connectivity. However, here we also reported differences of TE between rewarding and punishing trials given the same backbone anatomical connectivity (Fig. 4). Our results are further supported by a recent study involving drug-resistant epileptic patients with resected insula who showed poorer performance than healthy controls in case of risky loss compared to risky gains (Von Siebenthal et al., 2017).”

      References

      Carmichael ST, Price J. 1996. Connectional networks within the orbital and medial prefrontal cortex of macaque monkeys. J Comp Neurol 371:179–207.

      Cloutman LL, Binney RJ, Drakesmith M, Parker GJM, Lambon Ralph MA. 2012. The variation of function across the human insula mirrors its patterns of structural connectivity: Evidence from in vivo probabilistic tractography. NeuroImage 59:3514–3521. oi:10.1016/j.neuroimage.2011.11.016

      Combrisson E, Allegra M, Basanisi R, Ince RAA, Giordano BL, Bastin J, Brovelli A. 2022. Group-level inference of information-based measures for the analyses of cognitive brain networks from neurophysiological data. NeuroImage 258:119347. doi:10.1016/j.neuroimage.2022.119347

      Ghaziri J, Tucholka A, Girard G, Houde J-C, Boucher O, Gilbert G, Descoteaux M, Lippé S, Rainville P, Nguyen DK. 2017. The Corticocortical Structural Connectivity of the Human Insula. Cereb Cortex 27:1216–1228. doi:10.1093/cercor/bhv308

      Gueguen MCM, Lopez-Persem A, Billeke P, Lachaux J-P, Rheims S, Kahane P, Minotti L, David O, Pessiglione M, Bastin J. 2021. Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans. Nat Commun 12:3344. doi:10.1038/s41467-021-23704-w

      Heather Hsu C-C, Rolls ET, Huang C-C, Chong ST, Zac Lo C-Y, Feng J, Lin C-P. 2020. Connections of the Human Orbitofrontal Cortex and Inferior Frontal Gyrus. Cereb Cortex 30:5830–5843. doi:10.1093/cercor/bhaa160

      Lachaux J-P, Fonlupt P, Kahane P, Minotti L, Hoffmann D, Bertrand O, Baciu M. 2007. Relationship between task-related gamma oscillations and BOLD signal: new insights from combined fMRI and intracranial EEG. Hum Brain Mapp 28:1368–1375. doi:10.1002/hbm.20352

      Mukamel R, Gelbard H, Arieli A, Hasson U, Fried I, Malach R. 2004. Coupling Between Neuronal Firing, Field Potentials, and fMRI in Human Auditory Cortex. Cereb Cortex 14:881.

      Niessing J, Ebisch B, Schmidt KE, Niessing M, Singer W, Galuske RA. 2005. Hemodynamic signals correlate tightly with synchronized gamma oscillations. science 309:948–951.

      Nir Y, Fisch L, Mukamel R, Gelbard-Sagiv H, Arieli A, Fried I, Malach R. 2007. Coupling between neuronal firing rate, gamma LFP, and BOLD fMRI is related to interneuronal correlations. Curr Biol 17:1275–1285.

      Öngür D, Price JL. 2000. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex 10:206–219.

      Schneider M, Broggini AC, Dann B, Tzanou A, Uran C, Sheshadri S, Scherberger H, Vinck M. 2021. A mechanism for inter-areal coherence through communication based on connectivity and oscillatory power. Neuron 109:4050-4067.e12. doi:10.1016/j.neuron.2021.09.037

      Schreiber T. 2000. Measuring information transfer. Phys Rev Lett 85:461.

      Von Siebenthal Z, Boucher O, Rouleau I, Lassonde M, Lepore F, Nguyen DK. 2017. Decision-making impairments following insular and medial temporal lobe resection for drug-resistant epilepsy. Soc Cogn Affect Neurosci 12:128–137. doi:10.1093/scan/nsw152

      Recommendations for the authors

      Reviewer #1

      (1) Overall, the writing of the manuscript is dense and makes it hard to follow the scientific logic and appreciate the key findings of the manuscript. I believe the manuscript would be accessible to a broader audience if the authors improved the writing and provided greater detail for their scientific questions, choice of analysis, and an explanation of their results in simpler terms.

      We extensively modified the introduction to better describe the rationale and research question.

      (2) In the introduction the authors state "we hypothesized that reward and punishment learning arise from complementary neural interactions between frontal cortex regions". This stated hypothesis arrives rather abruptly after a summary of the literature given that the literature summary does not directly inform their stated hypothesis. Put differently, the authors should explicitly state what the contradictions and/or gaps in the literature are, and what specific combinations of findings guide them to their hypothesis. When the authors state their hypothesis the reader is still left asking: why are the authors focusing on the frontal regions? What do the authors mean by complementary interactions? What specific evidence or contradiction in the literature led them to hypothesize that complementary interactions between frontal regions underlie reward and punishment learning?

      We extensively modified the introduction and provided a clearer description of the brain circuits involved and the rationale for searching redundant and synergistic interactions between areas.

      (3) Related to the above point: when the authors subsequently state "we tested whether redundancy- or synergy dominated interactions allow the emergence of collective brain networks differentially supporting reward and punishment learning", the Introduction (up to the point of this sentence) has not been written to explain the synergy vs. redundancy framework in the literature and how this framework comes into play to inform the authors' hypothesis on reward and punishment learning.

      We extensively modified the introduction and provided a clearer description of redundant and synergistic interactions between areas.

      (4) The explanation of redundancy vs synergy dominated brain networks itself is written densely and hard to follow. Furthermore, how this framework informs the question on the neural substrates of reward versus punishment learning is unclear. The authors should provide more precise statements on how and why redundancy vs. synergy comes into play in reward and punishment learning. Put differently, this redundancy vs. synergy framework is key for understanding the manuscript and the introduction is not written clearly enough to explain the framework and how it informs the authors' hypothesis and research questions on the neural substrates of reward vs. punishment learning.

      Same as above

      (5) While the choice of these four brain regions in context of reward and punishment learning does makes sense, the authors do not outline a clear scientific justification as to why these regions were selected in relation to their question.

      Same as above

      (6) Could the authors explain why they used gamma band power (as opposed to or in addition to the lower frequency bands) to investigate MI. Relatedly, when the authors introduce MI analysis, it would be helpful to briefly explain what this analysis measures and why it is relevant to address the question they are asking.

      Please see our answer to the first public comment. We added a paragraph to the discussion section to justify our choice of focusing on the gamma band only. We added the following sentence to the result section to justify our choice for using mutual-information:

      The MI allowed us to detect both linear and non-linear relationships between the gamma activity and the PE

      An extended explanation justifying our choice for the MI was already present in the method section.

      (7) The authors state that "all regions displayed a local "probabilistic" encoding of prediction errors with temporal dynamics peaking around 500 ms after outcome presentation". It would be helpful for the reader if the authors spelled out what they mean by probabilistic in this context as the term can be interpreted in many different ways.

      We agree with the reviewer that the term “probabilistic” can be interpreted in different ways. In the revised manuscript we changed “probabilistic” for “mixed”.

      (8) The authors should include a brief description of how they compute RPE and PPE in the beginning of the relevant results section.

      The explanation of how we estimated the PE is already present in the result section: “We estimated trial-wise prediction errors by fitting a Q-learning model to behavioral data. Fitting the model consisted in adjusting the constant parameters to maximize the likelihood of observed choices etc.”

      (9) It is unclear from the Methods whether the authors have taken any measures to address the likely difference in the number of electrodes across subjects. For example, it is likely that some subjects have 10 electrodes in vmPFC while others may have 20. In group analyses, if the data is simply averaged across all electrodes then each subject contributes a different number of data points to the analysis. Hence, a subject with more electrodes can bias the group average. A starting point would be to state the variation in number of electrodes across subjects per brain region. If this variation is rather small, then simple averaging across electrodes might be justified. If the variation is large then one idea would be to average data across electrodes within subjects prior to taking the group average or use a resampling approach where the minimum number of electrodes per brain area is subsampled.

      We addressed this point in our public answers. As a reminder, the new version of the manuscript contains a figure showing the number of unique patients per region, the PE at per participant level together with local-encoding at the single participant level.

      (10) One thing to consider is whether the reward and punishment in the task is symmetrical in valence. While 1$ increase and 1$ decrease is equivalent in magnitude, the psychological effect of the positive (vs. the negative) outcome may still be asymmetrical and the direction and magnitude of this asymmetry can vary across individuals. For instance, some subjects may be more sensitive to the reward (over punishment) while others are more sensitive to the punishment (over reward). In this scenario, it is possible that the differentiation observed in PPE versus RPE signals may arise from such psychological asymmetry rather than the intrinsic differences in how certain brain regions (and their interactions) may encode for reward vs punishment. Perhaps the authors can comment on this possibility, and/or conduct more in depth behavioral analysis to determine if certain subjects adjust their choice behavior faster in response to reward vs. punishment contexts.

      While it could be possible that individuals display different sensitivities vis-à-vis positive and negative prediction errors (and, indeed, a vast body of human reinforcement learning literature seems to point in this direction; Palminteri & Lebreton, 2022), it is unclear to us how such differences would explain into the recruitment of anatomically distinct areas reward and punishment prediction errors. It is important to note here that our design partially orthogonalized positive and reward vs. negative and punishment PEs, because the neutral outcome can generate both positive and negative prediction errors, as a function of the learning context (reward-seeking and punishment avoidance). Back to the main question, for instance, Lefebvre et al (2017) investigated with fMRI the neural correlates of reward prediction errors only and found that inter-individual differences in learning rates for positive and negative prediction errors correlated with differences in the degree of striatal activation and not with the recruitment of different areas. To sum up, while we acknowledge that individuals may display different sensitivity to prediction errors (and reward magnitudes), we believe that such differences should translated in difference in the degree of activation of a given system (the reward systems vs the punishment one) rather than difference in neural system recruitment

      (11) As summarized in Fig 6, the authors show that information transfer between aINS to dlPFC was PPE specific whereas the information transfer between vmPFC to lOFC was RPE specific. What is unclear is if these findings arise as an inevitable statistical byproduct of the fact that aINS has high PPE-specificity and that vmPFC has high RPE-specificity. In other words, it is possible that the analysis in Fig 3,4 are sensitive to fact that there is a larger proportion of electrodes with either PPE or RPE sensitivity in aINS and vmPFC respectively - and as such, the II analysis might reflect the dominant local encoding properties above and beyond reflecting the interactions between regions per se. Simply put, could the analysis in Fig 3B turn out in any other way given that there are more PPE specific electrodes in aINS and more RPE specific electrodes in vmPFC? Some options to address this question would be to limit the electrodes included in the analyses (in Fig 3B for example) so that each region has the same number of PPE and RPE specific electrodes included.

      Please see the simulation we added to the revised manuscript (Fig. S10) demonstrating that synergistic interactions can emerge between regions with the same specificity.

      Regarding the possibility that Fig. 3 and 4 are sensitive to the number of bipolar derivations being R/PPE specific, a counter-example is the vmPFC. The vmPFC has a few recordings specific to punishment (Fig. 2) in almost 30% of the subjects (Fig. S4). However, there is no II about the PPE between recordings of the vmPFC (Fig. 3). The same reasoning also holds for the lOFC. Therefore, the proportion of recordings being RPE or PPE-specific is not sufficient to determine the type of interactions.

      (12)  Related to the point above, what would the results presented in Fig 3A (and 3B) look like if the authors ran the analyses on RPE specific and PPE specific electrodes only. Is the vmPFC-vmPFC RPE effect in Fig 3A arising simply due to the high prevalence of RPE specific electrodes in vmPFC (as shown in Fig. 2)?

      Please see our answer above.

      Reviewer #2:

      Regarding Figure 2A, the authors argued that their findings "globally reproduced their previously published findings" (from Gueguen et al, 2021). It is worth noting though that in their original analysis, both aINS and lOFC show differential effects (aINS showing greater punishment compared to reward, and the opposite for lOFC) compared to the current analysis. Although I would be akin to believe that the nonlinear approach used here might explain part of the differences (as the authors discussed), I am very wary of the other argument advanced: "the removal of iEEG sites contaminated with pathological activity". This raised some red flags. Does that mean some of the conclusions observed in Gueguen et al (2021) are only the result of noise contamination, and therefore should be disregarded? The author might want to add a short supplementary figure using the same approach as in Gueguen (2021) but using the subset of contacts used here to comfort potential readers of the validity of their previous manuscript.

      We appreciate the reviewer's concerns and understand the request for additional information. However, we would like to point out that the figure suggested by the reviewer is already present in the supplementary files of Gueguen et al. 2021 (see Fig. S2). The results of this study should not be disregarded, as the supplementary figure reproduces the results of the main text after excluding sites with pathological activity. Including or excluding sites contaminated with epileptic activity does not have a significant impact on the results, as analyses are performed at each time-stamp and across trials, and epileptic spikes are never aligned in time across trials.

      That being said, there are some methodological differences between the two studies. To extract gamma power, Gueguen et al. filtered and averaged 10 Hz sub-bands, while we used multi-tapers. Additionally, they used a temporal smoothing of 250 ms, while we used less smoothing. However, as explained in the main text, we used information-theoretical approaches to capture the statistical dependencies between gamma power and PE. Despite divergent methodologies, we obtained almost identical results.

      The data and code supporting this manuscript should be made available. If raw data cannot be shared for ethical reasons, single-trial gamma activities should at least be provided. Regarding the code used to process the data, sharing it could increase the appeal (and use) of the methods applied.

      We thank the reviewer for this suggestion. We added a section entitled “Code and data availability” and gave links to the scripts, notebooks and preprocessed data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition, Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      Weaknesses:

      There is a concern of double-dipping in one of the tests (Experiment 2, Figure 3: interaction of Upright/Inverted X Congruent/Incongruent). I raised this concern on the original submission, and it has not been resolved properly. The follow-up statistical test (after channel selection using the interaction contrast permutation test) still is geared towards that same contrast, even though the latter is now being tested differently. (Perhaps not explicitly testing the interaction, but in essence still testing the same.) A very simple solution would be to remove the post-hoc statistical tests and simply acknowledge that you're comparing simple means, while the statistical assessment was already taken care of using the permutation test. (In other words: the data appear compelling because of the cluster test, but NOT because of the subsequent t-tests.)

      We are sorry that we did not explain this issue clearly before, which might have caused some misunderstanding. When performing the cluster-based permutation test, we only tested whether the audiovisual congruency effect (congruent vs. incongruent) between the upright and inverted conditions was significantly different [i.e., (UprCon – UprInc) vs. (InvCon – InvInc)], without conducting extra statistical analyses on whether the congruency effect was significant in each orientation condition. Such an analysis yielded a cluster with a significant interaction between audiovisual integration and BM orientation for the cortical tracking effect at 1Hz (but not at 2Hz). However, this does not provide valid information about whether the audiovisual congruency effect at this cluster is significant in each orientation condition, given that a significant interaction effect may result from various patterns of data across conditions: such as significant congruency effects in both orientation conditions (Author response image 1a), a significant congruency effect in the upright condition and a non-significant effect in the inverted condition (Author response image 1b), or even non-significant yet opposite effects in the two conditions (Author response image 1c). Here, our results conform to the second pattern, indicating that cortical tracking of the high-order gait cycles involves a domain-specific process exclusively engaged in the AVI of BM. In a similar vein, the non-significant interaction found at 2Hz does not necessarily indicate that the congruency effect is non-significant in each orientation condition (Author response image 1f&e). Indeed, the congruency effect was significant in both the upright and inverted conditions at 2Hz in our study despite the non-significant interaction, suggesting that neural tracking of the lower-order step cycles is associated with a domain-general AVI process mostly driven by temporal correspondence in physical stimuli.

      Therefore, we need to perform subsequent t-tests to examine the significance of the simple effects in the two orientation conditions, which do not duplicate the clusterbased permutation test (for interaction only) and cause no double-dipping. Results from interaction and simple effects, put together, provide solid evidence that the cortical tracking of higher-order and lower-order rhythms involves BM-specific and domaingeneral audiovisual processing, respectively.

      To avoid ambiguity, we have removed the sentence “We calculated the audiovisual congruency effect for the upright and the inverted conditions” (line 194, which referred to the calculation of the indices rather than any statistical tests) from the manuscript. We have also clarified the meanings of the findings based on the interaction and simple effects together at the two temporal scales, respectively (Lines 205-207; Lines 213-215).

      Author response image 1.

      Examples of different patterns of data yielding a significant or nonsignificant interaction effect.

      Reviewer #2 (Public review):

      Summary:

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      Weaknesses:

      In the revised version of the paper, the manuscript better relays the results and anticipates analyses, and this version adequately resolves some concerns I had about analysis details. Still, it is my view that the findings of the study are basic neural correlate results that do not provide insights into neural mechanisms or the causal relevance of neural effects towards behavior and cognition. The presence of an inversion effect suggests that the supra-additivity is related to cognition, but that leaves open whether any detected neural pattern is actually consequential for multi-sensory integration (i.e., correlation is not causation). In other words, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated in the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion, and it is possible that such neural correlates relate to behaviorally relevant neural mechanisms, but based on the current task and associated analyses this has not been shown.

      Thank you for providing these thoughtful comments regarding the theoretical implications of our neural findings. Previous behavioral evidence highlights the specificity of the audiovisual integration (AVI) of biological motion (BM) and reveals the impairment of such ability in individuals with autism spectrum disorder. However, the neural implementation underlying the AVI of BM, its specificity, and its association with autistic traits remain largely unknown. The current study aimed to address these issues.

      It is noteworthy that the operation of multisensory integration does not always depend on specific tasks, as our brains tend to integrate signals from different sensory modalities even when there is no explicit task. Hence, many studies have investigated multisensory integration at the neural level without examining its correlation with behavioral performance. For example, the widely known super-additivity mode for multisensory integration proposed by Perrault and colleagues was based on single-cell recording findings without behavioral tasks (Perrault et al., 2003, 2005). As we mentioned in the manuscript, the super-additive and sub-additive modes indicate non-linear interaction processing, either with potentiated neural activation to facilitate the perception or detection of near-threshold signals (super-additive) or a deactivation mechanism to minimize the processing of redundant information cross-modally (subadditive) (Laurienti et al., 2005; Metzger et al., 2020; Stanford et al., 2005; Wright et al., 2003). Meanwhile, the additive integration mode represents a linear combination between two modalities. Distinguishing among these integration modes helps elucidate the neural mechanism underlying AVI in specific contexts, even though sometimes, the neural-level AVI effects do not directly correspond to a significant behavioral-level AVI effect (Ahmed et al., 2023; Metzger et al., 2020). In the current study, we unveiled the dissociation of multisensory integration modes between neural responses at two temporal scales (Exps. 1a & 1b), which may involve the cooperation of a domain-specific and a domain-general AVI processes (Exp. 2). While these findings were not expected to be captured by a single behavioral index, they revealed the multifaceted mechanism whereby hierarchical cortical activity supports audiovisual BM integration. They also advance our understanding of the emerging view that multi-timescale neural dynamics coordinate multisensory integration (Senkowski & Engel, 2024), especially from the perspective of natural stimuli processing.

      Meanwhile, our finding that the cortical tracking of higher-order rhythmic structure in audiovisual BM specifically correlated with individual autistic traits extends previous behavioral evidence that ASD children exhibited reduced orienting to audiovisual synchrony in BM (Falck-Ytter et al., 2018), offering new evidence that individual differences in audiovisual BM processing are present at the neural level and associated with autistic traits. This finding opens the possibility of utilizing the cortical tracking of BM as a potential neural maker to assist the diagnosis of autism spectrum disorder (see more details in our Discussion Lines 334-346).

      However, despite the main objective of the current study focusing on the neural processing of BM, we agree with the reviewer that it would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to BM perception, for further justifying that inputs are being integrated in the service of behavior. In the current study, we adopted a color-change detection task entirely unrelated to audiovisual correspondence but only for maintaining participants’ attention. The advantage of this design is that it allows us to investigate whether and how the human brain integrates audiovisual BM information under task-irrelevant settings, as people in daily life can integrate such information even without a relevant task. However, this advantage is accompanied by a limitation: the task does not facilitate the direct examination of the correlation between neural responses and behavioral performance, since the task performance was generally high (mean accuracy >98% in all experiments). Future research could investigate this issue by introducing behavioral tasks more relevant to BM perception (e.g., Shen et al., 2023). They could also apply advanced neuromodulation techniques to elucidate the causal relevance of the cortical tracking effect to behavior (e.g., Ko sem et al., 2018, 2020).

      We have discussed the abovementioned points as a separate paragraph in the revised manuscript (Lines 322-333). In addition, since the scope of the current study does not involve a causal correlation with behavioral performance, we have removed or modified the descriptions related to "functional relevance" in the manuscript (Abstract; Introduction, lines 101-103; Results, lines 239; Discussion, line 336; Supplementary Information, line 794、803). Moreover, we have strengthened the descriptions of the theoretical implications of the current findings in the abstract.

      We hope these changes adequately address your concern.

      References

      Ahmed, F., Nidiffer, A. R., O’Sullivan, A. E., Zuk, N. J., & Lalor, E. C. (2023). The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage, 274, 120143. https://doi.org/10.1016/j.neuroimage.2023.120143

      Falck-Ytter, T., Nystro m, P., Gredeba ck, G., Gliga, T., Bo lte, S., & the EASE team. (2018). Reduced orienting to audiovisual synchrony in infancy predicts autism diagnosis at 3 years of age. Journal of Child Psychology and Psychiatry, 59(8), 872–880. https://doi.org/10.1111/jcpp.12863

      Ko sem, A., Bosker, H., Jensen, O., Hagoort, P., & Riecke, L. (2020). Biasing the Perception of Spoken Words with Transcranial Alternating Current Stimulation. Journal of Cognitive Neuroscience, 32, 1–10. https://doi.org/10.1162/jocn_a_01579

      Ko sem, A., Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. (2018). Neural Entrainment Determines the Words We Hear. Current Biology, 28(18), 2867-2875.e3. https://doi.org/10.1016/j.cub.2018.07.023

      Laurienti, P. J., Perrault, T. J., Stanford, T. R., Wallace, M. T., & Stein, B. E. (2005). On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166(3), 289–297. https://doi.org/10.1007/s00221-005-2370-2

      Metzger, B. A., Magnotti, J. F., Wang, Z., Nesbitt, E., Karas, P. J., Yoshor, D., & Beauchamp, M. S. (2020). Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 40(36), 6938–6948. https://doi.org/10.1523/JNEUROSCI.0279-20.2020

      Perrault, T. J., Vaughan, J. W., Stein, B. E., & Wallace, M. T. (2003). Neuron-Specific Response Characteristics Predict the Magnitude of Multisensory Integration. Journal of Neurophysiology, 90(6), 4022–4026. https://doi.org/10.1152/jn.00494.2003

      Perrault, T. J., Vaughan, J. W., Stein, B. E., & Wallace, M. T. (2005). Superior Colliculus Neurons Use Distinct Operational Modes in the Integration of Multisensory Stimuli. Journal of Neurophysiology, 93(5), 2575–2586. https://doi.org/10.1152/jn.00926.2004

      Senkowski, D., & Engel, A. K. (2024). Multi-timescale neural dynamics for multisensory integration. Nature Reviews Neuroscience, 25(9), 625–642. https://doi.org/10.1038/s41583-024-00845-7

      Shen, L., Lu, X., Wang, Y., & Jiang, Y. (2023). Audiovisual correspondence facilitates the visual search for biological motion. Psychonomic Bulletin & Review, 30(6), 2272–2281. https://doi.org/10.3758/s13423-023-02308-z

      Stanford, T. R., Quessy, S., & Stein, B. E. (2005). Evaluating the Operations Underlying Multisensory Integration in the Cat Superior Colliculus. Journal of Neuroscience, 25(28), 6499–6508. https://doi.org/10.1523/JNEUROSCI.5095-04.2005

      Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech. Cerebral Cortex, 13(10), 1034–1043. https://doi.org/10.1093/cercor/13.10.1034

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and somewhat unusual paper supporting the idea that creatine is a neurotransmitter in the central nervous system of vertebrates. The idea is not entirely new, and the authors carefully weigh the evidence, both past and newly acquired, to make their case. The strength of the paper lies in the importance of the potential discovery - as the authors point out, creatine ticks more boxes on criteria of neurotransmitters than some of the ones listed in textbooks - and the list of known transmitters (currently 16) certainly is textbook material. A further strength of the manuscript is the careful consideration of a list of criteria for transmitters and newly acquired evidence for four of these criteria: 1. evidence that creatine is stored in synaptic vesicles, 2. mutants for creatine synthesis and a vesicular transporter show reduced storage and release of creatine, 3. functional measurement that creatine release has an excitatory or inhibitory (here inhibitory) effect in vivo, and 4. ATP-dependence. The key weakness of the paper is that there is no single clear 'smoking gun', like a postsynaptic creatine receptor, that would really demonstrate the function as a transmitter. Instead, the evidence is of a cumulative nature, and not all bits of evidence are equally strong. On balance, I found the path to discovery and the evidence assembled in this manuscript to establish a clear possibility, positive evidence, and to provide a foundation for further work in this direction.

      it is notable that, historically, no neurotransmitter has ever been established in a single paper. While creatine will not be an exception, data presented in this paper are more than any previous paper in demonstrating the possibility of a new neurotransmitter. However, we added an entire paragraph in the Discussion part about differences between Cr and classic neurotransmitters such as Glu, beginning with the absence of a molecularly defined receptor at this point and the Ca2+ independent component of Cr release induced by extracellular K+.

      We appreciate the reviewer for noting that evidence obtained by us now support that creatine satisfies all 4 criteria of transmitters.

      We respectively disagree the point about a smoking gun: any of these four is a smoking gun, while the satisfication of all 4 is quite strong, more than a smoking gun.

      We find it disagreeable that a receptor “would really demonstrate the function of a transmitter”. Textbook criteria for a transmitter usually require postsynaptic responses, not a molecularly defined receptor. A molecularly defined receptor for many of the known transmitters required many years of work, while they were accepted as transmitters before their receptors were finally molecularly defined. As long as there is a postsynaptic response, there is of course a receptor, though its molecular properties should be further studied. For examples, responses to choline were discovered in 1900 (Hunt, Am J Physiol 3, xviii-xix, 1900), those to acetylcholine in 1906 (Hunt and Taveau, Br Med J 2:1788-1789, 1906), those to supradrenal glands before 1894 (Oliver and Schäfer, J Physiol 18:230-276 1895). Henry Dale was awarded a Nobel prize in 1936 partly for his work on acetylcholine. Receptors for acetylcholine and noradrenaline were not molecularly defined until the 1970s and 1980s. Before then, they were only known by mediating responses to natural transmitters and synthesized chemicals.

      There were two previous reports that creatine could be taken into brain slices (Almeida et al., 2006) or synaptosomes (Peral, Vázquez-Carretero and Ilundain, 2010). These were used by the reviewer to argue that the idea of creatine as a neurotransmitter “is not entirely new”. However, no one has followed up these studies for 10 years, thus they would not be considered as good smoking guns. While we have reproduced the synaptosome uptake result (together with our new finding that this uptake was dependent on SLC6A8), it should be noted that uptake of molecules into synaptosomes is not absolutely required for a neurotransmitter because degradation of a transmitter is equally valid. Furthermore, molecules required synaptically but not as a transmitter can also be transported into the synaptic terminal.

      Our detection of Cr in the synaptic vesicles provides much stronger evidence supporting its importance. If a smoking gun is important, the detection of creatine in the SVs is the best smoking gun, whose discovery in fact was the reason leading us to study its release, postsynaptic responses as well as repeating the uptake experiment with genetic mutants.

      Reviewer #2 (Public Review):

      Summary:

      Bian et al studied creatine (Cr) in the context of central nervous system (CNS) function. They detected Cr in synaptic vesicles purified from mouse brains with anti-Synaptophysin using capillary electrophoresis-mass spectrometry. Cr levels in the synaptic vesicle fraction were reduced in mice lacking the Cr synthetase AGAT, or the Cr transporter SLC6A8. They provide evidence for Cr release within several minutes after treating brain slices with KCl. This KCl-induced Cr release was partially calcium-dependent and was attenuated in slices obtained from AGAT and SLC6A8 mutant mice. Cr application also decreased the excitability of cortical pyramidal cells in one third of the cells tested. Finally, they provide evidence for SLC6A8-dependent Cr uptake into synaptosomes, and ATP-dependent Cr loading into synaptic vesicles. Based on these data, the authors propose that Cr may act as a neurotransmitter in the CNS.

      Strengths:

      1) A major strength of the paper is the broad spectrum of tools used to investigate Cr.

      2) The study provides strong evidence that Cr is present in/loaded into synaptic vesicles.

      Weaknesses:

      (in sequential order)

      1) Are Cr levels indeed reduced in Agat-/-? The decrease in Cr IgG in Agat-/- (and Agat+/-) is similar to the corresponding decrease in Syp (Fig. 3B). What is the explanation for this? Is the decrease in Cr in Agat-/- significant when considering the drop in IgG? The data should be normalized to the respective IgG control.

      We measured the Cr concentration in the whole brain lysates using Creatine Assay Kit (Sigma, MAK079). Cr levels in the brain were reduced in Agat-/- mice. The Cr concentration in AGAT-/- mice was reduced to about 1/10 of AGAT+/+ and AGAT+/- mice (Author response image 1).

      Author response image 1.

      Cr concentration in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=5 male mice for each group). , p<0.05, **, p<0.001, one-way ANOVA with Tukey’s correction.

      As pointed by the reviewer, the decrease in Cr IgG in Agat-/- seems similar to the corresponding decrease in Syp (Fig. 3B in the paper). Cr pulled down by IgG was 0.46 ± 0.04, 0.37 ± 0.06 and 0.17 ±0.03 pmol/μg anti-syp antibody for Agat+/+, Agat+/-, and Agat-/- mice respectively. There was a trend of reduction Cr IgG in Agat-/-, however, there were no statistically significant differences between Agat-/- and Agat+/+, or between Agat-/- and Agat+/-, as determined by one-way ANOVA (Fig. 3B in the paper). Due to the fact that Agat-/- reduced Cr concentration in the brain, we speculate that the apparent drop in Cr pulled down by IgG may have partially resulted from the overall reduction of Cr content in the brain.

      The absolute content of Cr pulled down by Syp in Agat-/- mice was reduced to 21.6% of Agat+/+ mice and 23.6% of Agat+/- mice (Fig. 3B in the paper). As suggested by the reviewer, we normalized the Cr pulled down by Syp to the respective IgG control (Author response image 2). The normalized Cr content in AGAT-/- mice has a tendency to decrease, but not statistically significant, as compared to Agat+/+ and Agat+/- mice (n=10 for each group, one-way ANOVA).

      Author response image 2.

      Normalized Cr content in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=10 for each group). Cr pulled down by anti-Syp antibody was normalized to that of IgG.

      2) The data supporting that depolarization-induced Cr release is SLC6A8 dependent is not convincing because the relative increase in KCl-induced Cr release is similar between SLC6A8-/Y and SLC6A8+/Y (Fig. 5D). The data should be also normalized to the respective controls.

      As suggested by the reviewer, we normalized the Cr release during KCl stimulation to the baseline (Author response image 3). The ratio of Cr release evoked by high KCl stimulation to the baseline was similar in WT and Slc6a8 knockouts. This suggests that Cr is not released through SLC6A8 transporter.

      Author response image 3.

      Normalized Cr release from slices from Slc6a8+/Y and Slc6a8-/Y mice (n=7 slices for each group). Cr released evoked by high KCl stimulation was normalized to baseline.

      However, without Slc6a8, KCl-induced release of Cr was significantly reduced (Figure 5D in the paper). This is because Slc6a8 is a transporter to Cr uptake into synaptic terminals (Figure 5D and 8C in the paper). Therefore, Cr content in SVs (Figure 2C in the paper) indirectly reduced Cr release.

      3) The majority (almost 3/4) of depolarization-induced Cr release is Ca2+ independent (Fig. 5G). Furthermore, KCl-induced, Ca2+-independent release persists in SLC6A8-/Y (Fig. 5G). What is the model for Ca2+-independent Cr release? Why is there Ca2+-independent Cr release from SLC6A8 KO neurons? How does this relate to the prominent decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G)? They show a prominent decrease in Cr control levels in SLC6A8-/Y in Fig. 5D. Were the data shown in Fig. 5D obtained in the presence or absence of Ca2+? Could the decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G) be due to decreased Cr baseline levels in the presence of Ca2+ (Fig. 5D)?

      These are interesting questions that, at this point, could only be answered by references to literature. For example, one possibility was that Ca2+-independent Cr release might occurs in glia, since as pointed by the reviewer in Point 6, high GAMT levels were reported for astrocytes and oligodendrites (Schmidt et al. 2004; Rosko et al. 2023). As reported, other neuromodulators such as taurine can be released from astrocytes (Philibert, Rogers, and Dutton 1989) or slices (Saransaari and Oja 2006) in Ca2+ independent manner. In addition, in the absence of potassium stimulation, Ca2+ depletion lead to increased release of taurine in cultured astrocytes (Takuma et al. 1996) or in striatum in vivo (Molchanova, Oja, and Saransaari 2005). Similarly, in SLC6A8 KO slices, Ca2+ depletion (Figure 5G) also increased creatine baseline levels as compared to that in normal ACSF (Figure 5D). Another possibility was that Ca2+-independent Cr release might occurs in neurons lacking SLC6a8 expression.

      As mentioned in the paper, data shown in Figure 5D was obtained in the presence Ca2+. Reduction of Ca2+-dependent Cr release evoked by potassium in SLC6A8-/Y (Figure 5G) may be due to decreased Cr baseline levels in the presence of Ca2+ and reduced Cr in synaptic vesicles (Figure 5D).

      4) Cr levels are strongly reduced in Agat-/- (Figure 6B). However, KCl-induced Cr release persists after loss of AGAT (Figure 6B). These data do not support that Cr release is Agat dependent.

      Although KCl-induced Cr release persisted in AGAT-/- mutants, it was dropped to 11.6% of WT mice (Figure 6B). AGAT is not directly involved in the release, but required for providing sufficient Cr.

      5) The authors show that Cr application decreases excitability in ~1/3 of the tested neurons (Figure 7). How were responders and non-responders defined? What justifies this classification? The data for all Cr-treated cells should be pooled. Are there indeed two distributions (responders/non-responders)? Running statistics on pre-selected groups (Figure 7H-J) is meaningless. Given that the effects could be seen 2-8 minutes after Cr application - at what time points were the data shown in Figure 7E-J collected? Is the Cr group shown in Figure 7F significantly different from the control group/wash?

      The responders were defined by three criteria: (1) When Cr was applied, the rheobase was increased as compared to both control and wash conditions. (2) The number of total evoked spikes was decreased during Cr application than both control and wash. (3) The number of total evoked spikes was decreased at least by 10% than control or wash.

      For all the individual responders, when Cr was applied, the rheobase was increased (Figure 7E and 7F). While in individual non-responders, the rheobase was either identical to both control and wash (n=19/35), identical to either control or wash (n=11/35), between control and wash (n=2/35) or smaller than both control and wash (n=3/35) following Cr application. Thus, the responders and non-responders were separatable. When the rheobase data were pulled together, many points were overlapped, so we did not pull the data here.

      As suggested, we pulled the data of the ratio of spike changes in response to 100 μM Cr application for all neurons together (Author response image 4). Evoked spikes of non-responders were typically (34/35) changed in the range of -10% to 10%.

      Author response image 4.

      Relative changes of total evoked spikes in response to 100 μM Cr. Responders are represented by red dots and non-responders by black dots. Dashed black line indicates 10%. Relative change = (Cr-(Control +wash)/2)/((Control +wash)/2)*100%.

      In Figure 7E-J, we collected data at time points when the maximal response was reached. The Cr group shown in Figure 7F was indeed significantly different from the control group/wash (p<0.05, paired t test, for data points collected under 75-500 pA current injection).

      6) Indirect effects: The phenotypes could be partially caused by indirect effects of perturbing the Cr/PCr/CK system, which is known to play essential roles in ATP regeneration, Ca2+ homeostasis, neurotransmission, intracellular signaling systems, axonal and dendritic transport... Similarly, high GAMT levels were reported for astrocytes (e.g., Schmidt et al. 2004; doi: 10.1093/hmg/ddh112), and changes in astrocytic Cr may underlie the phenotypes. Cr has been also reported to be an osmolyte: a hyperosmotic shock of astrocytes induced an increase in Cr uptake, suggesting that Cr can work as a compensatory osmolyte (Alfieri et al. 2006; doi: 10.1113/jphysiol.2006.115006). Potential indirect effects are also consistent with a trend towards decreased KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C). These indirect effects may in part explain the phenotypes seen after perturbing Agat, SLC6A8, and should be thoroughly discussed.

      We discussed the possibility of creatine/phosphocreatine as non-transmitters in discussion part. We added the possibility of astrocytic Cr in discussion part. KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C) was not significant.

      7) As stated by the authors, there is some evidence that Cr may act as a co-transmitter for GABAA receptors (although only at high concentrations). Would a GABAA blocker decrease the fraction of cells with decreased excitability after Cr exposure?

      We performed another experiment in CA1 pyramidal neurons in hippocampus showing that Cr at 100 μM did not change GABAergic neurotransmission (n=8, Author response image 5). Inhibitory postsynaptic currents (IPSCs) recorded in the presence of glutamate receptor blockers (10 μM APV and 10 μM CNQX) were not changed by 100 μM creatine in hippocampal CA1 pyramidal neurons (Bgroup data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration). These did not support Cr activation of GABAA receptors.

      Author response image 5.

      IPSCs recorded in in hippocampal CA1 pyramidal neurons. (A) representative raw traces before (Control), during (Creatine) and after (Wash) the application of 100 μM creatine. (B&C) group data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration.

      8) The statement "Our results have also satisfied the criteria of Purves et al. 67,68, because the presence of postsynaptic receptors can be inferred by postsynaptic responses." (l.568) is not supported by the data and should be removed.

      We have deleted this sentence, though what could mediate postsynaptic responses other than receptors?

      Reviewer #3 (Public Review):

      SUMMARY:

      The manuscript by Bian et al. promotes the idea that creatine is a new neurotransmitter. The authors conduct an impressive combination of mass spectrometry (Fig. 1), genetics (Figs. 2, 3, 6), biochemistry (Figs. 2, 3, 8), immunostaining (Fig. 4), electrophysiology (Figs. 5, 6, 7), and EM (Fig. 8) in order to offer support for the hypothesis that creatine is a CNS neurotransmitter.

      We thank the reviewer for the summary.

      STRENGTHS:

      There are many strengths to this study.

      • The combinatorial approach is a strength. There is no shortage of data in this study.

      • The careful consideration of specific criteria that creatine would need to meet in order to be considered a neurotransmitter is a strength.

      • The comparison studies that the authors have done in parallel with classical neurotransmitters are helpful.

      • Demonstration that creatine has inhibitory effects is another strength.

      • The new genetic mutations for Slc6a8 and AGAT are strengths and potentially incredibly helpful for downstream work.

      WEAKNESSES:

      • Some data are indirect. Even though Slc6a8 and AGAT are helpful sentinels for the presence of creatine, they are not creatine themselves. Therefore, the conclusions that are drawn should be circumspect.

      SLC6A8 and AGAT mutants are not essential for Cr’s role as a neurotransmitter.

      • Regarding Slc6a8, it seems to work only as a reuptake transporter - not as a transporter into SVs. Therefore, we do not know what the transporter is.

      Indeed, SLC6A8 is only a transporter on the cytoplasmic membrane, not a transporter on synaptic vesicles. We have shown biochemistry here, and we have unpublished data that showed other SLCs on SVs, which did not include SLC6A8.

      • Puzzlingly, Slc6a8 and AGAT are in different cells, setting up the complicated model that creatine is created in one cell type and then processed as a neurotransmitter in another.

      • No candidate receptor for creatine has been identified postsynaptically.

      • Because no candidate receptor has been identified, is it possible that creatine is exerting its effects indirectly through other inhibitory receptors (e.g., GABAergic Rs)?

      As shown in our response to Question 7 of Reviewer 2, Cr did not exert its effects through inhibitory GABAA receptors.

      • More broadly, what are the other possibilities for roles of creatine that would explain these observations other than it being a neurotransmitter? Could it simply be a modifier that exists in the SVs (lots of molecules exist in SVs)?

      We discussed the possibility of a non-transmitter role for creatine/phosphocreatine in discussion part.

      • The biochemical studies are helpful in terms of comparing relevant molecules (e.g., Figs. 8 and S1), but the images of the westerns are all so fuzzy that there are questions about processing and the accuracy of the quantification.

      Multiple members (>4) have carried out SV purifications repeatedly over the last decade in our group, we are highly confident of SV purifications presented in Figs. 8 and S1.

      There are several criteria that define a neurotransmitter. The authors nicely delineated many criteria in their discussion, but it is worth it for readers to do the same with their own understanding of the data.

      By this reviewer's understanding (and the Purves' textbook definition) a neurotransmitter: 1) must be present within the presynaptic neuron and stored in vesicles; 2) must be released by depolarization of the presynaptic terminal; 3) must require Ca2+ influx upon depolarization prior to release; 4) must bind specific receptors present on the postsynaptic cell; 5) exogenous transmitter can mimic presynaptic release; 6) there exists a mechanism of removal of the neurotransmitter from the synaptic cleft.

      6 criteria seem to be only required by the reviewer. As discussed in our Discussion part, Purves’ textbook did not list 6 criteria but only three criteria, “the substance must be present within the presynaptic neuron; the substance must be released in response to presynaptic depolarization, and the release must be Ca2+ dependent; specific receptors for the substance be present on the postsynaptic cell” (Purves et al., 2001, 2016).

      Kandel et al. (2013, 2021) listed 4 criteria for a neurotransmitter: “it is synthesized in the presynaptic neuron; it is present within vesicles and is released in amounts sufficient to exert a defined action on the postsynaptic neuron or effector organ; when administered exogenously in reasonable concentrations it mimics the action of the endogenous transmitter; a specific mechanism usually exists for removing the substance from the synaptic cleft”.

      While we agree that any neuroscientist can have his/her own criteria, it is more reasonable to accept the textbooks that have been widely read for decades.

      For a paper to claim that the work has identified a new neurotransmitter, several of these criteria would be met - and the paper would acknowledge in the discussion which ones have not been met. For this particular paper, this reviewer finds that condition 1 is clearly met.

      Conditions 2 and 3 seem to be met by electrophysiology, but there are caveats here. High KCl stimulation is a blunt instrument that will depolarize absolutely everything in the prep all at once and could result in any number of non-specific biological reactions as a result of K+ rushing into all neurons in the prep. Moreover, the results in 0 Ca2+ are puzzling. For creatine (and for the other neurotransmitters), why is there such a massive uptick in release, even when the extracellular saline is devoid of calcium?

      To avoid the disadvantage of high KCl stimulation, we performed optogenetic experiments recently, with encouraging preliminary data. We do not know the source of Ca2+-independent release of Cr and neurotransmitters, though astrocytes are a possibility.

      Condition 4 is not discussed in detail at all. In the discussion, the authors elide the criterion of receptors specified by Purves by inferring that the existence of postsynaptic responses implies the existence of receptors. True, but does it specifically imply the existence of creatinergic receptors? This reviewer does not think that is necessarily the case. The authors should be appropriately circumspect and consider other modes of inhibition that are induced by activation or potentiation of other receptors (e.g., GABAergic or glycinergic).

      Our results did not support Cr stimulation of inhibitory GABAA receptors (see our answer to Point 7 in of Reviewer 2).

      Condition 5 may be met, because the authors applied exogenous creatine and observed inhibition (Fig. 7). However, this is tough to know without understanding the effects of endogenous release of creatine. if they were to test if the absence of creatine caused excess excitation (at putative creatinergic synapses), then that would be supportive of the same.

      After the submission of our manuscript, we found a recent paper showing that slc6a8 knockout led to increased excitation in pyramidal neurons in the prefrontal cortex (PFC), with increased firing frequency (Ghirardini et al., 2023). Because we have shown that slc6a8 knockout would cause decrease of Cr in SVs (Figure 2 in our paper), this result provide the evidence described as Condition 5 of this reviewer: that decrease of Cr in SVs led to excess excitation.

      For condition 6, the authors made a great effort with Slc6a8. This is a very tough criterion to understand for many synapses and neurotransmitters.

      In terms of fundamental neuroscience, the story would be impactful if proven correct. There are certainly more neurotransmitters out there than currently identified.

      The impact as framed by the authors in the abstract and introduction for intellectual disability is uncertain (forming a "new basis for ID pathogenesis") and it seems quite speculative beyond the data in this paper.

      We deleted this sentence.

      Reviewer #1 (Recommendations For The Authors):

      To strengthen the manuscript, I suggest the following considerations:

      1) The key missing evidence to my mind is a receptor - but this is clearly outside the scope of this paper. Yet, I am surprised that in the list of criteria for neurotransmitters in general there is no mention of a receptor. Furthermore, many receptors have been identified through receptor agonists or antagonists, like neurotoxins or drugs. The authors do not talk about putative receptors except for a sentence in the discussion where they speculate on a GPCR. There are numerous GPCR agonists and antagonists, which may be a long-shot, or something even a bit more designed based on knowledge about creatine? I do not think the publication of this manuscript should have been made dependent on finding an agonist or antagonist of this specific unknown receptor (if it exists), but it would be good to have at least some leads on this from the authors what has been tried or what could be done? How about a manipulation of G-protein-coupled signal transduction to support the idea that there IS such a GPCR? There may be a real opportunity here to test existing compounds in wild type, the slc6a8 and agat mutants.

      We will keep trying, but accept the reality that Rome was not built in a single day and that no transmitter was proven by one single paper.

      A key new puzzle piece of evidence is the identification of creatine in synaptic vesicles. The experiment relies heavily on the purity of the SV fraction using the anti-synaptophysin antibody. I am quite sure that these preparations contain many other compartments - and of course a big mix of synaptic (and other) vesicles. Would it be possible to purify with an anti slc6a8 antibody?

      Sl6a8 is expressed in on the plasma membrane of neurons7-9, instead of synaptic vesicles. Consistent with this, we could not detect obvious Slc6a8-HA signal in our starting material (Lane S in Author response image 6) that was used for SV purification. We have tried to purify SVs by HA antibody in Slc6a8 mice and SV markers could not be detected.

      Author response image 6.

      Lack of Slc6a8-HA in our starting material. In Slc6a8-HA knock-in mice, the HA signal was present in whole brain homogenate (H), but not obvious in supernatants (S) following 35000 × centrifugation. In contrast, SV marker Syp was present in supernatants.

      The K stimulation protocol in slices is relatively crude, as all neurons in the slice get simultaneously overactivated - and some of the effects on Ca-dependent release are not very strong (e.g. the 35 neurons that were not responsive to creatine at all). A primary neuronal culture of neurons that respond to creatine would strengthen this section.

      To avoid the disadvantage of K stimulation, we also performed optogenetic experiments recently and obtained encouraging preliminary results.

      Reviewer #2 (Recommendations For The Authors):

      1) The different sections of the manuscript are not separated by headers.

      2) The beginning of the results section either does not reference the underlying literature or refers to unpublished data.

      We have kept a bit background in the beginning of the Results section.

      3) The text contains many opinions and historical information that are not required (e.g., "It has never been easy to discover a new neurotransmitter, especially one in the central nervous system (CNS). We have been searching for new neurotransmitters for 12 years."; l. 17).

      This is a field that has been dormant for decades and such background introductions are helpful for at least some readers.

      4) Almeida et al. (2008; doi: 10.1002/syn.20280) provided evidence for electrical activity-, and Ca2+-dependent Cr release from rat brain slices. This paper should be introduced in the introduction.

      Those were stand-alone papers which have not been reproduced or paid attention to. Our introduction part did not mention them because our research did not begin with those papers. We had no idea that those papers existed when we began. We started with SV purification and only read those papers afterwards. Thus, they were not necessary background to our paper but can be discussed after we discovered Cr in SVs.

      5) Fig. 7: A Y-scale for the stimulation protocol is missing.

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      The main suggestion by this reviewer (beyond the details in the public review) is to consider the full spectrum of biology that is consistent with these results. By my reading, creatine could be a neurotransmitter, but other possibilities also exist, and the authors need to highlight those too.

      We have discussed non-transmitter role in the discussion.

      References

      Ghirardini, E., G. Sagona, A. Marquez-Galera, F. Calugi, C. M. Navarron, F. Cacciante, S. Chen, F. Di Vetta, L. Dada, R. Mazziotti, L. Lupori, E. Putignano, P. Baldi, J. P. Lopez-Atalaya, T. Pizzorusso, and L. Baroncelli. 2023. Cell-specific vulnerability to metabolic failure: the crucial role of parvalbumin expressing neurons in creatine transporter deficiency. Acta Neuropathol Commun, 11: 34. doi: 10.1186/s40478-023-01533-w.

      Lowe, M. T., Faull, R. L., Christie, D. L. & Waldvogel, H. J. Distribution of the creatine transporter throughout the human brain reveals a spectrum of creatine transporter immunoreactivity. J Comp Neurol 523, 699-725 (2015). https://doi.org:10.1002/cne.23667

      Mak, C. S. et al. Immunohistochemical localisation of the creatine transporter in the rat brain. Neuroscience 163, 571-585 (2009). https://doi.org:10.1016/j.neuroscience.2009.06.065.

      Molchanova, S. M., Oja, S. S. & Saransaari, P. Mechanisms of enhanced taurine release under Ca2+ depletion. Neurochem Int 47, 343-349 (2005). https://doi.org:10.1016/j.neuint.2005.04.027

      Philibert, R. A., Rogers, K. L. & Dutton, G. R. K+-evoked taurine efflux from cerebellar astrocytes: on the roles of Ca2+ and Na+. Neurochem Res 14, 43-48 (1989). https://doi.org:10.1007/BF00969756

      Rosko, L. M. et al. Cerebral Creatine Deficiency Affects the Timing of Oligodendrocyte Myelination. J Neurosci 43, 1143-1153 (2023). https://doi.org:10.1523/JNEUROSCI.2120-21.2022

      Saransaari, P. & Oja, S. S. Characteristics of taurine release in slices from adult and developing mouse brain stem. Amino Acids 31, 35-43 (2006). https://doi.org:10.1007/s00726-006-0290-5

      Schmidt, A. et al. Severely altered guanidino compound levels, disturbed body weight homeostasis and impaired fertility in a mouse model of guanidinoacetate N-methyltransferase (GAMT) deficiency. Hum Mol Genet 13, 905-921 (2004). https://doi.org:10.1093/hmg/ddh112

      Speer, O. et al. Creatine transporters: a reappraisal. Mol Cell Biochem 256-257, 407-424 (2004). https://doi.org:10.1023/b:mcbi.0000009886.98508.e7

      Takuma, K. et al. Ca2+ depletion facilitates taurine release in cultured rat astrocytes. Jpn J Pharmacol 72, 75-78 (1996). https://doi.org:10.1254/jjp.72.75

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #2 (Public Review): 

      Regarding reviewer #2 public review, we update here our answers to this public review with new analysis and modification done in the manuscript. 

      This manuscript is missing a direct phenotypic comparison of control cells to complement that of cells expressing RhoGEF2-DHPH at "low levels" (the cells that would respond to optogenetic stimulation by retracting); and cells expressing RhoGEF2-DHPH at "high levels" (the cells that would respond to optogenetic stimulation by protruding). In other words, the authors should examine cell area, the distribution of actin and myosin, etc in all three groups of cells (akin to the time zero data from figures 3 and 5, with a negative control). For example, does the basal expression meaningfully affect the PRG low-expressing cells before activation e.g. ectopic stress fibers? This need not be an optogenetic experiment, the authors could express RhoGEF2DHPH without SspB (as in Fig 4G). 

      Updated answer: We thank reviewer #2 for this suggestion. PRG-DHPH overexpression is known to affect the phenotype of the cell as shown in Valon et al., 2017. In our experiments, we could not identify any evidence of a particular phenotype before optogenetic activation apart from the area and spontaneous membrane speed that were already reported in our manuscript (Fig 2E and SuppFig 2). Regarding the distribution of actin and myosin, we did not observe an obvious pattern that will be predictive of the protruding/retracting phenotype. Trying to be more quantitative, we have classified (by eye, without knowing the expression level of PRG nor the future phenotype) the presence of stress fibers, the amount of cortical actin, the strength of focal adhesions, and the circularity of cells. As shown below, when these classes are binned by levels of expression of PRG (two levels below the threshold and two above) there is no clear determinant. Thus, we concluded that the main driver of the phenotype was the PRG basal expression rather than any particularity of the actin cytoskeleton/cell shape.

      Author response image 1.

      Author response image 2.

      Relatedly, the authors seem to assume ("recruitment of the same DH-PH domain of PRG at the membrane, in the same cell line, which means in the same biochemical environment." supplement) that the only difference between the high and low expressors are the level of expression. Given the chronic overexpression and the fact that the capacity for this phenotypic shift is not recruitmentdependent, this is not necessarily a safe assumption. The expression of this GEF could well induce e.g. gene expression changes. 

      Updated answer: We agree with reviewer #2 that there could be changes in gene expression. In the next point of this supplementary note, we had specified it, by saying « that overexpression has an influence on cell state, defined as protein basal activity or concentration before activation. »  We are sorry if it was not clear, and we changed this sentence in the revised manuscript (in red in the supp note). 

      One of the interests of the model is that it does not require any change in absolute concentrations, beside the GEF. The model is thought to be minimal and fits well and explains the data with very few parameters. We do not show that there is no change in concentration, but we show that it is not required to invoke it. We revised a sentence in the new version of the manuscript to include this point.

      Additional answer: During the revision process, we have been looking for an experimental demonstration of the independence of the phenotypic switch to any change in global gene expression pattern due to the chronic overexpression of PRG. Our idea was to be in a condition of high PRG overexpression such that cells protrude upon optogenetic activation, and then acutely deplete PRG to see if cells where then retracting. To deplete PRG in a timescale that prevent any change of gene expression, we considered the recently developed CATCHFIRE (PMID: 37640938) chemical dimerizer. We designed an experiment in which the PRG DH-PH domain was expressed in fusion with a FIRE-tag and co-expressing the FIRE-mate fused to TOM20 together with the optoPRG tool. Upon incubation with the MATCH small molecule, we should be able to recruit the overexpressed PRG to the mitochondria within minutes, hereby preventing it to form a complex with active RhoA in the vicinity of the plasma membrane. Unfortunately, despite of numerous trials we never achieved the required conditions: we could not have cells with high enough expression of PRGFIRE-tag (for protrusive response) and low enough expression of optoPRG (for retraction upon PRGFIRE-tag depletion). We still think this would be a nice experiment to perform, but it will require the establishment of a stable cell line with finely tuned expression levels of the CATCHFIRE system that goes beyond the timeline of our present work.      

      Concerning the overall model summarizing the authors' observations, they "hypothesized that the activity of RhoA was in competition with the activity of Cdc42"; "At low concentration of the GEF, both RhoA and Cdc42 are activated by optogenetic recruitment of optoPRG, but RhoA takes over. At high GEF concentration, recruitment of optoPRG lead to both activation of Cdc42 and inhibition of already present activated RhoA, which pushes the balance towards Cdc42."

      These descriptions are not precise. What is the nature of the competition between RhoA and Cdc42? Is this competition for activation by the GEFs? Is it a competition between the phenotypic output resulting from the effectors of the GEFs? Is it competition from the optogenetic probe and Rho effectors and the Rho biosensors? In all likelihood, all of these effects are involved, but the authors should more precisely explain the underlying nature of this phenotypic switch. Some of these points are clarified in the supplement, but should also be explicit in the main text. 

      Updated answer: We consider the competition between RhoA and Cdc42 as a competition between retraction due to the protein network triggered by RhoA (through ROCK-Myosin and mDia-bundled actin) and the protrusion triggered by Cdc42 (through PAK-Rac-ARP2/3-branched Actin). We made this point explicit in the main text.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Major 

      - why this is only possible for such few cells. Can the authors comment on this in the discussion? Does the model provide any hints? 

      As said in our answer to the public comment or reviewer #1, we think that the low number of cells being able to switch can be explained by two different reasons: 

      (1) First, we were looking for clear inversions of the phenotype, where we could see clear ruffles in the case of the protrusion, and clear retractions in the other case. Thus, we discarded cells that would show in-between phenotypes, because we had no quantitative parameter to compare how protrusive or retractile they were. This reduced the number of switching cells 

      (2) Second, we had a limitation due to the dynamic of the optogenetic dimer used here. Indeed, the control of the frequency was limited by the dynamic of unbinding of the optogenetic dimer. This dynamic of recruitment (~20s) is comparable to the dynamics of the deactivation of RhoA and Cdc42. Thus, the differences in frequency are smoothed and we could not vary enough the frequency to increase the number of switches. Thanks to the model, we can predict that increasing the unbinding rate of the optogenetic tool (shorter dimer lifetime) should allow us to increase the number of switching cells. 

      We have added a sentence in the discussion to make this second point explicit.

      - I would encourage the authors to discuss this molecular signaling switch in the context of general design principles of switches. How generalizable is this network/mechanism? Is it exclusive to activating signaling proteins or would it work with inhibiting mechanisms? Is the competition for the same binding site between activators and effectors a common mechanism in other switches? 

      The most common design principle for molecular switches is the bistable switch that relies on a nonlinear activation (for example through cooperativity) with a linear deactivation. Such a design allows the switch between low and high levels. In our case, there is no need for a non-linearity since the core mechanism is a competition for the same binding site on active RhoA of the activator and the effectors. Thus, the design principle would be closer to the notion of a minimal “paradoxical component” (PMID: 23352242) that both activate and limit signal propagation, which in our case can be thought as a self-limiting mechanism to prevent uncontrolled RhoA activation by the positive feedback. Yet, as we show in our work, this core mechanism is not enough for the phenotypic switch to happen since the dual activation of RhoA and Cdc42 is ultimately required for the protrusion phenotype to take over the retracting one. Given the particularity of the switch we observed here, we do not feel comfortable to speculate on any general design principles in the main text, but we thank reviewer #1 for his/her suggestion.

      - Supplementary figures - there is a discrepancy between the figures called in the text and the supplementary files, which only include SF1-4. 

      We apologize for this error and we made the correction. 

      - In the text, the authors use Supp Figure 7 to show that the phenotype could not be switched by varying the fold increase of recruitment through changing the intensity/duration of the light pulse. Aside from providing the figure, could you give an explanation or speculation of why? Does the model give any prediction as to why this could be difficult to achieve experimentally (is the range of experimentally feasible fold change of 1.1-3 too small? Also, could you clarify why the range is different than the 3 to 10-fold mentioned at the beginning of the results section? 

      We thank the reviewer for this question, and this difference between frequency and intensity can be indeed understood in a simple manner through the model. 

      All the reactions in our model were modeled as linear reactions. Thus, at any timepoint, changing the intensity of the pulse will only change proportionally the amount of the different components (amount of active RhoA, amount of sequestered RhoA, and amount of active Cdc42). This explains why we cannot change the balance between RhoA activity and Cdc42 activity only through the pulse strength. We observed the same experimentally: when we changed the intensity of the pulses, the phenotype would be smaller/stronger, but would never switch, supporting our hypothesis on the linearity of all biochemical reactions. 

      On the contrary, changing the frequency has an effect, for a simple reason: the dynamics of RhoA and Cdc42 activation are not the same as the dynamics of inhibition of RhoA by the PH domain (see

      Figure 4). The inhibition of RhoA by the PH is almost instantaneous while the activation of RhoGTPases has a delay (sets by the deactivation parameter k_2). Intuitively, increasing the frequency will lead to sustained inhibition of RhoA, promoting the protrusion phenotype. Decreasing the frequency – with a stronger pulse to keep the same amount of recruited PRG – restricts this inhibition of RhoA to the first seconds following the activation. The delayed activation of RhoA will then take over. 

      We added two sentences in the manuscript to explain in greater details the difference between intensity and frequency.  

      Regarding the difference between the 1.3-3 fold and the 3 to 10 fold, the explanation is the following: the 3 to 10 fold referred to the cumulative amount of proteins being recruited after multiple activations (steady state amount reached after 5 minutes with one activation every 30s); while the 1.3-3 fold is what can be obtained after only one single pulse of activation.  

      - The transient expression achieves a large range of concentration levels which is a strength in this case. To solve the experimental difficulties associated with this, i.e. finding transfected cells at low cell density, the authors developed a software solution (Cell finder). Since this approach will be of interest for a wide range of applications, I think it would deserve a mention in the discussion part. 

      We thank the reviewer for his/her interest in this small software solution.

      We developed the description of the tool in the Method section. The Cell finder is also available with comments on github (https://github.com/jdeseze/cellfinder) and usable for anyone using Metamorph or Micromanager imaging software. 

      Minor 

      - Can the authors describe what they mean with "cell state"? It is used multiple times in the manuscript and can be interpreted as various things. 

      We now explain what we mean by ‘cell state’ in the main text :

      “protein basal activities and/or concentrations - which we called the cell state”

      - “(from 0% to 45%, Figure 2D)", maybe add here: "compare also with Fig. 2A". 

      We completed the sentence as suggested, which clarifies the data for the readers.

      - The sentence "Given that the phenotype switch appeared to be controlled by the amount of overexpressed optoPRG, we hypothesized that the corresponding leakiness of activity could influence the cell state prior to any activation." might be hard to understand for readers unfamiliar with optogenetic systems. I suggest adding a short sentence explaining dark-state activity/leakiness before putting the hypothesis forward. 

      We changed this whole beginning of the paragraph to clarify.

      - Figure 2E and SF2A. I would suggest swapping these two panels as the quantification of the membrane displacement before activation seems more relevant in this context. 

      We thank reviewer #1 for this suggestion and we agree with it (we swapped the two panels)

      - Fig. 2B is missing the white frames in the mixed panels. 

      We are sorry for this mistake, we changed it in the new version.  

      - In the text describing the experiment of Fig. 4G, it would again be helpful to define what the authors mean by cell state, or to state the expected outcome for both hypotheses before revealing the result.

      We added precisions above on what we meant by cell state, which is the basal protein activities and/or concentrations prior to optogenetic activation. We added the expectation as follow: 

      To discriminate between these two hypotheses, we overexpressed the DH-PH domain alone in another fluorescent channel (iRFP) and recruited the mutated PH at the membrane. “If the binding to RhoA-GTP was only required to change the cell state, we would expect the same statistics than in Figure 2D, with a majority of protruding cells due to DH-PH overexpression. On the contrary, we observed a large majority of retracting phenotype even in highly expressing cells (Figure 4G), showing that the PH binding to RhoA-GTP during recruitment is a key component of the protruding phenotype.”

      - Figure 4H,I: "of cells that overexpress PRG, where we only recruit the PH domain" doesn't match with the figure caption. Are these two constructs in the same cell? If not please clarify the main text. 

      We agree that it was not clear. Both constructs are in the same cell, and we changed the figure caption accordingly.  

      - "since RhoA dominates Cdc42" is this concluded from experiments (if yes, please refer to the figure) or is this known from the literature (if yes, please cite). 

      The assumption that RhoA dominates Cdc42 comes from the fact that we see retraction at low PRG concentration. We assumed that RhoA is responsible for the retraction phenotype. Our assumption is based on the literature (Burridge 2004 as an example of a review, confirmed by many experiments, such as the direct recruitment of RhoA to the membrane, see Berlew 2021) and is supported by our observations of immediate increase of RhoA activity at low PRG. We modified the text to clarify it is an assumption.

      - Fig. 6G  o left: is not intuitive, why are the number of molecules different to start with? 

      The number of molecules is different because they represent the active molecules: increasing the amount of PRG increases the amount of active RhoA and active Cdc42. We updated the figure to clarify this point.

      o right: the y-axis label says "phenotype", maybe change it to "activity" or add a second y-axis on the right with "phenotype"? 

      We updated the figure following reviewer #1 suggestion.

      - Discussion: "or a retraction in the same region" sounds like in the same cell. Perhaps rephrase to state retraction in a similar region? 

      Sorry for the confusion, we change it to be really clear: “a protrusion in the activation region when highly expressed, or a retraction in the activation region when expressed at low concentrations.”

      Typos: 

      - "between 3 and 10 fold" without s. 

      - Fig. 1H, y-axis label. 

      - "whose spectrum overlaps" with s. 

      - "it first decays, and then rises" with s. 

      - Fig 4B and Fig 6B. Is the time in sec or min? (Maybe double-check all figures). 

      - "This result suggests that one could switch the phenotype in a single cell by selecting it for an intermediate expression level of the optoPRG.". 

      - "GEF-H1 PH domain has almost the same inhibition ability as PRG PH domain". 

      We corrected all these mistakes and thank the reviewer for his careful reading of the manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      Likewise, the model assumes that at high PRG GEF expression, the "reaction is happening far from saturation ..." and that "GTPases activated with strong stimuli -giving rise to strong phenotypic changes- lead to only 5% of the proteins in a GTP-state, both for RhoA and Cdc42". Given the high levels of expression (the absolute value of which is not known) this assumption is not necessarily safe to assume. The shift to Cdc42 could indeed result from the quantitative conversion of RhoA into its active state. 

      We agree with the reviewer that the hypothesis that RhoA is fully converted into its active state cannot be completely ruled out. However, we think that the two following points can justify our choice.

      - First, we see that even in the protruding phenotype, RhoA activity is increasing upon optoPRG recruitment (Figure 3). This means that RhoA is not completely turned into its active GTP-loaded state. The biosensor intensity is rising by a factor 1.5 after 5 minutes (and continue to increase, even if not shown here). For sure, it could be explained by the relocation of RhoA to the place of activation, but it still shows that cells with high PRG expression are not completely saturated in RhoA-GTP. 

      - We agree that linearity (no saturation) is still an hypothesis and very difficult to rule out, because it is not only a question of absolute concentrations of GEFs and RhoA, but also a question of their reaction kinetics, which are unknow parameters in vivo. Yet, adding a saturation parameter would mean adding 3 unknown parameters (absolute concentrations of RhoA, as well as two reaction constants). The fact that there are not needed to fit the complex curves of RhoA as we do with only one parameter tends to show that the minimal ingredients representing the interaction are captured here.  

      The observed "inhibition of RhoA by the PH domain of the GEF at high concentrations" could result from the ability of the probe to, upon membrane recruitment, bind to active RhoA (via its PH domain) thereby outcompeting the RhoA biosensor (Figure 4A-C). This reaction is explicitly stated in the supplemental materials ("PH domain binding to RhoA-GTP is required for protruding phenotype but not sufficient, and it is acting as an inhibitor of RhoA activity."), but should be more explicit in the main text. Indeed, even when PRG DHPH is expressed at high concentrations, it does activate RhoA upon recruitment (figure 3GH). Not only might overexpression of this active RhoA-binding probe inhibit the cortical recruitment of the RhoA biosensor, but it may also inhibit the ability of active RhoA to activate its downstream effectors, such as ROCK, which could explain the decrease in myosin accumulation (figure 3D-F). It is not clear that there is a way to clearly rule this out, but it may impact the interpretation. 

      This hypothesis is actually what we claim in the manuscript. We think that the inhibition of RhoA by the PH domain is explained by its direct binding. We may have missed what Reviewer #2 wanted to say, but we think that we state it explicitly in the main text :

      “Knowing that the PH domain of PRG triggers a positive feedback loop thanks to its binding to active RhoA 18, we hypothesized that this binding could sequester active RhoA at high optoPRG levels, thus being responsible for its inhibition.”

      And also in the Discussion:

      “However, this feedback loop can turn into a negative one for high levels of GEF: the direct interaction between the PH domain and RhoA-GTP prevents RhoA-GTP binding to effectors through a competition for the same binding site.”

      We may have not been clear, but we think that this is what is happening: the PH domain prevents the binding to effectors and decreases RhoA activity (as was shown in Chen et al. 2010).  

      The X-axis in Figure 4C time is in seconds not minutes. The Y-axis in Figure 4H is unlabeled. 

      We are sorry for the mistake of Figure 4C. We changed the Y-axis in the Figure 4h.  

      Although this publication cites some of the relevant prior literature, it fails to cite some particularly relevant works. For example, the authors state, "The LARG DH domain was already used with the iLid system" and refers to a 2018 paper (ref 19), whereas that domain was first used in 2016 (PMID 27298323). Indeed, the authors used the plasmid from this 2016 paper to build their construct. 

      We thank the reviewer for pointing out this error, we have corrected the citation and put the seminal one in the revised version.

      An analogous situation pertains to previous work that showed that an optogenetic probe containing the DH and PH domains in RhoGEF2 is somewhat toxic in vivo (table 6; PMID 33200987). Furthermore, it has previously been shown that mutation of the equivalent of F1044A and I1046E eliminates this toxicity (table 6; PMID 33200987) in vivo. This is particularly important because the Rho probe expressing RhoGEF2-DHPH is in widespread usage (76 citations in PubMed). The ability of this probe to activate Cdc42 may explain some of the phenotypic differences described resulting from the recruitment of RhoGEF2-DHPH and LARG-DH in a developmental context (PMID 29915285, 33200987). 

      We thank reviewer #2 for these comments, and added a small section in the discussion, for optogenetic users: 

      This underlines the attention that needs to be paid to the choice of specific GEF domains when using optogenetic tools. Tools using DH-PH domains of PRG have been widely used, both in mammalian cells and in Drosophila (with the orthologous gene RhoGEF2), and have been shown to be toxic in some contexts in vivo 28. Our study confirms the complex behavior of this domain which cannot be reduced to a simple RhoA activator.   

      Concerning the experiment shown in 4D, it would be informative to repeat this experiment in which a non-recruitable DH-PH domain of PRG is overexpressed at high levels and the DH domain of LARG is recruited. This would enable the authors to distinguish whether the protrusion response is entirely dependent on the cell state prior to activation or the combination of the cell state prior to activation and the ability of PRG DHPH to also activate Cdc42. 

      We thank the reviewer for his suggestion. Yet, we think that we have enough direct evidence that the protruding phenotype is due to both the cell state prior to activation and the ability of PRG DHPH to also activate Cdc42. First, we see a direct increase in Cdc42 activity following optoPRG recruitment (see Figure 6). This increase is sustained in the protruding phenotype and precedes Rac1 and RhoA activity, which shows that it is the first of these three GTPases to be activated. Moreover, we showed that inhibition of PAK by the very specific drug IPA3 is completely abolishing only the protruding phenotype, which shows that PAK, a direct effector of Cdc42 and Rac1, is required for the protruding phenotype to happen. We know also that the cell state prior to activation is defining the phenotype, thanks to the data presented in Figure 2. 

      We further showed in Figure 1 that LARG DH-PH domain was not able to promote protrusion. The proposed experiment would be interesting to confirm that LARG does not have the ability to activate another GTPase, even in a different cell state with overexpressed PRG. However, we are not sure it would bring any substantial findings to understand the mechanism we describe here, given the facts provided above.  

      Similarly, as PRG activates both Cdc42 and Rho at high levels, it would be important to determine the extent to which the acute Rho activation contributes to the observed phenotype (e.g. with Rho kinase inhibitor). 

      We agree with the reviewer that it would be interesting to know whether RhoA activation contributes to the observed phenotype, and we have tried such experiments. 

      For Rho kinase inhibitor, we tried with Y-27632 and we could never prevent the protruding phenotype to happen. However, we could not completely abolish the retracting phenotype either (even when the effect on the cells was quite strong and visible), which could be due to other effectors compensating for this inhibition. As RhoA has many other effectors, it does not tell us that RhoA is not required for protrusion. 

      We also tried with C3, which is a direct inhibitor of RhoA. However, it had too much impact on the basal state of the cells, making it impossible to recruit (cells were becoming round and clearly dying. As both the basal state and optogenetic activation require the activation of RhoA, it is hard to conclude out of experiments where no cell is responding. 

      The ability of PRG to activate Cdc42 in vivo is striking given the strong preference for RhoA over Cdc42 in vitro (2400X) (PMID 23255595). Is it possible that at these high expression levels, much of the RhoA in the cell is already activated, so that the sole effect that recruited PRG can induce is activation of Cdc42? This is related to the previous point pertaining to absolute expression levels.  

      As discussed before, we think that it is not only a question of absolute expression levels, but also of the affinities between the different partners. But Reviewer #2 is right, there is a competition between the activation of RhoA and Cdc42 by optoPRG, and activation of Cdc42 probably happens at higher concentration because of smaller effective affinity.

      Still, we know that activation of the Cdc42 by PRG DH-PH domain is possible in vivo, as it was very clearly shown in Castillo-Kauil et al., 2020 (PMID 33023908). They show that this activation requires the linker between DH and PH domain of PRG, as well as Gαs activation, which requires a change in PRG DH-PH conformation. This conformational switch does not happen in vitro, which might explain why the affinity against Cdc42 was found to be very low. 

      Minor points 

      In both the abstract and the introduction the authors state, "we show that a single protein can trigger either protrusion or retraction when recruited to the plasma membrane, polarizing the cell in two opposite directions." However, the cells do not polarize in opposite directions, ie the cells that retract do not protrude in the direction opposite the retraction (or at least that is not shown). Rather a single protein can trigger either protrusion or retraction when recruited to the plasma membrane, depending upon expression levels. 

      We thank the reviewer for this remark, and we agree that we had not shown any data supporting a change in polarization. We solved this issue, by showing now in Supplementary Figure 1 the change in areas in both the activated and in the not activated region. The data clearly show that when a protrusion is happening, the cell retracts in the non-activated region. On the other hand, when the cell retracts, a protrusion happens in the other part of the cell, while the total area is staying approximately constant. 

      We added the following sentence to describe our new figure:

      Quantification of the changes in membrane area in both the activated and non-activated part of the cell (Supp Figure 1B-C) reveals that the whole cell is moving, polarizing in one direction or the other upon optogenetic activation.

      While the authors provide extensive quantitative data in this manuscript and quantify the relative differences in expression levels that result in the different phenotypes, it would be helpful to quantify the absolute levels of expression of these GEFs relative to e.g. an endogenously expressed GEF. 

      We agree with the reviewer comment, and we also wanted to have an idea of the absolute level of expression of GEFs present in these cells to be able to relate fluorescent intensities with absolute concentrations. We tried different methods, especially with the purified fluorescent protein, but having exact numbers is a hard task.

      We ended up quantifying the amount of fluorescent protein within a stable cell line thanks to ELISA and comparing it with the mean fluorescence seen under the microscope. 

      We estimated that the switch concentration was around 200nM, which is 8 times more than the mean endogenous concentration according to https://opencell.czbiohub.org/, but should be reachable locally in wild type cell, or globally in mutated cancer cells. 

      Given the numerical data (mostly) in hand, it would be interesting to determine whether RhoGEF2 levels, cell area, the pattern of actin assembly, or some other property is most predictive of the response to PRG DHPH recruitment. 

      We think that the manuscript made it clear that the concentration of PRG DHPH is almost 100% predictive of the response to PRG DHPH. We believe that other phenotypes such as the cell area or the pattern of actin assembly would only be consequences of this. Interestingly, as experimentators we were absolutely not able to predict the behavior by only seeing the shape of the cell, event after hundreds of activation experiments, and we tried to find characteristics that would distinguish both populations with the data in our hands and could not find any.

      There is some room for general improvement/editing of the text. 

      We tried our best to improve the text, following reviewers suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their insightful comments and recommendations. We have extensively revised the manuscript in response to the valuable feedback. We believe the results is a more rigorous and thoughtful analysis of the data. Furthermore, our interpretation and discussion of the findings is more focused and highlights the importance of the circuit and its role in the response to stress. Thank you for helping to improve the presented science.

      Key changes made in response to the reviewers comments include:

      • Revision of statistical analyses for nearly all figures, with the addition of a new table of summary statistics to include F and/or t values alongside p-values.

      • Addition of statistical analyses for all fiber photometry data.

      • Examination of data for possible sex dependent effects.

      • Clarification of breeding strategies and genotype differences, with added details to methods to improve clarity.

      • Addressing concerns about the specificity of virus injections and the spread, with additional details added to methods.

      • Modification of terminology related to goal-directed behavior based on reviewer feedback, including removal of the term from the manuscript.

      • Clarification and additional data on the use of photostimulation and its effects, including efforts to inactivate neurons for further insight, despite technical challenges.

      • Correction of grammatical errors throughout the manuscript.

      Reviewer 1:

      Despite the manuscript being generally well-written and easy to follow, there are several grammatical errors throughout that need to be addressed.

      Thank you for highlighting this issue. Grammatical errors have been fixed in the revised version of the manuscript.

      Only p values are given in the text to support statistical differences. This is not sufficient. F and/or t values should be given as well.

      In response to this critique and similar comments from Reviewer 2, we re-evaluated our approach to statistical analyses and extensively revised analyses for nearly all figures. We also added a new table of summary statistics (Supplemental Table 1) containing the type of analysis, statistic, comparison, multiple comparisons, and p value(s). For Figures 4C-E, 5C, 6C-E, 7H-I, and 8H we analyzed these data using two-way repeated measures (RM) ANOVA that examined the main effect of time (either number of sessions or stimulation period) in the same animal and compared that to the main effect of genotype of the animal (Cre+ vs Cre-), and if there was an interaction. For Supplemental Figure 7A we also conducted a two-way RM ANOVA with time as a factor and activity state (number of port activations in active vs inactive nose port) as the other in Cre+ mice. For Figures 5D-E we conducted a two-way mixed model ANOVA that accounted and corrected for missing data. In figures that only compared two groups of data (Figures 5F-L, 6F, 8C-D, 8I, and Supp 6F-G) we used two-tailed t-test for the analysis. If our question and/or hypothesis required us to conduct multiple comparisons between or within treatments, we conducted Bonferroni’s multiple comparisons test for post hoc analysis (we note which groups we compared in Supplemental Table 1). For figures that did or did not show a change in calcium activity (Figure 3G, 3I-K, 7B, 7D-E, 8E-F), we compared waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). The time windows we used as comparison are noted in Supplemental Table 1, and if the comparisons were significant at 95%, 99%, and 99.9% thresholds.

      None of prior comparisons in prior analyses that were significant were found to have fallen below thresh holds for significance. Of those found to be not significantly different, only one change was noted. In Figure 6E there was now a significant baseline difference between Cre+ and Cre- mice with Cre- mice taking longer to first engage the port compared to Cre+ mice (p=0.045). Although the more rigorous approach the statistical analyses did not change our interpretations we feel the enhanced the paper and thank the reviewer for pushing this improvement.

      Moreover, the fibre photometry data does not appear to have any statistical analyses reported - only confidence intervals represented in the figures without any mention of whether the null hypothesis that the elevations in activity observed are different from the baseline.

      This is particularly important where there is ambiguity, such as in Figure 3K, where the spontaneous activity of the animal appears to correlate with a spike in activity but the text mentions that there is no such difference. Without statistics, this is difficult to judge.

      Thank you for highlighting this critical point and providing an opportunity to strengthen our manuscript. We added statistical analyses of all fiber photometry data using a recently described approach based on waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). In the statistical summary (Supplemental Table 1) we note the time window that we used for comparison in each analysis and if the comparisons were significant at 95%, 99%, and 99.9% thresholds. Thank you from highlighting this and helping make the manuscript stronger.

      With respect to Figure 3K, we are not certain we understood the spike in activity the reviewer referred to. Figure 3J and K include both velocity data (gold) and Ca2+ dependent signal (blue). We used episodes of velocity that were comparable to the avoidance respond during the ambush test and no significant differences in the Ca2+ signal when gating around changes in velocity in the absence of stressor (Supplemental Table1). This is in contrast to the significant change in Ca2+ signal following a mock predator ambush (Figure 3J). We interpret these data together to indicate that locomotion does not correlate with an increase in calcium activity in SuMVGLUT2+::POA neurons, but that coping to a stressor does. This conclusion is further examined in supplemental Figure 5, including examining cross-correlation to test for temporally offset relationship between velocity and Ca2+ signal in SUMVGLUT2+::POA neurons.

      The use of photostimulation only is unfortunate, it would have been really nice to see some inactivation of these neurons as well. This is because of the well-documented issues with being able to determine whether photostimulation is occurring in a physiological manner, and therefore makes certain data difficult to interpret. For instance, with regards to the 'active coping' behaviours - is this really the correct characterisation of what's going on? I wonder if the mice simply had developed immobile responding as a coping strategy but when they experience stimulation of these neurons that they find aversive, immobility is not sufficient to deal with the summative effects of the aversion from the swimming task as well as from the neuronal activation? An inactivation study would be more convincing.

      We agree with the point of the reviewer, experiments demonstrating necessity of SUMVGLUT2+::POA neurons would have added to the story here. We carried out multiple experiments aimed at addressing questions about necessity of SuMVGLUT2+::POA neurons in stress coping behaviors, specifically the forced swim assay. Efforts included employing chemogenetic, optogenetic, and tetanus toxin-based methods. We observed no effects on locomotor activity or stress coping. These experiments are both technically difficult and challenging to interpret. Interpretation of negative results, as we obtained, is particularly difficult because of potential technical confounds. Selective targeting of SuMVGLUT2+::POA neurons for inhibition requires a process requiring three viral injections and two recombination steps, increasing variability and reducing the number of neurons impacted. Alternatively, photoinhibition targeting SuMVGLUT2+::POA cells can be done using Retro-AAV injected into POA and a fiber implant over SuM. We tried both approaches. Data obtained were difficult to interpret because of questions about adequate coverage of SuMVGLUT2+::POA population by virally expressed constructs and/or light spread arose. The challenge of adequate coverage to effectively prevent output from the targeted population is further confounded by challenges inherent in neural inhibition, specifically determining if the inhibition created at the cellular level is adequate to block output in the context of excitatory inputs or if neurons must be first engaged in a particular manner for inhibition to be effective. Baseline neural activity, release probability, and post-synaptic effects could all be relevant, which photo-inhibition will potentially not resolve. So, while the trend is to always show “necessary and sufficient” effects, we’ve tried nearly everything, and we simply cannot conclude much from our mixed results. There are also wellestablished problems with existing photo-inhibition methods, which while people use them and tout them, are often ignored. We have a lot of expertise in photo-inhibition optogenetics, and indeed have used it with some success, developed new methods, yet in this particular case we are unable to draw conclusions related to inhibition. People have experienced similar challenges in locus coeruleus neurons, which have very low basal activity, and inhibition with chemogenetics is very hard, as well as with optogenetic pump-based approaches, because the neurons fire robust rebound APs. We have spent almost 2.5 years trying to get this to work in this circuit because reviews have been insistent on this result for the paper to be conclusive. Unfortunately, it simply isn’t possible in our view until we know more about the cell types involved. This is all in spite of experience using the approach in many other publications.

      We also employed less selective approaches, such as injecting AAV-DIO-tetanus toxin light chain (Tettox) constructs directly into SuM VGLUT2-Cre mice but found off target effects impacting animal wellbeing and impeding behavioral testing due viral spread to surrounding areas.

      While we are disappointed for being unable to directly address questions about necessity of SuMVGLUT2+::POA neurons in active coping with experimental data, we were unable to obtain results allowing for clear interpretation across numerous other domains the reviewers requested. We also feel strongly that until we have a clear picture of the molecular cell type architecture in the SuM, and Cre-drivers to target subsets of neurons, this question will be difficult to resolve for any group. We are working now on RNAseq and related spatial transcriptomics efforts in the SuM and examining additional behavioral paradigm to resolve these issues, so stay tuned for future publications.

      Accordingly, we avoid making statements relating to necessity in the manuscript. In spite of having several lines of physiological data with strong robust correlations behavior related to the SuMVGLUT2+::POA circuit.

      Nose poke is only nominally instrumental as it cannot be shown to have a unique relationship with the outcome that is independent of the stimuli-outcome relationships (in the same way that a lever press can, for example). Moreover, there is nothing here to show that the behaviours are goal-directed.

      Thank you for highlighting this point. Regarding goal-direct terminology, we removed this terminology from the manuscript. Since the mice perform highly selective (active vs inactive) port activation robustly across multiple days of training the behavior likely transitions to habitual behavior. We only tested the valuation of stimuli termination of the final day of training with time limited progressive ratio test. With respect to lever press versus active port activation, we are unclear how using a lever in this context would offer a different interpretation. Lever pressing may be more sensitive to changes in valuation when compared to nose poke port activation (Atalayer and Rowland 2008); however, in this study the focus of the operant behavior is separating innate behaviors for learned action–outcome instrumental learned behaviors for threat response (LeDoux and Daw 2018). The robust highly selective activation of the active port illustrated in Figure 6 fits as an action–outcome instrumental behavior wherein mice learn to engage the active but not inactive port to terminate photostimulation. The first activation of the port occurs through exploration of the arena but as demonstrated by the number of active port activations and the decline in time of the first active port engagement, mice expressing ChR2eYFP learn to engage the port to terminate the stimulation. To aid in illustrating this point we have added Supplemental Figure 7 showing active and inactive port activations for both Cre+ and Cre- mice. This adds clarity to high rate of selective port activation driven my stimulation of SUMVGLUT2+::POA neurons compared to controls. The elimination of goal directed and providing additional data narrows and supports one of the key points of the operant experiment.

      With regards to Figure 1: This is a nice figure, but I wonder if some quantification of the pathways and their density might be helpful, perhaps by measuring the intensity of fluorescence in image J (as these are processes, not cell bodies that can be counted)? Mind you, they all look pretty dense so perhaps this is not necessary! However, because the authors are looking at projections in so-called 'stress-engaged regions', the amygdala seems conspicuous by its absence. Did the authors look in the amygdala and find no projections? If so it seems that this would be worth noting.

      This is an interesting question but has proven to be a very technically challenging question. We consulted with several leaders who routinely use complimentary viral tracing methods in the field. We were unable to devise a method to provide a satisfactorily meaningful quantitative (as opposed to qualitative) approach to compare SUMVGLUT2+::POA to SuMVGLUT2+ projections. A few limitations are present that hinder a meaningful quantitative approach. One limitation was the need for different viral strategies to label the two populations. Labeling SuMVGLUT2+::POA neurons requires using VGLUT2-Flp mice with two injections into the POA and one into SuM. Two recombinase steps were required, reducing efficiency of overlap. This combination of viral injections, particularly the injections of RetroAAVs in the POA, can induce significant quantitative variability due to tropism, efficacy, and variability of retro-viral methods, and viral infection generally. These issues are often totally ignored in similar studies across the “neural circuit” landscape, but it doesn’t make them less relevant here.

      Although people do this in the field, and show quantification, we actually believe that it can be a quite misleading read-out of functionally relevant circuitry, given that neurotransmitter release ultimately is amplified by receptors post-synaptically, and many examples of robust behavioral effects have been observed with low fiber tracing complimentary methods (McCall, Siuda et al. 2017). In contrast, the broader SuMVGLUT2+ population was labeled using a single injection into the SuM. This means there like more efficient expression of the fluorophore. Additionally, in areas that contain terminals and passing fibers understanding and interpreting fluorescent signal is challenging. Together, these factors limit a meaningful quantitative comparison and make an interpretation difficult to make. In this context, we focused on a conservative qualitative presentation to demonstrate two central points. That 1) SuMVGLUT2+::POA neurons are subset of SuMVGLUT2+ neurons that project to specific areas and that exclude dentate gyrus, and they 2) arborize extensively to multiple areas which have be linked to threat responses. We agree that there is much to be learned about how different populations in SuM connect to targets in different regions of the brain and to continue to examine this question with different techniques. A meaningful quantitative study comparing projections is technically complex and, we feel, beyond our ability for this study.

      Also, for the reasons above we do not believe that quantification provides exceptional clarity with respect to the putative function of the circuit, glutamate released, or other cotransmitters given known amplification at the post-synaptic side of the circuit.

      With regard to the amygdala, other studies on SuM projections have found efferent projections to amygdala (Ottersen, 1980; Vertes, 1992). In our study we were unable to definitively determine projections from SuMVGLUT2+::POA neurons to amygdala, which if present are not particularly dense. For this reason we were conservative and do not comment on this particular structure.

      I would suggest removing the term goal-directed from the manuscript and just focusing on the active vs. passive distinction.

      We removed the use of goal-directed. Thank you for helping us clarify our terminology.

      The effect observed in Figure 7I is interesting, and I'm wondering if a rebound effect is the most likely explanation for this. Did the authors inhibit the VGAT neurons in this region at any other times and observe a similar rebound? If such a rebound was not observed it would suggest that it is something specific about this task that is producing the behaviour. I would like it if the authors could comment on this.

      We agree that results showing the change in coping strategy (passive to active) in forced swim after but not during stimulation of SuMVGAT+ neurons is quite interesting (Figure 7I). This experiment activated SuMVGAT+ neurons during a section of the forced swim assay and mice showed a robust shift to mobility after the stimulation of SuMVGAT+ neurons stopped. We did not carry out inhibition of SuMVGAT+ neurons in this manuscript. As the reviewer suggested, strong inhibition of local SuM neurons, including SUMVGLUT2+::POA neurons, could lead to rebound activity that may shift coping behaviors in confusing ways. We agree this is an interesting idea but do not have data to support the hypothesis further at this time.

      Reviewer 2

      (1) These are very difficult, small brain regions to hit, and it is commendable to take on the circuit under investigation here. However, there is no evidence throughout the manuscript that the authors are reliably hitting the targets and the spread is comparable across experiments, groups, etc., decreasing the significance of the current findings. There are no hit/virus spread maps presented for any data, and the representative images are cropped to avoid showing the brain regions lateral and dorsal to the target regions. In images where you can see the adjacent regions, there appears expression of cell bodies (such as Supp 6B), suggesting a lack of SuM specificity to the injections.

      We agree with the reviewer that the areas studied are small and technically challenging to hit. This was one of driving motivations for using multiple tools in tandem to restrict the area targeted for stimulation. Approaches included using a retrograde AAVs to express ChR2eFYP in SUMVGLUT2+::POA neurons; thereby, restricting expression to VGLUT2+ neurons that project to the POA. Targeting was further limited by placement of the optic fiber over cell bodies on SuM. Thus, only neurons that are VGLUT2+, project to the POA, and were close enough to the fiber were active by photostimulation. Regrettably, we were not able to compile images from mice where the fiber was misplaced leading to loss of behavioral effects. We would have liked to provide that here to address this comment. Unfortunately, generating heat maps for injections is not possible for anatomic studies that use unlabeled recombinase as part of an intersectional approach. Also determining the point of injection of a retroAAV can be difficult to accurately determine its location because neurons remote to injection site and their processes are labeled.

      Experiments described in Supplemental Figure 6B on VGAT neurons in SuM were designed and interpreted to support the point that SUMVGLUT2+::POA neurons are a distinct population that does not overlap with GABAergic neurons. For this point it is important that we targeted SuM, but highly confined targeting is not needed to support the central interpretation of the data. We do see labeling in SuM in VGAT-Cre mice but photo stimulation of SuMVGAT+ neurons does not generate the behavioral changes seen with activation of SUMVGLUT2+::POA neurons. As the reviewer points out, SuM is small target and viral injection is likely to spread beyond the anatomic boundaries to other VGAT+ neurons in the region, which are not the focus here. The activation would be restricted by the spread of light from the fiber over SuM (estimated to be about a 200um sphere in all directions). We did not further examine projections or localization of VGAT+ neurons in this study but focused on the differential behavioral effects of SUMVGLUT2+::POA neurons.

      (2) In addition, the whole brain tracing is very valuable, but there is very little quantification of the tracing. As the tracing is the first several figures and supp figure and the basis for the interpretation of the behavior results, it is important to understand things including how robust the POA projection is compared to the collateral regions, etc. Just a rep image for each of the first two figures is insufficient, especially given the above issue raised. The combination of validation of the restricted expression of viruses, rep images, and quantified tracing would add rigor that made the behavioral effects have more significance.

      For example, in Fig 2, how can one be sure that the nature of the difference between the nonspecific anterograde glutamate neuron tracing and the Sum-POA glutamate neuron tracing is real when there is no quantification or validation of the hits and expression, nor any quantification showing the effects replicate across mice? It could be due to many factors, such as the spread up the tract of the injection in the nonspecific experiment resulting in the labeling of additional regions, etc.

      Relatedly, in Supp 4, why isn’t C normalized to DAPI, which they show, or area? Similar for G what is the mcherry coverage/expression, and why isn’t Fos normalized to that?

      Thank you for highlighting the importance of anatomy and the value of anatomy. Two points based on the anatomic studies are central to our interpretation of the experimental data. First, SUMVGLUT2+::POA are a distinct population within the SuM. We show this by demonstrating they are not GABAergic and that they do not project to dentate gyrus. Projections from SuM to dentate gyrus have been described in multiple studies (Boulland et al., 2009; Haglund et al., 1987; Hashimotodani et al., 2018; Vertes, 1992) and we demonstrate them here for SuMVGLUT2+ cells. Using an intersectional approach in VGLUT2-Flp mice we show SUMVGLUT2+::POA neurons do not project to dentate gyrus. We show cell bodies of SUMVGLUT2+::POA neurons located in SuM across multiple figures including clear brain images. Thus, SUMVGLUT2+::POA neurons are SuM neurons that do not project to dentate gyrus, are not GABAergic, send projections to a distinct subset of targets, most notably excluding dentate gyrus. Second, SUMVGLUT2+::POA neurons arborize sending projections to multiple regions. We show this using a combinatorial genetic and viral approach to restrict expression of eYFP to only neurons that are in SuM (based on viral injection), project to the POA (based on retrograde AAV injection in POA), and VGLUT2+ (VGLUT2-Flp mice). Thus, any eYFP labeled projection comes from SUMVGLUT2+::POA neurons. We further confirmed projections using retroAAV injection into areas identified using anterograde approaches (Supplemental Figure 2). As discussed above in replies to Reviewer 1, we feel limitations are present that preclude meaningful quantitative analysis. We thus opted for a conservative interpretation as outlined.

      Prior studies have shown efferent projections from SuM to many areas, and projections to dentate gyrus have received substantial attention (Bouland et al., 2009; Haglund, Swanson, and Kohler, 1984; Hashimotodani et al., 2018; Soussi et al., 2010; Vertes, 1992; Pan and McNaugton, 2004). We saw many of the same projections from SuMVGLUT2+ neurons. We found no projections from SUMVGLUT2+::POA neurons to dentate gyrus (Figure 2). Our description of SuM projection to dentate gyrus is not new but finding a population of neurons in SuM that does not project to dentate gyrus but does project to other regions in hippocampus is new. This finding cannot be explained by spread of the virus in the tract or non-selective labeling.

      (3) The authors state that they use male and female mice, but they do not describe the n’s for each experiment or address sex as a biological variable in the design here. As there are baseline sex differences in locomotion, stress responses, etc., these could easily factor into behavioral effects observed here.

      Sex specific effects are possible; however, the studies presented here were not designed or powered to directly examine them. A point about experimental design that helps mitigate against strong sex dependent effect is that often the paradigm we used examined baseline (pre-stimulation) behavior, how behavior changed during stimulation, and how behavior returned (or not) to baseline after stimulation. Thus, we test changes in individual behaviors. Although we had limited statistical power, we conducted analyses to examine the effects of sex as variable in the experiments and found no differences among males and females.

      (4) In a similar vein as the above, the authors appear to use mice of different genotypes (however the exact genotypes and breeding strategy are not described) for their circuit manipulation studies without first validating that baseline behavioral expression, habituation, stress responses are not different. Therefore, it is unclear how to interpret the behavioral effects of circuit manipulation. For example in 7H, what would the VGLUT2-Cre mouse with control virus look like over time? Time is a confound for these behaviors, as mice often habituate to the task, and this varies from genotype to genotype. In Fig 8H, it looks like there may be some baseline differences between genotypes- what is normal food consumption like in these mice compared to each other? Do Cre+ mice just locomote and/or eat less? This issue exists across the figures and is related to issues of statistics, potential genotype differences, and other experimental design issues as described, as well as the question about the possibility of a general locomotor difference (vs only stress-induced). In addition, the authors use a control virus for the control groups in VGAT-Cre manipulation studies but do not explain the reasoning for the difference in approach.

      Thank you for highlighting the need for greater clarity about the breeding strategies used and for these related questions. We address the breeding strategy and then move to address the additional concerns raised. We have added details to the methods section to address this point. For VGLUT2-Cre mice we use litter mates controls from Cre/WT x WT/WT cross. The VGLUT2-Cre line (RRID:IMSR_JAX:028863) (Vong L , et al. 2011) used here been used in many other reports. We are not aware of any reports indicating a phenotype associated with the addition of the IRES-Cre to the Slc17a6 loci and there is no expected impact of expression of VGLUT2. Also, we see in many of the experiments here that the baseline (Figures 4, 5, and 7) behaviors are not different between the Cre+ and Cre- mice. For VGAT-Cre mice we used a different breeding strategy that allowed us to achieve greater control of the composition of litters and more efficient cohorts cohort. A Cre/Cre x WT/WT cross yielded all Cre/WT litters. The AAV injected, ChR2eYFP or eYFP, allowed us to balance the cohort.

      Regarding Figure 7H, which shows time immobile on the second day of a swim test, data from the Cre- mice demonstrate the natural course of progression during the second day of the test. The control mice in the VGAT-Cre cohort (Figure 7I) have similar trend. The change in behavior during the stimulation period in the Cre+ mice is caused by the activation of SUMVGLUT2+::POA neurons. The behavioral shift largely, but not completely, returns to baseline when the photostimulation stops. We have no reason to believe a VGLUT2-Cre+ mouse injected with control AAV to express eYFP would be different from WT littermate injected with AVV expressing ChR2eYFP in a Cre dependent manner.

      Turning to concerns related to 8H, which shows data from fasted mice quantify time spent interacting with food pellet immediately after presentation of a chow pellet, we found no significant difference between the control and Cre+ mice. We unaware of any evidence indicating that the two groups should have a different baseline since the Cre insertion is not expected to alter gene expression and we are unaware of reports of a phenotype relating to feeding and the presence of the transgene in this mouse line. Even if there were a small baseline shift this would not explain the large abrupt shift induced by the photostimulation. As noted above, we saw shifts in behavior abruptly induced by the initiation of photostimulation when compared to baseline in multiple experiments. This shift would not be explained by a hypothetical difference in the baseline behaviors of litter mates.

      (5) The statistics used throughout are inappropriate. The authors use serial Mann-Whitney U tests without a description of data distributions within and across groups. Further, they do not use any overall F tests even though most of the data are presented with more than two bars on the same graph. Stats should be employed according to how the data are presented together on a graph. For example, stats for pre-stim, stim, and post-stim behavior X between Cre+ and Cre- groups should employ something like a two-way repeated measures ANOVA, with post-hoc comparisons following up on those effects and interactions. There are many instances in which one group changes over time or there could be overall main effects of genotype. Not only is serially using Mann-Whitney tests within the same panel misleading and statistically inaccurate, but it cherry-picks the comparisons to be made to avoid more complex results. It is difficult to comprehend the effects of the manipulations presented without more careful consideration of the appropriate options for statistical analysis.

      We thank the reviewer for pointing this out and suggesting alterative analyses, we agree with the assessment on this topic. Therefore, we have extensively revised the statical approach to our data using the suggested approach. Reviewer 1 also made a similar comment, and we would like to point to our reply to reviewer 1’s second point in regard to what we changed and added to the new statistical analyses. Further, we have added a full table detailing the statical values for each figure to the paper.

      Conceptual:

      (6) What does the signal look like at the terminals in the POA? Any suggestion from the data that the projection to the POA is important?

      This is an interesting question that we will pursue in future investigations into the roles of the POA. We used the projection to the POA from SuM to identify a subpopulation in SuM and we were surprised to find the extensive arborization of these neurons to many areas associated with threat responses. We focused on the cell bodies as “hubs” with many “spokes”. Extensive studies are needed to understand the roles of individual projections and their targets. There is also the hypothetical technical challenge of manipulating one projection without activating retrograde propagation of action potentials to the soma. At the current time we have no specific insights into the roles of the isolated projection to POA. Interpretation of experiments activating only “spoke” of the hub would be challenging. Simple terminal stimulation experiments are challenged by the need to separate POA projections from activation of passing fibers targeting more anterior structures of the accumbens and septum.

      (7) Is this distinguishing active coping behavior without a locomotor phenotype? For example, Fig. 5I and other figure panels show a distance effect of stimulation (but see issues raised about the genotype of comparison groups). In addition, locomotor behavior is not included for many behaviors, so it is hard to completely buy the interpretation presented.

      We agree with the reviewer and thank them for highlighting this fundamental challenge in studies examining active coping behaviors in rodents, which requires movement. Additionally, actively responding to threatening stressors would include increased locomotor activity. Separation of movement alone from active coping can be challenging. Because of these concerns we undertook experiments using diverse behavioral paradigms to examine the elicited behaviors and the recruitment of SuMVGLUT2+::POA neurons to stressors. We conducted experiments to directly examine behaviors evoked by photoactivation of SuMVGLUT2+::POA. In these experiments we observed a diversity of behaviors including increased locomotion and jumping but also treading/digging (Figure 4). These are behaviors elicited in mice by threatening and noxious stimuli. An Increase of running or only jumping could signify a specific locomotor effect, but this is not what was observed. Based on these behaviors, we expected to find evidence of increase movement in open field (Figure 5G-I) and light dark choice (Figure 5J-L) assays. For many of the assays, reporting distance traveled is not practical. An important set of experiments that argues against a generic increase in locomotion is the operant behavior experiments, which require the animal to engage in a learned behavior while receiving photostimulation of SuMVGLUT2+::POA neurons (Figure 6). This is particularly true for testing using a progressive ratio when the time of ongoing photostimulation is longer, yet animals actively and selectively engage the active port (Figure 6G-H). Further, we saw a shift in behavioral strategy induce by photoactivation in forced swim test (Figure 7H). Thus, activation of SUMVGLUT2+::POA neurons elicited a range of behaviors that included swimming, jumping, treading, and learned response, not just increased movement. Together these data strongly argue that SuMVGLUT2+::POA neurons do not only promote increased locomotor behavior. We interpret these data together with the data from fiber photometry studies to show SuMVGLUT2+::POA neurons are recruited during acute stressors, contribute to aversive affective component of stress, and promote active behaviors without constraining the behavioral pattern.

      Regarding genotype, we address this in comments above as well but believe that clarifying the use of litter mates, the extensive use of the VGLUT2-Cre line by multiple groups, and experimental design allowing for comparison to baseline, stimulation evoked, and post stimulation behaviors within and across genotypes mitigate possible concerns relating to the genotype.

      (8) What is the role of GABA neurons in the SuM and how does this relate to their function and interaction with glutamate neurons? In Supp 8, GABA neuron activation also modulates locomotion and in Fig 7 there is an effect on immobility, so this seems pretty important for the overall interpretation and should probably be mentioned in the abstract.

      Thank you for noting these interesting findings. We added text to highlight these findings to the abstract. Possible roles of GABAergic neurons in SuM extend beyond the scope of the current study particularly since SuM neurons have been shown to release both GABA and glutamate (Li Y, Bao H, Luo Y, et al. 2020, Root DH, Zhang S, Barker DJ et al. 2018). GABAergic neurons regulate dentate gyrus (Ajibola MI, Wu JW, Abdulmajeed WI, Lien CC 2021), REM sleep (Billwiller F, Renouard L, Clement O, Fort P, Luppi PH 2017), and novelty processing Chen S, He L, Huang AJY, Boehringer R et al. 2020). The population of exclusively GABAergic vs dual neurotransmitter neurons in SuM requires further dissection to be understood. How they may relate to SUMVGLUT2+::POA neurons require further investigation.

      Questions about figure presentation:

      (9) In Fig 3, why are heat maps shown as a single animal for the first couple and a group average for the others?

      Thank you for highlighting this point for further clarification. We modified the labels in the figure to help make clear which figures are from one animal across multiple trials and those that are from multiple animals. In the ambush assay each animal one had one trial, to avoid habituation to the mock predator. Accordingly, we do not have multiple trials for each animal in this test. In contrast, the dunk assay (10 trial/animal) and the shock (5 trials/animal) had multiple trials for each animal. We present data from a representative animal when there are multiple trials per animal and the aggerate data.

      Why is the temporal resolution for J and K different even though the time scale shown is the same?

      Thank you for noticing this error carried forward from a prior draft of the figure so we could correct it. We replaced the image in 3J with a more correctly scaled heatmap.

      What is the evidence that these signal changes are not due to movement per se?

      Thank you for the question. There are two points of evidence. First, all the 465 nm excitation (Ca2+ dependent) data was collected in interleaved fashion with 415 nm (isosbestic) excitation data. The isosbestic signal is derived from GCaMP emission but is independent of Ca2+ binding (Martianova E, Aronson S, Proulx CD. 2019). This approach, time-division multiplexing, can correct calcium-dependent for changes in signal most often due to mechanical change. The second piece of evidence is experimental. Using multiple cohorts of mice, we examined if the change in Ca2+ signal was correlated with movement. We used the threshold of velocity of movement seen following the ambush. We found no correlation between high velocity movements and Ca2+ signal (Figure 3K) including cross correlational analysis (Supplemental figure 5). Based on these points together we conclude the change in the Ca2+ signal in SUMVGLUT2+::POA neurons is not due to movement induced mechanical changes and we find no correlation to movement unless a stressor is present, i.e. mock predator ambush or forced swim. Further, the stressors evoke very different locomotor responses fleeing, jumping, or swimming.

      (10) In Fig 4, the authors carefully code various behaviors in mice. While they pick a few and show them as bars, they do not show the distribution of behaviors in Cre- vs Cre+ mice before manipulation (to show they have similar behaviors) or how these behaviors shift categories in each group with stimulation. Which behaviors in each group are shifting to others across the stim and post-stim periods compared to pre-stim?

      This is an important point. We selected behaviors to highlight in Figure4 C-E because these behaviors are exhibited in response to stress (De Boer & Koolhaas, 2003; van Erp et al., 1994). For the highlighted behaviors, jumping, treading/digging, grooming, we show baseline (pre photostimulation), stimulation, and post stimulation for Cre+ and Cre- mice with the values for each animal plotted. We show all nine behaviors as a heat map in Figure 4B. The panels show changes that may occur as a function of time and show changes induced by photostimulation.

      The heatmaps demonstrate that photostimulation of SUMVGLUT2+::POA neurons causes a suppression of walking, grooming, and immobile behaviors with an increase in jumping, digging/treading, and rapid locomotion. After stimulation stops, there is an increase in grooming and time immobile. The control mice show a range of behaviors with no shifts noted with the onset or termination of photostimulation.

      Of note, issues of statistics, genotype, and SABV are important here. For example, the hint that treading/digging may have a slightly different pre-stim basal expression, it seems important to first evaluate strain and sex differences before interpreting these data.

      We examined the effects of sex as a biological variable in the experiments reported in the manuscript and found no differences among males and females in any of the experiments where we had enough animals in each sex (minimum of 5 mice) for meaningful comparisons. We did this by comparing means and SEM of males and females within each group (e.g. Cre+ males vs Cre+ female, Cre- males vs Cre- females) and then conducted a t-test to see if there was a difference. For figures that show time as a variable (e.g Figure 6C-E), we compared males and females with time x sex as main factors and compared them (including multiple comparisons if needed). We found no significant main effects or interactions between males and females. Because of this, and to maximize statistical power, we decided to move forward to keep males and females together in all the analyses presented in the manuscript. It is worth noting also that the core of the experimental design employed is a change in behavior caused by photostimulation. The mice are also the same strain with only difference being the modification to add an IRES and sequence for Cre behind the coding sequence of the Slc17A6 (VGLUT2) gene.

      (11) Why do the authors use 10 Hz stimulation primarily? is this a physiologically relevant stim frequency? They show that they get effects with 1 Hz, which can be quite different in terms of plasticity compared to 10 Hz.

      Thank you for the raising this important question. Because tests like open field and forced swim are subject to habituation and cannot be run multiple times per animal a test frequency was needed to use across multiple experiments for consistency. The frequency of 10Hz was selected because it falls within the rate of reported firing rates for SuM neurons (Farrel et al., 2021; Pedersen et al., 2017) and based on the robust but sub maximal effects seen in the real-time place preference assays. Identification of the native firing rates during stress response would be ideal but gathering this data for the identified population remains a dauting task.

      (12) In Fig 5A-F, it is unclear whether locomotion differences are playing a role. Entrances (which are low for both groups) are shown but distance traveled or velocity are not.

      In B, there is no color in the lower left panel. where are these mice spending their time? How is the entirety of the upper left panel brighter than the lower left? If the heat map is based on time distribution during the session, there should be more color in between blue and red in the lower left when you start to lose the red hot spots in the upper left, for example. That is, the mice have to be somewhere in apparatus. If the heat map is based on distance, it would seem the Cre- mice move less during the stim.

      We appreciate the opportunity to address this question, and the attention to detail the reviewer applied to our paper. In the real time place preference test (RTPP) stimulation would only be provided while the animal was on the stimulation side. Mice quickly leave the stimulation side of the arena, as seen in the supplemental video, particularly at the higher frequencies. Thus, the time stimulation is applied is quite low. The mice often retreat to a corner from entering the stimulation side during trials using higher frequency stimulation. Changing locomotor activity along could drive changes in the number entrances but we did not find this. In regard to the heat map, the color scale is dynamically set for each of the paired examples that are pulled from a single trial. To maximize the visibility between the paired examples the color scale does not transfer between the trials. As a result, in the example for 10 Hz the mouse spent a larger amount of time in the in the area corresponding to the lower right corner of the image and the maximum value of the color scale is assigned to that region. As seen in the supplemental video, mice often retreated to the corner of the non-stimulation side after entering the stimulation side. The control animal did not spend a concentrated amount of time in any one region, thus there is a lack of warmer colors. In contrast the baseline condition both Cre+ and Cre- mice spent time in areas disturbed on both sides of arena, as expected. As a result, the maximum value in the heat map is lower and more area are coded in warmer colors allowing for easier visual comparison between the pair. Using the scale for the 10 Hz pair across all leads to mostly dark images. We considered ways to optimized visualization across and within pairs and focused on the within pair comparison for visualization.

      (13) By starting with 1 hz, are the experimenters inducing LTD in the circuit? what would happen if you stop stimming after the first epoch? Would the behavioral effect continue? What does the heat map for the 1 hz stim look like?

      Relatedly, it is a lot of consistent stimulation over time and you likely would get glutamate depletion without a break in the stim for that long.

      Thank you for the opportunity to add clarity around this point regarding the trials in RTPP testing. Importantly, the trials were not carried out in order of increasing frequency of stimulation, as plotted. Rather, the order of trials was, to the extent possible with the number of mice, counterbalanced across the five conditions. Thus, possible contribution of effects of one trial on the next were minimized by altering the order of the trials.

      We have added a heat map for the 1 Hz condition to figure 5B.

      For experiments on RTPP the average stimulation time at 10Hz was less than 10 seconds per event. As a result, the data are unlikely to be affected by possible depletion of synaptic glutamate. For experiments using sustained stimulation (open field or light dark choice assays) we have no clear data to address if this might be a factor where 10Hz stimulation was applied for the entire trial.

      (14) In Fig 6, the authors show that the Cre- mice just don't do the task, so it is unclear what the utility of the rest of the figure is (such as the PR part). Relatedly, the pause is dependent on the activation, so isn't C just the same as D? In G and H, why ids a subset of Cre+ mice shown?

      Why not all mice, including Cre- mice?

      Thank you for the opportunity to improve the clarity of this section. A central aspect of the experiments in Figure 6 is the aversiveness of SUMVGLUT2+::POA neuron photostimulation, as shown in Figure 5B-F. The aversion to photostimulation drives task performance in the negative reinforcer paradigm. The mice perform a task (active port activation) to terminate the negative reinforcer (photostimulation of SuMVGLUT2+::POA neurons). Accordingly, control mice are not expected to perform the task because SuMVGLUT2+::POA neurons are not activated and, thus the mice are not motivated to perform the task.

      A central point we aim to covey in this figure is that while SuMVGLUT2+::POA neurons are being stimulated, mice perform the operant task. They selectively activated the active port (Supplemental Figure 7). As expected, control mice activate the active port at a low level in the process of exploring the arena. This diminishes on subsequent trials as mice habituate to the arena (Figure 6D). The data in Figures 6 C and D are related but can be divergent. Each pause in stimulation requires a port activation of a FR1 test but the number of port activations can exceed the pauses, which are 10 seconds long, if the animal continues to activate the port. Comparing data in Figures 6 C and D revels that mice generally activated the port two to three times for each pause earned with a trend towards greater efficiency on day 4 with more rewards and fewer activations.

      The purpose of the progressive ratio test is to examine if photostimulation of SuMVGLUT2+::POA continues to drive behavior as the effort required to terminate the negative stimuli increases. As seen in Figures 6 G and H, the stimulation of SuMVGLUT2+::POA neurons remains highly motivating. In the 20-minute trial we did not find a break point even as the number of port activations required to pause the stimulation exceed 50. We do not show the Cre- mice is Figure 6G and H because they did not perform the task, as seen in Figure 6F. For technical reasons in early trials, we have fully timely time stamped data for rewards and port activations from a subset of the Cre+ mice. Of note, this contains both the highest and lowest performing mice from the entire data set.

      Taken together, we interpret the results of the operant behavioral testing as demonstrating that SuMVGLUT2+::POA neuron activation is aversive, can drive performance of an operant tasks (as opposed to fixed escape behaviors), and is highly motivating.

      (15) In Fig 7, what does the GCaMP signal look like if aligned to the onset of immobility? It looks like since the hindpaw swimming is short and seems to precede immobility, and the increase in the signal is ramping up at the onset of hindpaw swimming, it may be that the calcium signal is aligned with the onset of immobility.

      What does it look like for swimming onset?

      In I, what is the temporal resolution for the decrease in immobility? Does it start prior to the termination of the stim, or does it require some elapsed time after the termination, etc?

      Thank for the opportunity to addresses these points and improve that clarity of our interpretation of the data. Regarding aligning the Ca2+ signal from fiber photometry recordings to swimming onset and offset, it is important to note that the swimming bouts are not the same length. As a result, in the time prior to alignment to offset of behaviors animals will have been swimming for different lengths of time. In Figure 7 C, we use the behavioral heat map to convey the behavioral average. Below we show the Ca2+ dependent signal aligned at the offset of hindpaw swim for an individual mouse (A) and for the total cohort (B). This alignment shows that the Ca2+ dependent signal declines corresponding to the termination of hindpaw swimming. Because these bouts last less than the total the widow shown, the data is largely included in Figure 7 C and D, which is aligned to onset. Due to the nuance of the difference is the alignment and the partial redundancy, we elected to include the requested alignment to swimming offset in the reply rather in primary figure.

      Author response image 1.

      Turning to the question regarding swimming onset, the animals started swimming immediately when placed in the water and maintained swimming and climbing behaviors until shifting behaviors as illustrated in Figure 7A and B. During this time the Ca2+-dependent signal was elevated but there is only one trial per animal. This question can perhaps be better addressed in the dunk assay presented in Figure 3C, F and G and Supplemental Figure 4 H and I. Here swimming started with each dunk and the Ca2+ signal increased.

      Regarding the question for about figure 7I. We scored for entire periods (2 mins) in aggerate. We noted in videos of the behavior test that there was an abrupt decrease in immobility tightly corresponding to the end of stimulation. In a few animals this shift occurred approximately 15-20s before the end of stimulation. This may relate to the depletion of neurotransmitter as suggested by the reviewer.

      Reviewer 3

      Major points

      (1) Results in Figure 1 suggested that SuM-Vglu2::POA projected not only POA but also to the diverse brain regions. We can think of two models which account for this. One is that homogeneous populations of neurons in SuM-Vglu2::POA have collaterals and innervated all the efferent targets shown in Figure 1. Another is to think of distinct subpopulations of neurons projecting subsets of efferent targets shown in Figure 1 as well as POA. It is suggested to address this by combining approaches taken in experiments for Figure 1 and Supplemental Figure 2.

      Thank you for raising this interesting point. We have attempted combining retroAAV injections into multiple areas that receive projections from SUMVGLUT2+::POA neurons. However, we have found the results unsatisfactory for separating the two models proposed. Using eYFP and tdTomato expressing we saw some overlapping expressing in SuM. We are not able to conclude if this indicates separate populations or partial labeling of a homogenous populations. A third option seems possible as well. There could be a mix of neurons projecting to different combinations of downstream targets. This seems particularly difficult to address using fluorophores. We are preparing to apply additional methodologies to this question, but it extends beyond the scope of this manuscript.

      (2) Since the authors drew a hypothetical model in which the diverse brain regions mediate the effect of SuM-Vglu2::POA activation in behavioral alterations at least in part, examination of the concurrent activation of those brain regions upon photoactivation of SuM-Vglu2::POA. This must help the readers to understand which neural circuits act upon the induction of active coping behavior under stress.

      Thank you for raising this important point. We agree that activating glutamatergic neurons should lead to activation of post synaptic neurons in the target regions. Delineating this in vivo is less straight forward. Doing so requires much greater knowledge of post synaptic partners of SUMVGLUT2+::POA neurons. There are a number of issues that would need to be accounted for. Undertaking two color photo stimulation plus fiber photometry is possible but not a technical triviality. Further, it is possible that we would measure Ca2+ signals in neurons that have no relevant input or that local circuits in a region may shape the signal. We would also lack temporal resolution to identify mono-postsynaptic vs polysynaptic connections. Thus, we would struggle to know if the change in signal was due to the excitatory input from SuM or from a second region. At present, we remain unclear on how to pursue this question experimentally in a manner that is likely to generate clearly interpretable results.

      (3) In Figure 4, "active coping behaviors" must be called "behaviors relevant to the active behaviors" or "active coping-like behaviors", since those behaviors were in the absence of stressors to cope with.

      Thank you for the suggestion on how to clarify our terminology. We have adopted the active coping-like term.

      (4) For the Dunk test, it is suggested to describe the results and methods more in detail, since the readers would be new to it. In particular, the mice could change their behavior between dunks under this test, although they still showed immobility across trials as in Supplemental Figure 4I. Since neural activity during the test was summarized across trials as in Figure 3, it is critical to examine whether the behavior changes according to time.

      Thank you for identifying this opportunity to improve our manuscript. We have expanded and added a detailed description of the dunk test in the methods section.

      As for Supplemental Figure 4I, we apologize for the confusion because the purpose of this figure is to show that mice remained mobile for the entire 30-second dunk trial. This did not appreciably change over the 10 trials. We have revised this figure to plot both immobile and mobile time to achieve greater clarity on this point.

      Minor points

      Typos

      In Figure 1, please add a serotype of AAVs to make it compatible with other figures and their legends.

      In the main text and Figure 2K, the authors used MHb/LHb and mHb/lHb in a mixed fashion. Please make them unified.

      In the figure legend of Figure 6, change "SuMVGLUT2+::POA neurons drive" to "SuMVGLUT2+::POA neurons " in the title.

      In line 86, please change "Retro-AAV2-Nuc-flox(mCherry)-eGFP" to "AAV5-Nuc-flox(mCherry)eGFP".

      In line 80, please change "Positive controls" to "As positive controls, ".

      Thank you for taking the time and making the effort to identify and call these out. We have corrected them.

    1. Author Response

      The following is the authors’ response to the previous reviews

      The revised manuscript is much improved - many unclear points are now better explained. However, in our opinion, some issues could still be significantly improved.

      1. Statistics: none of us are experts in statistics but several things remain questionable in our opinion and if it were our study, we would consult with an expert:

      a) while we understand the authors note about N-chasing and p-hacking, we wonder how the number of N's was premeditated before obtaining the results. Why in 4M an N of 3 is sufficient while in 3E the N is >20 (and not mentioned). At the very least, we think it would be wise to be cautious when stating something as not-significant when it is clear (as in 4M) that the likelihood of it actually being statistically significant is quite large.

      b) In most analyses, the data is not only normalized by actin or some other measure but also to the first (i.e left side on the graph) condition, resulting in identical data points that equal '1' (in Figure 4 alone - C; I; K; M; and O) - while this might be scientifically sound, it should be mentioned (the specific normalization) and also note that this technique shadows any real variance that exists in the original data in this condition. consider exploring techniques to overcome this issue.

      c) In 3C, - if we understand the experiment, you want to convince us that the DIFFERENCE between eB2-FC compared to FC is larger in the control compared to the experiment. We are not absolutely sure that the statistical tools employed here are sufficient - which is why we would consult an expert.

      A) We are aware that many studies do not consistently quantify such experiments. For example, there are essentially no published examples of the signalling timelines of EphB2 receptors as in Fig. 5. By striving to quantifying such biochemical effects, an unquantified experiment stands out, and so perhaps we were too strict by trying to quantify as many experiments as possible, resulting in low n’s for some of them. We acknowledge that additional experiments on EPHB1 protein stability may reach significance. We have adjusted our text on line 332-335 to point to this interesting trend, and slightly changed the conclusion to this section. Similarly, we commented on similar trends when describing Figs. 1E and 4G on lines 901 and 952.

      B) For the Western blot band intensity normalisation, we believe that our method is scientifically sound. Normally, when the replicate samples are loaded on one gel and blotted on the same membrane, the experimenter only needs to normalise the target band intensity to its cognate loading control band intensity for quantitation. However, we usually have a large number of samples from multiple experiments, carried out on different dates. For example, in Fig. 4B,C there are 7 biological replicates collected from 7 experiments and in Fig. 4D there are 10 protein samples. It is not possible for us to run all samples on the same gel. In addition, due to the combined effects of variance in transfer efficiency, the potency of antibodies, detection efficiency and the developing time for each blot, it is practically impossible to generate similar band intensity for each batch. Thus, we use normalisation of test bands to the loading control for individual experiments, and this analysis method is widely accepted by reputable journals with a focus on biochemical experiments (for example: PMID 37695914: Fig. 3 A,B,C; PMID 36282215: Fig. 3 B,C,D,E; PMID 33843588: Fig. 3 C,D,E,F,G,H). Since the value of the first sample on the plot is 1, which is a hypothetical value and does not meet the parametric test requirement, we performed one-sample t-test for statistics when other samples are compared with the first sample (PMID 35243233 Fig. 6 A,B,C,D; https://www.graphpad.com/quickcalcs/oneSampleT1/, “A one sample t-test compares the mean with a hypothetical value. In most cases, the hypothetical value comes from theory. For example, if you express your data as 'percent of control', you can test whether the average differs significantly from 100.”). Thus, we believe that our normalisation and statistical methods are both correct with a large number of precedents.

      C) This comment refers to the cell collapse experiment shown in Fig. 3C for which the data are plotted in Fig. 3D. We stand by the statistical method used. There are two groups of cells (CTRLCRISPR and MYCBP2 CRISPR) and two treatments for each cell group (Fc control and eB2), thus we should use two-way ANOVA. Since we compared the cell retraction effects of Fc and eB2 on the two groups of cells, Sidak post hoc comparison is the right method to avoid errors introduced by multiple comparisons. Here is an example of an eLife article that used the same statistical method for similar comparisons: PMID 37830910, Fig. 1 H,I. To make the comparison easier, we grouped the experiments by cell type (CTRLCRISPR and MYCBP2 CRISPR) as opposed to by treatment. Below, the old version is on the right, and the new version is on the left. The conclusion is that eB2 induces less cell collapse in cells depleted of MYCBP2, when compared to the control cells. However, eB2 is still able to collapse cells lacking MYCBP2.

      Author response image 1.

      Revisiting these data, we noticed an error introduced when CC compiled the data used to generate Fig. 3D. The data were acquired from nine biological replicates per condition. CC used a mix of two methods for cell collapse rate calculation: the first method involved the sum of collapsed cells and all cells from multiple regions of one coverslip (biological replicate). The second method involved computing a collapse rate in each region which then was used to calculate the average collapse rate for the entire coverslip (technical replicate). Given the small cell numbers due to sparse culture conditions, we believe that the first method is a more conservative approach. We hence re-plotted all replicate data using the first method. This resulted in slightly different % collapse and p values. These were changed accordingly in the text and plot and do not affect the conclusion of this experiment.

      2) thanks for the clarification that the interaction between the extracellular domain of EPHB2 and MYCBP2 might not occur directly - however, unless we missed this it was not clearly stated in the text. It is an important point and also a cool direction for the future - to find the elusive co-receptor that actually helps EPHB2 and MYCBP2 form a complex.

      We now also refer to this in the results section on line 215.

      “Since EPHB2 is a transmembrane protein and MYCBP2 is localised in the cytosol, these experiments suggest that the interaction between the extracellular domain of EPHB2 and MYCBP2 might be indirect and mediated by other unknown transmembrane proteins.”

      3) The Hela CRISPR cell line is better explained in the response letter but still not sufficiently explained in the text for a non-expert reader. If the authors want any reader to comprehend this, we would strongly recommend adding a scheme.

      We now include a schematic outlining the CRISPR cell generation as Fig. 3A and its description on line 926.

      Author response image 2.

      4) To clarify some of our previous (and persisting) concerns about Figure 3D/E - it is true that a reduction in 25% of cell size is dramatic. But (if we understand correctly) your claim is that a reduction in 22% (this is a guess, as the actual numbers are not supplies) is significantly less than 25%. Even if it is, statistically speaking, significant, what is the physiological relevance of this very slight effect? In this experiment, the N was quite large, and we wonder if the images in D are representative - it would be nice to label the data points in E to highlight which images you used.

      We now mention the average cell area contraction measurements in the legend to Fig. 3F on line 935. We also tracked down the individual cells shown in Fig. 3E and they are now labelled as data points in blue in Fig. 3F. HeLa cell collapse is a simplified model of EPHB2 function and we do not know whether the difference between the behaviour of CTRLCRISPR and MYCBP2 CRISPR cells is physiologically significant and thus we prefer not to speculate on this.

      5) Figure 3F and other stripe assays - In the end, it is your choice how to quantify. We believe that quantifying area of overlap is a more informative and objective measurement that might actually benefit your analyses. That said, if you do keep the quantification as it is now, you have to define the threshold of what you mean by "cell/s (or an axon in 7A, where it is even more complicated as are you eluding to primary, secondary, or even smaller branches) are RESIDING within the stripe". Is 1% overlap sufficient or do you need 10 or 50% overlap?

      We now added this statement to the methods on line 745: “A cell was considered to be on an ephrin-B2 stripe when more than 50% of its nucleus was located on that stripe”. For chick explant stripe assay, when measuring the length of an axon on a stripe, we only measured the main axons originated from the explants.

      For explant/stripe experiments in Fig. 7 AB, we now use the term “GFP-expressing neurite” rather than “branch”. This was already present in the results of the previous version, but the methods and legend needed to be brought up to date (lines 786 and 1008. We think that “branch” was a confusing term that was supposed to mean the same thing as “neurite” but came across as some indication of branching. We do not know whether the GFP+ neurites were primary or secondary extensions of explants, or in fact, whether some of them contained more than one axon. We also adjusted the method to reflect the fact that some stripes were used in conjunction with a single explant and added a reference to a previous study extensively using this method (Poliak et al., 2015) on line 778.

      6) We still don't get the link to the lysosomal degradation. Your data suggests that in your cells EPHB2 is primarily degraded by the lysosomal pathway and not proteasome. Any statement about MYCBP2 is not strongly supported by the data, in our opinion - Unless you develop some statistical measurement that shows that the effect of BafA1 is statistically different in MYCBP2 cells than in control cells. Currently, this is not the case and the link is therefore not warranted in our opinion.

      We generated a new version of Fig. 4K with average increase in EPHB2 levels in the presence of BafA1 and CoQ, compared to DMSO treated controls (see below). BafA1 and CoQ restored EPHB2 protein levels by 19% and 14% respectively in CtrlCRISPR cells, while the inhibitors restored EPHB2 protein levels by 40% and 35% respectively in MYCBP2 CRISPR cells.

      Author response image 3.

      For each of the 4 replicates, the increase in EPHB2 levels by BafA1 compared to DMSO is as follows:

      Author response table 1.

      These values are not significantly different between CtrlCRISPR cells versus MYCBP2 CRISPR cells (p= 0.08, student’s t test). Similarly for the CoQ experiment. We now temper our conclusion for this experiment: Although the difference in percentage increase between CTRLCRISPR cells and MYCBP2CRISPR cells is not significant, this trend raises the possibility that the loss of MYCBP2 promotes EPHB2 receptor degradation through the lysosomal pathway (line 319). We also adjusted the section title (line 306).

      7) While the C. elegans part is now MUCH better explained - we are not sure we understand the additional insight. The fact that vab-1 and glo4 double mutants are additive as are vab1 and fsn1, suggest they act in parallel (if the mutants are NULL, and not if they are hypomorphs, if one wants to be accurate) - how this relates to your story is unclear. The vab1/rpm1 double mutant is still uninformative and incomplete. rpm1 phenotype is so severe that nothing would make it more severe. We read the Jin paper that the authors directed to - nothing makes the rpm1 phenotype more severe. Yes, some DOWNSTREAM elements make the rpm1 phenotype LESS severe - this is not something you were testing, to the best of our knowledge. Rather, you wanted to see if rpm1 mutant resulted in stabilization of vab1 and thus suppression of vab1 phenotype - we are just not sure the system is amenable to test (actually reject) your hypothesis that Vab1 is degraded by rpm1. Also, assuming we are talking about NULLs, the fact that the rpm1 phenotype is WAY stronger than the vab1 mutant, suggests that rpm1 functions via multiple routes, adding even more complexity to the system. Given these results, despite the much improved clarity, we are still not sure that the worm data adds new insight, rather than potentially confusing the reader.

      We realise that the genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network are complicated. However, we insist on keeping the data for the sake of its availability for future studies and completeness. We also think it is important for readers and the community to see these data, even if the authors and reviewers are not entirely in agreement about the importance/interpretation of experimental outcomes. It is our hope that the community will examine the results and draw their own conclusions.

      A few points of clarification:

      The C. elegans experiments were designed to test genetically if the vertebrate interactions between EPHB2 and MYCBP2 and its signalling network are conserved. We studied two kinds of interactions: (1) between vab-1 and RPM-1/MYCBP2 downstream proteins (GLO-4 and FSN-1) and (2) between vab-1 and rpm-1. For these studies, we used null alleles for vab-1, glo-4 and fsn-1 which is now noted on lines 440, 453, 475 and 859. Our findings are consistent with the VAB-1 Ephrin receptor functioning in parallel to known RPM-1 binding proteins. This is further supported by new data: vab-1; fsn-1 double mutants showed enhanced incidence of axon overextension defects using a second transgenic background, zdIs5 (Pmec-4::GFP), to visualize axon termination (Fig. 8F).

      This second transgenic background also allowed us to generate new data to address your concerns about phenotypic saturation in rpm-1 mutants. To do this, we used the zdIs5 (Pmec4::GFP) genetic background, in which axon termination defects are not saturated in rpm-1 mutants (Fig. 8F) because they can be enhanced by other mutants such as cdc-42 and unc-33 (Fig. 7C, D, in Borgen et al. Development 144, 4658–4672 (2017), PMID 29084805). In this new background, we found that vab-1 loss of function fails to enhance the incidence of severe “hook” defects in rpm-1 mutants which is an indication that the two genes function in the same pathway. Importantly, prior studies in this background, also showed that mutants in the RPM-1 signalling network (e.g. fsn-1, glo-4 and ppm-2) do not enhance the incidence of severe “hook” defects as double mutants with rpm-1 compared to rpm-1 single mutants (Fig. 7B, ibid.).

      To reflect these ideas more clearly, we revised the Results section pertaining to C. elegans genetics (starting on line 418) and tempered our discussion (lines 517). Basically, this section now says that we studied genetic interactions between vab-1 and the RPM-1/MYCBP2 signalling network. From these experiments we conclude that: (1) The enhancement of overextension defects in vab-1; glo-4 and vab-1; fsn-1 double mutants compared to single mutants indicates that VAB-1/EPHR functions in parallel to known RPM-1 binding proteins to facilitate axon termination, and (2) Since the vab-1; rpm-1 double mutants do not display an increased frequency or severity of overextension defects compared to rpm-1 single mutants, VAB-1 /EPHR functions in the same genetic pathway as RPM-1/MYCBP2.

      The new genetic data included in this version were generated by Karla J. Opperman who is now included as a co-author.

      Further corrections:

      Author response image 4.

      Because of the errors associated with quantifications in Fig. 3D (see above), we reviewed other quantification methodologies and noticed another discrepancy that required a correction. In the hippocampal neuron growth cone collapse assay shown in the previous version of Fig. 7 D (left), the growth cones were classified into three groups: 1, fully collapsed; 2, hard to tell, but not fully collapsed; 3, fan-shape cones. Two different quantifications were performed as follows: (1), number of fully collapsed cones divided by the numbers of all growth cones; (2), number of fully collapsed cones divided by [number of fully collapsed cones + fan-shape cones]. CC erroneously used the second method to generate Fig. 7D.

      We think that the first method is more appropriate. Furthermore, since n=5 for the Fc and eB1-Fc conditions, but n=3 for the eB2-Fc condition, we decided to omit it. The final plot for figure 7D is the following:

      Author response image 5.

      Our conclusion still stands that exogenous FBD1 WT overexpression impaired the growth cone collapse mediated by EphB.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      Liver cancer shows a high incidence in males than females with incompletely understood causes. This study utilized a mouse model that lacks the bile acid feedback mechanisms (FXR/SHP DKO mice) to study how dysregulation of bile acid homeostasis and a high circulating bile acid may underlie the gender-dependent prevalence and prognosis of HCC. By transcriptomics analysis comparing male and female mice, unique sets of gene signatures were identified and correlated with HCC outcomes in human patients. The study showed that ovariectomy procedure increased HCC incidence in female FXR/SHP DKO mice that were otherwise resistant to agedependent HCC development, and that removing bile acids by blocking intestine bile acid absorption reduced HCC progression in FXR/SHP DKO mice. Based on these findings, the authors suggest that gender-dependent bile acid metabolism may play a role in the male-dominant HCC incidence, and that reducing bile acid level and signaling may be beneficial in HCC treatment. 

      strengths:

      (1) Chronic liver diseases often proceed the development of liver and bile duct cancer. Advanced chronic liver diseases are often associated with dysregulation of bile acid homeostasis and cholestasis. This study takes advantage of a unique FXR/SHP DKO model that develop high organ bile acid exposure and spontaneous age-dependent HCC development in males but not females to identify unique HCC-associated gene signatures. The study showed that the unique gene signature in female DKO mice that had lower HCC incidence also correlated with lower grade HCC and better survival in human HCC patients. 2. The study also suggests that differentially regulated bile acid signaling or gender-dependent response to altered bile acids may contribute to gender-dependent susceptibility to HCC development and/or progression. 3. The sex-dependent differences in bile acidmediated pathology clearly exist but are still not fully understood at the mechanistic level. Female mice have been shown to be more sensitive to bile acid toxicity in a few cholestasis models, while this study showed a male dominance of bile acid promotion of HCC. This study used ovariectomy to demonstrate that female hormones are possible underlying factors. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer. 

      We thank Reviewer 1 for their positive and thorough assessment of our manuscript

      Weaknesses:

      (1) HCC shows heterogeneity, and it is unclear what tissues (tumor or normal) were used from the DKO mice and human HCC gene expression dataset to obtain the gene signature, and how the authors reconcile these gene signatures with HCC prognosis.

      Mice studies: Aged DKO mice develop aggressive tumors (major and minor nodules, See Figure 1), and the entire liver is burdened with multiple tumor nodules. It is technically challenging to demarcate the tumor boundaries as most of the surrounding tissues do not display normal tissue architecture. Therefore, livers from age- and sexmatched wild-type C57/BL6 mice were used as control tissue. All the mice were inbred in our facility. Spatial transcriptomics and longitudinal studies are ongoing to collect tumors at earlier time points wherein we can differentiate tumor and non-tumor tissue. 

      Human Studies: We mined five separate clinical data sets. The human HCC gene expression comprised of samples from the (i) National Cancer Institute (NCI) cohort (GEO accession numbers, GSE1898 and GSE4024) and (ii) Korea, (iii) Samsung, (iv) Modena, and (v) Fudan cohorts as previously described (GEO accession numbers, GSE14520, GSE16757, GSE43619, GSE36376, and GSE54236). We have added a new supplemental table 4, giving details of these datasets. Depending on the cohort, they are primarily HCC samples- surgical resections of HCC, control samples, with some tumors and paired non-tumor tissues.

      (2) The authors identified a unique set of gene expression signatures that are linked to HCC patient outcomes, but analysis of these gene sets to understand the causes of cancer promotion is still lacking. The studies of urea cycle metabolism and estrogen signaling were preliminary and inconclusive. These mechanistic aspects may be followed up in revision or future studies.

      We agree. Experiments to elicit HCC causality and promotion are complex, given the heterogeneous nature of liver cancer. Moreover, the length of time (12 months) needed to spontaneously develop cancer in this DKO mouse model makes it challenging. As mentioned by the reviewer, mechanistic studies are ongoing, and longitudinal time course experiments are actively being pursued to delineate causality. Having said that, we mined the TCGA LIHC (The Cancer Genome Atlas Liver Hepatocellular Carcinoma) database to examine the expression of the individual urea cycle genes and found them suppressed in liver tumorigenesis (new Supplementary Figure 4). We also evaluated if estrogen receptor  (Er) targets altered in DKO females (DKO_Estrogen) correlate with overall survival in HCC (new Supplementary Figure 6). We note that Er expression per se is reduced in males and females upon liver tumorigenesis. Also, DKO_Estrogen signature positively corroborated with better overall survival (new Supplementary Figure 6). These findings further bolster the relevance of urea cycle metabolism and estrogen signaling during HCC. 

      (3) While high levels of bile acids are convincingly shown to promote HCC progression, their role in HCC initiation is not established. The DKO model may be limited to conditions of extremely high levels of organ bile acid exposure. The DKO mice do not model the human population of HCC patients with various etiology and shared liver pathology (i.e. cirrhosis). Therefore, high circulating bile acids may not fully explain the male prevalence of HCC incidence.

      We agree with this comment that our studies do not show bile acids can initiate HCC and may act as one of the many factors that contribute to the high male prevalence of HCC. This is exactly the reason why throughout the manuscript we do not write about HCC initiation. To clarify further, in the revised discussion of the manuscript, we have added a sentence to highlight this aspect, “while this study demonstrates bile acids promote HCC progression it does not investigate or provide evidence if excess bile acids are sufficient for HCC initiation.”

      (4) The authors showed lower circulating bile acids and increased fecal bile acid excretion in female mice and hypothesized that this may be a mechanism underlying the lower bile acid exposure that contributed to lower HCC incidence in female DKO mice. Additional analysis of organ bile acids within the enterohepatic circulation may be performed because a more accurate interpretation of the circulating bile acids and fecal bile acids can be made in reference to organ bile acids and total bile acid pool changes in these mice.

      As shown in this manuscript- we provide BA compositional analyses from the liver, serum, urine, and feces (Figures 5 and 6, new Supplementary Figure 8, Supplementary Tables 4 and 5). Unfortunately, we did not collect the intestinal tissue or gallbladders for BA analysis in this study. Separate cohorts of mice are being aged for future BA analyses from different organs within the enterohepatic loop. We thank you for this suggestion. Nevertheless, we have previously measured and reported BA values to be elevated in the intestines and the gall bladder of young DKO mice (PMC3007143).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The translational value to human HCC is not so strong yet. Authors show that there is a correlation between the female-selective gene signature and low-grade tumors and better survival in HCC patients overall. However, these data do not show whether this signature is more highly correlated with female tumor burden and survival. In other words, whether the mechanisms of female protection may be similar between humans and mice. In that respect, it would also be good to elaborate on whether women have higher fecal BA excretion and lower serum BA concentration.

      The reviewer poses an interesting question to test if the DKO female-specific signatures are altered differently in male vs. female HCC samples. As we found the urea cycle and estrogen signaling to be protective and enriched in our mouse model, we tested their expression pattern using the TCGA-LIHC RNA-seq data. We found urea cycle genes and Er transcripts broadly reduced in tumor samples irrespective of the sex (new Supplementary Figure 4 and Supplementary Figure 6), indicating that these pathways are compromised upon tumorigenesis even in the female livers. 

      While prior studies have shown (i) a smaller BA pool w synthesis in men than women (PMID: 22003820), we did not find a study that systematically investigated BA excretion between the sexes in HCC context. The reviewer is spot on in suggesting BA analysis from HCC and unaffected human fecal samples from both sexes. Designing and performing such studies in the future will provide concrete proof of whether BA excretion protects female livers from developing liver cancer. We thank you for these suggestions.

      (2) The authors should perform a thorough spelling and grammar check.

      We apologize for the typos, which have been fixed, and as suggested by the reviewer, we have performed a grammar check.

      (3) There are quite some errors and inaccuracies in the result section, figures, and legends. The authors should correct this.

      We apologize for the inadvertent errors in the manuscript, and we have clarified these inaccuracies in the revised version. Thank you.

      Reviewer#1 (Recommendations for the authors).

      (1) Figures 1A-F, This statement of altered liver steatosis needs to be further supported by measurement of liver triglycerides. Lower magnification images of Sirius red stain should be shown for better evaluation of liver fibrosis.

      Unfortunately, we did not measure liver triglycerides and sirius red stained samples have faded, and lower magnification is unavailable at this juncture. We have modified our results accordingly.  

      We did not take the gross picture of WT female and DKO female livers in the same frame as shown below. Since the manuscript is focused on male and female differences in liver cancer incidence, we provided DKO male and female liver images as Figure 1D in the paper.

      Author response image 1.

      Gross liver images of a year-old WT and DKO mice which show prominent hepatocarcinogenesis in DKO male mice

      (2) Can the authors clarify if the gene transcriptomics was performed with normal or tumor tissues of DKO mice?

      Gene transcriptomics were performed with the tumor tissue of DKO mice. We have previously published data from younger non tumor bearing DKO male mice (PMCID: PMC3007143). 

      (3) Supplementary Figure 3C. Could the authors confirm if this is F vs M or just DKO female since it does not seem to match the result description in the main text? It is better practice to indicate the sub-panels of the Supplementary Figures in the main text while describing the results.

      As the reviewer correctly points out Supplementary Figure 3C is DKO F vs M signature not DKO_female signature and this has been clarified in the text. We have also included DKO_F data now to reduce the confusion.

      (4) Figure 3. Legend, the data presented are not well explained in the Legend, especially the labeling and what is being presented and compared.

      As suggested by the reviewer, we have modified the legend accordingly.

      (5) Supplementary Table 4 does not contain total serum bile acid as described in the main text.

      We agree with the reviewer. We provided primary and secondary BA concentrations, Supplementary Table 4 (currently Supplementary Table 5 in the revised version): Rows 20 and 21. but not their added total. We have modified the text accordingly.

      (6) Method section: many experiments lack descriptions of details.

      We have added details to the animal experimental design, ER ChIP-PCR, schematics of experiments are included within the main and supplemental figures, metabolomics and BA analysis have been expanded. 

      Reviewer #2 (Recommendations For The Authors):

      General:

      (1) The authors are advised to do a thorough grammar and spelling check.

      We have performed spelling and grammar check as suggested using an online platform Grammarly. Thank You.

      Results:

      (1) Figure 1 o The authors should show in Figure 1D female WT and female DKO liver.

      See Figure 1 added in our responses to point 1 of reviewer 1’s comment.

      In the Figure legend, (A-E) should be replaced by (A+D). 

      Thank you. We have modified it accordingly.

      The authors do not refer to 1J in the text, please add this reference.

      Thank you for pointing it. We have referenced 1J in the text.

      The description of 1H does not elaborate on the sex differences in ALT/AST levels, as this is the focus of the manuscript.

      We have added a sentence to show that the injury markers are higher in DKO males, which is consistent with an advanced disease. Thanks.

      The authors should use the correct nomenclature in Figure 1I/1J (gene vs protein and capitals vs non-capitals).

      The Figure 1I and 1J show gene expression of Fxr and Shp and hence we used the non-capital italicized nomenclature. Thanks.

      (2) Figure 2:

      The x-axis length is different in Figures 2A and 2B. Please correct to visualize the differences between males and females better.

      The x axis length has been fixed as suggested. Thanks

      (3) Figure 3:

      The authors should elaborate on how the patients were assigned to each gene signature. This is not fully clear.

      The gene set obtained from the WT and DKO mice were used. The process used is shown as a schematic in Supplemental Fig 2C and the gene list is included  in an excel sheet as Supplemental table 1. 

      We are curious how these data (F3A-C) would look when separating male and female human patients.

      We performed an overall survival analysis with a subgroup of patients and provide it. We segregated the HCC cohort data on sex and age (>55 yr, since we assumed 55 as an age for menopause) and evaluated the DKO gene signature. Similar to the original figure 3, we find that irrespective of sex, and age, DKO FvsM gene signature corresponds with better overall survival in men and in women. These findings align with the combined analysis in overall survival shown in original Figure 3 of the manuscript, and therefore we did not modify it. If deemed necessary, we are happy to include the figure below to reviewers in the main manuscript.

      Author response image 2.

      Correlation of gene signatures obtained from WT and DKO mouse model with the survival data of HCC patients segregated by age and sex. The Kaplan Meier Survival graphs were generated based on WT and DKO transcriptome changes using five HCC clinical cohorts. Analysis of OS (Overall Survival) in patients ((A) Men and (B) Women) using the gene signatures representative of either male WT or male DKO, female WT or female DKO, and unique changes observed in female DKO mice but not in male DKO mice.

      What was used as the control signature in Figure 3C? Please specify this.

      For Figure 3C we compared the DKO_M signature to that of DKOF vs M signature. These genes are listed as an Excel Sheet (Supplementary Table 1).

      The authors claim that DKO female mice display chronic cholestasis, similar to their male counterparts. Please refer to previous work or show the data.

      Serum BA levels are elevated in DKO females are reported in supplementary table 5 and we find comparable hepatic BA composition in Figure 5 F.

      (4) Figure 4: Labels for the x-axis are missing in Figure 4C. Please add legends or labels to the bars.

      The x axis label is included in the top Serum BAs in (M)

      In Figure 4I, the percentage of input is quite low. An IgG control would show whether recruitment of ERalpha to the shown loci is significant above background levels. Also, ChIP on the OVX liver could serve as a negative control.

      We did use IgG as control pull down and the signals above this background were considered. We have not performed this in OVX, which would be an excellent negative control for future studies. Thank You.

      The results and legends refer to ChIP-qPCR, while methods only mention ChIP-seq.Please adapt.

      We sincerely apologize for the mistake. We used published ChIP-seq to identify putative binding site and then performed ChIP PCR to validate it. We have clarified and rectified this error. Thank You.

      Significance indications in the figure legend do not correspond with significance indications in the figure. Please explain the used significance symbols in the figure in the legend.

      Thank You. The legends and their significance have been matched.

      (5) Figure 5:

      Authors claim lowered total serum BA in females compared to males, and reference to Supplementary Table 4. However, these data are not provided, only percentages and ratios are displayed.

      In the revised version, this has become Table 5. See response to the same concern noted by Reviewer 1, Point 5 above.

      Figure 5D: Are sulphated BA also elevated in WT females? Please provide these data.

      There is no significant urinary excretion of BAs in WT control animals. We have previously measured and found none. But under cholestatic conditions BAs are observed in urine. Therefore, sulphated BA levels were found only in the DKO mice. 

      Figure 5H: Is the fecal BA excretion in WT females also proportionally higher than in males? Please provide these data.

      We were unable to perform the untargeted metabolomics profiling of WT fecal samples. When we measured for BAs in the feces, as expected very low conc were present irrespective of the sex (~0.01 M) and we did not find any sex difference.  Also, prior studies in 129SVJ strain exhibited comparable fecal excretion (PMC150802). We did not find any clinical studies that measured fecal BA between the sexes.

      (6) Figure 6:

      References in the text of the result section to Figure 6 are wrong. The authors should change this.

      Thank You. This has been rectified.

      Significance indications in the legend do not correspond with significance indications in the figure. Please explain the used significance symbols in the figure in the legend.

      Thank You. The legends and their significance have been matched.

      (7) Supplemental Figure 3:

      Please adapt the title of this figure; the sentence is incorrect. The description of this figure is very poor.

      We have modified the legend and the title of the Supplemental Figure 3 to make it more appropriate. Thanks

      Please explain what the blue and red dots represent.

      Each dot in blue and yellow indicate the Bayesian probability generated from our BCCP model.

      What are the bold horizontal lines representing? Why are there no dots in some box plots? Please elaborate.

      The box represents the interquartile range (IQR), encompassing the middle 50% of the data. The bottom and top edges correspond to the 25th and 75th percentiles, respectively, while the bold horizontal line indicates the median value.

      The absence of visible dots in certain categories—particularly in higher CLIP and TNM stages—is due to the small number of patients, all of whom had similar Bayesian prediction probabilities. As these values cluster tightly around the median, the individual dots may be overlapped and hidden behind the median line.

      The figure is not visually easy to understand, please reconsider the representation.  

      We hope the modified figure legends with the explanation of the lines and the points in the graphs increases the clarity and makes them acceptable.

      Please add the DKO_female signature plot.

      We have added these graph to Supplemental figure 3

      (8) Supplemental 4A:

      Fold change at Z-score is missing. This should be added.

      Thank you we have added this information

      (9) Supplemental 5:

      The scale bar is missing. This should be included.

      The figure is now supplemental figure 8 and the scale bar has been added.

      Methods:

      (1) Did the authors use ChIP-sequencing or ChIP-qPCR? Please describe the correct method.

      We apologize for the error. We have used ChIP-PCR and rectified it in our methods and in our response to a figure 4 query.

      (2) It is unclear how the mouse model was generated. Please refer to earlier publications.

      The mice were generated in house at UIUC, and we have added this sentence to the Methods section. The original reference has been cited in the text (PMCID: PMC3007143).

      Discussion:

      (1) The authors claim in the discussion: 'consistently higher recruitment of ER to the classical BA synthetic genes ...' This is not shown in Figure 4I, only ER recruitment to Cyp7a1 is significantly higher in females. Please rephrase.

      We agree and we have modified the sentence Cyp7A1 accounts for ~75% of BA synthesis and is a rate-limiting gene in the classical BA synthesis pathway. 

      (2) The authors could make their statements stronger if they could elaborate on whether women have more fecal BA excretion, and if there are differences in serum BA concentration in HCC between male and female patients. 

      Unfortunately, we were unable to find clinical studies with appropriate controls which examined and reported serum BA in HCC in a sex specific manner.

      In addition, to understand whether the female-specific protections in humans are similar to mice, it would be nice to show correlations of the female-specific mouse signature with male and female liver signatures.

      At this time, we do not have large n numbers of control or precancerous early-stage patient datasets from both sexes to make such comparisons. Nevertheless, there is translational relevance of these sex-specific signature. Figure 2 included in the reviewer response shows that DKO male signature correlates with poor overall survival in males, whereas neither DKO male nor DKO female signature predict outcome in females. In contrast, DKO female-specific gene signature (DKOFvsM) correlates with better overall survival in both men and in women. 

      (3) The authors state in the discussion: 'Currently we do not know how to reconcile this data other than indicating a potential ER independent mechanism.' We do not understand the reasoning behind this statement. Please clarify.

      We find that increased Erα expression in DKO coincides with CA-mediated suppression of BA synthesis genes in the absence of Fxr and Shp. But we also noticed that in OVX DKO mice, Erα expression is blunted, and so is basal BA synthesis gene expression. Putting together these data, it is intriguing that Erα expression correlates both positively and negatively with BA synthesis genes. To reconcile these contrasting results, we have written the following sentence in the discussion.

      “These findings suggest Erα expression is linked to both positive and negative regulation of BA synthesis genes. But we do not know how ER elicits these differential effects on BA synthesis.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to first thank the Editor as well as the three reviewers for their enthusiasm and conducting another careful evaluation of our manuscript. We appreciate their thoughtful and constructive comments and suggestions. Some concerns regarding experimental design, data analysis, and over-interpretation of our findings still remains unresolved after the initial revision. Here we endeavored to address these remaining concerns through further refinement of our writing, and inclusion of these concerns in the discussion session. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review):

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      I acknowledge the authors' efforts to address the comments received. However, my concerns persist:

      Thanks very much again for the re-evaluation and comments. Please find our revision plans to each comment below.

      (1) The authors contend that shorter reaction times correlated with increased distances between individuals in social space imply that participants construct and utilize two-dimensional representations. This method is adapted from a previous study by Park et al. Yet, there is a fundamental distinction between the two studies. In the prior work, participants learned relationships between adjacent individuals, receiving feedback on their decisions, akin to learning spatial locations during navigation. This setup leads to two different predictions: If participants rely on memory to infer relationships, recalling more pairs would be necessary for distant individuals than for closer ones. Conversely, if participants can directly gauge distances using a cognitive map, they would estimate distances between far individuals as quickly as for closer ones. Consequently, as the authors suggest, reaction times ought to decrease with increasing decision value, which, in this context, corresponds to distances. However, the current study allowed participants to compare all possible pairs without restricting learning experiences, rendering the application of the same methodology for testing two-dimensional representations inappropriate. In this study, the results could be interpreted as participants not forming and utilizing two-dimensional representations.

      We apologize for not being clear enough about our task design, we have made relevant changes in the methodology section in the manuscript to make it clearer. The reviewer’s concern is that participants learned about all the pairs in the comparison task which makes the distance effect invalid. We would like to clarify that during all the memory test tasks (the comparison task, the collect task and the recall task outside and inside scanner), participants never received feedback on whether their responses were correct or not. Therefore, the comparison task in our study is similar to the previous study by Park et al. (2021). Participants do not have access to correct responses for all possible pairs of comparison prior to or during this task, they would need to make inference based on memory retrieval.

      (2) The confounding of visual features with the value of social decision-making complicates the interpretation of this study's results. It remains unclear whether the observed grid-like effects are due to visual features or are genuinely indicative of value-based decision-making, as argued by the authors. Contrary to the authors' argument, this issue was not present in the previous study (Constantinescu et al.). In that study, participants associated specific stimuli with the identities of hidden items, but these stimuli were not linked to decision-making values (i.e., no image was considered superior to another). The current study's paradigm is more akin to that of Bao et al., which the authors mention in the context of RSA analysis. Indeed, Bao et al. controlled the length of the bars specifically to address the problem highlighted here. Regrettably, in the current paradigm, this conflation remains inseparable.

      We’d like to thank the reviewer for facilitating the discussion on the question of ‘social space’ vs. ‘sensory space’. The task in scanner did not require value-based decision making. It is akin to both the Bao et al. (2019) study and Constantinescu et al. (2016) study in a sense that all three tasks are trying to ask participants to imagine moving along a trajectory in an abstract, non-physical space and the trajectory is grounded in sensory cue. Participants were trained to associate the sensory cue with abstract (social/nonsocial) concepts. We think that the paradigm is a relatively faithful replication of the study by Constantinescu et al. Nonetheless, we agreed that a design similar to Bao et al. (2019) which controls for sensory confounds would be more ideal to address this concern, or adopting a value-based decision-making task in the scanner similar to that by Park et al. (2021), and we have included this limitation in the discussion section.

      (3) While the authors have responded to comments in the public review, my concerns noted in the Recommendation section remain unaddressed. As indicated in my recommendations, there are aspects of the authors' methodology and results that I find difficult to comprehend. Resolving these issues is imperative to facilitate an appropriate review in subsequent stages.

      Considering that the issues raised in the previous comments remain unresolved, I have retained my earlier comments below for review.

      We apologize for not addressing the recommendations properly, please find detailed our response and plans for revision.

      I have some comments. I hope that these can help.

      (1) While the explanation of Fig.4A-C is lacking in both the main text and figure legend, I am not sure if I understand this finding correctly. Did the authors find the effects of hexagonal modulation in the medial temporal gyrus and lingual gyrus correlate with the individual differences in the extent to which their reaction times were associated with the distances between faces when choosing a better collaborator? If so, I am not sure what argument the authors try to draw from these findings. Do the authors argue that these brain areas show hexagonal modulation, which was not supported in the previous analysis (Fig.3)? What is the level of correlation between these behavioral measures and the grid consistency effects in the vmPFC and EC, where the authors found actual grid-like activity? How do the authors interpret this finding? More importantly, how does this finding associate with other findings and the argument of the study?

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis reported in Figure 4 aims to use whole-brain analysis to examine: 1) if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and 2) if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait.

      To be more specific, for the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. We interpreted stronger distance effect as a behavioral index of having better internal map-like representation. We interpreted stronger grid consistency effect as a neural index of better representation of the 2D social space. Therefore, we’d like to see if there exists correlation between behavioral and neural indices of map-like representation.

      To achieve this goal, behavioral indicators are entered as covariates in second-level analysis of the GLM testing grid consistency effect (GLM2). Figure3 showed results from GLM2 without the covariates. Figure4 showed results of clusters whose neural indices of map-like representation covaried with that from behavior and survived multiple-comparison correction. Indeed, in these regions, the grid consistency effect was not significant at group level (so not shown in Figure 3). We tried to interpret this finding in our discussion (line 374-289 for temporal lobe correlation, line 395-404 for precuneus correlation).

      Finally, we would like to point out that including the covariates in GLM2 did not change results in Figure3, the clusters in Figure3 still survives correction. Meanwhile, these clusters in Figure 3 did not show correlation with behavioral indicators of map-like representation.

      Author response image 1.

      (2) There are no behavioral results provided. How accurately did participants perform each of the tasks? How are the effects of grid consistency associated with the level of accuracy in the map test?

      Why did participants perform the recall task again outside the scanner?

      We will endeavor to improve signposting the corresponding figures in the main text. For the behavioral results, we reported the stats in section “Participants construct social value map after associative learning of avatars and corresponding characteristics” in the main text, and the plots are shown in Figure 1. Particularly, figure 1F showed accuracy of tasks in training, as well as the recall task in the scanner. For the correlation, we did not find significant correlation between behavioural accuracy and grid consistency effect. We will make it clearer in the result section.

      (3) The methods did not explain how the grid orientation was estimated and what the regressors were in GLM2. I don't think equations 2 and 3 are quite right.

      For the grid orientation estimation method, we provided detailed description in the Supplementary methods 2.2.2. We will add links to this section in the main text.

      Equation 2 and 3 describes how the parametric regressors entered into GLM2 were formed and provided prerequisites on calculation of grid orientations. Equation 2 was the results of directly applying the angle addition and subtraction theorems so they should be correct. We will try to make the rationale clearer in the supplementary text.

      (4) With the increase in navigation distances, more grid cells would activate. Therefore, in theory, the activity in the entorhinal cortex should increase with the Euclidean distances, which has not been found here. I wonder if there was enough variability in the Euclidean distances that can be captured by neural correlates. This would require including the distributions of Euclidean distances according to their trajectory angles. Regarding how Fig.1E is generated, I don't understand what this heat map indicates. Additionally, it needs to be confirmed if the grid effects remain while controlling for the Euclidean distances of navigation trajectories.

      We did not specifically control for the trajectory length, we only controlled for the distribution of trajectory to be uniform. We have included a figure of the distribution of Euclidean distances in Figure S9 and the distribution of trajectory direction in Figure S8.

      Author response image 2.

      As for Figure 1E, we aim to reproduce the findings from Figure 1F in Constantinescu et al. (2016) where they showed that participants progressively refined the locations of the outcomes through training. We divided the space into 15×15 subregions and computed the amount of time spent in each subregion and plotted Figure 1E. Brighter color in Figure 1E indicate greater amount of time spent in the corresponding subregion. Note that all these timing indices were computed as a percentage of the total time spent in the explore task in a given session. If participants were well-acquainted with the space and avatars, they would spend more time at the avatar (brighter color in avatar locations) in the review session compared to the learning session.

      As for the effect of distances on grid-like representation, we did not include the distance as a parametric modulator in grid consistency effect GLM (GLM2) due to insufficient trials in each bin (6-8 trials). But there is side evidence that could potentially rule out this confound. In the distance representation analysis, we did not find distance representation in any of the clusters that have significant grid-like representation (regions in Figure 2).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid. From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Weaknesses:

      In the revised manuscript, the authors soften their claims about finding a grid code in the entorhinal cortex and provide additional caveats about limitations in their findings. It seems that the authors and reviewers are in agreement about the following weaknesses, which were part of my original review: Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      In the authors' response to reviews, they provide additional clarification about their exploratory analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. My guess is that readers would find it useful if some of this language were included in the main text, especially with regard to an explanation regarding the rationale for these exploratory studies.

      Thank you very much again for your careful re-evaluation and suggestions. We have tried to improve our writing and incorporate the suggestions in the new revision.

      Reviewer #3 (Public Review):

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes, and is relatively well powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably Park et al., 2021, Nature Neuroscience.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that, when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid like, i.e., show six-fold symmetry. In real world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raise the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much again for your careful re-evaluation and comments. We have tried to incorporate some of the suggested papers into our discussion. In summary, we agree that there is more to six-fold symmetric code that can be utilized to represent “conceptual space”. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents useful findings on several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2 that further our understanding of deep sea microbial life. The manuscript's primary claim is that phage isolates augment polysaccharide use in Pseudomonas bacteria via auxiliary metabolic genes (AMGs). However, the strength of the evidence is incomplete and does not support the primary claims. Namely, there are not data presented to rule out phage contamination in the polysaccharide stock solution, AMGs are potentially misidentified, and there is missing evidence of successful infection.

      Thanks for the Editor’s and Reviewers’ positive and constructive comments, which help us improve the quality of our manuscript entitled “Deep-sea bacteriophages facilitate host utilization of polysaccharides” (paper#eLife-RP-RA-2023-92345). The comments are valuable, and we have studied the comments carefully and have made corresponding revisions according to the suggestions. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Revised portions are marked in blue in the modified manuscript. Please find the detailed responses as following.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This manuscript describes the identification and isolation of several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2. The authors observe induction of several putative chronic phages with the introduction of additional polysaccharides to the media. The authors suggest that two of the recovered phage genomes encode AMGs associated with polysaccharide use. The authors also suggest that adding the purified phage to cultures of Pseudomonas stutzeri 273 increased the growth of this bacterium due to augmented polysaccharide use genes from the phage. While the findings were of interest and relevance to the field, it is my opinion that several of the analysis fall short of supporting the key assertions presented.

      Thanks for your comments. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Please find the detailed responses as following.

      Strengths: Interesting isolate of deep sea Lentisphaerae strains which will undoubtedly further our understanding of deep sea microbial life.

      Thanks for your positive comments.  

      Weaknesses:

      (1) Many of the findings are consistent with a phage contamination in the polysaccharide stock solution. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) the polysaccharide stock solution was strictly sterilized to remove any phage contamination; (2) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used were not contaminated with any phage-like structures; in addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2); (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Author response image 1). The above results clearly indicated the phages were derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      Author response image 1.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. 

       

      (2) The genes presented as AMGs are largely well known and studied phage genes which play a role in infection cycles.

      Thanks for your comments. Indeed, these AMGs may be only common in virulent phages, while have never been reported in chronic phages. In virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection. However, the chronic phages do not lyse the host. Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it. The detailed information is shown below.

      (3) The evidence that the isolated phage can infect Pseudomonas stutzeri 273 is lacking, putting into question the dependent results.

      Thanks for your comments. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. It is reported that filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells. The possible mechanism of other chronic phages release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further reveal the infection mechanism due to some techniques absence. Therefore, according to your suggestions, we have deleted this section in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I have previously reviewed this manuscript as a submission to another journal in 2022. My recommendations here mirror those of my prior suggestions, now with further added details.

      Thanks for your great efforts for reviewing our manuscript and valuable suggestions for last and this versions.

      Specific comments:

      Comment 1: Line 32. Rephrase to "polysaccharides cause the induction of multiple temperate phages infecting two strains of Lentisphaerae (WC36 and zth2) from the deep sea."

      Thanks for your positive suggestion. We have modified this description as “Here, we found for the first time that polysaccharides induced the production of multiple temperate phages infecting two deep-sea Lentisphaerae strains (WC36 and zth2).” in the revised manuscript (Lines 31-33). 

      Comment 2: Line 66. "Chronic" infections are not "lysogenic" as described here, suggesting the former is a subcategory of the latter. If you are going to introduce lifecycles you need a brief sentence distinguishing "chronic" from "lysogenic"

      Thanks for your positive suggestion. We added this sentence as “Currently, more and more attention has been paid to chronic life cycles where bacterial growth continues despite phage reproduction (Hoffmann Berling and Maze, 1964), which was different from the lysogenic life cycle that could possibly lyse the host under some specific conditions.” in the revised manuscript (Lines 66-69).

      Comment 3: Line 72. Please avoid generalized statements like "a hand-full" (or "plenty" line 85). Try to be at least somewhat quantitative regarding how many chronic phages are known. This is a fairly common strategy among archaeal viruses. 

      Thanks for your suggestion. Given that some filamentous phages also have a chronic life cycle that is not explicitly reported, we cannot accurately estimate their numbers. According to your suggestions, we have modified these descriptions as “however, to our best knowledge, only few phages have been described for prokaryotes in the pure isolates up to date (Roux et al., 2019; Alarcón-Schumacher et al., 2022; Liu et al., 2022).” in the revised manuscript (Lines 73-75). In addition, the number of chronic phages in the biosphere cannot be accurately estimated, according to the latest report (Chevallereau et al., 2022), which showed that “a large fraction of phages in the biosphere are produced through chronic life cycles”. Therefore, we have modified this description as “Therefore, a large percentage of phages in nature are proposed to replicate through chronic life cycles” in the revised manuscript (Lines 87-88). 

      Comment 4: Line 93. While Breitbart 2012 is a good paper to cite here, there have been several, much more advanced analysis of the oceans virome. https://doi.org/10.1016/j.cell.2019.03.040 is one example, but there are several others. A deeper literature review is required in this section.  

      Thanks for your valuable suggestions. We have added some literatures and modified this description as “A majority of these viruses are bacteriophages, which exist widely in oceans and affect the life activities of microbes (Breitbart, 2012; Roux et al., 2016; Gregory et al., 2019; Dominguez-Huerta et al., 2022).” in the revised manuscript (Lines 94-97). 

      References related to this response:

      Roux, S., Brum, J.R., Dutilh, B.E., Sunagawa, S., Duhaime, M.B., Loy, A., Poulos, B.T., Solonenko, N., Lara, E., Poulain, J., et al. (2016) Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537:689-693. 

      Gregory, A.C., Zayed, A.A., Conceição-Neto, N., Temperton, B., Bolduc, B., Alberti, A., Ardyna, M., Arkhipova, K., Carmichael, M., Cruaud, C., et al. (2019) Marine DNA Viral Macro- and Microdiversity from Pole to Pole. Cell 177:1109-1123.e1114. 

      Dominguez-Huerta, G., Zayed, A.A., Wainaina, J.M., Guo, J., Tian, F., Pratama, A.A., Bolduc, B., Mohssen, M., Zablocki, O., Pelletier, E., et al. (2022) Diversity and ecological footprint of Global Ocean RNA viruses. Science 376:1202-1208.

      Comment 5: Line 137. I see the phage upregulation in Figure 1, however in the text and figure it would be good to also elaborate on what the background expression generally looks like. Perhaps a transcriptomic read normalization and recruitment to the genome with a display of the coverage map, highlighting the prophage would be helpful. Are the polysacharides directly influencing phage induction or is there some potential for another cascading effect?  

      Thanks for your comments. We have elaborated all expressions of phage-associated genes under different conditions in the Supplementary Table 1, which showed that the background expressions were very low. The numbers in Fig. 1C were the gene expressions (by taking log2 values) of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin compared with the rich medium alone.

      In addition, our RT-qPCR results (Fig. 1D) also confirmed that these genes encoding phage-associated proteins were significantly upregulated when 10 g/L laminarin was added in the rich medium. According to your suggestions, we have modified this description as “In addition to the up-regulation of genes related to glycan transport and degradation, when 10 g/L laminarin was added in the rich medium, the most upregulated genes were phage-associated (e. g. phage integrase, phage portal protein) (Fig. 1C and Supplementary Table 1), which were expressed at the background level in the rich medium alone.” in the revised manuscript (Lines 136-140). Based on the present results, we speculate that polysaccharides might directly induce phage production, which needs to be verified by a large number of experiments in the future.

      Comment 6: Line 179. We need some assurance that phage was not introduced by your laminarin or starch supplement. Perhaps a check on the TEM/sequencing check of supplement itself would be helpful? This may be what is meant on Line 188 "without culturing bacterial cells" however this is not clearly worded if that is the case. Additional note, further reading reinforces this as a key concern. Many of the subsequent results are consistent with a contaminated starch stock. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used are not contaminated with any phage-like structures. In addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2). According to your suggestions, we have modified this description as “We also tested and confirmed that there were not any phage-like structures in rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), ruling out the possibility of phage contamination from the polysaccharides (laminarin/ starch).” in the revised manuscript (Lines 158-162) and “Meanwhile, we also checked the polysaccharides (laminarin/ starch) in rich medium directly by TEM and did not find any phage-like structures (Supplementary Fig. 2).” in the revised manuscript (Lines 178-180). (2) the polysaccharide stock solution was strictly sterilized to remove any phage contamination. (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Response Figure 1). The above results clearly indicated the phage was derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      In addition, given that polysaccharide was a kind of critical energy source for most microorganisms, we sought to ask whether polysaccharide also induces the production of bacteriophages in other deep-sea bacteria. To this end, we cultured deep-sea representatives from other four other phyla (including Chloroflexi, Tenericutes, Proteobacteria, and Actinobacteria) in the medium supplemented with laminarin/starch, and checked the supernatant of cells suspension through TEM as described above. We could not find any phage-like structures in these cells suspension (Author reaponse image 2), which also confirmed that there was no phage contamination in the polysaccharides.

      Author response image 2.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36.   

      Author response image 3.

      TEM observation of the supernatant of cells suspension of a Chloroflexi strain, a Tenericutes strain, a Proteobacteria strain and an Actinobacteria strain that cultivated in the rich medium supplemented with 10 g/L laminarin and 10 g/L starch. No phage-like particles could be observed.  

      Comment 7: Line 223. Correct generalized wording "long time". 

      Thanks for your comments. We have changed “after for a long time” to “after 30 days” in the revised manuscript (Line 197).

      Comment 8: Line 229. Please more explicitly describe what these numbers are (counts of virion like structures - filamentous and hexagonal respectively?), the units (per µL?), and how these were derived. The word "around" should be replaced with mean and standard deviation values for each count from replicates, without which these are not meaningful.

      Thanks for your comments. The average numbers per microliter (µL) of filamentous and hexagonal phages in each condition were respectively calculated by randomly choosing ten TEM images. According to your suggestions, we have modified this description as “Specifically, the average number per microliter of filamentous phages (9.7, 29 or 65.3) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4.3, 13.7 or 35.3) (Fig. 3B). The average number per microliter of hexagonal phages (9, 30, 46.7) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4, 11.3 or 17.7) (Fig. 3C).” in the revised manuscript (Lines 203-210).

      Comment 9: Line 242. This section should be included in the discussion of Figure 2 - around line 194.

      Thanks. According to your suggestion, we have moved this section to the discussion corresponding to Figure 2 (Lines 183-191).

      Comment 10: Figure 3. Stay consistent in the types of figures generated per strain. Figure 3A should be a growth curve.

      Thanks for your comments. Actually, figure 3A was a growth curve, the corresponding description “(A) Growth curve of strain WC36 cultivated in either rich medium alone or rich medium supplemented with 5 g/L or 10 g/L laminarin for 30 days.” was shown in the Figure 3A legend in this manuscript.

      Comment 11: Line 312. Move the discussion of AMGs to after the discussion of the phage genome identification.

      Thanks for your valuable comments. According to your suggestions, we have moved the discussion of AMGs to after the discussion of the phage genome identification.

      Comment 12: Line 312. It would be informative to sequence in-bulk each of your treatments as opposed to just sequencing the viral isolates (starch and no host included) to see what viruses can be identified in each. ABySS is also not a common assembler for viral analysis. Is there literature to support it as a sufficient tool in assembling viral genomes? What sequencing depths were obtained in your samples?

      Thanks for your comments. In previous studies, we did sequence the starch or laminarin alone (no host included) and did not detect any phage-related sequences. The introduction of ABySS software was shown in these literatures (Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. 2017 May;27(5):768-777; Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009 Jun;19(6):1117-23.), which were also used to assemble viral genomes in these literatures (Guo Y, Jiang T. First Report of Sugarcane Mosaic Virus Infecting Goose Grass in Shandong Province, China. Plant Dis. 2024 Mar 21. doi: 10.1094/PDIS-11-23-2514-PDN; Tang M, Chen Z, Grover CE, Wang Y, Li S, Liu G, Ma Z, Wendel JF, Hua J. Rapid evolutionary divergence of Gossypium barbadense and G. hirsutum mitochondrial genomes. BMC Genomics. 2015 Oct 12;16:770.). The sequencing depth of the phages of strain WC36 and zth2 were 350x and 365x, respectively.

      Comment 13: Line 323. Replace "eventually" with more detail about what was done to derive the genomes. Were these the only four sequences identified as viral?

      Thanks for your comments. We have used the ABySS software (http://www.bcgsc.ca/platform/bioinfo/software/abyss) to perform genome assembly with multiple-Kmer parameters. VIBRANT v1.2.1 (Kieft et al., 2020), DRAM-v (Shaffer et al., 2020), VirSorter v1.0.5 (with categories 1 (“pretty sure”) and 2 (“quite sure”)) (Roux et al., 2015) and VirFinder v1.1 (with statistically significant viral prediction: score > 0.9 and P-value < 0.05) (Ren et al., 2017) with default parameters were used to identify viral genomes from these assembly sequences by searching against the both cultured and non-cultured viral NCBI-RefSeq database (http://blast.ncbi.nlm.nih.gov/) and IMG/VR database (Camargo et al., 2023). The GapCloser software (https://sourceforge.net/projects/soapdenovo2/files/GapCloser/) was subsequently applied to fill up the remaining local inner gaps and correct the single base polymorphism for the final assembly results. All the detailed processes were described in the supplementary information. The virus sequences with higher scores are only these four, but they are not complete genomes. Some virus sequences with shorter sequences and lower scores were excluded.

      Comment 14: Line 328. We need some details about the host genomes here. How were these derived? What is their completeness/contamination? What is their size? If the bins are poor, these would not serve as a reliable comparison to identify integrated phage.

      Thanks for your comments. For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.

      The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. 

      Moreover, to verify whether the absence of microbial contamination in phage sequencing results, we used the new alignment algorithm BWA-MEM (version 0.7.15) to perform reads mapping of host WGS to these phages. We found that all the raw reads of host strains (WC36 and zth2) were not mapping to these phages sequences (Author response image 3, shown as below). In addition, we also performed the evaluation of the assembly graph underlying the host consensus assemblies. Clean reads were mapped to the bacterial complete genome sequences by the Bowtie 2 (version 2.5.0), BWA (version 0.7.8) and SAMTOOLS (version 0.1.18). The results showed that the total mismatch rate of strains WC36 and zth2 were almost 0% and 0.03%, respectively (Author response table 1, shown as below). In addition, we also collected the cells of strains WC36 and zth2, and then sent them to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. The raw reads of strains WC36G and zth2 were also not mapping to the phages sequences. Therefore, we can confirm that these bacteriophage genomes were completely outside of the host chromosomes. 

      Author response image 4.

      The read mapping from WGS to phage sequences.

      Author response table 1.

      Sequencing depth and coverage statistics.

      References related to this response:

      Zheng, R., Liu, R., Shan, Y., Cai, R., Liu, G., and Sun, C. (2021b) Characterization of the first cultured free-living representative of Candidatus Izemoplasma uncovers its unique biology ISME J 15:2676-2691. 

      Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Nat Genet 25:25-29. 

      Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome Nucleic Acids Res 32:D277-280. 

      Galperin, M.Y., Makarova, K.S., Wolf, Y.I., and Koonin, E.V. (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database Nucleic Acids Res 43:D261-269. 

      Bairoch, A., and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28:45-48.

      Comment 15: Line 333. This also needs some details. What evidence do you have that these are not chromosomal? If not chromosomal where can they be found? Sequencing efforts should also be able to yield extrachromosomal elements such as plasmids etc... If you were to sequence your purified isolate cultures from the rich media alone and include all assemblies (not just those binned for example) as a reference, would you be able to recruit viral reads? The way this reads suggests that Chevallereau et al., worked specifically with these phage, which is not the case - please rephrase.

      Thanks for your comments. We carefully compared the bacteriophage genomes with those of the corresponding hosts (strains WC36 and zth2) using Galaxy Version 2.6.0 (https://galaxy.pasteur.fr/) (Afgan et al., 2018) with the NCBI BLASTN method and used BWA-mem software for read mapping from host whole genome sequencing (WGS) to these bacteriophages. These analyses both showed that the bacteriophage genomes are completely outside of the host chromosomes. Therefore, we hypothesized that the phage genomes might exist in the host in the form similar to that of plasmid.

      Comment 16: Line 335. More to the point here that we need confirmation that these phages were not introduced in the polysaccharide treatment

      Thanks for your comments. Please find our answers for this concern in the responses for comment 1 of “weakness” part and comment 6 of “Recommendations For The Authors” part.

      Comment 17: Line 342. Lacking significant detail here. Phylogeny based on what gene(s), how were the alignments computed/refined, what model used etc..?

      Thanks for your comments. According to your suggestions, all the related information was shown in this section “Materials and methods” of this manuscript. The maximum likelihood phylogenetic tree of Phage-WC36-2 and Phage-zth2-2 was constructed based on the terminase large subunit protein (terL). These proteins used to construct the phylogenetic trees were all obtained from the NCBI databases. All the sequences were aligned by MAFFT version 7 (Katoh et al., 2019) and manually corrected. The phylogenetic trees were constructed using the W-IQ-TREE web server (http://iqtree.cibiv.univie.ac.at) with the “GTR+F+I+G4” model (Trifinopoulos et al., 2016). Finally, we used the online tool Interactive Tree of Life (iTOL v5) (Letunic and Bork, 2021) to edit the tree. 

      Comment 18: Line 346. How are you specifically defining AMGs in this study? Most of these are well-known and studied phage genes with specific life cycle functions and could not be considered as polysaccharide processing AMGs even though in host cells many do play a role in polysaccharide processing systems. A substantially deeper literature review is needed in this section, which would ultimately eliminate most of these from the potential AMG pools. Further, the simple HMM/BLASTp evalues are not sufficient to support the functional annotation of these genes. At a minimum, catalytic/conserved regions should be identified, secondary structures compared, and phylogenetic analysis (where possible) developed etc... My recommendation is to eliminate this section entirely from the manuscript. 

      Categorically:

      - Glycoside hydrolase (various families), glucosaminidases, and transglycosylase are all very common to phage and operate generally as a lysins, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection https://doi.org/10.3389/fmicb.2016.00745 (and citations therein) https://doi.org/10.1016/j.cmi.2023.10.018 etc... In order to confirm these as distinct AMGs we would need a very detailed analysis indicating that these are not phage infection cycle/host recognition related, however I strongly suspect that under such interrogation, these would prove to be as such.

      -TonB related systems including ExbB are well studied among phages as part of the trans-location step in infection. These could not be considered as AMGs. https://doi.org/10.1128/JB.00428-19. Other TonB dependent receptors play a role in host recognition.

      -Several phage acetyltransferases play a role in suppressing host RNA polymerase in order to reserve host cell resources for virion production, including polysaccharide production. https://doi.org/10.3390/v12090976. Further it has been shown that the E. coli gene neuO (O-acetyltransferase) is a homologue of lambdoid phage tail fiber genes https://doi.org/10.1073/pnas.0407428102. I suspect the latter is also the case here and this is a tail fiber gene.

      Thanks for your valuable comments. According to your suggestions, we have reanalyzed these AMGs and made some modifications (the new version Fig. 5A, shown as below). These genes encoding proteins associated with polysaccharide transport and degradation may be only common in virulent phages, and have never been reported in chronic phages. Unlike virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection, chronic phages do not lyse the host. It is reported that, filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells (Riechmann et al., 1997; Sun et al., 1987). The possible mechanism of other chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. It has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic manner (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. 

      Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it.

      References related to this response:

      Riechmann L, Holliger P. (1997) The C-terminal domain of TolA is the coreceptor for filamentous phage infection of E. coli Cell 90:351-60.

      Sun TP, Webster RE. (1987) Nucleotide sequence of a gene cluster involved in entry of E colicins and single-stranded DNA of infecting filamentous bacteriophages into Escherichia coli J Bacteriol 169:2667-74. 

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. (2022) Chronic release of tailless phage particles from Lactococcus lactis Appl Environ Microbiol 88: e0148321. da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio_._ 2022;13:e0237522.

      Comment 19: Line 354. To make this statement that these genes are missing from the host, we would need to know that these genomes are complete.

      Thanks for your comments. The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. In addition, we also collected the cells of strains WC36 and zth2, and then sent it to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. Therefore, these genomes of strains WC36 and zth2 were complete and circular.    

      Comment 20: Figure 5. Please see https://peerj.com/articles/11447/ and https://doi.org/10.1093/nar/gkaa621 for a detailed discussion on vetting AMGs. Several of these should be eliminated according to the standards set in the field. More specifically, and by anecdotal comparison with other inoviridae genomes, for Phage-WC36-1 and Phage-zth2-1, I am not convinced that the transactional regulator and glycoside hydrolase are a part of the phage genome. The phage genome probably ends at the strand switch.

      Thanks for your comments. According to your suggestions, we have analyzed these two articles carefully and modified the genome of Phage-WC36-1 and Phage-zth2-1 by anecdotal comparison with other inoviridae genomes. As you said, the transactional regulator and glycoside hydrolase are not a part of the phage genome.

      The new version Fig. 5A was shown.

      References related to this response:

      Shaffer, M., Borton, M.A., McGivern, B.B., Zayed, A.A., La Rosa, S.L., Solden, L.M., Liu, P., Narrowe, A.B., Rodrgíuez-Ramos, J., Bolduc, B., et al. (2020) DRAM for distilling microbial metabolism to automate the curation of microbiome function Nucleic Acids Res 48:8883-8900 

      Pratama, A.A., Bolduc, B., Zayed, A.A., Zhong, Z.P., Guo, J., Vik, D.R., Gazitúa, M.C., Wainaina, J.M., Roux, S., and Sullivan, M.B. (2021) Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation PeerJ 9:e11447

      Comment 21: Line 380. This section needs to start with detailed evidence that this phage can even infect this particular strain. Added note, upon further reading the serial dilution cultures are not sufficient to prove these phage infect this Pseudomonas. We need at a minimum a one-step growth curve and wet mount microscopy. It is much more likely that some carry over contaminant is invading the culture and influencing OD600. With the given evidence, I am not at all convinced that these phages have anything to do with Pseudomonas polysaccharide use and I recommend either drastically revising this section or eliminating it entirely.

      Line 386-389. Could this be because you are observing your added phage in the starch enriched media while no phage were introduced with the "other types of media" so none would be observed? This could have nothing to do with infection dynamics. Further, this would also be consistent with your starch solution being contaminated by phage.

      Line 399. Again consistent with the starch media being contaminated.

      Line 401-408. This is more likely to do with the augmentation of the media with an additional carbon source and not involving the phage. 

      Line 410. I am not convinced that these viruses infect the Pseudomonas strain. Extensive further evidence of infection is needed to make these assertions.  Figure 6A. We need confirmation that the isolate culture remains pure and there are no other contaminants introduced with the phage.

      Thanks for your comments. We have proved that the polysaccharides (laminarin/ starch) didn't contaminate any phages above. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. The presence of filamentous phages and hexagonal phages was detected in the supernatant of strain 273 cultured in basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. After 3 passages of serial cultivation in basal medium supplemented with 5 g/L starch, we found that filamentous phages and hexagonal phages were also present in basal medium supplemented with starch, but not in the basal medium, which may mean that Phages-WC36 could infect strain 273 and starch is an important inducer. In addition, the Phages-WC36 used in the growth assay of strain 273 were multiple purified and eventually suspended in SM buffer (0.01% gelatin, 50 mM Tris-HCl, 100 mM NaCl and 10 mM MgSO4). Thus, these phages are provided do not contain some extracellular enzymes and/or nutrients. In addition, we set up three control groups in the growth assay of strain 273: basal medium, basal medium supplemented with Phages-WC36 and basal medium supplemented with starch. If the Phages-WC36 contains some extracellular enzymes and/or nutrients, strain 273 could also grow well in the basal medium supplemented only with Phages-WC36. However, the poor growth results of strain 273 cultivated in the basal medium supplemented with Phages-WC36 further confirmed that there were not some extracellular enzymes and/or nutrients in these phages.

      Finally, the possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further disclose the infection mechanism in this paper. Therefore, according to your suggestions, we have deleted this section entirely.

      Comment 27: Line 460. Details about how these genomes were reconstructed is needed here.  

      Thanks for your comments. According to your suggestions, we have added the detailed information about the genome sequencing, annotation, and analysis as “Genome sequencing, annotation, and analysis of strains WC36 and zth2 For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021b). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.” in the revised manuscript (Lines 333-351).

      Comment 28: Line 462. Accession list of other taxa in the supplement would help here.  

      Thanks for your comments. The accession numbers of these strains were displayed behind these strains in Figure 1A. According to your suggestions, we have added an accession list of these taxa (Supplementary Table 6) in the revised manuscript.

      Comment 29: Line 463. Is there any literature to support that these are phylogenetically informative genes for Inoviridae?  

      Thanks for your comments. There are some literatures (Zeng et al, 2021; Evseev et al, 2023) to support that these are phylogenetically informative genes for Inoviridae. We have added these literatures in the revised manuscript. 

      References related to this response:

      Zeng, J., Wang, Y., Zhang, J., Yang, S., and Zhang, W. (2021) Multiple novel filamentous phages detected in the cloacal swab samples of birds using viral metagenomics approach Virol J 18:240

      Evseev, P., Bocharova, J., Shagin, D., and Chebotar, I. (2023) Analysis of Pseudomonas aeruginosa isolates from patients with cystic fibrosis revealed novel groups of filamentous bacteriophages. Viruses 15: 2215

      Reviewer #2 (Public Review):

      Summary: This paper investigates virus-host interactions in deep-sea bacteriophage systems which employ a seemingly mutualistic approach to viral replication in which the virus aids host cell polysaccharide import and utilization via metabolic reprogramming. The hypothesis being tested is supported with solid and convincing evidence and the findings are potentially generalizable with implications for our understanding of polysaccharide-mediated virus-host interactions and carbon cycles in marine ecosystems more broadly.

      Thanks for your positive comments.

      Strengths: This paper synthesizes sequencing and phylogenic analyses of two Lentisphaerae bacteria and three phage genomes; electron microscopy imaging of bacterial/phage particles; differential gene expression analyses; differential growth curve analyses, and differential phage proliferation assays to extract insights into whether laminarin and starch can induce both host growth and phage proliferation. The data presented convincingly demonstrate that both host culture density and phage proliferation increase as a result having host, phage, and polysaccharide carbon source together in culture.

      Thanks for your positive comments.  

      Weaknesses (suggestions for improvement): 

      (1) The article would be strengthened by the following additional experiment: providing the phage proteins hypothesized to be aiding host cell growth (red genes from Figure 5...TonB system energizer ExbB, glycosidases, etc) individually or in combination on plasmids rather than within the context of the actual phage itself to see if such additional genes are necessary and sufficient to realize the boosts in host cell growth/saturation levels observed in the presence of the phages tested.

      Thanks for your valuable comments. It is a really good idea to express individually or in combination on plasmids to see the effects of those polysaccharide-degradation proteins in the host cell. However, at present, we failed to construct the genetic and expression system for the strictly anaerobic strain WC36, which hindering our further detailed investigation of the functions of those polysaccharide-degradation proteins. In our lab, we are trying our best to build the genetic and expression system for strain WC36. We will definitely test your idea in the future. 

      (2) The paper would also benefit from additional experiments focused on determining how the polysaccharide processing, transport, and metabolism genes are being used by the phages to either directly increase viral infection/replication or else to indirectly do so by supporting the growth of the host in a more mutualistic manner (i.e. by improving their ability to import, degrade, and metabolize polysaccharides).  

      Thanks for your valuable comments. Indeed, due to the chronic phage genome is not within the chromosome of the host, it is very hard to disclose the exact auxiliary process and mechanism of chronic phages. At present, we are trying to construct a genetic manipulation system for the strictly anaerobic host WC36, and we will gradually reveal this auxiliary mechanism in the future. In addition, combined with the reviewer 1’s suggestions, the focus of revised manuscript is to emphasize that polysaccharides induce deep-sea bacteria to release chronic phages, and most of the content of phage assisting host metabolism of polysaccharides has been deleted.

      (3) The introduction would benefit from a discussion of what is known regarding phage and/or viral entry pathways that utilize carbohydrate anchors during host entry. The discussion could also be improved by linking the work presented to the concept of "selfishness" in bacterial systems (see for instance Giljan, G., Brown, S., Lloyd, C.C. et al. Selfish bacteria are active throughout the water column of the ocean. ISME COMMUN. 3, 11 (2023) https://doi.org/10.1038/s43705-023-00219-7). The bacteria under study are gram negative and it was recently demonstrated (https://www.nature.com/articles/ismej201726) that "selfish" bacteria sequester metabolizable polysaccharides in their periplasm to advantage. It is plausible that the phages may be hijacking this "selfishness" mechanism to improve infectivity and ENTRY rather than helping their hosts to grow and profilerate so they can reap the benefits of simply having more hosts to infect. The current work does not clearly distinguish between these two distinct mechanistic possibilities. The paper would be strengthened by at least a more detailed discussion of this possibility as well as the author's rationale for interpreting their data as they do to favor the "mutualistic" interpretation. In the same light, the paper would benefit from a more careful choice of words which can also help to make such a distinction more clear/evident/intentional. As currently written the authors seem to be actively avoiding giving insights wrt this question.  

      Thanks for your valuable comments. According to your suggestions, we have added the related discussion as “Moreover, it was recently demonstrated that selfish bacteria, which were common throughout the water column of the ocean, could bind, partially hydrolyze, and transport polysaccharides into the periplasmic space without loss of hydrolysis products (Reintjes et al., 2017; Giljan et al., 2023). Based on our results, we hypothesized that these chronic phages might also enter the host through this “selfishness” mechanism while assisting the host in metabolizing polysaccharides, thus not lysing the host. On the other hand, these chronic phages might hijack this “selfishness” mechanism to improve their infectivity and entry, rather than helping their hosts to grow and proliferate, so they could reap the benefits of simply having more hosts to infect. In the future, we need to construct a genetic operating system of the strictly anaerobic host strain WC36 to detailedly reveal the relationship between chronic phage and host.” in the revised manuscript (Lines 305-316). 

      References related to this response:

      Reintjes, G., Arnosti, C., Fuchs, B.M., and Amann, R. (2017) An alternative polysaccharide uptake mechanism of marine bacteria ISME J 11:1640-1650

      Giljan, G., Brown, S., Lloyd, C.C., Ghobrial, S., Amann, R., and Arnosti, C. (2023) Selfish bacteria are active throughout the water column of the ocean ISME Commun 3:11

      (4) Finally, I would be interested to know if the author’s sequencing datasets might be used to inform the question raised above by using bacterial immunity systems such as CRISPR/Cas9. For example, if the phage systems studied are truly beneficial/mutualistic for the bacteria then it’s less likely that there would be evidence of targeted immunity against that particular phage that has the beneficial genes that support polysaccharide metabolism.

      Thanks for your comments. According to your suggestions, we have carefully analyzed the genome of strain WC36, and found that there were no CRISPR/Cas9-related genes. Considering our results that the number of chronic phages was increased with the prolongation of culture time, we speculated that host might have no targeted immunity against these chronic phages.

      Reviewer #2 (Recommendations For The Authors):

      There are some minor grammatical errors and unclear statements (lines 99-100, 107-109, 163, 222, 223, 249-250, 254) which should also be fixed before final publication. 

      Thanks for your valuable comments. We have fixed these minor grammatical errors and unclear statements in the revised manuscript.

      Lines 99-100: we have modified this description as “For instance, AMGs of marine bacteriophages have been predicted to be involved in photosynthesis (Mann et al., 2003), nitrogen cycling (Ahlgren et al., 2019; Gazitúa et al., 2021), sulfur cycling (Anantharaman et al., 2014; Roux et al., 2016), phosphorus cycling (Zeng and Chisholm, 2012), nucleotide metabolism (Sullivan et al., 2005; Dwivedi et al., 2013; Enav et al., 2014), and almost all central carbon metabolisms in host cells (Hurwitz et al., 2013).” in the revised manuscript (Lines 100-105).

      Lines 107-109: we have modified this description as “However, due to the vast majority of deep-sea microbes cannot be cultivated in the laboratory, most bacteriophages could not be isolated.” in the revised manuscript (Lines 110-111).

      Line 163: we have modified this description as “Based on the growth curve of strain WC36, we found that the growth rate of strictly anaerobic strain WC36 was relatively slow.” in the revised manuscript (Lines 149-151).

      Lines 222-223: we have modified this description as “Regardless of whether the laminarin was present, the bacterial cells kept their cell shape intact, indicating they were still healthy after 30 days” in the revised manuscript (Lines 195-197).

      Lines 249-250: we have modified this description as “However, the entry and exit of the hexagonal phages into the WC36 cells were not observed.” in the revised manuscript (Lines 190-191).

      Line 254: we have modified this description as “To explore whether the production of bacteriophages induced by polysaccharide is an individual case, we further checked the effect of polysaccharides on another cultured deep-sea Lentisphaerae strain zth2.” in the revised manuscript (Lines 213-215).

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the Reviewers and Editors for the constructive comments, which we believe have significantly improved the quality of our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      We thank the Reviewer #1 for this pertinent comment and the opportunity to address this issue. A very similar concern was also raised by Reviewer #2. Below we try to clarify the motivations that led us to predict that the hypothesized long-term predictions should manifest at the onset (and not within or the end) of a perceptual chunk. 

      Reviewers #1 and #2 contest a critical assumption of our study i.e., the fact that longterm predictions should occur at the beginning of a rhythmic chunk as opposed to its completion. They also contest the prediction deriving from this view i.e., omitting the first sound in a perceptual chunk (short for Spanish, long for Basque) would lead to larger error responses than omitting a later element. They suggest an alternative view: the omission of tones at the end of a perceptual rhythmic chunk would evoke larger error responses than omissions at its onset, as subjects are more likely to predict the completion of the chunk than its beginning. This view predicts an interaction effect in the opposite direction of our findings. 

      While we acknowledge this as a plausible hypothesis, we believe that the current literature provides strong support for our view. Indeed, many studies in the rhythm and music perception literature have investigated the ERP responses to deviant sounds and omissions placed at different positions within rhythmic patterns (e.g., Ladinig et al., 2009; Bouwer et al., 2016; Brochard et al., 2003; Potter et al., 2009; Yabe et al., 2001). For instance, Lading et al., 2009 presented participants with metrical rhythmical sound sequences composed of eight tones. In some deviant sequences, the first or a later tone was omitted. They found that earlier omissions elicited earlier and higher-amplitude MMN responses than later omissions (irrespective of attention). Overall, this and other studies showed that the amplitude of ERP responses are larger when deviants occur at positions that are expected to be the “start” of a perceptual group - “on the beat” in musical terms - and decline toward the end of the chunk. According to some of these studies, the first element of a chunk is particularly important to track the boundaries of temporal sequences, which is why more predictive resources are invested at that position. We believe that this body of evidence provides robust bases for our hypotheses and the directionality of our predictions.

      An additional point that should be considered concerns the amplitude of the prediction error response elicited by the omission. From a predictive coding perspective, the omission of the onset of a chunk should elicit larger error responses because the system is expecting the whole chunk (i.e., two tones/more acoustic information). On the other hand, the omission of the second tone - in the transition between two tones within the chunk - should elicit a smaller error response because the system is expecting only the missing tone (i.e. less acoustic information). 

      Given the importance of these points, we have now included them in the updated version of the paper, in which we try to better clarify the rationale behind our hypothesis (see Introduction section, around the 10th paragraph).

      (2) The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      We thank the Reviewer for this comment. As mentioned in the provisional response, the approach employed to identify the presence of an interaction effect was conservative: We utilized a non-parametric test on combined gradiometers data, without making a priori assumptions about the location of the effect, and employed small cluster thresholds (cfg.clusteralpha = 0.05) to increase the chances of detecting highly localized clusters with large effect sizes. The fact that the interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. It should be also considered that in the present analyses we focused on planar gradiometer data that, compared to magnetometers and axial gradiometers, present more fine-grained spatial resolution and are more suited for picking up relatively small effects. 

      The partial overlap of the cluster with the activation peaks may simply reflect the fact that different sources contribute to the generation of the omission-MMN, which has been reported in several studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).  We value the Reviewer’s input and are grateful for the opportunity to address these considerations.

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      We thank the Reviewer for the comment and appreciate the opportunity to address these concerns. We have re-evaluated the boxplot in Figure 2E and want to clarify that the two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in the figure below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot. 

      Moreover, we believe that the presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels (see Fig. 1 C; Supplementary Fig. 2 A). 

      Based on these considerations - and along with the evidence collected in the control study and the source reconstruction data reported in the new version of the manuscript - we find it unlikely that the interaction effect is driven by outliers or by a main effect of omission type. We appreciate the opportunity provided by the Reviewer to address these concerns, as we believe they strengthen the claim that the observed effect is driven by the hypothesized long-term linguistic priors rather than uncontrolled group differences.

      Author response image 1.

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      We  appreciate  the  Reviewer’s  suggestion  to  incorporate  more comprehensive source analyses. In the new version of the paper, we perform new analyses on the source data using a new Atlas with more fine-grained parcellations of the regions of interests (ROIs) (Brainnetome atlas; Fan et al., 2016) and focusing on peak activity to increase response’s sensitivity in space and time. We therefore invite the Reviewer to read the updated part on source reconstruction included in the Results and Methods sections of the paper.  

      Reviewer #1 (Recommendations For The Authors):

      While I have described my biggest concerns with respect to this work in the public review, here I list more specific points that I hope will help to improve the manuscript. Some of these are very minor, but I hope you will still find them constructive. 

      (1) I understand the difficulties implied in recruiting subjects from two different linguistic groups, but with 20 subjects per group and a between-groups design, the current study is somewhat underpowered. A post-hoc power analysis shows an achieved power of 46% for medium effect sizes (d = 0.5, and alpha = 0.05, one-sided test). A sensitivity analysis shows that the experiment only has 80% power for effect sizes of d = 0.8 and above. It would be important to acknowledge this limitation in the manuscript. 

      We thank the Reviewer for reporting these analyses. It must be noted that our effect of interest was based on Molnar et al.’s (2016) behavioral experiment, in which a sample size of 16 subjects per group was sufficient to detect the perceptual grouping effect. In Yoshida et al., (2010), the perceptual grouping effect emerged with two groups of 20 7–8-month-old Japanese and English-learning infants. Based on these previous findings, we believe that a sample size of 20 participants per group can be considered appropriate for the current MEG study. We clarified these aspects in the Participants section of the manuscript, in which we specified that previous behavioral studies detected the perceptual grouping with similar sample sizes. Moreover, to acknowledge the limitation highlighted by the Reviewer, we also include the power and sensitivity analysis in a note in the same section (see note 2 in the Participants section).

      (2) All the line plots in the manuscript could be made much more informative by adding 95% CI bars. For example, in Figure 4A, the omission response for the long tone departs from the one for the short tone very early. Adding CIs would help to assess the magnitude of that early difference. Error bars are present in Figure 3, but it is not specified what these bars represent. 

      Thanks for the comments. We added the explanation of the error bars in the new version of Figure 3. For the remaining figures, we prefer maintaining the current version of the ERF, as the box-plots accompanying them provide information about the distribution of the effect across participants.

      (3) In the source analysis, there is only mention of an interaction trend in the left auditory cortex, but no statistics are presented. If the authors prefer to mention such a trend, I think it would be important to provide its stats to allow the reader to assess its relevance. 

      We performed new analysis on the source data, all reported in the updated version of the manuscript.

      (4) In the discussion section, the authors refer to the source analysis and state that "the interaction is evident in the left". But if only a statistical trend was observed, this statement would be misleading. 

      We agree with this comment. We invite the Reviewer to check the new part on source reconstruction, in which contrasts going in the same direction of the sensor level data are performed.

      (5) In the discussion the authors argue that "This result highlights the presence of two distinct systems for the generation of auditory" that operate at different temporal scales, but the current work doesn't offer evidence for the existence of two different systems. The effects of long-term priors and short-term priors presented here are not dissociated and instead sum up. It remains possible that a single system is in place, collecting statistics of stimuli over a lifetime, including the statistics experienced during the experiment. 

      Thanks for pointing that out. We changed the sentence above as follows: “This result highlights the presence of an active predictive system that relies on natural sound statistics learned over a lifetime to process incoming auditory input”.

      (6) In the discussion, the authors acknowledge that the omission response has been interpreted both as pure prediction and as pure prediction error. Then they declare that "Overall, these findings are consistent with the idea that omission responses reflect, at least in part, prediction error signals.". However an argument for this statement is not provided. 

      Thanks for pointing out this lack of argument. In the new version of the manuscript, we explained our rationale as follows: “Since sensory predictive signals primarily arise in the same regions as the actual input, the activation of a broader network of regions in omission responses compared to tones suggests that omission responses reflect, at least in part, prediction error signals”.

      (7) In the discussion the authors present an alternative explanation in which both groups might devote more resources to the processing of long events, because these are relevant content words. Following this, they argue that "Independently on the interpretation, the lack of a main effect of omission type in the control condition suggests that the long omission effect is driven by experience with the native language." However as there was no manipulation of duration in the control experiment, a lack of the main effect of omission type there does not rule out the alternative explanation that the authors put forward. 

      This is correct; thanks for noticing it. We removed the sentence above to avoid ambiguities.

      Minor points: 

      (8) The scale of the y-axis in Figure 2C might be wrong, as it goes from 9 to 11 and then to 12. If the scale is linear, the top value should be 13, or the bottom value should be 10. 

      Figure 2C has been modified accordingly, thanks for noticing the error.

      (9) There is a very long paragraph starting on page 7 and ending on page 8. Toward the end of the paragraph, the analysis of the control condition is presented. That could start a new paragraph.

      Thanks for the suggestion. We modified the manuscript as suggested.

      Reviewer #2 (Public Review):

      (1) Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of longterm priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Thanks for the suggestion. A similar point was also advanced by Reviewer 1. In general, we believe our work speaks about the predictive nature of such experiencedependent  effects, and show that these linguistic priors shape sensory processes at very early stages. This is discussed in the sixth and seventh paragraphs of the Discussion section. In the new version of the article, we modified some statements and tried to make them more coherent with the scope of the present work. For instance, we changed "This result highlights the presence of two distinct systems for the generation of auditory predictive models, one relying on the transition probabilities governing the recent past, and another relying on natural sound statistics learned over a lifetime“ with “This result highlights the presence of an active predictive system that relies on natural sound statistics learned over a lifetime to process incoming auditory input”.

      (2) Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      A similar point was advanced by Reviewer #1. We tried to clarify the rationale behind our hypothesis. Please refer to the response provided to the first comment of Reviewer #1 above.

      (3) In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      We thank the Reviewer for this comment. We explored in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed a very interesting hypothesis. In the phase analyses reported below we focused on the instantaneous phase angle time locked to the onset of short and long tones presented in the experiment.

      In short, we extracted time intervals of two seconds centered on the onset of the tones for each participant (~200 trials per condition) and using a wavelet transform (implemented in Fieldtrip ft_freqanalysis) we targeted the 0.92 Hz frequency that corresponds to the rhythm of presentation of our pairs of tones. We extracted the phase angle for each time point and using the circular statistics toolbox implemented in Matlab we computed the Raleigh z scores across all the sensor space for each tone (long and short tone) and group (Spanish (Spa) dominants and Basque (Eus) dominants). This method evaluates the instantaneous phase clustering at a specific time point, thus evaluating the presence of a specific oscillatory pattern at the onset of the specific tone. 

      Author response image 2.

      Here we observe that the phase clustering was stronger in the right sensors for both groups. The critical point is to evaluate the phase angle (estimated in phase radians) for the two groups and the two tones and see if there are statistical differences. We focused first on the sensor with higher clustering (right temporal MEG1323) and observed very similar phase angles for the two groups both for long and short tones (see image below). We then focused on the four left fronto-temporal sensor pairs who showed the significant interaction: here we observed one sensor (MEG0412) with different effects for the two groups (interaction group by tone was significant, p=0.02): for short tones the “Watson (1961) approximation U2 test” showed a p-value of 0.11, while for long tones the p-value was 0.03 (after correction for multiple comparisons). 

      Overall, the present findings suggest the tendency to phase aligning differently in the two groups to long and short tones in the left fronto-temporal hemisphere. However, the effect could be detected only in one gradiometer sensor and it was not statistically robust. The effect in the right hemisphere was statistically more robust, but it was not sensitive to group language dominance. 

      Due to the inconclusive nature of these analyses regarding the role of language experience in shaping the phase alignment to rhythmic sound sequences, we prefer to keep these results in the public review rather than incorporating them in the article.  Nonetheless, we believe that this decision does not undermine the main finding that the group differences in the MMN amplitude are driven by long-term predictions – especially in light of the many studies indicating the MMN as a putative index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). Moreover, as suggested in the preliminary reply, despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      Author response image 3.

      (4) Source localization is performed on sensor-level significant data. The lack of  sourcelevel statistics weakens the conclusions that can be extracted. Furthermore, only the source reflecting the interaction pattern is taken into account in detail as supporting their hypotheses, overlooking other sources. Also, the right IFG source activity is not depicted, but looking at whole brain maps seems even stronger than the left. To sum up, source localization data, as informative as it could be, does not strongly support the author's claims in its current state. 

      A similar comment was also advanced by Reviewer #1 (comment 2). We appreciate the suggestion to incorporate more comprehensive source analyses. In the new version of the paper, we perform new analyses on the source data using a new Atlas with more fine-grained parcellations of the ROIs, and focusing on peak activity to increase response’s sensitivity in space and time. We therefore invite the Reviewer to read the updated part on source reconstruction included in the Results and Methods sections of the paper. 

      In the article, we report only the source reconstruction data from ROIs in the left hemisphere, because it is there that the interaction effect arises at the sensor level. However, we also explored the homologous regions in the right hemisphere, as requested by the Reviewer. A cluster-based permutation test focusing on the interaction between language group and omission type was performed on both the right STG and IFG data. No significant interaction emerged in any of these regions. Below a plot of the source activity time series over ROIs in the right STG and IFG. 

      Author response image 4.

      Reviewer #2 (Recommendations For The Authors):

      In this set of private recommendations for the authors, I will outline a couple of minor comments and try to encourage additional data analyses that, in my opinion, would strengthen the evidence provided by the study. 

      (1) As I noted in the public review, I believe an oscillatory analysis of the data would, on one hand, provide stronger support for the behavioral effect of rhythmic perceptual organization given the lack of behavioral direct evidence; and, on the other hand, provide evidence (to be discussed if so) for a role of entrained oscillation phase in explaining the different pattern of omission responses. One analysis the authors could try is to measure the phase angle of an oscillation, the frequency of which relates to the length of the binary pattern, at the onset of short and long tones, separately, and compare it across groups. Also, single trials of omission responses could be sorted according to that phase. 

      Thanks for the suggestion. Please see phase analyses reported above.

      (2) I wonder why source activity for the right IFG was not shown. I urge the authors to provide and discuss a more complete picture of the source activity found. Given the lack of source statistics (which could be performed), I find it a must to give an overall view. I find it so because I believe the distinction between perceptual grouping effects due to inherent acoustic differences across languages or semantic differences is so interesting. 

      Thanks again for the invitation to provide a more complete picture of the source activity data. As mentioned in the response above, we invite the Reviewer to read the new related part included in the Results and Methods sections of the paper. In our updated source reconstruction analysis, we find that some regions around the left STG show a pattern that resembles the one found at the sensor-level, providing further support for the “acoustic” (rather than syntactic/semantic) nature of the effect. 

      We did not report ROI analysis on the right hemisphere because the interaction effect at sensor level emerged on the left hemisphere. Yet, we included a summary of this analysis in the public response above. 

      (3) Related to this, I have to acknowledge I had to read the whole Molnar et al. (2016) study to find the only evidence so far that, acoustically, in terms of sound duration, Basque and Spanish differ. This was hypothesized before but only at Molnar, an acoustic analysis is performed. I think this is key, and the authors should give it a deeper account in their manuscript. I spend my review of this study thinking, well, but when we speak we actually bind together different words and the syllabic structure does not need to reflect the written one, so maybe the effect is due to a high-level statistical prior related to the content of the words... but Molnar showed me that actually, acoustically, there's a difference in accent and duration: "Taken together, Experiments 1a and 1b show that Basque and Spanish exhibit the predicted differences in terms of the position of prosodic prominence in their phonological phrases (Basque: trochaic, Spanish: iambic), even though the acoustic realization of this prominence involves not only intensity in Basque but duration, as well. Spanish, as predicted, only uses duration as a cue to mark phrasal prosody." 

      Thanks for the suggestion, the distinction in terms of sound duration in Spanish and Basque reported by Molnar is indeed very relevant for the current study. 

      We add a few sentences to highlight the acoustic analysis by Molnar and the consequent acoustic nature of the reported effect.

      In the introduction: “Specifically, the effect has been proposed to depend on the quasiperiodic alternation of short and long auditory events in the speech signal – reported in previous acoustic analyses (Molnar et al., 2016) – which reflect the linearization of function words (e.g., articles, prepositions) and content words (e.g., nouns, adjectives, verbs).”

      In the discussion, paragraph 3, we changed “We hypothesized that this effect is linked to a long-term “duration prior” originating from the syntactic function-content word order of language, and specifically, from its acoustic consequences on the prosodic structure” with “We hypothesized that this effect is linked to a long-term “duration prior” originating from the acoustic properties of the two languages, specifically from the alternation of short and long auditory events in their prosody”.

      In the discussion, end of paragraph eight: “The reconstruction of cortical sources associated with the omission of short and long tones in the two groups showed that an interaction effect mirroring the one at the sensor level was present in the left STG, but not in the left IFG (fig. 3, B, C, D). Pairwise comparisons within different ROIs of the left STG indicated that the interaction effect was stronger over primary (BA 41/42) rather than associative (BAs 22) portions of the auditory cortex. Overall, these results suggest that the “duration prior” is linked to the acoustic properties of a given language rather than its syntactic configurations”.

      Now, some minor comments: 

      (1) Where did the experiments take place? Were they in accordance with the Declaration of Helsinki? Did participants give informed consent? 

      All the requested information has been added to the updated version of the manuscript. Thanks for pointing out this.

      (2) The fixed interval should be called inter-stimulus interval. 

      Thanks for pointing this out. We changed the wording as suggested.

      (3) The authors state that "Omission responses allow to examine the presence of putative error signals decoupled from bottom-up sensory input, offering a critical test for predictive coding (Walsh et al 2020, Heilbron and Chait, 2018).". However the way omission responses are computed in their study is by subtracting the activity from the previous tone. This necessarily means that in the omission activity analyzed, there's bottom-up sensory input activity. As performing another experiment with a control condition in which a sequence of randomly presented tones with different durations to compare directly the omission activity in both sequences (experimental and control) is possibly too demanding, I at least urge the authors to incorporate the fact that their omission responses do reflect also tone activity. And consider, for future experiments, the inclusion of further control conditions. 

      Thanks for the opportunity to clarify this aspect. Actually, the way we computed the omission MMN is not by subtracting the activity of the previous tone from the omission, but by subtracting the activity of randomly selected tones across the whole experiment. That is, we randomly selected around 120 long and short tones (i.e., about the same number as the omissions); we computed the ERF for the long and short tones; we subtracted these ERF from the ERF of the corresponding short and long omissions. We clarified these aspects in both the Materials and Methods (ERF analysis paragraph) and Results section.

      Moreover, the subtraction strategy - which is the standard approach to calculate the MMN - allows to handle possible neural carryover effects arising from the perception of the tone preceding the omission.

      The sentence "Omission responses allow to examine the presence of putative error signals decoupled from bottom-up sensory input, offering a critical test for predictive coding (Walsh et al 2020, Heilbron and Chait, 2018)." simply refer to the fact that the error responses resulting from an omission are purely endogenous, as omissions are just absence of an expected input (i.e., silence). On the other hand, when a predicted sequence of tones is disrupted by an auditory deviants (e.g., a tone with a different pitch or duration than the expected one), the resulting error response is not purely endogenous, but it partially includes the response to the acoustic properties of the deviant.

      (4) When multiple clusters emerged from a comparison, only the most significant cluster was reported. Why? 

      We found more than one significant cluster only in the comparison between pure omissions vs tones (figure 2 A, B). The additional significant cluster from this comparison is associated with a P-value of 0.04, emerges slightly earlier in time, and goes in the same direction as the cluster reported in the paper i.e., larger ERF responses for omission vs tones. We added a note specifying the presence of this second cluster, along with a figure on the supplementary material (Supplementary Fig. 1 A, B).

      (5) Fig 2, if ERFs are baseline corrected -50 to 0ms, why do the plots show pre-stimulus amplitudes not centered at 0? 

      This is because we combined the latitudinal and longitudinal gradiometers on the ERF obtained after baseline correction, by computing the root mean square of the signals at each sensor position (see also  https://www.fieldtriptoolbox.org/example/combineplanar_pipelineorder/). This information is reported in the methods part of the article.

      (6) Fig 2, add units to color bars. 

      Sure.

      (7) Fig 2 F and G, put colorbar scale the same for all topographies. 

      Sure, thanks for pointing this out.

      (8) The interaction effect language (Spanish; Basque) X omission type (short; long) appears only in a small cluster of 4 sensors not located at the locations with larger amplitudes to omissions. Authors report it as left frontotemporal, but it seems to me frontocentral with a slight left lateralization.

      (1) the fact that the cluster reflecting the interaction effect does not overlap with the peaks of activity is not surprising in our view. Many sources contribute to the generation of the MMN. The goal of our work was to establish whether there is also evidence for a long-term system (among the many) contributing to this. That is why we perform a first analysis on the whole omission response network (likely including many sources and predictive/attentional systems), and then we zoom in and focus on our hypothesized interaction. We never claim that the main source underlying the omissionMMM is the long-term predictive system. 

      (2) The exact location of those sensors is at the periphery of the left-hemisphere omission response, which mainly reflects activity from the left temporal regions. The sensor location of this cluster could be influenced by multiple factors, including (i) the direction of the source dipoles determining an effect; (ii) the combination of multiple sources contributing to the activity measured at a specific sensor location, whose unmixing could be solved only with a beamforming source approach. Based on the whole evidence we collected also in the source analyzes we concluded that the major contributors to the sensor-level interaction are emerging from both frontal and temporal regions.

      Reviewer #3 (Public Review):

      (1) The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      We appreciate the positive feedback from Reviewer #3. We agree that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Below are a few comments concerning the weakness highlighted: 

      (i) Concerning the sample size: a similar point was raised by Reviewer #1. We report our reply as presented above: “Despite a sample size of 20 participants per group can be considered relatively small for detecting an effect in a between-group design, it must be noted that our effect of interest was based on Molnar et al.’s (2016) experiment, where a sample size of 16 subjects per group was sufficient to detect the perceptual grouping effect. In Yoshida et al., 2010, the perceptual grouping effect arose with two groups of 20 7–8-month-old Japanese and English-learning infants. Based on these findings, we believe that a sample size of 20 participants per group can be considered appropriate for the current study”. We clarified these aspects in the new version of the manuscript.

      (ii) We believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it”. (iii) Regarding the fact that the “crucial effects are all mostly in the .01>P<.05 range”: we want to stress that the approach we used to detect the interaction effect was conservative, using a cluster-based permutation approach with no a priori assumptions about the location of the effect. The robustness of our approach has also been highlighted by Reviewer 2: “Data analyses. Sound, state-of-the-art methodology in the event-related field analyses at the sensor level.” In sum, despite some crucial effects being in the .01>P<.05 range, we believe that the statistical soundness of our analysis, combined with the lack of effect in the control condition, provides compelling evidence for our H1.

      Reviewer #3 (Recommendations For The Authors):

      Figures - Recommend converting all diagrams and plots to vector images to ensure they remain clear when zoomed in the PDF format. 

      Sure, thanks. 

      Figure 1: To improve clarity, the representation of sound durations in panels C and D should be revisited. The use of quavers/eighth notes can be confusing for those familiar with musical notation, as they imply isochrony. If printed in black and white, colour distinctions may be lost, making it difficult to discern the different durations. A more universal representation, such as spectrograms, might be more effective. 

      Thanks for the suggestion. It’s true that the quavers/eighth notes might be confusing in that respect. However, we find this notation as a relatively standard approach to define paradigms in auditory neuroscience, see for instance the two papers below. In the new version of the manuscript, we specified in the captions under the figure that the notes refer to individual tones, in order to avoid ambiguities.

      - Wacongne, C., Labyt, E., Van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754-20759.

      - Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., & Pallier, C. (2015). The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees. Neuron, 88(1), 2-19.

      Figure 2 : In panel C of Figure 2, please include the exact p-value for the interaction observed. Refrain from using asterisks or "n.s." and opt for exact p-values throughout for the sake of clarity. 

      Thank you for your suggestion. We have included the exact p-value for the interaction in panel C of Figure 2. However, for the remaining figures, we have chosen to maintain the use of asterisks and "n.s.". We would like our pictures to convey the key findings concisely, while the numerical details can be found in the article text. The caption below the image also provides guidance on the interpretation of the p-values: (statistical significance: **p < 0.01, *p < 0.05, and ns p > 0.05).  

      Figure 3 Note typo "Omission reponse"

      Fixed. Thanks for noticing the typo. 

      A note: we moved the figure reflecting the main effect of long tone omission and the lack of main effect of language background (Figure 4 in the previous manuscript) in the supplementary material (Supplementary Figure 2).

      References

      Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 22632271.

      Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

      Ladinig, O., Honing, H., Háden, G., & Winkler, I. (2009). Probing attentive and preattentive emergent meter in adult listeners without extensive music training. Music Perception, 26(4), 377-386. 

      Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological Science, 14(4), 362-366.

      Potter, D. D., Fenwick, M., Abecasis, D., & Brochard, R. (2009). Perceiving rhythm where none exists: Event-related potential (ERP) correlates of subjective accenting. Cortex, 45(1), 103-109.

      Bouwer, F. L., Werner, C. M., Knetemann, M., & Honing, H. (2016). Disentangling beat perception from sequential learning and examining the influence of attention and musical abilities on ERP responses to rhythm. Neuropsychologia, 85, 80-90.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Despite the breadth of information presented, however, further quantification of results and explanation of experimental approaches would be needed to support some of the authors' claims. Additionally, a more thorough discussion is needed to contextualize their findings relative to previous work.

      (1) a. Important quantification is lacking for the data presented. For example, multiple figures include immunohistochemistry or immunocytochemistry data (Figures 1, 5, 6), but they are presented without accompanying measures like fractions of cells labeled or comparisons against controls.

      We would like to clarify that the immunohistochemistry or immunocytochemistry data presented are meant to be qualitative rather than quantitative. The main purpose of the images is to show the presence or absence of markers of OEC subtypes rather than how much is present. That being said, in the revision we now add quantitative estimates of cell fractions for OECs along with other major cell types in Supplemental Table 1 and each OEC subtype marker in Supplemental Table 2. 

      b. As a result, for axons projecting via OEC bridges in Figure 1, it is unclear how common these bridges are in the presence or absence of OECs.

      We note that the number of spinal cord transected rats with bridges of axons crossing the lesion core are extremely rare following a severe spinal cord injury in adult mammals. Our first example of axon bridging following a complete spinal cord transection followed by OEC transplants was reported in Thornton et al., (2018) and compared to an incomplete transection in a fibroblast-transplanted control in his Figure 4. That figure also appeared the cover of Experimental Neurology when the paper was published. Figure 1 in the current paper was from an independent experiment which replicated the previously observed rare bridge formation. We noted this in the revised manuscript.

      Page 6: “We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals.”

      c. For Figure 6., it is unclear whether cells having an alternative OEC morphology coincide with progenitor OEC subtype marker genes to a statistically significant degree. (see top paragraph on page 11)

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: one type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our OEC subtypes based on scRNAseq with those previously described based on morphology alone (Franceschini and Barnett, 1996). There could be agreement between large, flat or small fusiform OECs morphological and their progenitor status, but it is not required that the two classification types would significantly overlap. Here we report the percentage of morphology-based cell subtypes that show expression of our OEC subtype markers to estimate the overlap between the two. Our results indicate the two types of OEC morphologies share a certain degree of overlap, a finding that indicates similarities as well as differences between the two classification methods.

      In our results section we show that ~3/4ths of the Ki67-expressing OEC progenitor cells sampled were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remaining ~1/4th of the Ki67-labeled  OECs were fusiform in shape and expressed Ngfr<sup>p75</sup> strongly. We feel that this is important to include as it is the only previous report of OB-OEC subtypes. The statistics of these results were in our original manuscript on page 11 and we further revise the text as follows:

      Page 12: “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti-Ngfr<sup>p75</sup>  (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4ths) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share a certain degree of overlap, indicating similarities but also differences between the two classification methods.”

      d. Similar quantification is missing in other types of data such as Western blot images (Fig. 9) and OEC marker gene data (for which p-values are not reported; Table S2). 

      Response on Western blots: The Western blot signals shown in Figure 9 are from experiments that were designed to be qualitative rather than quantitative, by addressing the question, “Can we detect Reelin signals or not? in the different samples.” Both Western blots show that Reln<sup>+/+</sup> mouse olfactory bulbs (d) or cortices (e) contain Reelin whereas Reln<sup>-/-</sup>  samples do not and therefore provide positive and negative controls, respectively. The rat olfactory nerve layer (ONL, laminae I-II of olfactory bulb, d lane 1; e lane 3) contains mainly OECs wrapped around the axons of the olfactory sensory neurons that transmit olfactory signals into the olfactory bulb. To address your request for quantification, Dr. Khankan measured the density of the three isoforms of Reelin, 400 kD, 300 kD and 180 kD in Fig. 9e and normalized them against the GADPH control (37 kD). The graph below shows the normalized band density in arbitrary units on the Y-axis relative to the first 3 conditions, i.e., Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cerebral cortices and rat  Reln<sup>+/+</sup> ONL. Because the conditioned medium was collected from tissue culture medium rather than cells or tissue, the GAPDH control was not present and therefore these data cannot be normalized in a similar analysis.  

      Author response image 1.

      Response for OEC marker gene data: We now add new full supplementary Table S1 (for major cell types) and Table S2 (for OEC subtypes) to report statistical p values and adjusted p values, as well as additional statistics information including percent cell expressing a subtype marker in a given subtype versus in other subtypes. 

      e. The addition of quantitative measures and, where appropriate, statistical comparisons with p-values or other significance measures, would be important for supporting the authors' claims and more rigorously conveying the results.

      As detailed in the above responses, we now add quantifications and statistics to support the claims and enhance the rigor of our analysis.

      (2) a. Some aspects of the experimental design that are relevant to the interpretation of the results are not explained. For example, OECs appear to be collected from only female rats, but the potential implications of this factor are not discussed.

      We added a short explanation in the Discussion and Methods section regarding why spinal cord injury studies are carried out on female rats.

      Page 24, Discussion: “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies prefer females as their short urethra facilitates daily manual bladder expression. Our study, therefore, was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation.”

      Page 26, Methods: “Only females were used in order to match the sex of previous SCI studies conducted exclusively on female rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018). Following complete thoracic spinal cord transection, an adult rat is unable to urinate voluntarily and therefore urine must be manually “expressed” twice a day throughout the experiment. Females have a shorter urethra than males, and thus their bladders are easier to empty completely.”

      b. Additionally, it is unclear from the manuscript to what degree immunopurified cells are OECs as opposed to other cell types. The antibody used to retain OECs, nerve growth factor receptor p75 (Ngfr-p75), can also be expressed by non-OEC olfactory bulb cell types including astrocytes [1-3]. The possible inclusion of Ngfr-p75-positive but non-OEC cell types in the OEC culture is not sufficiently addressed.

      (a) Cragnolini, A.B. et al., Glia, (2009), doi: 10.1002/glia.20857.

      (b) Vickland H. et al., Brain Res., (1991), doi: 10.1016/0006-8993(91)91659-O.

      (c) Ung K. et al., Nat Commun., (2021), doi: 10.1038/s41467-021-25444-3.

      Our OECs are dissected primarily from the olfactory nerve layer that is concentrated medially and ventrally around the olfactory bulb together with a small part of the glomerular layer (layer II). OECs are the only glia present in olfactory nerve layer. Thus, although it is possible that other cell types also express Ngfr-p75 as pointed out by the reviewer and in the references provided, our OEC dissection method severely limits the number of astrocytes that might be included in our cultures. We further provide additional evidence (see updated Figure 2d and the detailed responses to the next question) that our immunopanned OECs using our dissection method consistently express all classic OEC markers but do not consistently express the majority of classic markers for other glial cell types such as astrocytes or oligodendrocytes.

      Such non-OEC cell types are also not distinguished in the analysis of single-cell RNA sequencing data (only microglia, fibroblasts, and OECs are identified; Figure 2). Thus, it is currently unclear whether results related to the OEC subtype may have been impacted by these experimental factors.

      We need to clarify that when determining potential cell types in Figure 2, we compared our cell cluster marker genes against a broad array of cell types including astrocytes, oligodendrocytes and Schwann cells, but the gene overlap was only significant for microglia, fibroblasts, and OECs, which we labeled in new Figure 2d. We added more details in methods and results to clarify how we determined the cell types in Figure 2 (text added below). We did consider all the potential cell types that could have been present in our OEC cultures, including astrocytes. However, astrocyte or oligodendrocyte markers were not significantly enriched in the clusters, but markers for microglia, fibroblasts, and OECs were prominent in the cell clusters.

      In the revised Figure 2d, we now illustrate that the OEC clusters not only express typical OEC markers, but also express a few but not all marker genes from other glial cells. We show the comparative data on markers for astrocytes, oligodendrocytes, and Schwann cells in Figure 2d in parallel with the marker genes for OECs, microglia, and fibroblasts. For each of the other glial cell types, there are some genes which overlap with OECs, and that is the reason why we identified OECs as hybrid glia.

      Page 6, Results: “Based on previously reported cell type marker genes for fibroblasts and major glial cell types including OECs, astrocytes, oligodendrocytes, and microglia, we found elevated expression of OEC marker genes in clusters 2, 3 and 7, microglia marker genes in clusters 4, 6, and 7, and fibroblast marker genes in clusters 0, 1, and 5 (Figure 2d).”

      Page 33, Methods: “Additional marker genes for fibroblasts and multiple glial cell types including astrocytes, oligodendrocytes, and microglia were also used to compare with those of the cell clusters.”

      (3) The introduction, while well written, does not discuss studies showing no significant effect of OEC implantation after spinal cord injury. The discussion also fails to sufficiently acknowledge this variability in the efficacy of OEC implantation. This omission amplifies bias in the text, suggesting that OECs have significant effects that are not fully reflected in the literature. The introduction would need to be expanded to properly address the nuance suggested by the literature regarding the benefits of OECs after spinal cord injury. Additionally, in the discussion, relating the current study to previous work would help clarify how varying observations may relate to experimental or biological factors.

      We appreciate the insightful comment and have now included information about the variability in OEC transplantation in previous studies in both the introduction and discussion sections. We discuss technical differences that lead to variability in the Introduction and how our findings could help interpret the variability in the Discussion.

      Page 4-5: Text added to the Introduction: “The outcomes of OEC transplantation studies after spinal cord injury vary substantially in the literature due to many technical differences between their experimental designs. The source of OECs has a great impact on the outcome, with OB-OECs showing more promise than peripheral lamina propria-derived OECs, and purified, freshly-prepared OECs being required for optimal OEC survival. Other important variables include the severity of the injury (hemisection to complete spinal cord transection), the age of the spinal cord injured host (early postnatal versus adult), and OEC transplant strategies (delayed or acute transplantation, cell transplants with or without a matrix; Franssen et al., 2007). Franssen et al. (2007) evaluated studies that used only OECs as a transplant, and reported that 41 out of 56 studies showed positive effects, such as OEC stimulation of regeneration, positive interactions with the glial scar and remyelination of axons. More recent systematic reviews and meta-analyses on the effects of OEC transplantation following different spinal cord injury models reported that OECs significantly improved locomotor function (Watzlawick et al.2016; Nakjavan-Shahraki et al., 2018), but did not improve neuropathic pain (Nakjavan-Shahraki et al., 2018.)”

      Pages 24-25: Discussion on OEC source variability  “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between these OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and help resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      Reviewer #1 (Recommendations For The Authors):

      This is an extremely well-written and impactful series of experiments from a renowned leader in the field. The experimental questions are timely, with similar therapeutic approaches being prepared for clinical trial. The results address a gap that has persisted in the field for several decades and one that has been considered by many scientists long before technology existed to find answers. This highlights the importance of these experiments and the results reported here. With these things in mind, there are only a few minor factors that I have, that should be addressed to strengthen the paper.

      We truly appreciate the positive evaluations from the reviewer!

      Primary concerns

      (1) Quantification of results: The authors report on the data with broad brush strokes, missing the opportunity to quantify results and strengthen the interpretations. For instance, when describing gene expression, what proportion of cells analyzed were expressing these genes? How did this compare with detectable levels of protein? Can the author draw correlations between data sets collected that could offer even more insight into the identities of the cells studied? There is also a missed opportunity to evaluate how transplantation into injured neural tissue might alter gene expression of the phenotypes identified prior to transplantation.

      We appreciate these insightful comments and have added quantitative information and other relevant discussions in the revision. We now add Suppl Tables 1 (for major cell types including OECs, fibroblast, and microglia) and 2 (for OEC subtypes) to indicate the proportion of cells expressing each marker gene in each given cell cluster/subtype in the column. “Percentage of cells expressing the gene in the subtype/cell type” versus the proportion of cells expression the given marker genes in other cell types in the column “Percentage of cells expressing the gene in the other subtypes/cell types.” In the new supplementary tables, we report statistical p values and adjusted p values after multiple testing correction to indicate statistical significance.

      Regarding the comparison with protein levels, we carried out immunohistochemistry experiments to confirm the proteins corresponding to OEC subtype markers. Our findings show that proteins for the gene markers can be detected, and thereby supports our sc-seq findings. However, the immunofluorescence only provides a qualitative measure of protein levels in situ, so we cannot perform a correlation analysis. This is something we plan to  pursue in a follow-up study with measurable protein levels. We also discuss future directions to examine the genes and proteins in in vivo transplantation studies in the Discussion.

      (2) Discussion and interpretation: Greater depth to interpretation and discussion of data and its impact on future work is needed. For example, on pages 20-21, the authors reflect briefly on why Reelin might be of interest (it could lead to Dab-1 expression), but why is that important? There are several instances like this where it would be useful for the authors to provide a little more insight into the potential impact of these data and interpretations.

      We appreciate these valuable suggestions. We have revised our Results and Discussion sections to offer deeper insight and interpretation of the importance of the data, especially that for Reelin.

      Page 17: Results: “In the canonical Reelin-signaling pathway, Reelin binds to the very-low-density lipoprotein receptor (Vldlr) and apolipoprotein E receptor 2 (ApoER2) and induces Src-mediated tyrosine phosphorylation of the intracellular adaptor protein Disabled-1 (Dab1). Both Reelin and Dab1 are highly expressed in embryos and contribute to correct neuronal positioning.”

      Page 22-23, Discussion: “Reelin is a developmentally expressed protein detected in specific neurons, in addition to OECs and Schwann cells. The canonical Reelin-signaling pathway involves neuronal-secreted Reelin binding to Vldlr and ApoER2 receptors expressed on Dab1-labeled neurons. Following Reelin binding, Dab1 is phosphorylated by Src family kinases which initiates multiple downstream pathways. Very little is known, however, about Reelin secreted by glia. Panteri et al. (2006) reported that Schwann cells express low levels of Reelin in adults, and that it is upregulated following a peripheral nerve crush, as is reported above for many neurotrophic factors. Reelin loss in Schwann cells reduced the diameter of small myelinated axons but did not affect unmyelinated axons (Panteri et al., 2005). In the olfactory system, OECs ensheath the Dab1-labeled, unmyelinated axons of olfactory sensory neurons which are continuously generated and die throughout life. OEC transplantation following spinal cord injury would provide an exogenous source of Reelin that could phosphorylate Dab1-containing neurons or their axons. Dab1 is expressed at high levels in the axons of some projection neurons, such as the corticospinal pathway (Abadesco et al., 2014). Future experiments are needed to explore the function that glial-secreted Reelin may have on axonal regeneration.”

      Minor concerns

      (3) The authors reflect on the spontaneous glial bridge that develops in the repairing spinal cord of Zebrafish, but perhaps even more relevant is that this same phenomenon occurs in mammals as well if the spinal cord is injured during early development (opossum; Lane et al, EJN 2007). This should be considered and the statement that there is little regeneration in the mammalian spinal cord should be clarified.

      We appreciate this insightful comment. We now add discussions of the axonal regeneration and bridging observed following severe spinal cord injury in young developing mouse and opossum spinal cords.

      Page 23: “Adult mammals show little evidence of spontaneous axonal regeneration after a severe spinal cord injury in contrast to transected neonatal rats (Bregman, 1987; Bregman et al., 1993) and young postnatal opossums (Lane et al., 2007). In immature mammals, axons continue to project across or bridge the spinal cord transection site during development. Lower organisms such as fish, show even more evidence of regeneration following severe SCI. Mokalled et al. (2016) reported that glial secretion of Ctgfa/Ccn2 was both necessary and sufficient to stimulate a glial bridge for axon regeneration across the zebrafish transection site. Cells in the injury site that express Ctgf include ependymal cells, endothelial cells, and reactive astrocytes (Conrad et al., 2005; Mokalled et al., 2016; Schwab et al., 2001). Here we show that, although rare, Ctgf-positive OECs can contribute to glial bridge formation in adult rats. The most consistent finding among our severe SCI studies combined with OEC transplantation is the extent of remodeling of the injury site and axons growing into the inhibitory lesion site, together with OECs and astrocytes. The formation of a glial bridge across the injury was critical to the spontaneous axon generation seen in zebrafish (Mokalled et al., 2016) and likely contributed to the axon regeneration detected in our OEC transplanted, transected rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018).

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript title and abstract must include the species and sex studied.

      The title and abstract have been modified as suggested.

      Page 1: “Olfactory ensheathing cells from adult female rats are hybrid glia that promote neural repair”

      (2) OECs submitted for sequencing were like those about to be transplanted; however, the phenotype of the cells would likely change immediately and shift over time post-implantation. Please briefly address or discuss this point in the Discussion (or Results).

      We have added this important discussion point.

      Pages 23-24: Discussion: “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      (3) Page 12: Use of "monocytes" - the word "monocyte" implies a circulating, undifferentiated innate immune cell. This should not be used interchangeably with macrophage or microglia.

      We agree and now refer to microglia or macrophages depending on the context. We did leave the term monocyte in Table 3 if these cells were found in a top 20 gene reported in the references.

      (4) Page 12: "We now show that these unique monocytes reported between the bundles of olfactory axons surrounded by OECs (Smithson & Kawaja, 2010), are in fact, a distinct subtype of OECs."

      Is it possible to conclude that these cells are a "distinct subtype of OECs?" Perhaps these cells are a hybrid between microglia/macrophages and OECs? This is speculative, so should be worded more carefully - especially in the Results section. Please clarify, dampen conclusions, and/or better justify the wording here.

      We agree and have modified the entire paragraph to dampen and more carefully explain our conclusions. We also added an additional observation that strengthens the relationship between OECs and microglial/macrophages.  

      Page 12, Results: Additional observation: “In fact, all top 20 genes in cluster 3 are expressed in microglia, macrophages, and/or monocytes (Suppl. Table 3).”

      Page 13, Results: The statement referenced in your review was deleted and we wrote the following: “Smithson and Kawaja (2010) identified unique microglial/macrophages that immunolabeled with Iba-1 (Aif1) and Annexin A3 (Anxa3) in the olfactory nerve and outer nerve layer of the olfactory bulb. These authors proposed that Iba1-Anxa3 double-labeled cells were a distinct population of microglia/macrophages that protected the olfactory system against viral invasion into the cranial cavity. Based on our scRNA-seq data we offer an alternative interpretation that at least some of these Iba-1-Anxa3 cells may be a hybrid OEC-microglial cell type. Supporting this interpretation, there are a number of reports that suggest OECs frequently function as phagocytes (e.g., Khankan et al., 2016; Nazareth et al., 2020; Su et al. 2013).”

      (5) Page 13: "Pseudotime trajectory analysis, a widely used approach to predict cell plasticity and lineages based on scRNA-seq data, suggests that there are potential transitions between specific OEC subclusters." This is interesting but is somewhat unclear. Please add one more sentence to aid the reader's understanding regarding how this analysis is performed.

      Thank you for your valuable feedback. We have revised the text for clarity as follows:

      Page 14, Results: “We performed pseudotime trajectory analysis using the Slingshot algorithm to infer lineage trajectories, cell plasticity and lineages by ordering cells in pseudotime based on their transcriptional progression reflected in scRNA-seq data. Transcriptional progression refers to the changes in gene expression profiles of cells as they undergo differentiation or transition through different states. The trajectory analysis results suggest that there are potential transitions between specific OEC subclusters.”

      (6) The authors could discuss potential reasons for variability in OEC treatment results after spinal cord injury between studies and labs. How might sequencing results here inform the debate about whether OECs are helpful or not?

      In response to the Public Review, we added discussions about the variability in OEC treatments between studies in both the Introduction and Discussion, and these comments are copied on pages 6-7 of this document. In the Discussion we included a statement about how the current findings may inform the debate on OECs.

      (7) Discussion: please add a discussion of limitations and future directions that addresses the following points:

      a) Please add one sentence on the lack of studying sex differences - only females were studied here.

      b) There is no correlation or modulation of any target genes, so all results here are correlative.

      c) Please add a brief paragraph with future directions for the field, including acknowledgment that the role of OECs in repair after SCI is not fully resolved and that future studies might consider targeting some of the specific pathways described herein.

      d) Which pathways and OEC subpopulations likely best support repair, and how might these be reinforced or better maintained in the SCI environment? If not known, what are the next steps for identifying the most reparative OEC subtype?

      Thank you for the valuable suggestions. We have added these to the discussion as detailed below.

      Pages 23-25, Discussion:

      “Limitations of these OEC scRNA-Seq studies”

      “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies are conducted on females as their short urethra facilitates daily manual bladder expression. Our study was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation. We also did not modulate any of the genes or proteins in the identified OEC subtypes to test their causal and functional roles, thus our findings remain correlative in the current study. Future gene/protein modulation studies are necessary to understand the functional roles of the individual OEC subtypes in the context of their reparative functions to determine which pathways and subtypes are more critical and can be enhanced for neural repair. Our current findings build the foundation for these future studies to help resolve the role of OECs in spinal cord injury repair.” 

      “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between the two OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      (8) Figure 6: What is the major point of this figure and its related immunocytochemistry? Please clarify.

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: One type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our scRNA-seq-based OEC subtypes with those previously described based on morphology alone (Franceschini and Barnett, 1996). In our results section we show that ~3/4ths of the OECs sampled that were Ki67+ progenitor cells and were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remainder were Schwann cell-like, fusiform in shape and strongly Ngfr<sup>p75</sup>-labeled. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the different classification methods.

      (9) Figure 9, caption: "OEC whole cell lysates (WCL; lanes: 4, 6, and 8), and OEC conditioned medium (CM; lanes: 5 and 7)."  This statement is unclear - please clarify the result here.

      We added clarification to the legend for Figure 9d. 

      Page 50: (d) “Western blot confirms the expression of Reelin in rat olfactory nerve layer I and layer II (ONL; lane 1 of western blot). Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse olfactory bulbs were used as positive and negative controls, respectively (lanes: 2 and 3). Reelin that was synthesized by cultured OECs was found in whole cell lysates (WCL; lanes: 4, 6, and 8), whereas Reelin that was secreted by cultured OECs into tissue culture medium was measured in the OEC “conditioned medium” (CM; lanes: 5 and 7). GAPDH was the loading control for tissue homogenates (lanes 1-4, 6, 8).”

      (10) Methods: A Cat. No. for all antibodies and key supplies should be included.

      Response: All of the antibody information in the revised version is in Suppl. Table 4. Information for other key supplies is included in the extensive methods section.

      (11) Methods: How was primary antibody specificity validated for less-used antibodies? Background staining can be a major issue after SCI; e.g., with the CTGF antibody used in Figure 5.

      The spinal cord section shown in Figure 5 was compared to sections from the same SCI cohort that had been injected with control cells, i.e. skin fibroblasts. We have used the first two antibodies (anti-Glial fibrillary acidic protein and anti-Green fluorescent protein) for many years so only the CTGF was a “less-used antibody.” Our strategy for working with “less-used” or “newly-purchased” antibodies was as follows.

      First, we studied the literature to find the best antibodies for neuronal tissue. Many of the images in Figure 7 were generated with antibodies purchased just for this study. Our goal was to characterize them on normal adult lamina propria and olfactory bulb tissues rather than in the injured spinal cord where background can be an issue. In the olfactory bulb we examined the olfactory nerve layer where OECs are concentrated and then examined the olfactory epithelium, lamina propria, and the deep layers of the olfactory bulb to find regions without immunolabel. As described above, we tested anti-CTGF antibodies in SCI sections implanted with skin fibroblasts controls when conducting experiments for CTGF in sections with OECs. New antibodies were tested at multiple concentrations and we tried different immunocytochemical techniques. Anti-CTFG is expressed by several different cell types, but expression is low in most of the areas above and below the injury site. Despite our success with many “newly-purchased” antibodies there were at least 4 of them that we were never able obtain specific labeling. 

      (12) Will the data (especially the sequencing data) be shared publicly?

      The data has been uploaded to and shared via the public data repository GEO. Data availability is stated on the title page of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important evidence supporting the ability of a new type of neuroimaging, OPM-MEG system, to measure beta-band oscillation in sensorimotor tasks on 2-14 years old children and to demonstrate the corresponding development changes, since neuroimaging methods with high spatiotemporal resolution that could be used on small children are quite limited. The evidence supporting the conclusion is solid but lacks clarifications about the much-discussed advantages of OPM-MEG system (e.g., motion tolerance), control analyses (e.g., trial number), and rationale for using sensorimotor tasks. This work will be of interest to the neuroimaging and developmental science communities.

      We thank the editors and reviewers for their time and comments on our manuscript. We have responded in detail to the comments, on a point-by-point basis, below. Included in our responses (and our revised manuscript) are additional analyses to control for trial count, clarification of the advantages of OPM-MEG, and justification of our use of sensory (as distinct from motor) stimulation. In what follows, our responses are in bold typeface; additions to our manuscript are in bold italic typeface. 

      Reviewer #1 (Public Review):

      Summary:

      Compared with conventional SQUID-MEG, OPM-MEG offers theoretical advantages of sensor configurability (that is, sizing to suit the head size) and motion tolerance (the sensors are intrinsically in the head reference frame). This study purports to be the first to experimentally demonstrate these advantages in a developmental study from age 2 to age 34. In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance - neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      Thank you for reviewing our manuscript. We agree that our results demonstrate substantial equivalence with conventional MEG. However, as mentioned by Reviewer 3, most past studies have “focused on older children and adolescents (e.g., 9-15 years old)” whereas our youngest group is 25 years. We believe that by obtaining data of sufficient quality in these age groups, without the need for any restriction of head movement, we have demonstrated the advantage of OPM-MEG. We now have made this clear in our discussion:

      “…our primary aim was to test the feasibility of OPM-MEG for neurodevelopmental studies. Our results demonstrate we were able to scan children down to age 2 years, measuring high-fidelity electrophysiological signals and characterising the neurodevelopmental trajectory of beta oscillations. The fact that we were able to complete this study demonstrates the advantages of OPM-MEG over conventional-MEG, the latter being challenging to deploy across such a large age range…”

      Strengths:

      A replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      As noted above the demonstration of equivalence was one of our primary aims. We have elaborated further on the advantages below.

      Weaknesses:

      The authors describe 64 tri-axial detectors, which they refer to as 192 channels. This is in keeping with some of the SQUID-MEG description, but possibly somewhat disingenuous. For the scientific literature, perhaps "64 tri-axial detectors" is a more parsimonious description.

      The number of channels in a MEG system refers to the number of independent measurements of magnetic field. This, in turn, tells us the number of degrees of freedom in the data that can be exploited by algorithms like signal space separation or beamforming. E.g. the MEGIN (cryogenic) MEG system has 306 channels, 102 magnetometers and 204 planar gradiometers. Sensors are constructed as “triple sensor elements” with one magnetometer and 2 gradiometers (in orthogonal orientations) centred on a single location. In our system, each sensor has three orthogonal metrics of magnetic field which are (by definition) independent. We have 64 such sensors, and therefore 192 independent channels – indeed when implementing algorithms like SSS we have shown we can exploit this number of degrees of freedom.1 192 channels is therefore an accurate description of the system.

      A small fraction (<20%) of trials were eliminated for analysis because of "excess interference" - this warrants further elaboration.

      We agree that this is an important point. We now state in our methods section:

      “…Automatic trial rejection was implemented with trials containing abnormally high variance (exceeding 3 standard deviations from the mean) removed. All experimental trials were also inspected visually by an experienced MEG scientist, to exclude trials with large spikes/drifts that were missed by the automatic approach. In the adult group, there was a significant overlap between automatically and manually detected bad trials (0.7+-1.6 trials were only detected manually). In the children 10.0 +-9.4 trials were only detected manually)…”

      We also note that the other reviewers and editor questioned whether the higher rejection rate in children had any bearing on results. This is an extremely important question. In revising the manuscript this has also been taken into account with all data reanalysed with equal trial counts in children and adults. Results are presented in Supplementary Information Section 5.

      Figure 3 shows a reduced beta ERD in the youngest children. Although the authors claim that OPMMEG would be similarly sensitive for all ages and that SQUID-MEG would be relatively insensitive to young children, one trivial counterargument that needs to be addressed is that OPM has NOT in fact increased the sensitivity to young child ERD. This can possibly be addressed by analogous experiments using a SQUID-based system. An alternative would be to demonstrate similar sensitivity across ages using OPM to a brain measure such as evoked response amplitude. In short, how does Figure 3 demonstrate the (theoretical) sensitivity advantage of OPM MEG in small heads ?

      We completely understand the referees’ point – indeed the question of whether a neuromagnetic effect really changes with age, or apparently changes due to a drop in sensitivity (caused by reduced head size or - in conventional MEG and fMRI - increased subject movement) is a question that can be raised in all neurodevelopmental studies.

      Our authors have many years’ experience conducting studies using conventional MEG (including in neurodevelopment) and agreed that the idea of scanning subjects down to age two in conventional MEG would not be practical; their heads are too small and they typically fail to tolerate an environment where they are forced to remain still for long periods. Even if we tried a comparative study using conventional MEG, the likely data exclusion rate would be so high that the study would be confounded. This is why most conventional MEG studies only scan older children and adolescents. For this reason, we cannot undertake the comparative study the reviewer suggests. There are however two reasons why we believe sensitivity is not driving the neurodevelopmental effects that we observe:

      Proximity of sensors to the head: 

      For an ideal wearable MEG system, the distance between the sensors and the scalp surface (sensor proximity) would be the same regardless of age (and size), ensuring maximum sensitivity in all subjects. To test how our system performed in this regard, we undertook analyses to compute scalp-to-sensor distances. This was done in two ways:

      (1) Real distances in our adaptable system: We took the co-registered OPM sensor locations and computed the Euclidean distance from the centre of the sensitive volume (i.e. the centre of the vapour cell) to the closest point on the scalp surface. This was measured independently for all sensors, and an average across sensors calculated. We repeated this for all participants (recall participants wore helmets of varying size and this adaptability should help minimise any relationship between sensor proximity and age).

      (2) Simulated distances for a non-adaptable system: Here, the aim was to see how proximity might have changed with age, had only a single helmet size been used. We first identified the single example subject with the largest head (scanned wearing the largest helmet) and extracted the scalpto-sensor distances as above. For all other subjects, we used a rigid body transform to co-register their brain to that of the example subject (placing their head (virtually) inside the largest helmet). Proximity was then calculated as above and an average across sensors calculated. This was repeated for all participants.

      In both analyses, sensor proximity was plotted against age and significant relationships probed using Pearson correlation. 

      In addition, we also wanted to probe the relation between sensor proximity and head circumference. Head circumference was estimated by binarising the whole head MRI (to delineate volume of the head), and the axial slice with the largest circumference around was selected. We then plotted sensor proximity versus head circumference, for both the real (adaptive) and simulated (nonadaptive) case (expecting a negative relationship – i.e. larger heads mean closer sensor proximity). The slope of the relationship was measured and we used a permutation test to determine whether the use of adaptable helmets significantly lowered the identified slope (i.e. do adaptable helmets significantly improve sensor proximity in those with smaller head circumference).

      Results are shown in Figure R1. We found no measurable relationship between sensor proximity and age (r = -0.195; p = 0.171) in the case of the real helmets (panel A). When simulating a non-adaptable helmet, we did see a significant effect of age on scalp-to-sensor distance (r = -0.46; p = 0.001; panel B). This demonstrates the advantage of the adaptability of OPM-MEG; without the ability to flexibly locate sensors, we would have a significant confound of sensor proximity. 

      Plotting sensor proximity against head circumference we found a significant negative relationship in both cases (r = -0.37; p = 0.007 and  r = -0.78; p = 0.000001); however, the difference between slopes was significant according to a permutation test (p < 0.025) suggesting that adaptable has indeed improved sensor proximity in those with smaller head circumference. This again shows the benefits of adaptability to head size.

      Author response image 1.

      Scalp-to-sensor distance as a function of age (A/B) and head circumference (C/D). A and C show the case for the real helmets; B and D show the simulated non-adaptable case.

      In sum, the ideal wearable system would see sensors located on the scalp surface, to get as close as possible to the brain in all subjects. Our system of multiple helmet sizes is not perfect in this regard (there is still a significant relationship between proximity and head circumference). However, our solution has offered a significant improvement over a (simulated) non-adaptable system. Future systems should aim to improve even further on this, either by using additively manufactured bespoke helmets for every subject (this is a gold standard, but also costly for large studies), or potentially adaptable flexible helmets.

      Burst amplitudes:

      The reviewer suggested to “demonstrate similar sensitivity across ages using OPM to a brain measure”. We decided not to use the evoked response amplitude (as suggested), since this would be expected to change with age. Instead, we used the amplitude of the bursts.

      Our manuscript shows a significant correlation between beta modulation and burst probability – implying that the stimulus-related drop in beta amplitude occurs because bursts are less likely to occur. Further, we showed significant age-related changes in both beta amplitude and burst probability leading to a conclusion that the age dependence of beta modulation was caused by changes in the likelihood of bursts (i.e. bursts are less likely to ’switch off’ during sensory stimulation in children). We have now extended these analyses to test whether burst amplitude also changes significantly with age – we reasoned that if burst amplitude remained the same in children and adults, this would not only suggest that beta modulation is driven by burst probability (distinct from burst amplitude), but also show directly that the beta effects we see are not attributable to a lack of sensitivity in younger people. 

      We took the (unnormalized) beamformer projected electrophysiological time series from sensorimotor cortex and filtered it 5-48 Hz (the motivation for the large band was because bursts are known to be pan-spectral and have lower frequency content in children; this band captures most of the range of burst frequencies highlighted in our spectra). We then extracted the timings of the bursts, and for each burst took the maximum projected signal amplitude. These values were averaged across all bursts in an individual subject, and plotted for all subjects against age.

      Author response image 2.

      Beta burst amplitude as a function of age; A) shows index finger simulation trials; B shows little finger stimulation trials. In both case there was no significant modulation of burst amplitude with age.

      Results (see Figure R2) showed that the amplitude of the beta burst showed no significant age-related modulation (R2 = 0.01, p = 0.48 for index finger and R2 = 0.01, p = 0.57 for the little finger). This is distinct from both burst probability and task induced beta modulation. This adds weight to the argument that the diminished beta modulation in children is not caused by a lack of sensitivity to the MEG signal and supports our conclusion that burst probability is the primary driver of the agerelated changes in beta oscillations.

      Both of the above analyses have been added to our supplementary information and mentioned in the main manuscript. The first shows no confound of sensor proximity to the scalp with age in our study. The second shows that the bursts underlying the beta signal are not significantly lower amplitude in children – which we reasoned they would be if sensitivity was diminished at younger ages. We believe that the two together suggest that we have mitigated a sensitivity confound in our study.

      The data do not make a compelling case for the motion tolerance of OPM-MEG. Although an apparent advantage of a wearable system, an empirical demonstration is still lacking. How was motion tracked in these participants?

      We agree that this was a limitation of our experiment. 

      We have the equipment to track motion of the head during an experiment, using IR retroreflective markers placed on the helmet and a set of IR cameras located inside the MSR. However, the process takes a long time to set up, it lacks robustness, and would have required an additional computer (the one we typically use was already running the somatosensory stimulus and video). When the study was designed, we were concerned that the increased set up time for motion tracking would cause children to get bored, and result in increased participant drop out. For this reason we decided not to capture motion of the head during this study.

      With hindsight this was a limitation which – as the reviewer states – makes us unable to prove that motion robustness was a significant advantage for this study. That said, during scanning there was both a parent and an experimenter in the room for all of the children scanned, and anecdotally we can say that children tended to move their head during scans – usually to talk to the parent. Whilst this cannot be quantified (and is therefore unsatisfactory) we thought it worth mentioning in our discussion, which reads:

      “…One limitation of the current study is that practical limitations prevented us from quantitatively tracking the extent to which children (and adults) moved their head during a scan. Anecdotally however, experimenters present in the room during scans reported several instances where children moved, for example to speak to their parents who were also in the room. Such levels of movement could not be tolerated in conventional MEG or MRI and so this again demonstrates the advantages afforded by OPM-MEG…”

      As a note, empirical demonstrations of the motion tolerance of OPM-MEG have been published previously: Early demonstrations included Boto et al. 2 who captured beta oscillations in adults playing a ball game and Holmes et al. who measured visual responses as participants moved their head to change viewing angle3. In more recent demonstrations, Seymour et al. measured the auditory evoked field in standing mobile participants4; Rea et al. measured beta modulation as subjects carried out a naturalistic handwriting task5 and Holmes et al measured beta modulation as a subject walked around a room.6

      Furthermore, while the introduction discusses at some length the phenomenon of PMBR, there is no demonstration of the recording of PMBR (or post-sensory beta rebound). This is a shame because there is literature suggesting an age-sensitivity to this, that the optimal sensitivity of OPM-MEG might confirm/refute. There is little evidence in Figure 3 for adult beta rebound. Is there an explanation for the lack of sensitivity to this phenomenon in children/adolescents? Could a more robust paradigm (button-press) have shed light on this?

      We understand the question. There are two limitations to the current study in respect to measuring the PMBR:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed. For this reason we opted for entirely passive stimulation, requiring no active engagement from our participants. The advantages of this was a stimulus that all subjects could engage with. However, this was at the cost of a diminished rebound.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s 7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9, though this has rarely been adhered to in the literature. Here, we wanted to keep recordings short for the comfort of the younger participants, so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly; one can only measure beta modulation with the task. This limitation has now been addressed explicitly in our discussion:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      Data on functional connectivity are valuable but do not rely on OPM recording. They further do not add strength to the argument that OPM MEG is more sensitive to brain activity in smaller heads - in fact, the OPM recordings seem plagued by the same insensitivity observed using conventional systems.

      Given the demonstration above that bursts are not significantly diminished in amplitude in children relative to adults; and further given the demonstrations in the literature (e.g. Seedat et al.10) that functional connectivity is driven by bursts, we would argue that the effects of connectivity changing with age are not related to sensitivity but rather genuinely reflect a lack of coordination of brain activity.

      The discussion of burst vs oscillations, while highly relevant in the field, is somewhat independent of the OPM recording approach and does not add weight to the OPM claims.

      We agree that the burst vs. oscillations discussion does not add weight to the OPM claims per se. However, we had two aims of our paper, the second being to “investigate how task-induced beta modulation in the sensorimotor cortices is related to the occurrence of pan-spectral bursts, and how the characteristics of those bursts change with age.” As the reviewer states, this is highly relevant to the field, and therefore we believe adds impact, not only to the paper, but also by extension to the technology.

      In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance, neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      We thank the referee for the time and important contributions to this paper. We believe the fact that we were able to record good data in children as young as two years old was, in itself, an experimental realisation of the ‘theoretical advantages’ of OPM-MEG. Our additional analyses, inspired by the reviewers comments, help to clarify the advantages of OPM-MEG over conventional technology. The reviewers’ insights have without doubt improved the paper.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a new 192-channel OPM system that can be configured using different helmets to fit individuals from 2 to 34 years old. To demonstrate the veracity of the system, they conduct a sensorimotor task aimed at mapping developmental changes in beta oscillations across this age range. Many past studies have mapped the trajectory of beta (and gamma) oscillations in the sensorimotor cortices, but these studies have focused on older children and adolescents (e.g., 9-15 years old) and used motor tasks. Thus, given the study goals, the choice of a somatosensory task was surprising and not justified. The authors recorded a final sample of 27 children (2-13 years old) and 24 adults (21-34 years) and performed a time-frequency analysis to identify oscillatory activity. This revealed strong beta oscillations (decreases from baseline) following the somatosensory stimulation, which the authors imaged to discern generators in the sensorimotor cortices. They then computed the power difference between 0.3-0.8 period and 1.0-1.5 s post-stimulation period and showed that the beta response became stronger with age (more negative relative to the stimulation period). Using these same time windows, they computed the beta burst probability and showed that this probability increased as a function of age. They also showed that the spectral composition of the bursts varied with age. Finally, they conducted a whole-brain connectivity analysis. The goals of the connectivity analysis were not as clear as prior studies of sensorimotor development have not conducted such analyses and typically such whole-brain connectivity analyses are performed on resting-state data, whereas here the authors performed the analysis on task-based data. In sum, the authors demonstrate that they can image beta oscillations in young children using OPM and discern developmental effects.

      Thank you for this summary and for taking the time to review our manuscript.

      Strengths:

      Major strengths of the study include the novel OPM system and the unique participant population going down to 2-year-olds. The analyses are also innovative in many respects.

      Thank you – we also agree that the major strength is in the unique cohort.

      Weaknesses:

      Several weaknesses currently limit the impact of the study. 

      First, the choice of a somatosensory stimulation task over a motor task was not justified. The authors discuss the developmental motor literature throughout the introduction, but then present data from a somatosensory task, which is confusing. Of note, there is considerable literature on the development of somatosensory responses so the study could be framed with that.

      We completely understand the referee’s point, and we agree that the motivation for the somatosensory task was not made clear in our original manuscript.

      Our choice of task was motivated completely by our targeted cohort; whilst a motor task would have been our preference, it was generally felt that making two-year-olds comply with instructions to press a button would have been a significant challenge. In addition, there would likely have been differences in reaction times. By opting for a passive sensory stimulation we ensured compliance, and the same stimulus for all subjects. We have added text on this to our introduction as follows:

      “…Here, we combine OPM-MEG with a burst analysis based on a Hidden Markov Model (HMM) 10–12 to investigate beta dynamics. We scanned a cohort of children and adults across a wide age range (upwards from 2 years old). Because of this, we implemented a passive somatosensory task which can be completed by anyone, regardless of age…”

      We also state in our discussion:

      “…here we chose to use passive (sensory) stimulation. This helped ensure compliance with the task in subjects of all ages and prevented confounds of e.g. reaction time, force, speed and duration of movement which would be more likely in a motor task.7,8 However, there are many other systems to choose and whether the findings here regarding beta bursts and the changes with age also extend to other brain networks remains an open question.…”

      Regarding the neurodevelopmental literature – we are aware of the literature on somatosensory evoked responses – particularly median nerve stimulation – but we can find little on the neurodevelopmental trajectory of somatosensory induced beta oscillations (the topic of our paper). We have edited our introduction as follows:

      “…All these studies probed beta responses to movement execution; in the case of tactile stimulation (i.e. sensory stimulation without movement) both task induced beta power loss, and the post stimulus rebound have been consistently observed in adults9,13–18. Further, beta amplitude in sensory cortex has been related to attentional processes19 and is broadly thought to carry top down top down influence on primary areas20. However, there is less literature on how beta modulation changes with age during purely sensory tasks.…”

      We would be keen for the reviewer to point to any specific papers in the literature that we may have missed.

      Second, the primary somatosensory response actually occurs well before the time window of interest in all of the key analyses. There is an established literature showing mechanical stimulation activates the somatosensory cortex within the first 100 ms following stimulation, with the M50 being the most robust response. The authors focus on a beta decrease (desynchronization) from 0.3-0.8 s which is obviously much later, despite the primary somatosensory response being clear in some of their spectrograms (e.g., Figure 3 in older children and adults). This response appears to exhibit a robust developmental effect in these spectrograms so it is unclear why the authors did not examine it. This raises a second point; to my knowledge, the beta decrease following stimulation has not been widely studied and its function is unknown. The maps in Figure 3 suggest that the response is anterior to the somatosensory cortex and perhaps even anterior to the motor cortex. Since the goal of the study is to demonstrate the developmental trajectory of well-known neural responses using an OPM system, should the authors not focus on the best-understood responses (i.e., the primary somatosensory response that occurs from 0.0-0.3 s)?

      We understand the reviewer’s point. The original aim of our manuscript was to investigate the neurodevelopmental trajectory of beta oscillations, not the evoked response. In fact, the evoked response in this paradigm is complicated by the fact that there are three stimuli in a very short (<500 ms) time window. For this reason, we prefer the focus of our paper to remain on oscillations.

      Nevertheless, we agree that not including the evoked responses was a missed opportunity.  We have now added evoked responses to our analysis pipeline and manuscript. As surmised by the reviewer, the M50 shows neurodevelopmental changes (an increase with age). Our methods section has been updated accordingly and Figure 3 has been modified. The figure and caption are copied below for the convenience of the reviewer.

      Author response image 3.

      Beta band modulation with age: (A) Brain plots show slices through the left motor cortex, with a pseudo-T-statistical map of beta modulation (blue/green) overlaid on the standard brain. Peak MNI coordinates are indicated for each subgroup. Time frequency spectrograms show modulation of the amplitude of neural oscillations (fractional change in spectral amplitude relative to the baseline measured in the 2.5-3 s window). Vertical lines indicate the time of the first braille stimulus. In all cases results were extracted from the location of peak beta desynchronisation (in the left sensorimotor cortex). Note the clear beta amplitude reduction during stimulation. The inset line plots show the 4-40 Hz trial averaged phase-locked evoked response, with the expected prominent deflections around 20 and 50 ms. (B) Maximum difference in beta-band amplitude (0.3-0.8 s window vs 1-1.5 s window) plotted as a function of age (i.e., each data point shows a different participant; triangles represent children, circles represent adults). Note significant correlation (𝑅2 \= 0.29, 𝑝 = 0.00004 *). (C) Amplitude of the P50 component of the evoked response plotted against age. There was no significant correlation (𝑅2 \= 0.04, 𝑝 = 0.14 ). All data here relate to the index finger stimulation; similar results are available for the little finger stimulation in Supplementary Information Section 1.

      Regarding the developmental effects, the authors appear to compute a modulation index that contrasts the peak beta window (.3 to .8) to a later 1.0-1.5 s window where a rebound is present in older adults. This is problematic for several reasons. First, it prevents the origin of the developmental effect from being discerned, as a difference in the beta decrease following stimulation is confounded with the beta rebound that occurs later. A developmental effect in either of these responses could be driving the effect. From Figure 3, it visually appears that the much later rebound response is driving the developmental effect and not the beta decrease that is the primary focus of the study. Second, these time windows are a concern because a different time window was used to derive the peak voxel used in these analyses. From the methods, it appears the image was derived using the .3-.8 window versus a baseline of 2.5-3.0 s. How do the authors know that the peak would be the same in this other time window (0.3-0.8 vs. 1.0-1.5)? Given the confound mentioned above, I would recommend that the authors contrast each of their windows (0.3-0.8 and 1.0-1.5) with the 2.5-3.0 window to compute independent modulation indices. This would enable them to identify which of the two windows (beta decrease from 0.3-0.8 s or the increase from 1.0-1.5 s) exhibited a developmental effect. Also, for clarity, the authors should write out the equation that they used to compute the modulation index. The direction of the difference (positive vs. negative) is not always clear.

      We completely understand the referee’s point; referee 1 made a similar point. In fact, there are two limitations of our paradigm regarding the measurement of PMBR versus the task-induced beta decrease:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, as described above it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9 Here, we wanted to keep recordings relatively short for the younger participants, and so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly because the PMBR of the nth trial is still ongoing when the (n+1)th trial begins. Because of this, there is no genuine rest period, and so the stimulus induced beta decrease and subsequent rebound cannot be disentangled. This limitation has now been made clear in our discussion as follows:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      To clarify our method of calculating the modulation index, we have added the following statement to the methods:

      “The beta modulation index was calculated using the equation , where , and are the average Hilbert-envelope-derived amplitudes in the stimulus (0.3-0.8s), post-stimulus (1-1.5s) and baseline (2.5-3s) windows, respectively.”

      Another complication of using a somatosensory task is that the literature on bursting is much more limited and it is unclear what the expectations would be. Overall, the burst probability appears to be relatively flat across the trial, except that there is a sharp decrease during the beta decrease (.3-.8 s). This matches the conventional trial-averaging analysis, which is good to see. However, how the bursting observed here relates to the motor literature and the PMBR versus beta ERD is unclear.

      Again, we agree completely; a motor task would have better framed the study in the context of existing burst literature – but as mentioned above, making 2-year-olds comply with the instructions for a motor task would have been difficult. Interestingly in a recent paper, Rayson et al. used EEG to investigate burst activity in infants (9 and 12 months) and adults during observed movement execution, with results showing stimulus induced decrease in beta burst rate at all ages, with the largest effects in adults21. This paper was not yet published when we submitted our article but does help us to frame our burst results since there is strong agreement between their study and ours. We now mention this study in both our introduction and discussion. 

      Another weakness is that all participants completed 42 trials, but 19% of the trials were excluded in children and 9% were excluded in adults. The number of trials is proportional to the signal-to-noise ratio. Thus, the developmental differences observed in response amplitude could reflect differences in the number of trials that went into the final analyses.

      This is an important observation and we thank the reviewer for raising the issue. We have now re-analysed all of our data, removing trials in the adults such that the overall number of trials was the same as for the children. All effects with age remained significant. We chose to keep the Figures in the main manuscript with all good trials (as previously) and present the additional analyses (with matched trial numbers) in supplementary information. However, if the reviewer feels strongly, we could do it the other way around (there is very little difference between the results).

      Reviewer #3 (Public Review):

      This study demonstrated the application of OPM-MEG in neurodevelopment studies of somatosensory beta oscillations and connections with children as young as 2 years old. It provides a new functional neuroimaging method that has a high spatial-temporal resolution as well wearable which makes it a new useful tool for studies in young children. They have constructed a 192-channel wearable OPM-MEG system that includes field compensation coils which allow free head movement scanning with a relatively high ratio of usable trials. Beta band oscillations during somatosensory tasks are well localized and the modulation with age is found in the amplitude, connectivity, and panspectral burst probability. It is demonstrated that the wearable OPM-MEG could be used in children as a quite practical and easy-to-deploy neuroimaging method with performance as good as conventional MEG. With both good spatial (several millimeters) and temporal (milliseconds) resolution, it provides a novel and powerful technology for neurodevelopment research and clinical applications not limited to somatosensory areas.

      We thank the reviewer for their summary, and their time in reviewing our manuscript.

      The conclusions of this paper are mostly well supported by data acquired under the proper method. However, some aspects of data analysis need to be improved and extended.

      (1) The colour bars selected for the pseudo-T-static pictures of beta modulation in Figures 2 and 3, which are blue/black and red/black, are not easily distinguished from the anatomical images which are grey-scale. A colour bar without black/white would make these figures better. The peak point locations are also suggested to be marked in Figure 2 and averaged locations in Figure 3 with an error bar.

      Thank you for this comment which we certainly agree with. The colour scheme used has now been changed to avoid black. We have also added peak locations. 

      (2) The data points in plots are not constant across figures. In Figures 3 and 5, they are classified into triangles and circles for children and adults, but all are circles in Figures 4 and 6.

      Thank you! We apologise for the confusion. Data points are now consistent across plots.

      (3) Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward modulating may still be impacted by the small head profile. Add more information about source localization accuracy and stability across ages or head size.

      This is an excellent point. We have added to our discussion relating to the accuracy of the forward model. 

      “…We failed to see a significant difference in the spatial location of the cortical representations of the index and little finger; there are three potential reasons for this. First, the system was not designed to look for such a difference – sensors were sparsely distributed to achieve whole head coverage (rather than packed over sensory cortex to achieve the best spatial resolution in one area22). Second, our “pseudo-MRI” approach to head modelling (see Methods) is less accurate than acquisition of participantspecific MRIs, and so may mask subtle spatial differences. Third, we used a relatively straightforward technique for modelling magnetic fields generated by the brain (a single shell forward model). Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward model may still be impacted by the small head profile. This may diminish spatial resolution and future studies might look to implement more complex models based on e.g. finite element modelling23. Finally, previous work 24 suggested that, for a motor paradigm in adults, only the beta rebound, and not the power reduction during stimulation, mapped motortopically. This may also be the case for purely sensory stimulation. Nevertheless, it remains the case that by placing sensors closer to the scalp, OPM-MEG should offer improved spatial resolution in children and adults; this should be the topic of future work…”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Major items to further test include the differing number of trials, the windowing issue, and the focus on motor findings in the intro and discussion. First, I would recommend the authors adjust the number of trials in adults to equate them between groups; this will make their developmental effects easier to interpret.  

      Thank you for raising this important point. This has now been done and appears in our supplementary information as discussed above.

      Second, to discern which responses are exhibiting developmental effects, the authors need to contrast the 0.3-0.8 window with the later window (2.5-3.0), not the window that appears to have the PMBR-like response. This artificially accentuates the response. I also think they should image the 1.0-1.5 vs 2.5-3.0s window to determine whether the response in this time window is in the same location as the decrease and then contrast this for beta differences. 

      We completely understand this point, which relates to separating the reduction in beta amplitude during stimulation and the rebound post stimulation. However, as explained above, doing so unambiguously would require the use of much longer trials. Here we were only able to measure stimulus induced beta modulation (distinct from the separate contributions of the task induced beta power reduction and rebound). It may be that future studies, with >10 s trial length, could probe the role of the PMBR, but such studies require long paradigms which are challenging to implement with children.

      Third, changing the framing of the study to highlight the somatosensory developmental literature would also be an improvement.

      We have added to our introduction a stated in the responses above.

      Finally, the connectivity analysis on data from a somatosensory task did not make sense given the focus of the study and should be removed in my opinion. It is very difficult to interpret given past studies used resting state data and one would expect the networks to dynamically change during different parts of the current task (i.e., stimulation versus baseline).

      We appreciate the point regarding connectivity. However, it was our intention to examine the developmental trajectory of beta oscillations, and a major role of beta oscillations is in mediating connectivity. It is true that most studies are conducted in the resting state (or more recently – particularly in children – during movie watching). The fact that we had a sensory task running is a confound; nevertheless, the connectivity we derived in adults bears a marked similarity to that from previous papers (e.g. 25) and we do see significant changes with age. We therefore believe this to be an important addition to the paper and we would prefer to keep it.

      References

      (1) Holmes, N., Bowtell, R., Brookes, M. J. & Taulu, S. An Iterative Implementation of the Signal Space Separation Method for Magnetoencephalography Systems with Low Channel Counts.

      Sensors 23, 6537 (2023).

      (2) Boto, E. et al. Moving magnetoencephalography towards real-world applications with a wearable system. Nature (2018) doi:10.1038/nature26147.

      (3) Holmes, M. et al. A bi-planar coil system for nulling background magnetic fields in scalp mounted magnetoencephalography. NeuroImage 181, 760–774 (2018).

      (4) Seymour, R. A. et al. Using OPMs to measure neural activity in standing, mobile participants. NeuroImage 244, 118604 (2021).

      (5) Rea, M. et al. A 90-channel triaxial magnetoencephalography system using optically pumped magnetometers. annals of the new york academy of sciences 1517, https://doi.org/10.1111/nyas.14890 (2022).

      (6) Holmes, N. et al. Enabling ambulatory movement in wearable magnetoencephalography with matrix coil active magnetic shielding. NeuroImage 274, 120157 (2023).

      (7) Pakenham, D. O. et al. Post-stimulus beta responses are modulated by task duration. NeuroImage 206, 116288 (2020).

      (8) Fry, A. et al. Modulation of post-movement beta rebound by contraction force and rate of force development. Human Brain Mapping 37, 2493–2511 (2016).

      (9) Pfurtscheller, G. & Lopes da Silva, F. H. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clin Neurophysio 110, 1842–1857 (1999).

      (10) Seedat, Z. A. et al. The role of transient spectral ‘bursts’ in functional connectivity: A magnetoencephalography study. NeuroImage 209, 116537 (2020).

      (11) Baker, A. P. et al. Fast transient networks in spontaneous human brain activity. eLife 2014, 1867 (2014).

      (12) Vidaurre, D. et al. Spectrally resolved fast transient brain states in electrophysiological data. NeuroImage 126, 81–95 (2016).

      (13) Gaetz, W. & Cheyne, D. Localization of sensorimotor cortical rhythms induced by tactile stimulation using spatially filtered MEG. NeuroImage 30, 899–908 (2006).

      (14) Cheyne, D. et al. Neuromagnetic imaging of cortical oscillations accompanying tactile stimulation. Cognitive Brain Research 17, 599–611 (2003).

      (15) van Ede, F., Jensen, O. & Maris, E. Tactile expectation modulates pre-stimulus β-band oscillations in human sensorimotor cortex. NeuroImage 51, 867–876 (2010).

      (16) Salenius, S., Schnitzler, A., Salmelin, R., Jousmäki, V. & Hari, R. Modulation of Human Cortical Rolandic Rhythms during Natural Sensorimotor Tasks. NeuroImage 5, 221–228 (1997).

      (17) Cheyne, D. O. MEG studies of sensorimotor rhythms: A review. Experimental Neurology 245, 27–39 (2013).

      (18) Kilavik, B. E., Zaepffel, M., Brovelli, A., MacKay, W. A. & Riehle, A. The ups and downs of beta oscillations in sensorimotor cortex. Experimental Neurology 245, 15–26 (2013).

      (19) Bauer, M., Oostenveld, R., Peeters, M. & Fries, P. Tactile Spatial Attention Enhances Gamma-Band Activity in Somatosensory Cortex and Reduces Low-Frequency Activity in Parieto-Occipital Areas. J. Neurosci. 26, 490–501 (2006).

      (20) Barone, J. & Rossiter, H. E. Understanding the Role of Sensorimotor Beta Oscillations. Frontiers in Systems Neuroscience 15, (2021).

      (21) Rayson, H. et al. Bursting with Potential: How Sensorimotor Beta Bursts Develop from Infancy to Adulthood. J Neurosci 43, 8487–8503 (2023).

      (22) Hill, R. M. et al. Optimising the Sensitivity of Optically-Pumped Magnetometer Magnetoencephalography to Gamma Band Electrophysiological Activity. Imaging Neuroscience (2024) doi:10.1162/imag_a_00112.

      (23) Stenroos, M., Hunold, A. & Haueisen, J. Comparison of three-shell and simplified volume conductor models in magnetoencephalography. NeuroImage 94, 337–348 (2014).

      (24) Barratt, E. L., Francis, S. T., Morris, P. G. & Brookes, M. J. Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage 181, 831–844 (2018).

      (25) Rier, L. et al. Test-Retest Reliability of the Human Connectome: An OPM-MEG study. Imaging Neuroscience (2023) doi:10.1162/imag_a_00020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point reply in response to the Reviewer’s comments

      Reviewer #1

      Public review:

      [1] (a) Given that only a fraction of the FAPs express BDNF after injury, the authors need to demonstrate the specificity of the Prrx1-Cre for FAPs. This is particularly important because muscle stem cell also express GDNF receptors (Fig. 3C & D) and myogenic progenitors/satellite cells produce BDNF after nerve injury (Griesbeck et al., 1995 (PMID 8531223); Omura et al., 2005 (PMID 16221288)). (b) Moreover, as the authors point out, there are multipotent mesenchymal precursor cells in the nerve that migrate into the surrounding tissue following nerve injury and contribute to regeneration (Carr et al, PMID 30503141). Therefore, there are multiple possible sources of BDNF, highlighting the need to clearly demonstrate that FAP-derived BDNF is essential.

      - (a) As the Reviewer noted, both GDNF receptor expression and increased BDNF expression in response to nerve injury are detectable in both FAPs and muscle stem cells (MuSCs). Therefore, we agree with the Reviewer that demonstrating the specificity of Prrx1-Cre in FAPs is crucial to support our claim. In our previous publication (Kim et al., 2022), using Prrx1-Cre; Rosa-eYFP mice, we showed that while most of the CD31-CD45-Vcam1-Sca1+ FAPs are eYFP+, CD31-CD45-Vcam1+Sca1- MuSCs do not express eYFP (Liu et al., 2015; Kim et al., 2022) (Attached Figure 1). Additionally, genomic DNA PCR using mononuclear cells sorted from our Prrx1Cre; Bdnffl/fl mice showed that DNA recombination in the floxed Bdnf gene could only be detected in FAPs and CD31-CD45-Vcam1-Sca1- cells, but not in MuSCs (Author response image 2). This is consistent with a previous report that showed Prrx1-Cre activity in FAPs, pericytes, vascular smooth muscle cells (vSMCs) and tenocytes (Leinroth et al.,

      2022), where pericytes, vSMCs and tenocytes are included the CD31-CD45-Vcam1Sca1- population (Giordani et al., 2019). Together, these results demonstrate that while Prrx1-Cre is active in FAPs, it is absent in MuSCs.

      Author response image 1.

      Expression of eYFP in muscle-resident, lineage-negative, live mononuclear cells isolated from Prrx1Cre;RosaeYFP mice. Supplemental Figure 3A from Kim et al., 2022. Lin-: lineage-negative (CD31-CD45-); Neg.: Vcam1-Sca1-.

      Author response image 2.

      Recombination of the floxed Bdnf gene in the mononuclear cells sorted from muscles of Prrx1Cre; Bdnffl/fl or Bdnffl/fl mice. Genotypes and cell types sampled for each lane is specified. P4, P5, and P6 indicate primers used for each PCR. Lin+: lineage(CD31/CD45)-positive; DN: CD31-CD45-Vcam1-Sca1-.

      - (b) We appreciate and agree with the Reviewer’s comment that additional experiments are needed to confirm that FAP-derived BDNF is indeed essential for nerve regeneration, considering other potential cellular sources of BDNF, such as nerve-resident mesenchymal precursor cells. One possible experiment that could demonstrate the requirement of FAP-derived BDNF in nerve regeneration would be the transplantation of wild-type FAPs into our Prrx1Cre; Bdnf fl/fl mice and to see if the delay in nerve regeneration and remyelination is recovered, making the process similar to that in control mice. Unfortunately, since the genetic background of our Prrx1Cre; Bdnffl/fl mice is a mixture of B6, 129S4, and BALB/c, immune rejection of the transplanted cells may occur, which makes the experiment technically difficult. Another experimental approach could involve the use of FAP-specific Cre mouse line, as we have mentioned in the Discussion of our original manuscript. However, such a line does not yet exist due to the lack of a marker gene that is expressed specifically in FAPs, but not in nerve-resident mesenchymal precursor cells. Overcoming such technical challenges and demonstrating the requirement of FAP-derived BDNF in nerve regeneration would significantly strengthen our report, though we regret that these methods are currently unavailable.

      [2] Similarly, the authors should provide some evidence that BDNF protein is produced by FAPs. All of their data for BDNF expression is based on mRNA expression and that appears to only be increased in a small subset of FAPs. Perhaps an immunostaining could be done to demonstrate up-regulation of BDNF in FAPs after injury.

      - We appreciate the Reviewer’s constructive comment. To demonstrate that BDNF protein is produced by FAPs upon nerve injury, we performed western blot analysis. FAPs were isolated from either sciatic nerve crush injury-affected muscles at 7 days post injury (dpi) or from the contralateral, uninjured muscles, and protein samples were prepared for SDS-PAGE and western blot using anti-BDNF, anti-PDGFRα and antiGAPDH antibodies. As a result, while both nerve injury-affected and uninjured musclederived FAPs expressed PDGFRα, the mature from of BDNF protein was only detected in nerve injury-affected FAPs, showing that BDNF is indeed expressed in FAPs at the protein level after injury. We have added this new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      [3] The suggestion that Schwann cell-derived GDNF is responsible for upregulation of BDNF in the FAPs is indirect, based largely on the data showing that injection of GDNF into the muscle is sufficient to up-regulate BDNF (Fig. 4F & G). However, to more directly connect the 2 observations in a causal way, the authors should inject a Ret/GDNF antagonist, such as a Ret-Fc construct, then measure the BDNF levels.

      - We appreciate the Reviewer’s constructive comment, and we agree that testing the necessity of GDNF/RET signaling in BDNF upregulation is crucial to link the expression of the two neurotrophic factors in a causal way. As a means to antagonize GDNF/RET signaling, we injected anti-GDNF antibodies into the tibialis anterior and gastrocnemius muscles following sciatic nerve crush injury to block the activity of intramuscular GDNF protein. As a result, although the differences were not statistically significant, we observed a tendancy towards decreased Bdnf mRNA expression upon anti-GDNF injection compared to IgG controls. We have added this new result as New Figure 4—figure supplement 2, and revised our manuscript to include the details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). We have also changed the title of New Figure 4 (line 332) to encompass the new results. We are aware that further experiments that may involve increasing the number of animals tested, increasing the antibody injection dosage or frequency, or implementation of genetic models such as Plp1CreER; Gdnffl/fl should be carried out to validate our hypothesis with statistical significance. Unfortunately, due to limited time, resources, and research funds, we were unable to perform such additional experiments. We hope that the Reviewer understands these limitations.

      [4] (a) In assessing the regeneration after nerve crush, the authors focus on remyelination, for example, assessing CMAP and g-ratios. However, they should also quantify axon regeneration, which can be done distal to the crush injury at earlier time points, before the 6 weeks scored in their study. Evaluating axon regeneration, which occurs prior to remyelination, would be especially useful because BDNF can act on both Schwann cells, to promote myelination, and axons, enhancing survival and growth. (b) They could also evaluate the stability of the neuromuscular junctions, particularly if a denervation was done with the conditional knock outs, although that may be a bit beyond the scope of this study.

      - (a) As the Reviewer mentioned, BDNF is known to act on both Schwann cells and axons, where it promotes myelination and axonal growth, respectively (Oudega and

      Hagg, 1998; Zhang et al., 2000; Chan et al., 2001; Xiao et al., 2009; English et al.,

      2013). We fully agree with the Reviewer’s comment that quantification of axon regeneration, which could be achieved through immunostaining of the distal part of the sciatic nerve at earlier time points after injury, would shed light on whether FAPderived BDNF can also contribute to axon regeneration in addition to remyelination. Unfortunately, we could not perform such additional experiments within the limited time frame, since preparing enough numbers of control and conditional knockout mice that match the age groups used in this study (3-4 months old), followed by waiting for additional 2-4 weeks after nerve crush injury for sample collection, and subsequent immunostaining for quantification could take almost 6 months in total. We hope that the Reviewer understands this limitation.

      - (b) We appreciate the Reviewer’s constructive comment. Although the number of animals used for neuromuscular junction (NMJ) analyses was not sufficient, we had briefly examined the structure of NMJs at 4 weeks post nerve crush injury in control (Ctrl) and conditional knockout (cKO) mice as a preliminary experiment. As a result, no significant differences were observed between Ctrl and cKO mice in terms of NMJ morphology and innervation (Author response image 3). 

      Author response image 3.

      Structures of neuromuscular junctions from Ctrl vs cKO mice at 4 weeks post nerve crush injury. Whole-mount immunostaining was done using the exterior digitorum longus muscles that were affected by sciatic nerve crush injury. Samples were stained with α-bungarotoxin (green), neurofilament (red), and synaptophysin (blue). Scale bar: 50 μm. 

      Going back to part (a) of this Reviewer’s comment, considering the data presented in Author response image 3, where innervation of axons into acetylcholine receptor clusters was not significantly different between Ctrl versus cKO mice, FAP-derived BDNF may not be critical for the axonal growth upon nerve injury. Although we acknowledge that additional experiments are required to draw a meaningful conclusion on this point, we could not perform such additional experiments due to insufficient time and resources.

      We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] In citing the ability of BDNF to promote Schwann cell myelination the authors should include Chan et al., 2001 (PMID 11717413) in addition to the Zhang et al, 2000 and Xiao et al, 2009 references.

      - We apologize for missing out the reference mentioned by the Reviewer. We have added the suggested reference in our revised manuscript (lines 395, 425, and 517).

      Reviewer #2

      Public review:

      [1] Although, I find the data the authors generated enough for their claims. I do see them as relatively poor, and (a) a complementary analysis of protein expression would strengthen the paper through immunostaining of the different genes mentioned for FAPs and Schwann cells. The model is entirely supported by measuring mRNA levels and negative regulation of gene expression in specific cells. Additionally, (b) what happens to the structure of the neuromuscular junction after regeneration when GDNF or BDNF expression is reduced? (c) The determination of decreasing levels of FAPs BDNF mRNA during aging is interesting; is the gain of BDNF expression in FAPs reverting the phenotype?

      - (a) We appreciate and agree with the Reviewer’s comment that validation of BDNF protein expression in FAPs and GDNF protein expression in Schwann cells upon nerve injury would strengthen this paper. Regarding GDNF protein expression in Schwann cells upon nerve injury, it has already been demonstrated by previous studies (Höke et al., 2002; Xu et al., 2013). For BDNF protein expression in FAPs upon nerve injury, we performed western blot analysis for validation, as mentioned in the response to Reviewer #1 Public review [2]. The results showed that while the mature form of BDNF protein could not be readily detected in FAPs isolated from uninjured muscles, it could be detected in FAPs isolated from sciatic nerve crush injury-affected muscles at 7 days post injury. We have added the new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      - (b) Though the data is preliminary, we examined the structures of neuromuscular junctions (NMJs) from control and Prrx1Cre; Bdnf fl/fl mice at 4 weeks post injury in the exterior digitorum longus muscles, as mentioned in the response to Reviewer #1 Publilc review [4](b). As a result, we could not identify significant differences between control versus Prrx1Cre; Bdnf fl/fl mice, where BDNF expression is reduced specifically in Prrx1-expressing cells, including FAPs (Attached Figure 3). Since other cellular sources of BDNF, such as Schwann cells, exist, regeneration of the NMJs may not have been as significantly affected as remyelination in our Prrx1Cre; Bdnf fl/fl mice. However, further experiments with a sufficient number of mice and more observation time points are required to statistically validate this hypothesis in detail. Unfortunately, preparing samples for such additional analyses would take more than four months, as we need to produce sufficient numbers of control and Prrx1Cre; Bdnf fl/fl mice that match the age groups used in this study. We hope that the Reviewer understands our limitation.

      Regarding analyzing NMJ structures after regeneration affected by reduced GDNF levels, using genetic models such as Plp1CreER; Gdnffl/fl mice would be appropriate, as we have used the Prrx1Cre; Bdnffl/fl mice in this study to reduce BDNF levels produced by FAPs. Unfortunately, we do not have the Gdnffl mice, and obtaining these mice to produce Plp1CreER; Gdnffl/fl mice and performing the additional experiment would take too much time for this current revision. In a further study, we will try to perform the additional experiment by obtaining the required mouse line. We hope that the Reviewer understands our limitation.

      - (c) We appreciate the Reviewer for highlighting this point. In this paper, we have shown that BDNF expression upon nerve injury is decreased in aged FAPs compared to young adult FAPs, and suggested that this may be one of the causes of the delayed nerve regeneration phenotype in aged mice. Previously, it has been reported that while intramuscular injection of BDNF accelerates nerve regeneration, intramuscular injection of anti-BDNF antibodies delays the regeneration process (Zheng et al., 2016). This implies that intramuscular levels of active BDNF can significantly influence the speed of nerve regeneration. Therefore, the gain of BDNF expression in aged FAPs may contribute to reversing the delayed nerve regeneration phenotype in aged mice, since it would result in additional supply of active, intramuscular BDNF, which has previously been shown to accelerate nerve regeneration. Though experimental validation is required to support such claim, we could not obtain sufficient numbers of aged mice within the limited time frame. We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] The authors should include the experimental design and several drawings in the leading figures indicating, for example, how remyelination after injury was quantified and how the response of regenerated sciatic nerve to a depolarizing stimulus was studied.

      - We apologize for any confusion caused by insufficient information provided in the leading figures. Unfortunately, due to limited space, we could not add experimental designs or drawings in the leading figures. Instead, to do our best to comply with the

      Reviewer’s comment, we have revised the figure legends in the leading figures so that the experimental designs or diagrams can be referred to in the figure supplements.

      We hope that the Reviewer understands this limitation.

      Reviewer #3

      Public review:

      [1] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([2]), as the same comment is repeated.

      [2] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation In FAP cells. The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([3]), as the same comment is repeated.

      Recommendations for the authors:

      [1] Although this is a novel study and contains very well-performed parts, the GDNF section is preliminary and requires additional experimentation. In the introduction authors describe well FAPs but even do not mention how GDNF is signaling. Moreover, the reader may get an impression that Ras-MAPK pathway is the only or at least the main GDNF signaling pathway. In fact, for neurons Akt and Src signaling pathways play also crucial role.

      - We apologize for the missing content in the Introduction section of our manuscript and for any confusion caused by our misleading description of the GDNF signaling pathway. We have revised our manuscript to include the GDNF signaling pathway in the Introduction section, along with a description of other downstream signaling pathways of GDNF that are known to play crucial roles, as mentioned by the Reviewer (lines 115-130). Additionally, we changed the expression in the Results section to avoid making any misleading impressions (lines 318-319).

      [2] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - We appreciate the Reviewer for the constructive comment. Though we fully agree with the Reviewer that validating the expression of RET and GFRα1 proteins in FAPs is needed, we were unable to obtain the antibodies required for such experiments within the limited time frame for this revision. We hope that the Reviewer understands our limitation. Although we could not directly show the expression of those GDNF receptor genes at the protein level in FAPs, based on the result where intramuscular GDNF injection could sufficiently induce Bdnf expression in FAPs compared to PBS control in the absence of nerve damage, it is likely that GDNF receptors are indeed expressed at the protein level in FAPs, since if otherwise, FAPs would not have been able to respond to the injected GDNF protein. Nevertheless, in a future study, we will try to validate the protein-level expression of GDNF receptors in FAPs to comply with the Reviewer’s suggestion and to further support this study.

      [3] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. Authors can monitor activation of MAPK pathway by detecting phospho-Erk and PI3 kinase-Akt pathway measuring phospho-S6 using immunohistochemistry. We can recommend to use the following antibodies: pErk1/2 (1:300, Cell Signaling, Cat# 4370L RRID:AB_2297462), pS6 (1:300, Cell Signaling, Cat# 4858L RRID:AB_1031194). These experiments are crucial because RET and GFRa1 proteins maybe not expressed at the sufficient level on the cell surface.

      - We sincerely appreciate the Reviewer’s constructive comment. In this study, we suggested that the GDNF-BDNF axis within FAPs would signal through the MAPK pathway based on the bioinformatic analysis of our single cell RNA-seq data and matching the results with the previously known pathways. We fully agree that monitoring the activation of the MAPK pathway and the PI3K-Akt pathway by immunohistochemistry would experimentally demostrate whether GDNF can activate those pathways within FAPs through GFRα1/RET activation. Unfortunately, we could not obtain the antibodies suggested by the Reviewer for this revision due to insufficient research funds and limited time frame. We hope that the Reviewer understands our limitation. In future studies, we will try to validate the detailed molecular pathway that mediates the GDNF-BDNF axis in FAPs by incorporating the methodology suggested by the Reviewer, along with implementation of genetic models such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl or Prrx1Cre; Gfra1fl/fl to validate whether Schwann cell-derived

      GDNF can actually signal through its canonical receptor RET/GFRα1 expressed in FAPs to induce expression of BDNF upon nerve injury.

      [4] (a) There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation in FAP cells. Authors can use GDNF blocking antibodies, siRNA or use RET or GFRa1 cKO mice to delete them from FAP cells. (b) The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive. Authors should show that GDNF injection is increasing BDNF protein levels in FAPs. To get sufficient material for ELISA detection of BDNF is perhaps problematic. However, authors can use BDNF antibodies from Icosagen company and use IHC.

      - (a) We appreciate the Reviewer for the critical comment. As mentioned in the reply for Reviewer #1 Public review [3], we used GDNF blocking antibodies to reduce GDNF signaling within the tibialis anterior and gastrocnemius muscles by intramuscular injection after sciatic nerve crush injury, and included the result as a new figure supplement in our revised manuscript (New Figure 4—figure supplement 2) with its details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). Though the results were not statistically significant, intramuscular injection of anti-GDNF antibodies showed a tendency toward reduced Bdnf expression in FAPs, compared to IgG controls. As mentioned in the reply for Reviewer #1 Public review [3], and as suggested by the Reviewer, using cKO mice such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl, or Prrx1Cre; Gfra1fl/fl mice would further validate the GDNF-BDNF axis suggested in this study, likely with statistical significance. Unfortunately, obtaining these genetic models within the limited time frame of this current revision is not feasible. We will try to adopt such models in our future study to validate the role of Schwann cell-derived GDNF in inducing BDNF expression in FAPs via activation of RET/GFRα1.  

      - (b) We appreciate the Reviewer for the constructive comment. Though we fully agree that the experiment suggested by the Reviewer would validate the synthesis and secretion of BDNF protein by GDNF signaling in FAPs, we were not able to perform it due to lack of research funds to obtain enough amount of the GDNF protein. We hope that the Reviewer understands our limitation. Still, combining the results from New Figure 4H in this study with the New Figure 4F, where GDNF injection induced Bdnf mRNA expression in FAPs, and BDNF protein expression in FAPs in response to nerve injury was demonstrated via western blot, we anticipate that GDNF injection would increase BDNF protein levels in FAPs, though direct validation of this statement would require conducting the additional experiments mentioned by the Reviewer.

      References

      Chan JR, Cosgaya JM, Wu YJ, and Shooter EM (2001). Neurotrophins are key mediators of the myelination program in the peripheral nervous system. Proceedings of the National Academy of Sciences 98:14661-14668.

      English AW, Liu K, Nicolini JM, Mulligan AM, and Ye K (2013). Small-molecule trkB agonists promote axon regeneration in cut peripheral nerves. Proc Natl Acad Sci U S A 110:16217-22.10.1073/pnas.1303646110

      Giordani L, He GJ, Negroni E, Sakai H, Law JY, Siu MM, Wan R, Corneau A, Tajbakhsh S, and Cheung TH (2019). High-dimensional single-cell cartography reveals novel skeletal muscle-resident cell populations. Molecular Cell 74:609-621. e6.

      Höke A, Gordon T, Zochodne D, and Sulaiman O (2002). A decline in glial cell-linederived neurotrophic factor expression is associated with impaired regeneration after long-term Schwann cell denervation. Experimental neurology 173:77-85.

      Kim J-H, Kang J-S, Yoo K, Jeong J, Park I, Park JH, Rhee J, Jeon S, Jo Y-W, and Hann S-H (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight 7:

      Leinroth AP, Mirando AJ, Rouse D, Kobayahsi Y, Tata PR, Rueckert HE, Liao Y, Long JT, Chakkalakal JV, and Hilton MJ (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports 39:

      Liu L, Cheung TH, Charville GW, and Rando TA (2015). Isolation of skeletal muscle stem cells by fluorescence-activated cell sorting. Nature protocols 10:1612-1624.

      Oudega M, and Hagg T (1998). Neurotrophins promote regeneration of sensory axons in the adult rat spinal cord. Brain Research 818:431-438.10.1016/S0006-8993(98)01314-6

      Xiao J, Wong AW, Willingham MM, Kaasinen SK, Hendry IA, Howitt J, Putz U, Barrett GL, Kilpatrick TJ, and Murray SS (2009). BDNF exerts contrasting effects on peripheral myelination of NGF-dependent and BDNF-dependent DRG neurons. J Neurosci 29:4016-22.10.1523/JNEUROSCI.3811-08.2009

      Xu P, Rosen KM, Hedstrom K, Rey O, Guha S, Hart C, and Corfas G (2013). Nerve injury induces glial cell linederived neurotrophic factor (gdnf) expression in schwann cells through purinergic signaling and the pkcpkd pathway. Glia 61:1029-1040.

      Zhang JY, Luo XG, Xian CJ, Liu ZH, and Zhou XF (2000). Endogenous BDNF is required for myelination and regeneration of injured sciatic nerve in rodents. European Journal of Neuroscience 12:4171-4180.10.1111/j.1460-9568.2000.01312.x

      Zheng J, Sun J, Lu X, Zhao P, Li K, and Li L (2016). BDNF promotes the axonal regrowth after sciatic nerve crush through intrinsic neuronal capability upregulation and distal portion protection. Neuroscience letters 621:1-8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses to be addressed: 

      (1) More detail is required to understand the effects of genetic and drug manipulations on heart rate as these are important experiments. At the very least, a discussion on the limitations of these manipulations is needed. 

      - For example, how does one separate the pulsatile versus nutritive effects of blood flow/heartrate reduction? 

      - The conclusion that arterial SMC differentiation is driven by pulsatile blood flow needs to be toned down. Indeed, this conclusion is mainly supported by in vitro cell co-cultures exposed to laminar versus pulsatile flow. In vivo, reducing Tnnt2a expression affects cardiac contractility and blood flow does not selectively affect pulsatility. To make this conclusion, the authors would need an experimental means to selectively dampen the pulsatility of blood flow.

      We understand this concern and we toned down the statements related to the pulsatile flow of our conclusion by using 'flow' instead of 'pulsatile flow' in all text except for the in vitro co-cultures part. We also added a paragraph to discuss the limited capability of qualitatively reduce blood flow in vivo, and acknowledge that the effects of nutrients and flow reduction could not be uncoupled in live zebrafish embryos. We proposed that in the future, in vitro 3D vascular culture models may be combined with microfluidics to precisely calibrate nutrient composition in culture media, flow velocity and pulse; these methods would help address these questions more thoroughly. See page 11-12 line 312-322.

      (2) Since mural cells are sensitive to transmural pressure, could the authors elaborate on the potential role of raised intravascular pressure in SMC differentiation? This would better parallel rodents and humans. 

      We thank you for this suggestion. We added a paragraph to discuss the potential role of raised intravascular pressure in VSMC differentiation in the discussion section (see page 11 line 296-311).

      (3) The authors use nifedipine to reduce blood flow. Nifedipine is a specific and potent inhibitor of voltage-dependent calcium channels (VDCC) which are expressed in SMCs. Prior studies (PMID: 35588738) showed that VDCC blockers increased rather than inhibited SMC differentiation. Nifedipine is also likely to act upon VSMC calcium handling in the circle of Willis, which may in turn affect cell maturation. Could the authors comment on this seeming discrepancy?

      It is possible that off-target or indirect effects of Nifedipine decrease smooth muscle cell proliferation, or that altered cardiac contractility fundamentally alters aspects of vascular development other than blood flow. 

      - Additionally, it would be helpful to report the quantitative heart rate reduction achieved with Nifedipine. This would clear up concerns that the heart rate reduction is too large for normal vascular development to occur, and thus decrease proliferation rate independent of changes in blood flow pulsatility. 

      We concur with these comments, which is why our experimentation with Nifedipine is reinforced by employing an alternative, non-pharmacological strategy to inhibit blood flow: the use of morpholino against tnnt2a gene. The results with either Nifedipine or tnnt2a support the lack of VSMCs maturation. In addition, we provided the quantitative heart rate reduction achieved with Nifedipine shown in new Figure S2A-S2C, suggesting that the drug is not completely halting the heart rate but decreasing it. Nevertheless, we report that Zebrafish embryos can survive and develop a normal blood vascular system without any heartbeat. Hence, we exclude that the effect on VSMCs maturation is linked non-specifical effects caused by the loss of heartbeat. Nevertheless, we now acknowledged in our discussion the limitation of nifedipine, as it may affect VSMC through VDCCs (page 12, line 323-334).

      We also added a paragraph in the discussion section to compare nifedipine, an L-type VDCC blocker, and ML218, a T-type VDCC selective inhibitor from the previous study (Ando et al., 2022). We noted that in this previous study, the increase in VSMC differentiation only occur on anterior metencephalic central arteries (AMCtAs) that are more than 40 mm away from the BCA; these AMCtAs are much smaller than CoW arteries and have different geometry hence possible different kinetics of VSMC maturation (Ando et al., 2022) as our manuscript discovery would suggest.

      (4) The authors should provide more information on how blood flow velocity and wall shear stress are calculated from the Circle of Willis vascular structure. It is presumed that these values are dependent upon the 3-D morphology of the vessel network, as labeled by intravenous dextran dye, but this is not clear. (a second reviewer similarly comments: I was unclear how flow velocity values were obtained in Fig. 3E. Are they based on computational simulation, or are they experimentally calculated following the dextran injection?) Small local differences in vessel diameter and shape will influence blood flow velocity, but these morphological changes are not clearly articulated. Further, it is unclear how flow input levels to the CaDI and basilar arteries are decided across time points. For instance, is it possible to measure the blood flow speed empirically with line-scanning or high-speed tracking of labeled blood cells or particles? This would provide validation of the modeling results. 

      The computational fluid dynamic simulation was performed according to previous study from our lab (Barak et al., 2021). Blood flow velocity and wall shear stress are dependent upon the 3D morphology of the vessel network labeled by intravascular dextran. Details on how the computational fluid dynamic simulation was performed are added in method section page 17 line 433-449.

      Moreover, to address this reviewer concern we have now provided new experimental measurement of blood flow using the red blood cell (RBC) velocity via axial line scanning microscopy in Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos at 54 hpf, 3 dpf, and 4 dpf. By using the experimental RBC velocity, we re-simulated the computational fluid dynamic. The new findings align with our conclusion and are further elaborated upon in response to this reviewer comment listed as point 6. Details on how RBC velocity calculated is added in method section page 16 line 414-431.

      (5) Does the cardiac injection of dextran itself affect the diameter of the arteries, given the invasiveness of the procedure? This could be examined in fish with a transgenic endothelial label with and without dextran. 

      Here, we performed an experiment on wildtype zebrafish at 5 days post-fertilization (dpf) with and without Dextran injection, examining the effects of Dextran injection on vessel diameters. As shown in the representative image below, the XZ panel clearly illustrates a Dextran-filled PCS vessel with no alteration in vessel size. Dextran microangiography, a technique employed to obtain vessel geometry with fluorescent microsphere, has been well established in zebrafish (Kamei et al., 2010). Our findings, demonstrating that Dextran does not affect vessel size, are consistent with previous studies utilizing Dextran microangiography.

      Author response image 1.

      (6) The data from the microangiography experiment in Figure 3 does not fully support the stated results. The authors report that the CaDI had the highest blood flow speed starting from 54 hpf, but it does not appear to be higher than the other arteries at this time point. Additionally, there is not sufficient evidence that wall shear stress coincides with smooth muscle cell differentiation in the CaDI. Wall shear stress appears to be similar between 54 hpf and 3 dpf in the CaDI, only increasing between 3 dpf and 4 dpf, while differentiation is shown to begin at 3 dpf. The authors need to address this and/or soften conclusions. 

      First, In response to this specific reviewer concern, we measured red blood cell (RBC) velocity by used axial line scanning microscopy to analyze Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos (the detailed method was added in Method section in the manuscript). We replaced the computational simulated blood flow velocity by RBC velocity in new Figure 3E-3G, and re-run the computational simulated wall shear stress (WSS) using the RBC velocity in new Figure 3I-3K. We compared RBC velocity and WSS among different vessels at each time point. We confirmed that CaDI has the highest RBC velocity starting from 54 hpf to 4 dpf (new Figure 3A-3C, and 3E-3G) and found an overall increase in average WSS from 54 hpf to 4 dpf (new Figure 3A-3C, and 3H). Further, WSS in CaDI was significantly higher than BCA and PCS at 54 hpf, 3 dpf, and 4 dpf (new Figure 3A-3C, 3I-3K). Altogether, the CFD simulation suggests that CoW arteries experience different hemodynamic WSS that is associated with spatiotemporal pattern of VSMC differentiation on CoW arteries.”.  (Page 6, line 153-162)

      Second, to identify the correlation of WSS and VSMC differentiation in CaDI, we performed Pearson correlation analysis. In the image provided here, we plotted a linear regression with normalized # of acta2+ cells in CaDI and WSS with developmental stages (54 hpf, 3 and 4 dpf), and performed Pearson correlation coefficient analysis by using GraphPad Prism 10.0.3. The correlation coefficient r = 0.595, suggesting that the two variables (acta2+ cells and WSS) tend to increase together with developmental stages (54 hpf, 3 and 4 dpf).

      Author response image 2.

      Third, we softened our conclusion as the RBC velocity across CoW arteries was differentially distributed while VSMC differentiation occurred in these vessels.

      (7) It is unclear if acta2 expression is conferring vascular tone, as would be expected if the cells are behaving as mature VSMCs. Does arterial diameter decrease with an increase in acta2 expression? Are acta2-positive mural cells associated with more dynamic changes in arteriole diameter under basal or stimulated conditions? 

      Thanks for this interesting question. VSMC maturation and its vasoactivity could be further investigated in the future. Our study focused on early stage of VSMC differentiation, in which pdgfrb+ progenitors started to express VSMC marker acta2. We discussed the onset of transgelin expression and loss of abcc9 expression as markers of VSMC maturation. In addition, a previous study found that VSMC covered vessels in zebrafish brain dilate as early as 4 dpf and constrict at 6 dpf (Bahrami & Childs, 2020). Future study may focus on the association between expression of different VSMC markers and VSMC functional maturation. (page 10, line 272-279)

      (8) The authors argue that CoW vessels transition from venous to arterial identity (Fig. 1). However, kdrl is not an ideal arterial marker for this experiment as it is expressed in both arteries and veins. While it is true that many arterial beds have stronger kdrl expression than the veins, its expression in both arteries and veins changes with developmental stage, and its expression level may vary depending on the type of vessel. Therefore, showing that kdrl increases from 32 hpf - 4 dpf in CoW vessels is not convincing because its expression may increase in both venous or arterial vasculature as the vessels mature. In addition, flt4 expression is not exclusively venous; for example, it has noticeable expression in the dorsal aorta at 24-32 hpf stages. It would be helpful to confirm this transition by analyzing additional arterial and venous markers. 

      We acknowledge this and we added a paragraph to discuss the limitation. We combined loss of flt4 and increase in kdrl to establish the temporal sequence of circle of Willis morphogenesis, arterial specification, and VSMC differentiation. We acknowledge that additional arterial and venous markers need to be analyzed for a more thorough characterization of arterial specification in vertebrate brain vascular development. See page 12 line 335-341.

      (9) The authors show that acta2+ VSMCs are absent in tnnt2a MO embryos, concluding that blood flow is required for their differentiation from pericytes. However, there is no data showing that pericytes are still present in tnnt2a MO embryos. Although this has been previously shown by Ando et al 2016, it would be beneficial to confirm in the current study as this is a critical piece of evidence needed for this conclusion. 

      To determine if blood flow is dispensable for pdgfrb+ progenitor recruitment, we performed tnnt2a MO (0.35 ng/embryo) injection in Tg(pdgrb:egfp, kdrl:ras-mcherry) ncv22/s896. Loss of blood flow did not affect pdgfrb+ progenitor emergence around the CoW (new Figure S2G-S2H) at 3 days post fertilization (dpf). This is consistent with previous observation in Ando et al 2016 Figure S2C (Ando et al., 2016).

      (10) The authors show that klf2a MO injected embryos have a reduced number of VSMCs at 3 dpf but a normal number at 4 dpf (Fig. 6), concluding that klf2a is only important to initiate CaDI muscularization. If this is true, it would raise important questions about how VSMCs differentiate at a later stage in the absence of klf2a. For instance, is blood flow not required to differentiate at a later stage, or is there another factor that compensates in the absence of klf2a? The alternative explanation/ caveat is that klf2a MO loses efficacy with development, leading to the recovery of VSMCs at this stage. Therefore, it would be important to confirm this result using a genetic klf2a mutant. 

      Thank you for pointing this out.  We note that based on the klf2a reporter line, klf2a activity in CoW arterial endothelial cells is highly correlated with the number of acta2+ VSMCs in CaDI, BCA and PCS at 3 dpf (r = 0.974, new Figure S5J). Interestingly however, klf2a activity remained stable from 3 dpf to 4 dpf, well beyond initiation of VSMC differentiation. Thus, we speculate sustained klf2a expression may support further maturation of VSMCs, as acta2+ VSMCs showed distinct morphology at 4 dpf compared with 3 dpf. (Page 10, line 268-272). As for the observation that klf2a morphants have normal number of VSMCs at 4 dpf, we think that in addition to the temporary effect of morpholino, a proximal explanation is compensation by paralogous klf2b in zebrafish. We acknowledge that further characterization of CoW VSMC development in klf2a and klf2b double genetic mutants (Rasouli et al., 2018; Steed et al., 2016) may help determine whether klf2b compensates klf2a in CoW VSMC differentiation beyond 4 dpf. See page 10-11 line 292-295.

      (11) A large part of the discussion focuses on Notch and Wnt signaling, as downstream Klf2 effectors. While these are reasonable hypotheses to propose, there is no data on the involvement of these pathways in the current study. It seems excessive to speculate on detailed mechanisms of how Klf2 activates Notch and Wnt signaling in the absence of data showing that these pathways are affected in CoW vessels. Therefore, the discussion could be shortened here unless additional data can be obtained to demonstrate the involvement of these pathways in VSMCs in CoW.

      We concur and have condensed the discussion on Notch and Wnt signaling as downstream klf2 effectors.

      Minor comments: 

      (1) Line 138 "CaDI is the only vessels in the CoW receiving pulsatile arterial blood low ... ". Adding a reference to support this statement would be useful. 

      We agree and revised this sentence into ‘CaDI receive proximal arterial feed through lateral dorsal aorta from cardiac outflow tract (Isogai et al., 2001)’. It was also based on our general observation of zebrafish vascular anatomy and blood flow under a confocal microscope.

      (2) The image insets in Figs. 1A, 2A, 4E-L, 5A, 6A are quite small. Please make them larger to help the reader interpret the findings. 

      We agree. We maximized the image size to help the reader interpret the finding, and to visualize confocal images and schematics side-by-side.

      (3) The schematics in Figs. 1-2, and 4-6 are helpful, but the different cell types are difficult to see because they are small and their colors/shapes are not very distinct. 

      We agree. We increased the size and color contrast to provide better visualization of the schematics in new schematic Figures. 1-2 and 4-6.

      (4) It is stated that there are no diameter differences between different arteries, but statistics are not reported. 

      The statistics in Figure 3D were performed by ordinary two-way ANOVA followed by Tukey’s multiple comparisons test, with a single pooled variance. Here we added pairwise comparisons among vessels in the CoW. Hence when non indicated the difference are non-significant.

      (5) Figure 3F would be better visualized on a log scale, as it is difficult to see the differences between each post-fertilization timepoint. 

      We agree. In the new Figure 3H, the average wall shear stress (WSS) in CoW arteries is presented on log scale in y axis to see the differences between each post-fertilization timepoint.

      (6) Please provide more background and validation on the pericyte cell line, and their use for the questions in this study. 

      Thank you for the question, TgBAC(pdgfrb:egfp)ncv22 was generated and described by Ando et al 2016 to clarify mural cell coverage of vascular endothelium in zebrafish (Ando et al., 2016). We added a describe in the method section to provide background and validation on this pericyte line (see page 13 line 368-372).

      (7) Flow velocity and WSS changes are shown in each vessel in Figs. 3E,G. However, the comparison should be made between different types of vessels to see if there is a statistical difference and PCS, for example, which would explain differences in VSMC coverage. 

      We agreed. We compared the difference among arteries in the CoW at each developmental timepoint and performed ordinary one-way ANOVA with Tukey’s multiple comparisons test. Figure. 3E is replaced by new Figure. 3E-G and Figure. 3G is replaced by new Figure. 3I-K.

      (8) Similarly, between CaDI, the number of klf2a cells in Fig. 5B should be compared between different vessels, not between different stages of the same vessel. 

      We agree. In new Figure 5B-E, the number of klf2a+ cells per 100 μm vessel length are compared among different vessels at each developmental stage and analyzed by ordinary one-way ANOVA with Tukey’s multiple comparisons test.

      (9) When quantifying klf2+ cells in Fig. 5, it would be helpful to quantify klf2 expression level between cells in different vessels. This could be done by quantifying GFP expression in existing images. The difference in expression level may explain the variation between CaDI and PCS more accurately than just the difference in cell number. 

      The GFP expression reflect the stability of GFP protein expression and labels discrete nuclei with active klf2a expression. Hence the quantification of GFP level might not give an accurate readout of klf2a expression per se but rather of its activity. For this reason we don’t think that this experiment will add accurate measurement of klf2a expression.

      (10) Do data points in Figure 4D correspond to different cells in the same chamber experiment? If so, they cannot be treated as independent replicates. Each data point should correspond to an independent replicate experiment. 

      We agree. Now in the figure legend, we report the number of cells analyzed.

      (11) Graph placement is confusing in Figs. 4I, M. An adjacent Fig. 4G shows Nifedipine treated embryos, while the graph next to (Fig. 4I) shows acta+ cell number from tnnt2a 4 dpf experiment. Similarly, the bottom Fig. 4K tnn2a 4 dpf MO experiment has an adjacent graph Fig. 4M, which shows nifedipine treatment quantification, which makes it very confusing. 

      We agreed. We rearranged Figure 4E (representative images of control embryos at 3 dpf and 4 dpf), Figure 4F (tnnt2a MO embryos at 3 dpf and 4 dpf), Figure 4G (nifedipine treated embryos at 3 dpf and 4 dpf).

      Reference:

      Ando, K., Fukuhara, S., Izumi, N., Nakajima, H., Fukui, H., Kelsh, R. N., & Mochizuki, N. (2016). Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development, 143(8), 1328-1339. https://doi.org/10.1242/dev.132654

      Ando, K., Tong, L., Peng, D., Vazquez-Liebanas, E., Chiyoda, H., He, L., Liu, J., Kawakami, K., Mochizuki, N., Fukuhara, S., Grutzendler, J., & Betsholtz, C. (2022). KCNJ8/ABCC9-containing K-ATP channel modulates brain vascular smooth muscle development and neurovascular coupling. Dev Cell, 57(11), 1383-1399 e1387. https://doi.org/10.1016/j.devcel.2022.04.019

      Bahrami, N., & Childs, S. J. (2020). Development of vascular regulation in the zebrafish embryo. Development, 147(10). https://doi.org/10.1242/dev.183061

      Barak, T., Ristori, E., Ercan-Sencicek, A. G., Miyagishima, D. F., Nelson-Williams, C., Dong, W., Jin, S. C., Prendergast, A., Armero, W., Henegariu, O., Erson-Omay, E. Z., Harmanci, A. S., Guy, M., Gultekin, B., Kilic, D., Rai, D. K., Goc, N., Aguilera, S. M., Gulez, B., . . . Gunel, M. (2021). PPIL4 is essential for brain angiogenesis and implicated in intracranial aneurysms in humans. Nat Med, 27(12), 2165-2175. https://doi.org/10.1038/s41591-021-01572-7

      Isogai, S., Horiguchi, M., & Weinstein, B. M. (2001). The vascular anatomy of the developing zebrafish: an atlas of embryonic and early larval development. Dev Biol, 230(2), 278-301. https://doi.org/10.1006/dbio.2000.9995

      Kamei, M., Isogai, S., Pan, W., & Weinstein, B. M. (2010). Imaging blood vessels in the zebrafish. In Methods in cell biology (Vol. 100, pp. 27-54). Elsevier.

      Rasouli, S. J., El-Brolosy, M., Tsedeke, A. T., Bensimon-Brito, A., Ghanbari, P., Maischein, H. M., Kuenne, C., & Stainier, D. Y. (2018). The flow responsive transcription factor Klf2 is required for myocardial wall integrity by modulating Fgf signaling. Elife, 7. https://doi.org/10.7554/eLife.38889

      Steed, E., Faggianelli, N., Roth, S., Ramspacher, C., Concordet, J. P., & Vermot, J. (2016). klf2a couples mechanotransduction and zebrafish valve morphogenesis through fibronectin synthesis. Nat Commun, 7, 11646. https://doi.org/10.1038/ncomms11646

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their careful reading of our manuscript and for the detailed and constructive feedback on our work. Please find attached the revised version of the manuscript. We performed an extensive revision of the manuscript to address the issues raised by the referees. We provide new analyses (regarding the response consistency and the neural complexity), added supplementary figures and edits to figures and texts. Based on the reviewers’ comments, we introduced several major changes to the manuscript.

      Most notably, we

      • added a limitation statement to emphasize the speculative nature of our interpretation of the timing of word processing/associative binding

      • emphasized the limitations of the control condition

      • added analyses on the interaction between memory retrieval after 12h versus 36h

      • clarified our definition of episodic memory

      • added detailed analyses of the “Feeling of having heard” responses and the confidence ratings

      We hope that the revised manuscript addresses the reviewers' comments to their satisfaction. We believe that the revised manuscript has been significantly improved owing to the feedback provided. Below you can find a point-by-point response to each reviewer comment in blue. We are looking forward that the revision will be published in the Journal eLife.

      Reviewer #1 (Public Review):

      The authors show that concurrently presenting foreign words and their translations during sleep leads to the ability to semantically categorize the foreign words above chance. Specifically, this procedure was successful when stimuli were delivered during slow oscillation troughs as opposed to peaks, which has been the focus of many recent investigations into the learning & memory functions of sleep. Finally, further analyses showed that larger and more prototypical slow oscillation troughs led to better categorization performance, which offers hints to others on how to improve or predict the efficacy of this intervention. The strength here is the novel behavioral finding and supporting physiological analyses, whereas the biggest weakness is the interpretation of the peak vs. trough effect.

      R1.1. Major importance:

      I believe the authors could attempt to address this question: What do the authors believe is the largest implication of this studies? How far can this technique be pushed, and how can it practically augment real-world learning?

      We revised the discussion to put more emphasis on possible practical applications of this study (lines 645-656).

      In our opinion, the strength of this paper is its contribution to the basic understanding of information processing during deep sleep, rather than its insights on how to augment realworld learning. Given the currently limited data on learning during sleep, we believe it would be premature to make strong claims about potential practical applications of sleep-learning. In addition, as pointed out in the discussion section, we do not know what adverse effects sleep-learning has on other sleep-related mechanisms such as memory consolidation.

      R1.2. Lines 155-7: How do the authors argue that the words fit well within the half-waves when the sounds lasted 540 ms and didn't necessarily start right at the beginning of each half-wave? This is a major point that should be discussed, as part of the down-state sound continues into the up-state. Looking at Figure 3A, it is clear that stimulus presented in the slow oscillation trough ends at a time that is solidly into the upstate, and would not neurolinguists argue that a lot of sound processing occurs after the end of the sound? It's not a problem for their findings, which is about when is the best time to start such a stimulus, but it's a problem for the interpretation. Additionally, the authors could include some discussion on whether possibly presenting shorter sounds would help to resolve the ambiguities here.

      The word pairs’ presentations lasted on average ~540 ms. Importantly, the word pairs’ onset was timed to occur 100 ms before the maximal amplitude of the targeted peaks/troughs.

      Therefore, most of a word’s sound pattern appeared during the negative going half-wave (about 350ms of 540ms). Importantly, Brodbeck and colleagues (2022) have shown that phonemes are continuously analyzed and interpreted with delays of about 50-200 ms, peaking at 100ms delay. These results suggest that word processing started just following the negative maximum of a trough and finished during the next peak. Our interpretation (e.g. line 520+) suggests that low-level auditory processing reaches the auditory cortex before the positive going half-wave. During the positive going half-wave the higher-level semantic networks appear the extract the presented word's meaning and associate the two simultaneously presented words. We clarified the time course regarding slow-wave phases and sound presentation in the manuscript (lines 158-164). Moreover, we added the limitation that we cannot know for sure when and in which slow-wave phase words were processed (lines 645-656). Future studies might want to look at shorter lasting stimuli to narrow down the timing of the word processing steps in relation to the sleep slow waves.

      R1.3. Medium importance:

      Throughout the paper, another concern relates to the term 'closed-loop'. It appears this term has been largely misused in the literature, and I believe the more appropriate term here is 'real-time' (Bergmann, 2018, Frontiers in Psychology; Antony et al., 2022, Journal of Sleep Research). For instance, if there were some sort of algorithm that assessed whether each individual word was successfully processed by the brain during sleep and then the delivery of words was subsequently changed, that could be more accurately labelled as 'closed-loop'.

      We acknowledge that the meaning of “closed-loop” in its narrowest sense is not fulfilled here. We believe that “slow oscillation phase-targeted, brain-state-dependent stimulation” is the most appropriate term to describe the applied procedure (BSDBS, Bergmann, 2018). We changed the wording in the manuscript to brain-state-dependent stimulation algorithm. Nevertheless, we would like to point out that the algorithm we developed and used (TOPOSO) is very similar to the algorithms often termed closed-loop algorithm in memory and sleep (e.g. Esfahani et al., 2023; Garcia-Molina et al., 2018; Ngo et al., 2013, for a comparison of TOPOSO to these techniques see Wunderlin et al., 2022 and for more information about TOPOSO see Ruch et al., 2022).

      R1.4. Figure 5 and corresponding analyses: Note that the two conditions end up with different sounds with likely different auditory complexities. That is, one word vs. two words simultaneously likely differ on some low-level acoustic characteristics, which could explain the physiological differences. Either the authors should address this via auditory analyses or it should be added as a limitation.

      This is correct, the two conditions differ on auditory complexities. Accordingly, we added this issue as another limitation of the study (line 651-653). We had decided for a single word control condition to ensure that no associative learning (between pseudowords) could take place in the control condition because this was the critical learning process in the experimental condition. We would like to point out that we observed significant differences in brain responses to the presentation of word-pairs (experimental condition) vs single pseudowords (control condition) in the Trough condition, but not the Peak condition. If indeed low-level acoustic characteristics explained the EEG differences occurring between the two conditions then one would expect these differences occurring in both the trough and the peak condition because earlier studies showed that low-level acoustic processing proceeds in both phases of slow waves (Andrillon et al., 2016; Batterink et al., 2016; Daltrozzo et al., 2012).

      R1.5. Line 562-7 (and elsewhere in the paper): "episodic" learning is referenced here and many times throughout the paper. But episodic learning is not what was enhanced here. Please be mindful of this wording, as it can be confusing otherwise.

      The reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g., Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013).

      We stand by our claim that sleep-learning was of episodic nature. Here we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000) and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). We revised the manuscript to clarify that and how our definition differs from traditional definitions. Please see reviewer comment R3.1 for a more extensive answer.

      Reviewer #2 (Public Review):

      In this project, Schmidig, Ruch and Henke examined whether word pairs that were presented during slow-wave sleep would leave a detectable memory trace 12 and 36 hours later. Such an effect was found, as participants showed a bias to categorize pseudowords according to a familiar word that they were paired with during slow-wave sleep. This behavior was not accompanied by any sign of conscious understanding of why the judgment was made, and so demonstrates that long-term memory can be formed even without conscious access to the presented content. Unconscious learning occurred when pairs were presented during troughs but not during peaks of slow-wave oscillations. Differences in brain responses to the two types of presentation schemes, and between word pairs that were later correctly- vs. incorrectly-judged, suggest a potential mechanism for how such deep-sleep learning can occur.

      The results are very interesting, and they are based on solid methods and analyses. Results largely support the authors' conclusions, but I felt that there were a few points in which conclusions were not entirely convincing:

      R2.1. As a control for the critical stimuli in this study, authors used a single pseudoword simultaneously played to both ears. This control condition (CC) differs from the experimental condition (EC) in a few dimensions, among them: amount of information provided, binaural coherence and word familiarity. These differences make it hard to conclude that the higher theta and spindle power observed for EC over CC trials indicate associative binding, as claimed in the paper. Alternative explanations can be made, for instance, that they reflect word recognition, as only EC contains familiar words.

      We agree. In the revised version of the manuscript, we emphasise this as a limitation of our study (line 653-656). Moreover, we understand that the differences between stimuli of the control and the experimental condition must not rely only on the associative binding of two words. We cautioned our interpretation of the findings.

      Interestingly, EC vs CC exhibits differences following trough- but not peak targeting (see R1.4). If indeed all the EC vs CC differences were unrelated to associative binding, we would expect the same EC vs CC differences when peaks were targeted. Hence, the selective EC vs CC differences in the trough condition suggest that the brain is more responsive to sound, information, word familiarity and word semantics during troughs, where we found successful learning, compared to peaks, where no learning occurred. Troughtargeted word pairs (EC) versus foreign words (CC) enhanced the theta power 336 at 500 ms following word onset and this theta enhancement correlated significantly with interindividual retrieval performance indicating that theta probably promoted associative learning during sleep. This correlation was insignificant for spindle power.

      R2.2. The entire set of EC pairs were tested both following 12 hours and following 36 hours. Exposure to the pairs during test #1 can be expected to have an effect over memory one day later, during test #2, and so differences between the tests could be at least partially driven by the additional activation and rehearsal of the material during test #1. Therefore, it is hard to draw conclusions regarding automatic memory reorganization between 12 and 36 hours after unconscious learning. Specifically, a claim is made regarding a third wave of plasticity, but we cannot be certain that the improvement found in the 36 hour test would have happened without test #1.

      We understand that the retrieval test at 12h may have had an impact on performance on the retrieval test at 36h. Practicing retrieval of newly formed memories is known to facilitate future retrieval of the same memories (e.g. Karpicke & Roediger, 2008). Hence, practicing the retrieval of sleep-formed memories during the retrieval test at 12h may have boosted performance at 36h.

      However, recent literature suggests that retrieval practice is only beneficial when corrective feedback is provided (Belardi et al., 2021; Metcalfe, 2017). In our study, we only presented the sleep-played pseudowords at test and participants received no feedback regarding the accuracy of their responses. Thus, a proper conscious re-encoding could not take place. Nevertheless, the retrieval at 12h may have altered performance at 36h in other ways. For example, it could have tagged the reactivated sleep-formed memories for enhanced consolidation during the next night (Rabinovich Orlandi et al., 2020; Wilhelm et al., 2011).

      We included a paragraph on the potential carry-over effects from retrieval at 12h on retrieval at 36h in the discussion section (line 489-496; line 657-659). Furthermore, we removed the arguments about the “third wave of plasticity”.

      R2.3. Authors claim that perceptual and conceptual processing during sleep led to increased neural complexity in troughs. However, neural complexity was not found to differ between EC and CC, nor between remembered and forgotten pairs. It is therefore not clear to me why the increased complexity that was found in troughs should be attributed to perceptual and conceptual word processing, as CC contains meaningless vowels. Moreover, from the evidence presented in this work at least, I am not sure there is room to infer causation - that the increase in HFD is driven by the stimuli - as there is no control analysis looking at HFD during troughs that did not contain stimulation.

      With the analysis of the HFD we would like to provide an additional perspective to the oscillation-based analysis. We checked whether the boundary condition of Peak and Trough targeting changes the overall complexity or information content in the EEG. Our goal was to assess the change in neural complexity (relative to a pre-stimulus baseline) following the successful vs unsuccessful encoding of word pairs during sleep.

      We acknowledge that a causal interpretation about HFD is not warranted, and we revised the manuscript accordingly. It was unexpected that we could not find the same results in the contrast of EC vs CC or correct vs incorrect word pairs. We suggest that our signal-to noise ratio might have been too weak.

      One could argue that the phase targeting alone (without stimulation) induces peak/trough differences in complexity. We cannot completely rule out this concern. But we tried to use the EEG that was not influenced by the ongoing slow-wave: the EEG 2000-500ms before the stimulus onset and 500-2000ms after the stimulus onset. Therefore, we excluded the 1s of the targeted slow-wave, hoping that most of the phase inherent complexity should have faded out (see Figure 2). We could not further extend the time window of analysis due to the minimal stimulus onset interval of 2s. Of course we cannot exclude that the targeted Trough impacted the following HFD. We clarified this in the manuscript (line 384-425).

      Furthermore, we did find a difference of neural complexity between the pre-stimulus baseline and the post-stimulus complexity in the Peak condition but not in the Trough condition (we now added this contrast to the manuscript, line 416-419). Hence, the change in neural complexity is a reaction to the interaction of the specific slow-wave phase with the processing of the word pairs. Even though these results cannot provide unambiguous, causal links, we think they can figure as an important start for other studies to decipher neural complexity during slow wave sleep.

      Reviewer #3 (Public Review):

      The study aims at creating novel episodic memories during slow wave sleep, that can be transferred in the awake state. To do so, participants were simultaneously presented during sleep both foreign words and their arbitrary translations in their language (one word in each ear), or as a control condition only the foreign word alone, binaurally. Stimuli were presented either at the trough or the peak of the slow oscillation using a closed-loop stimulation algorithm. To test for the creation of a flexible association during sleep, participant were then presented at wake with the foreign words alone and had (1) to decide whether they had the feeling of having heard that word before, (2) to attribute this word to one out of three possible conceptual categories (to which translations word actually belong), and (3) to rate their confidence about their decision.

      R3.1. The paper is well written, the protocol ingenious and the methods are robust. However, the results do not really add conceptually to a prior publication of this group showing the possibility to associate in slow wave sleep pairs of words denoting large or small object and non words, and then asking during ensuing wakefulness participant to categorise these non words to a "large" or "small" category. In both cases, the main finding is that this type of association can be formed during slow wave sleep if presented at the trough (versus the peak) of the slow oscillation. Crucially, whether these associations truly represent episodic memory formation during sleep, as claimed by the authors, is highly disputable as there is no control condition allowing to exclude the alternative, simpler hypothesis that mere perceptual associations between two elements (foreign word and translation) have been created and stored during sleep (which is already in itself an interesting finding). In this latter case, it would be only during the awake state when the foreign word is presented that its presentation would implicitly recall the associated translation, which in turn would "ignite" the associative/semantic association process eventually leading to the observed categorisation bias (i.e., foreign words tending to be put in the same conceptual category than their associated translation). In the absence of a dis-confirmation of this alternative and more economical hypothesis, and if we follow Ocam's razor assumption, the claim that there is episodic memory formation during sleep is speculative and unsupported, which is a serious limitation irrespective of the merits of the study. The title and interpretations should be toned down in this respect

      Our study conceptually adds to and extends the findings by Züst et al. (a) by highlighting the precise time-window or brain state during which sleep-learning is possible (e.g. slow-wave trough targeting), (b) by demonstrating the feasibility of associative learning during night sleep, and (c) by uncovering the longevity of sleep-formed memories.

      We acknowledge that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g, (Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013). We stand by our claim that sleep-learning was of episodic nature. We use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). The core computational features of episodic memory are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory.

      Therefore, we revised the manuscript to emphasize how our definition differs from traditional definitions (line 64).

      For the current study, we designed a retrieval task that calls on the core computational features of episodic memory by assessing flexible retrieval of sleep-formed compositional word-word associations. Reviewer 3 suggests an alternative interpretation for the learning observed here: mere perceptual associations between foreign words and translations words are stored during sleep, and semantic associations are only inferred at retrieval testing during ensuing wakefulness. First, these processing steps would require the rapid soundsound associative encoding, long-term storage, and the flexible sound retrieval, which would still require hippocampal processing and computations in the episodic memory system. Second, this mechanism seems highly laborious and inefficient. The sound pattern of a word at 12 hours after learning triggers the reactivation of an associated sound pattern of another word. This sound pattern then elicits the activation of the translation words’ semantics leading to the selection of the correct superordinate semantic category at test.

      Overall, we believe that our pairwise-associative learning paradigm triggered a rapid conceptual-associative encoding process mediated by the hippocampus that provided for flexible representations of foreign and translation words in episodic memory. This study adds to the existing literature by examining specific boundary conditions of sleep-learning and demonstrates the longevity (at least 36 hours) of sleep-learned associations.

      Other remarks:

      R3.2. Lines 43-45 : the assumption that the sleeping brain decides whether external events can be disregarded, requires awakening or should be stored for further consideration in the waking state is dubious, and the supporting references date from a time (the 60') during which hypnopedia was investigated in badly controlled sleep conditions (leaving open the doubt about the possibility that it occurred during micro awakenings)

      We revised the manuscript to add timelier and better controlled studies that bolster the 60ties-born claim (line 40-51). Recently, it has been shown that the sleeping brain preferentially processes relevant information. For example the information conveyed by unfamiliar voices (Ameen et al., 2022), emotional content (Holeckova et al., 2006; Moyne et al., 2022), our own compared to others’ names (Blume et al., 2018).

      R3.3. 1st paragraph, lines 48-53 , the authors should be more specific about what kind of new associations and at which level they can be stored during sleep according to recent reports, as a wide variety of associations (mostly elementary levels) are shown in the cited references. Limitations in information processing during sleep should also be acknowledged.

      In the lines to which R3 refers, we cite an article (Ruch & Henke, 2020) in which two of the three authors of the current manuscript elaborate in detail what kind of associations can be stored during sleep. We revised these lines to more clearly present the current understanding of the potential and the limitations of sleep-learning (line 40-51). Although information processing during sleep is generally reduced (Andrillon et al., 2016), a variety of different kinds of associations can be stored, ranging from tone-odour to word-word association (Arzi et al., 2012, 2014; Koroma et al., 2022; Züst et al., 2019).

      R3.4. The authors ran their main behavioural analyses on delayed retrieval at 36h rather than 12h with the argument that retrieval performance was numerically larger at 36 than 12h but the difference was non-significant (line 181-183), and that effects were essentially similar. Looking at Figure 2, is the trough effect really significant at 12h ? In any case, the fact that it is (numerically) higher at 36 than 12h might suggest that the association created at the first 12h retrieval (considering the alternative hypothesis proposed above) has been reinforced by subsequent sleep.

      The Trough effect at 12h is not significant, as stated on line 185 (“Planned contrasts against chance level revealed that retrieval performance significantly exceeded chance at 36 hours only (P36hours = 0.036, P12hours = 0.094).”). It seems that our wording was not clear. Therefore, we refined the description of the behavioural analysis in the manuscript (lines 188-193).

      In brief, we report an omnibus ANOVA with a significant main effect of targeting type (Trough vs Peak, main effect Peak versus Trough: F(1,28) = 5.237, p = 0.030, d = 0.865). Because Trough-targeting led to significantly better memory retention than Peak-targeting, we computed a second ANOVA, solely including participants with through-targeted word-pair encoding. The memory retention in the Trough condition is above chance (MTrough = 39.11%, SD = 10.76; FIntercept (1,14) = 5.660, p = 0.032) and does not significantly differ between the 12h and 36h retrieval (FEncoding-Test Delay (1,14) = 1.308, p = 0.272). However, the retrieval performance at 36h numerically exceeds the performance at 12h and the direct comparison against chance reveals that the 36h but not the 12h retrieval was significant (P36hours = 0.036, P12hours = 0.094). Hence, we found no evidence for above chance performance at the 12h retrieval and focused on the retrieval after 36h in the EEG analysis.

      We agree with the reviewer that the subsequent sleep seems to have improved consolidation and subsequent retrieval. We assume that the reviewer suggests that participants merely formed perceptual associations during sleep and encoded episodic-like associations during testing at 12h (as pointed out in R 3.1). However, we believe that it is unlikely that the awake encoding of semantic associations during the 12h retrieval led to improved performance after 36h. We changed the discussion regarding the interaction between retrieval at 12h and 36h (line 505-512, also see R 2.2)

      R3.5> In the discussion section lines 419-427, the argument is somehow circular in claiming episodic memory mechanisms based on functional neuroanatomical elements that are not tested here, and the supporting studies conducted during sleep were in a different setting (e.g. TMR)

      Indeed, the TMR and animal studies are a different setting compared to the present study. We re-wrote this part and only focused on the findings of Züst and colleagues (2019), who examined hippocampal activity during the awake retrieval of sleep-formed memories (lines 472-482). Additionally, we would like to emphasise that our main reasoning is that the task requirements called upon the episodic memory system.

      R3.6. Supplementary Material: in the EEG data the differentiation between correct and incorrect ulterior classifications when presented at the peak of the slow oscillation is only significant in association with 36h delayed retrieval but not at 12h, how do the authors explain this lack of effect at 12 hour ?

      We assume that the reviewer refers to the TROUGH condition (word-pairs targeted at a slow-wave trough) and not as written to the peak condition. We argue that the retention performance at 12h is not significantly above chance (M12hours = 37.4%, P12hours = 0.094).

      Hence, the distinction between “correctly” and “incorrectly” categorised word pairs was not informative for the EEG analysis during sleep. For whatever reason the 12h retrieval was not significantly above chance, the less successful memory recall and thus a less balanced trial count makes recall accuracy a worse delineator for separating EEG trials then the recall performance after 36 hours.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor importance:

      Abstract: The opening framing is confusing here and in the introduction. Why frame the paper in the broadest terms about awakenings and threats from the environment when this is a paper about intersections between learning & memory and sleep? I do understand that there is an interesting point to be made about the counterintuitive behavioral findings with respect to sleep generally being perceived as a time when stimuli are blocked out, but this does not seem to me to be the broadest points or the way to start the paper. The authors should consider this but of course push back if they disagree.

      We understand the reviewer’s criticism but believe that this has more to do with personal preferences than with the scientific value or validity of our work. We believe that it is our duty as researchers to present our study in a broader context because this may help readers from various fields to understand why the work is relevant. To some readers, evidence for learning during sleep may seem trivial, to others, it may seem impossible or a weird but useless conundrum. By pointing out potential evolutionary benefits of the ability to acquire new information during sleep, we help the broad readership of eLife understand the relevance of this work.

      Lines 31-32: "Neural complexity" -> "neural measures of complexity" because it isn't clear what "neural complexity" means at this point in the abstract. Though, note my other point that I believe this analysis should be removed.

      To our understanding, “neural complexity” is a frequently used term in the field and yields more than 4000 entries on google scholar. Whereas ‘neural measures of complexity’ only finds 3 hits on google scholar [September 2023]. In order to link our study with other studies on neural complexity, we would like to keep this terminology. As an example, two recent publications using “neural complexity” are Lee et al. (2020) and Frohlich et al. (2022).

      Lines 42-43: The line of work on 'sentinel' modes would be good to cite here (e.g., Blume et al., 2017, Brain & Language).

      We added the suggested citation to the manuscript (lines 52).

      Lines 84-90: While I appreciate the authors desire to dig deep and try to piece this all together, this is far too speculative in my opinion. Please see my other points on the same topic.

      In this paragraph, we point out why both peaks and troughs are worth exploring for their contributions to sensory processing and learning during sleep. Peaks and troughs are contributing mutually to sleep-learning. Our speculations should inspire further work aimed at pinning down the benefits of peaks and troughs for sleep-learning. We clarified the purpose and speculative nature of our arguments in the revised version of the manuscript.

      Line 109: "outlasting" -> "lasting over" or "lasting >"

      We changed the wording accordingly.

      Line 111: I believe 'nonsense' is not the correct term here, and 'foreign' (again) would be preferred. Some may be offended to hear their foreign word regarded as 'nonsense'. However, please let me know if I have misunderstood.

      We would like to use the linguistic term “pseudoword” (aligned with reviewer 2’s comment) and we revised the manuscript accordingly.

      Figure 1A: "Enconding" -> "Encoding"

      Thank you for pointing this out.

      Lines 201-2: Were there interactions between confidence and correctness on the semantic categorization task? Were correct responses given with more confidence than incorrect ones? This would not necessarily be a problem for the authors' account, as there can of course be implicit influences on confidence (i.e., fluency).

      As is stated in the results section, confidence ratings did not differ significantly between correct and incorrect assignments (Trough condition: F(1,14) = 2.36, p = 0.15); Peak condition: F(1,14) = 0.48, p = 0.50).

      Line 236: "Nicknazar" -> "Niknazar"

      Thank you for pointing this out.

      Line 266: "profited" -> "benefited"

      We changed the wording accordingly.

      Lines 280-4: There seems some relevance here with Malerba et al. (2018) and her other papers to categorize slow oscillations.

      Diving into the details on how to best categorise slow oscillations is beyond the scope of this manuscript. Here, we build on work from the field of microstate analyses and use two measures to describe and quantify the targeted brain states: the topography of the electric field (i.e., the correlation of the electric field with an established template or “microstate”), and the field strength (global field power, GFP). While the topography of a quasi-stable electric field reflects activity in a specific neural network, the strength (GFP) of a field most likely mirrors the degree of activation (or inactivity) in the specific network. Here, we find that consistent targeting of a specific network state yielding a strong frontal negativity benefitted learning during sleep. For a more detailed explanation of the slow-wave phase targeting see (Ruch et al., 2022).

      Lines 343-6: Was it intentional to have 0.5 s (0.2-0.7 s) surrounding the analysis around 500 ms but only 0.4 s (0.8-1.2 s) surrounding the analysis around 1 s? Could the authors use the same size interval or justify having them be different?

      We apologise for the misleading phrasing and we clarified this in the revised manuscript. We applied the same procedure for the comparison of later correctly vs incorrectly classified pseudowords as we did for the comparison between EC and CC. Hence, we analysed the entire window from 0s to 2.5s with a cluster-based permutation approach. Contrary to the EC vs CC contrast, no cluster remained significant for the comparison of the subsequent memory effect. By mistake we reported the wrong time window. In the revised manuscript, the paragraph is corrected (lines 364-369).

      Line 356-entire HFD section: it is unclear what's gained by this analysis, as it could simply be another reflection of the state of the brain at the time of word presentation. In my opinion, the authors should remove this analysis and section, as it does not add clarity to other aspects of the paper.

      (If the authors keep the section) Line 361-2 - "Moreover, high HFD values have been associated with cognitive processing (Lau et al., 2021; Parbat & Chakraborty, 2021)." This statement is vague. Could the authors elaborate?

      Please see our answer to Reviewer 2 (2.3) for a more detailed explanation. In brief, we would like to keep the analysis with the broad time window of -2 to -0.5 and from 0.5 to 2 s.

      Lines 403-4: How was it determined that these neural networks mediated both conscious/unconscious processes? Perhaps the authors meant to make a different point, but the way it reads to me is that there is evidence that some neural networks are conscious and others are not and both forms engage in similar functions.

      We revised the manuscript to be more precise and clear: “The conscious and unconscious rapid encoding and flexible retrieval of novel relational memories was found to recruit the same or similar networks including the hippocampus(Henke et al., 2003; Schneider et al., 2021). This suggests that conscious and unconscious relational memories are processed by the same memory system.” (p. 22, top).

      Lines 433-41: Performance didn't actually significantly increase from 12 to 36 hours, so this is all too speculative in my opinion.

      We removed the speculative claim that performance may have increased from the retrieval at 12 hours to the retrieval at 36 hours.

      Line 534: "assisted by enhanced" -> "coincident with". It's unclear whether theta reflects successful processing as having occurred or whether it directly affects or assists with it.

      We have adjusted the wording to be more cautious, as suggested (line 588).

      Line 572-4: Rothschild et al. (2016) is relevant here.

      Unfortunately, we do not see the relevance of this article within the context of our work.

      Line 577 paragraph: The authors may consider adding a note on the importance of ethical considerations surrounding this form of 'inception'.

      We extended this part by adding ethical considerations to the discussion section (Stickgold et al., 2021, line 657).

      Line 1366: It would be better if the authors could eventually make their data publicly available. This is obviously not required, but I encourage the authors to consider it if they have not considered it already.

      In my opinion, the discussion is too long. I really appreciate the authors trying to figure out the set of precise times in which each level of neural processing might occur and how this intersects with their slow oscillation phase results. However, I found a lot of this too speculative, especially given that the sounds may bleed into parts of other phases of the slow oscillation. I do not believe this is a problem unique to these authors, as many investigators attempting to target certain phases in the target memory reactivation literature have faced the same problem, but I do believe the authors get ahead of the data here. In particular, there seems to be one paragraph in the discussion that is multiple pages long (p. 22-24). This paragraph I believe has too much detail and should be broken up regardless, as it is difficult for the reader to follow.

      Considering the recent literature, we believe this interpretation best explains the data. As argued earlier, we believe that a speculative interpretation of the reported phenomena can provide substantial added value because it inspires future experimental work. We have improved the manuscript by clearly distinguishing between data and interpretation. We do declare the speculative nature of some offered interpretations. We hope that these speculations, which are testable hypotheses (!), will eventually be confirmed or refuted experimentally.

      Reviewer #2 (Recommendations For The Authors):

      I very much enjoyed the paper and think it describes important findings. I have a few suggestions for improvement, and minor comments that caught my eye during reading:

      (1) I was missing an analysis of CC ERP, and its comparison to EC ERP.

      We added this analysis to the manuscript (line 299-301). The comparison of CC ERP with EC ERP did not yield any significant cluster for either the peak (cluster-level Monte Carlo p=0.54) or the trough (cluster-level Monte Carlo p>0.37). We assume that the noise level was too high for the identification of differences between CC and EC ERP.

      (2) Regarding my public review comment #2, some light can be shed on between-test effects, I believe, using an item-based analysis - looking at correlations between items' classifications in test #1 and test #2. The assumption seems to be that items that were correct in test #1 remained correct in test #2 while other new correct classifications were added, owing to the additional consolidation happening between the two tests. But that is an empirical question that can be easily tested. If no consistency in item classification is found, on the other hand, or if only consistency in correct classification is found, that would be interesting in itself. This item-based analysis can help tease away real memory from random correct classification. For instance, the subset of items that are consistently classified correctly could be regarded as non-fluke at higher confidence and used as the focus of subsequent-memory analysis instead of the ones that were correct only in test #2.

      Thanks, we re-analysed the data accordingly. Participants were consistent at choosing a specific object category for an item at 12 hours and 36 hours (consistency rate = 47% same category, chance level is 1/3). Moreover, the consistency rate did not differ between the Trough and the Peak condition (MTrough = 47.2%, MPeak = 47.0%, P = 0.98). The better retrieval performance in the Trough compared to the Peak condition after 36 hours is due to: A) if participants were correct at 12h, they chose again the correct answer at 36h (Trough: 20% & Peak: 14%). B) Following an incorrect answer at 12h, participants switched to another object category at 36h (Trough: 72%, Peak: 67%). C) If participants switched the object category following an incorrect answer at 12h, they switched more often to the correct category at 36h in the trough versus the peak condition (Trough: in 56% & Peak: 53%). Hence, the data support the reviewer’s assumption: items that were correct after 12 hours remained correct after 36 hours, while other new correct classifications were generated at 36h owing to the additional consolidation happening between the two tests. We added this finding to the manuscript (line 191-200, Figure S6):

      Author response image 1.

      As suggested, we re-analysed the ERP with respect to the subsequent memory effect. This time we computed four conditions according to the reviewer’s argument about consistently correctly classified pseudowords, presented in the figure below: ERP of trials that were correctly classified at 36h (blue), ERP of trials that were incorrectly classified at 36h (light blue), ERP of trials that were correctly classified twice (brown) and ERP of trials that were not correctly classified twice (orange, all trials that are not in brown). Please note that the two blue lines are reported in the manuscript and include all trials. The brown and the orange line take the consistency into account and together include as well all trials.

      Author response image 2.

      By excluding even more trials from the group of correct retrieval responses, the noise level gets high. Therefore, the difference between the twice-correct and the not-twice-correct trials is not significant (cluster-level Monte Carlo p > 0.27). Because the ERP of twice-correct trials seems very similar to the ERP of the trials correctly classified at 36h at frontal electrodes, we assume that our ERP effect is not driven by a few extreme subjects. Similarly, not-twicecorrect trials (orange) have a stronger frontal trough than the trials incorrectly classified at 36h (light blue).

      (3) In a similar vein, a subject-based analysis would be highly interesting. First and foremost, readers would benefit from seeing the lines that connect individual dots across the two tests in figures 2B and 2C. It is reasonable to expect that only a subset of participants were successful learners in this experiment. Finding them and analyzing their results separately could be revealing.

      We added a Figure S1 to the supplementary material, providing the pairing between performance of the 12h and the 36h retrieval.

      It is an interesting idea to look at successful learners alone. We computed the ERP of the subsequent memory effect for those participants, who had an above change retrieval accuracy at 36h. The result shows a similar effect as reported for all participants (frontal cluster ~0-0.3s). The p-value is only 0.08 because only 9 of 15 participants exhibited an above chance retrieval performance at 36 hours.

      Author response image 3.

      ERP effect of correct (blue) vs incorrect (light blue) pseudoword category assignment of participants with a retrieval performance above chance at 36h (SD as shades):

      We prefer to not include this data in the manuscript, but are happy to provide it here.

      (4) I wondered why the authors informed subjects of the task in advance (that they will be presented associations when they slept)? I imagine this may boost learning as compared to completely naïve subjects. Whether this is the reason or not, I think an explanation of why this was done is warranted, and a statement whether authors believe the manipulation would work otherwise. Also, the reader is left wondering why subjects were informed only about test #1 and not about test #2 (and when were they told about test #2).

      Subjects were informed of all the tests upfront. We apologize for the inconsistency in the manuscript and revised the method part. The explanation of why participants were informed is twofold: a) Participants had to sleep with in-ear headphones. We wanted to explain to participants why these are necessary and why they should not remove them. b) We hoped that participants would be expecting unconsciously sounds played during sleep, would process these sounds efficiently and would remain deeply asleep (no arousals).

      (5) FoHH is a binary yes/no question, and so may not have been sensitive enough to demonstrate small differences in familiarity. For comparison, the Perceptual Awareness Scale (Ramsøy & Overgaard, 2004) that is typically used in studies of unconscious processing is of a 4-point scale, and this allows to capture more nuanced effects such as partial consciousness and larger response biases. Regardless, it would be informative to have the FoHH numbers obtained in this study, and not just their comparison between conditions. Also, was familiarity of EC and CC pseudowords compared? One may wonder whether hearing the pseudowords clearly vs. in one ear alongside a familiar word would make the word slightly more familiar.

      We apologize for having simplified this part too much in the manuscript. Indeed, the FoHH is comparable to the PAS. We used a 4-point scale, where participants rated their feeling of whether they have heard the pseudoword during previous sleep. In the revised manuscript, we report the complete results (line 203-223). The FoHH did not differ between any of the suggested contrasts. Thus, for both the peak and the trough condition, the FoHH did not differ between sleep-played vs new; correct EC trials vs new; correct vs incorrect EC trials; EC vs CC trials. To illustrate the results, a figure of the FoHH has been added to the supplement (Figure S4).

      (6) Similarly, it would be good to report the numbers of the confidence ratings in the paper as well.

      In the revised manuscript, we extended the description of the confidence rating results. We added the descriptive statistics (line 224-236) and included a corresponding figure in the supplement (Figure S5).

      Minor/aesthetic comments:

      We implemented all the following suggestions.

      (1) I suggest using "pseudoword" or "nonsense word" instead of "foreign word", because "foreign word" typically means a real word from a different language. It is quite confusing when starting to read the paper.

      After reconsidering, we think that pseudoword is the appropriate linguistic term and have revised the manuscript accordingly.

      (2) Lines 1000-1001: "The required sample size of N = 30 was determined based on a previous sleep-learning study". I was missing a description of what study you are referring to.

      (3) I am not sure I understood the claim nor the rationale made in lines 414-417. Is the claim that pairs did not form one integrated engram? How do we know that? And why would having one engram not enable extracting the meaning from a visual-auditory presentation of the cue? The sentence needs some rewording and/or unpacking.

      (4) Were categories counterbalanced (i.e., did each subjects' EC contain 9 animal words, 9 tool words and 9 place words)?

      (5) Asterisks indicating significant effects are missing from Figure 4 and S2.

      (6) Fig1 legend: "Participants were played with pairs" is ungrammatical.

      (7) Line 1093: no need for a comma.

      (8) Line 1336: missing opening parenthesis

      (9) Line 430: "observe" instead of "observed".

      (10) Line 466: two dots instead of one..

      Reviewer #3 (Recommendations For The Authors):

      Methods: 2 separate ANOVAs are performed (lines 160-185), but would not it make more sense to combine both in one ? If kept separated then a correction for multiple comparisons might be needed (p/2 = 0.025)

      We computed an omnibus ANOVA. In a next step, we examined the effect in the significant targeting condition by computing another ANOVA. For further explanations, see reviewer comment 3.4.

      References

      Ameen, M. S., Heib, D. P. J., Blume, C., & Schabus, M. (2022). The Brain Selectively Tunes to Unfamiliar Voices during Sleep. Journal of Neuroscience, 42(9), 1791–1803. https://doi.org/10.1523/JNEUROSCI.2524-20.2021

      Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural Markers of Responsiveness to the Environment in Human Sleep. The Journal of Neuroscience, 36(24), Article 24. https://doi.org/10.1523/JNEUROSCI.0902-16.2016

      Arzi, A., Holtzman, Y., Samnon, P., Eshel, N., Harel, E., & Sobel, N. (2014). Olfactory Aversive Conditioning during Sleep Reduces Cigarette-Smoking Behavior. Journal of Neuroscience, 34(46), Article 46. https://doi.org/10.1523/JNEUROSCI.2291-14.2014

      Arzi, A., Shedlesky, L., Ben-Shaul, M., Nasser, K., Oksenberg, A., Hairston, I. S., & Sobel, N. (2012). Humans can learn new information during sleep. Nature Neuroscience, 15(10), Article 10. https://doi.org/10.1038/nn.3193

      Batterink, L. J., Creery, J. D., & Paller, K. A. (2016). Phase of Spontaneous Slow Oscillations during Sleep Influences Memory-Related Processing of Auditory Cues. Journal of Neuroscience, 36(4), 1401–1409. https://doi.org/10.1523/JNEUROSCI.3175-15.2016

      Belardi, A., Pedrett, S., Rothen, N., & Reber, T. P. (2021). Spacing, Feedback, and Testing Boost Vocabulary Learning in a Web Application. Frontiers in Psychology, 12. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.757262

      Bergmann, T. O. (2018). Brain State-Dependent Brain Stimulation. Frontiers in Psychology, 9, 2108. https://doi.org/10.3389/fpsyg.2018.02108

      Blume, C., del Giudice, R., Wislowska, M., Heib, D. P. J., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638–648. https://doi.org/10.1016/j.neuroimage.2018.05.056

      Brodbeck, C., & Simon, J. Z. (2022). Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Frontiers in Neuroscience, 16. https://www.frontiersin.org/articles/10.3389/fnins.2022.828546

      Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia, and the Hippocampal System. A Bradford Book.

      Daltrozzo, J., Claude, L., Tillmann, B., Bastuji, H., & Perrin, F. (2012). Working memory is partially preserved during sleep. PloS One, 7(12), Article 12.

      Dew, I. T. Z., & Cabeza, R. (2011). The porous boundaries between explicit and implicit memory: Behavioral and neural evidence. Annals of the New York Academy of Sciences, 1224(1), 174–190. https://doi.org/10.1111/j.1749-6632.2010.05946.x

      Esfahani, M. J., Farboud, S., Ngo, H.-V. V., Schneider, J., Weber, F. D., Talamini, L. M., & Dresler, M. (2023). Closed-loop auditory stimulation of sleep slow oscillations: Basic principles and best practices. Neuroscience & Biobehavioral Reviews, 153, 105379. https://doi.org/10.1016/j.neubiorev.2023.105379

      Frohlich, J., Chiang, J. N., Mediano, P. A. M., Nespeca, M., Saravanapandian, V., Toker, D., Dell’Italia, J., Hipp, J. F., Jeste, S. S., Chu, C. J., Bird, L. M., & Monti, M. M. (2022). Neural complexity is a common denominator of human consciousness across diverse regimes of cortical dynamics. Communications Biology, 5(1), Article 1. https://doi.org/10.1038/s42003-022-04331-7

      Gabrieli, J. D. E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 87–115.

      Garcia-Molina, G., Tsoneva, T., Jasko, J., Steele, B., Aquino, A., Baher, K., Pastoor, S., Pfundtner, S., Ostrowski, L., Miller, B., Papas, N., Riedner, B., Tononi, G., & White, D. P. (2018). Closed-loop system to enhance slow-wave activity. Journal of Neural Engineering, 15(6), 066018. https://doi.org/10.1088/1741-2552/aae18f

      Hannula, D. E., Minor, G. N., & Slabbekoorn, D. (2023). Conscious awareness and memory systems in the brain. WIREs Cognitive Science, 14(5), e1648. https://doi.org/10.1002/wcs.1648

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), Article 7. https://doi.org/10.1038/nrn2850

      Henke, K., Mondadori, C. R. A., Treyer, V., Nitsch, R. M., Buck, A., & Hock, C. (2003). Nonconscious formation and reactivation of semantic associations by way of the medial temporal lobe. Neuropsychologia, 41(8), Article 8. https://doi.org/10.1016/S0028-3932(03)00035-6

      Holeckova, I., Fischer, C., Giard, M.-H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject’s own name uttered by a familiar voice. Brain Research, 1082(1), 142–152. https://doi.org/10.1016/j.brainres.2006.01.089

      Karpicke, J. D., & Roediger, H. L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966–968. https://doi.org/10.1126/science.1152408

      Koroma, M., Elbaz, M., Léger, D., & Kouider, S. (2022). Learning New Vocabulary Implicitly During Sleep Transfers With Cross-Modal Generalization Into Wakefulness. Frontiers in Neuroscience, 16, 801666. https://doi.org/10.3389/fnins.2022.801666

      Lee, Y., Lee, J., Hwang, S. J., Yang, E., & Choi, S. (2020). Neural Complexity Measures. Advances in Neural Information Processing Systems, 33, 9713–9724. https://proceedings.neurips.cc/paper/2020/hash/6e17a5fd135fcaf4b49f2860c2474c7 c-Abstract.html

      Metcalfe, J. (2017). Learning from Errors. Annual Review of Psychology, 68(1), 465–489. https://doi.org/10.1146/annurev-psych-010416-044022

      Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 62, 62–79. https://doi.org/10.1037/1196-1961.62.1.62

      Moyne, M., Legendre, G., Arnal, L., Kumar, S., Sterpenich, V., Seeck, M., Grandjean, D., Schwartz, S., Vuilleumier, P., & Domínguez-Borràs, J. (2022). Brain reactivity to emotion persists in NREM sleep and is associated with individual dream recall. Cerebral Cortex Communications, 3(1), tgac003. https://doi.org/10.1093/texcom/tgac003

      Ngo, H.-V. V., Martinetz, T., Born, J., & Mölle, M. (2013). Auditory Closed-Loop Stimulation of the Sleep Slow Oscillation Enhances Memory. Neuron, 78(3), Article 3. https://doi.org/10.1016/j.neuron.2013.03.006

      O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketz, N. (2014). Complementary Learning Systems. Cognitive Science, 38(6), 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x

      O’Reilly, R. C., & Rudy, J. W. (2000). Computational principles of learning in the neocortex and hippocampus. Hippocampus, 10(4), 389–397. https://doi.org/10.1002/1098-1063(2000)10:4<389::AID-HIPO5>3.0.CO;2-P

      Rabinovich Orlandi, I., Fullio, C. L., Schroeder, M. N., Giurfa, M., Ballarini, F., & Moncada, D. (2020). Behavioral tagging underlies memory reconsolidation. Proceedings of the National Academy of Sciences, 117(30), 18029–18036. https://doi.org/10.1073/pnas.2009517117

      Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), Article 1. https://doi.org/10.1037/a0013974

      Ruch, S., & Henke, K. (2020). Learning During Sleep: A Dream Comes True? Trends in Cognitive Sciences, 24(3), 170–172. https://doi.org/10.1016/j.tics.2019.12.007

      Ruch, S., Schmidig, F. J., Knüsel, L., & Henke, K. (2022). Closed-loop modulation of local slow oscillations in human NREM sleep. NeuroImage, 264, 119682. https://doi.org/10.1016/j.neuroimage.2022.119682

      Schacter, D. L. (1998). Memory and Awareness. Science, 280(5360), 59–60. https://doi.org/10.1126/science.280.5360.59

      Schneider, E., Züst, M. A., Wuethrich, S., Schmidig, F., Klöppel, S., Wiest, R., Ruch, S., & Henke, K. (2021). Larger capacity for unconscious versus conscious episodic memory. Current Biology, 31(16), 3551-3563.e9. https://doi.org/10.1016/j.cub.2021.06.012

      Shohamy, D., & Turk-Browne, N. B. (2013). Mechanisms for widespread hippocampal involvement in cognition. Journal of Experimental Psychology: General, 142(4), 1159–1170. https://doi.org/10.1037/a0034461

      Squire, L. R., & Dede, A. J. O. (2015). Conscious and Unconscious Memory Systems. Cold Spring Harbor Perspectives in Biology, 7(3), a021667. https://doi.org/10.1101/cshperspect.a021667

      Stickgold, R., Zadra, A., & Haar, A. J. H. (2021). Advertising in Dreams is Coming: Now What? Dream Engineering. https://dxe.pubpub.org/pub/dreamadvertising/release/1

      Tulving, E. (2002). Episodic Memory: From Mind to Brain. Annual Review of Psychology, 53(1), 1–25. https://doi.org/10.1146/annurev.psych.53.100901.135114

      Wilhelm, I., Diekelmann, S., Molzow, I., Ayoub, A., Mölle, M., & Born, J. (2011). Sleep Selectively Enhances Memory Expected to Be of Future Relevance. Journal of Neuroscience, 31(5), 1563–1569. https://doi.org/10.1523/JNEUROSCI.3575-10.2011

      Wunderlin, M., Koenig, T., Zeller, C., Nissen, C., & Züst, M. A. (2022). Automatized online prediction of slow-wave peaks during non-rapid eye movement sleep in young and old individuals: Why we should not always rely on amplitude thresholds. Journal of Sleep Research, 31(6), e13584. https://doi.org/10.1111/jsr.13584

      Züst, M. A., Ruch, S., Wiest, R., & Henke, K. (2019). Implicit Vocabulary Learning during Sleep Is Bound to Slow-Wave Peaks. Current Biology, 29(4), 541-553.e7. https://doi.org/10.1016/j.cub.2018.12.038

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zanetti et al use biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The major novel finding is that association of the VP3 protein with an anionic lipid (PI3P) appears to be important for viral replication, as evidenced through a cellular assay on FFUs.

      Strengths:

      Support previously published claims that VP3 associates with early endosome membrane, potentially through binding to PI3P. The finding that mutating a single residue (R200) critically affects early endosome binding and that the same mutation also inhibits viral replication suggests a very important role for this binding in the viral life cycle.

      Weaknesses:

      The manuscript is relatively narrowly focused: the specifics of the bi-molecular interaction between the VP3 of an unusual avian virus and a host cell lipid (PIP3). Further, the affinity of this interaction is low and its specificity relative to other PIPs is not tested, leading to questions about whether VP3-PI3P binding is relevant.

      Regarding the manuscript’s focus, we challenge the notion that studying a single bi-molecular interaction makes the scope of the paper overly narrow. This interaction—between VP3 and PI3P—plays a critical role in the replication of the birnavirus, which is the central theme of our work. Moreover, identifying and understanding such distinct interactions is a fundamental aspect of molecular virology, as they shed light on the precise mechanisms that viruses exploit to hijack the host cell machinery. Consequently, far from being narrowly focused, we believe our work contributes to the broader understanding of host-pathogen interactions.

      As for the low affinity of the VP3-PI3P interaction, we argue that this is not a limitation but rather a biologically relevant feature. As discussed in the manuscript, the moderate strength of this interaction is likely critical for regulating the turnover rate of VP3/endosomal PI3P complexes, which in turn could optimize viral replication efficiency. A stronger affinity might trap VP3 on the endosomal membrane, whereas weaker interactions might reduce its ability to efficiently target PI3P. Thus, the observed affinity may reflect a fine-tuned balance that supports the viral life cycle.

      With regard to specificity, we emphasize that in the context of the paper, we refer to biological specificity, which is not necessarily the same as chemical specificity. The binding of PI3P to early endosomes is “biologically” preconditioned by the distribution of PI3P within the cell. PI3P is predominantly localized in endosomal membranes, which “biologically precludes” interference from other PIPs due to their distinct cellular distributions. Moreover, while early endosomes also contain other anionic lipids, our work demonstrates that among these, PI3P plays a distinctive role in VP3 binding. This highlights its functional relevance in the context of early endosome dynamics.

      Reviewer #3 (Public review):

      Summary:

      Infectious bursal disease virus (IBDV) is a birnavirus and an important avian pathogen. Interestingly, IBDV appears to be a unique dsRNA virus that uses early endosomes for RNA replication that is more common for +ssRNA viruses such as for example SARS-CoV-2. This work builds on previous studies showing that IBDV VP3 interacts with PIP3 during virus replication. The authors provide further biophysical evidence for the interaction and map the interacting domain on VP3.

      Strengths:

      Detailed characterization of the interaction between VP3 and PIP3 identified R200D mutation as critical for the interaction. Cryo-EM data show that VP3 leads to membrane deformation.

      We thank the reviewer for the feedback.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zanetti et al. use biophysical and cellular assays to investigate the interaction of the birnavirus VP3 protein with the early endosome lipid PI3P. The major novel finding is that the association of the VP3 protein with an anionic lipid (PI3P) appears to be important for viral replication, as evidenced through a cellular assay on FFUs.

      Strengths:

      Supports previously published claims that VP3 may associate with early endosomes and bind to PI3P-containing membranes. The claim that mutating a single residue (R<sub>200</sub>) critically affects early endosome binding and that the same mutation also inhibits viral replication suggests a very important role for this binding in the viral life cycle.

      Weaknesses:

      The manuscript is relatively narrowly focused: one bimolecular interaction between a host cell lipid and one protein of an unusual avian virus (VP3-PI3P). Aspects of this interaction have been described previously. Additional data would strengthen claims about the specificity and some technical issues should be addressed. Many of the core claims would benefit from additional experimental support to improve consistency.

      Indeed, our group has previously described aspects of the VP3-PI3P interaction, as indicated in lines 100-105 from the manuscript. In this manuscript, however, we present biochemical and biophysical details that have not been reported before about how VP3 connects with early endosomes, showing that it interacts directly with the PI3P. Additionally, we have now identified a critical residue in VP3—the R<sub>200</sub>—for binding to PI3P and its key role in the viral life cycle. Furthermore, the molecular dynamics simulations helped us come up with a mechanism for VP3 to connect with PI3P in early endosomes. This constitutes a big step forward in our understanding of how these "non-canonical" viruses replicate.

      We have now incorporated new experimental and simulation data; and have carefully revised the manuscript in accordance with the reviewers’ recommendations. We are confident that these improvements have further strengthened the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Birnavirus replication factories form alongside early endosomes (EEs) in the host cell cytoplasm. Previous work from the Delgui lab has shown that the VP3 protein of the birnavirus strain infectious bursal disease virus (IBDV) interacts with phosphatidylinositol-3-phosphate (PI3P) within the EE membrane (Gimenez et al., 2018, 2020). Here, Zanetti et al. extend this previous work by biochemically mapping the specific determinants within IBDV VP3 that are required for PI3P binding in vitro, and they employ in silico simulations to propose a biophysical model for VP3-PI3P interactions.

      Strengths:

      The manuscript is generally well-written, and much of the data is rigorous and solid. The results provide deep knowledge into how birnaviruses might nucleate factories in association with EEs. The combination of approaches (biochemical, imaging, and computational) employed to investigate VP3-PI3P interactions is deemed a strength.

      Weaknesses:

      (1) Concerns about the sources, sizes, and amounts of recombinant proteins used for co-flotation: Figures 1A, 1B, 1G, and 4A show the results of co-flotation experiments in which recombinant proteins (control His-FYVE v. either full length or mutant His VP3) were either found to be associated with membranes (top) or non-associated (bottom). However, in some experiments, the total amounts of protein in the top + bottom fractions do not appear to be consistent in control v. experimental conditions. For instance, the Figure 4A western blot of His-2xFYVE following co-flotation with PI3P+ membranes shows almost no detectable protein in either top or bottom fractions.

      Liposome-based methods, such as the co-flotation assay, are well-established and widely regarded as the preferred approach for studying protein-phosphoinositide interactions. However, this approach is rather qualitative, as density gradient separation reveals whether the protein is located in the top fractions (bound to liposomes) or the bottom fractions (unbound). Our quantifications aim to demonstrate differences in the bound fraction between liposome populations with and without PI3P. Given the setting of the co-flotation assays, each protein-liposome system [2xFYVE-PI3P(-), 2xFYVE-PI3P(+), VP3-PI3P(-), or VP3-PI3P(+)] is assessed separately, and even if the experimental conditions are homogeneous, it is not surprising to observe differences in the protein level between different experiments. Indeed, the revised version of the manuscript includes membranes with more similar band intensities, as depicted in the new versions of Figures 1 and 4.

      Reading the paper, it was difficult to understand which source of protein was used for each experiment (i.e., E. coli or baculovirus-expressed), and this information is contradicted in several places (see lines 358-359 v. 383-384). Also, both the control protein and the His-VP3-FL proteins show up as several bands in the western blots, but they don't appear to be consistent with the sizes of the proteins stated on lines 383-384. For example, line 383 states that His-VP3-FL is ~43 kDa, but the blots show triplet bands that are all below the 35 kDa marker (Figures 1B and 1G). Mass spectrometry information is shown in the supplemental data (describing the different bands for His-VP3-FL) but this is not mentioned in the actual manuscript, causing confusion. Finally, the results appear to differ throughout the paper (see Figures 1B v. 1G and 1A v. 4A).

      Thank you for pointing out these potentially confusing points in the previous version of the manuscript. Indeed, we were able to produce recombinant VP3 from the two sources: Baculovirus and Escherichia coli. Initially, we opted for the baculovirus system, based on evidence from previous studies showing that it was suitable for ectopic expression of VP3. Subsequently, we successfully produced VP3 using Escherichia coli. On the other side, the fusion proteins His-2xFYVE and GST-2xFYVE were only produced in the prokaryotic system, also following previous reported evidence. We confirmed that VP3, produced in either system, exhibited similar behavior in our co-flotation and bio-layer interferometry (BLI) assays. However, the results of co-flotation and BLI assays shown in Figs. 1 and 4 were performed using the His-VP3 FL, His-VP3 FL R<sub>200</sub>D and His-VP3 FL DCt fusion proteins produced from the corresponding baculoviruses. We have clarified this in the revised version of our manuscript. Please, see lines 430-432.

      Additionally, we have made clear that the His-VP3 FL protein purification yielded four distinct bands, and we confirmed their VP3 identity through mass spectrometry in the revised version of the manuscript. Please, see lines 123-124.

      Finally, we replaced membranes for Figs. 4A and 1G (left panel) with those with more similar band intensities. Please, see the new version of Figures 1 and 4.

      (2) Possible "other" effects of the R<sub>200</sub>D mutation on the VP3 protein. The authors performed mutagenesis to identify which residues within patch 2 on VP3 are important for association with PI3P. They found that a VP3 mutant with an engineered R<sub>200</sub>D change (i) did not associate with PI3P membranes in co-floatation assays, and (ii) did not co-localize with EE markers in transfected cells. Moreover, this mutation resulted in the loss of IBDV viability in reverse genetics studies. The authors interpret these results to indicate that this residue is important for "mediating VP3-PI3P interaction" (line 211) and that this interaction is essential for viral replication. However, it seems possible that this mutation abrogated other aspects of VP3 function (e.g., dimerization or other protein/RNA interactions) aside from or in addition to PI3P binding. Such possibilities are not mentioned by the authors.

      The arginine amino acid at position 200 of VP3 is not located in any of the protein regions associated with its other known functions: VP3 has a dimerization domain located in the second helical domain, where different amino acids across the three helices form a total of 81 interprotomeric close contacts; however, R<sub>200</sub> is not involved in these contacts (Structure. 2008 Jan;16(1):29-37, doi:10.1016/j.str.2007.10.023); VP3 has an oligomerization domain mapped within the 42 C-terminal residues of the polypeptide, i.e., the segment of the protein composed by the residues at positions 216-257 (J Virol. 2003 Jun;77(11):6438–6449, doi: 10.1128/jvi.77.11.6438-6449.2003); VP3’s ability to bind RNA is facilitated by a region of positively-charged amino acids, identified as P1, which includes K<sub>99</sub>, R<sub>102</sub>, K<sub>105</sub>, and K<sub>106</sub> (PLoS One. 2012;7(9):e45957, doi: 10.1371/journal.pone.0045957). Furthermore, our findings indicate that the R<sub>200</sub>D mutant retains a folding pattern similar to the wild-type protein, as shown in Figure 4B. All these lead us to conclude that the loss of replication capacity of R<sub>200</sub>D viruses results from impaired, or even loss of, VP3-PI3P interaction.

      We agree with the reviewer that this is an important point and have accordingly addressed it in the Discussion section of the revised manuscript. Please, see lines 333-346.

      (3) Interpretations from computational simulations. The authors performed computational simulations on the VP3 structure to infer how the protein might interact with membranes. Such computational approaches are powerful hypothesis-generating tools. However, additional biochemical evidence beyond what is presented would be required to support the authors' claims that they "unveiled a two-stage modular mechanism" for VP3-PI3P interactions (see lines 55-59). Moreover, given the biochemical data presented for R<sub>200</sub>D VP3, it was surprising that the authors did not perform computational simulations on this mutant. The inclusion of such an experiment would help tie together the in vitro and in silico data and strengthen the manuscript.

      We acknowledge that the wording used in the previous version of the manuscript may have overstated the "unveiling" of the two-stage binding mechanism of VP3. Our intention was to propose a potential mechanism, that is consistent both with the biophysical experiments and the molecular simulations. In the revised version of the manuscript, we have tempered these claims and framed them more appropriately.

      Regarding the simulations for the R<sub>200</sub>D VP3 mutant, these simulations were indeed performed and included in the original manuscript as part of Figure S14 in the Supplementary Information. However, we realize that this was not sufficiently emphasized in the main text, an oversight on our part. We have now revised the manuscript to highlight these results more clearly.

      Additionally, to further strengthen the connection between experimental and simulation trends, we have now included a new figure in the Supplementary Information (Figure S15). This figure depicts the binding energy of VP3 ΔNt and two of its mutants, VP3 ΔNt R<sub>200</sub>D and VP3 ΔNt P2 Mut, as a function of salt concentration. The results show that as the number of positively charged residues in VP3 is systematically reduced, the binding of the protein to the membrane becomes weaker. The effect is more pronounced at lower salt concentrations, which highlights the weight of electrostatic forces on the adsorption of VP3 on negatively charged membranes. Please, see Supplementary Information (Figure S15).

      Reviewer #3 (Public Review):

      Summary:

      Infectious bursal disease virus (IBDV) is a birnavirus and an important avian pathogen. Interestingly, IBDV appears to be a unique dsRNA virus that uses early endosomes for RNA replication that is more common for +ssRNA viruses such as for example SARS-CoV-2.

      This work builds on previous studies showing that IBDV VP3 interacts with PIP3 during virus replication. The authors provide further biophysical evidence for the interaction and map the interacting domain on VP3.

      Strengths:

      Detailed characterization of the interaction between VP3 and PIP3 identified R<sub>200</sub>D mutation as critical for the interaction. Cryo-EM data show that VP3 leads to membrane deformation.

      Weaknesses:

      The work does not directly show that the identified R<sub>200</sub> residues are directly involved in VP3-early endosome recruitment during infection. The majority of work is done with transfected VP3 protein (or in vitro) and not in virus-infected cells. Additional controls such as the use of PIP3 antagonizing drugs in infected cells together with a colocalization study of VP3 with early endosomes would strengthen the study.

      In addition, it would be advisable to include a control for cryo-EM using liposomes that do not contain PIP3 but are incubated with HIS-VP3-FL. This would allow ruling out any unspecific binding that might not be detected on WB.

      The authors also do not propose how their findings could be translated into drug development that could be applied to protect poultry during an outbreak. The title of the manuscript is broad and would improve with rewording so that it captures what the authors achieved.

      In previous works from our group, we demonstrated the crucial role of the VP3 P2 region in targeting the early endosomal membranes and for viral replication, including the use of PI3K inhibitors to deplete PI3P, showing that both the control RFP-2xFYVE and VP3 lost their ability to associate with the early endosomal membranes and reduces the production of an infective viral progeny (J Virol. 2018 May 14;92(11):e01964-17, doi: 10.1128/jvi.01964-17; J Virol. 2021 Feb 24;95(6):e02313-20, doi: 10.1128/jvi.02313-20). In the present work, to further characterize the role of R<sub>200</sub> in binding to early endosomes and for viral replication, we show that: i) the transfected VP3 R<sub>200</sub>D protein loses the ability to bind to early endosomes in immunofluorescence assays (Figure 2E and Figure 3); ii) the recombinant His-VP3 FL R<sub>200</sub>D protein loses the ability to bind to liposomes PI3P(+) in co-flotation assays (Figure 4A); and, iii) the mutant virus R<sub>200</sub>D loses replication capacity (Figure 4C).

      Regarding the cryo-electron microscopy observation, we verified that there is no binding of gold particles to liposomes PI3P(-) when they are incubated solely with the gold-particle reagent, or when they are pre-incubated with the gold-particle reagent with either His-2xFYVE or His-VP3 FL. We have incorporated a new panel in Figure 1C showing a representative image of these results. Please, see lines 143-144 in the revised version of our manuscript and our revised version of Figure 1C.

      We have replaced the title of the manuscript by a more specific one. Thus, our current is " On the Role of VP3-PI3P Interaction in Birnavirus Endosomal Membrane Targeting".

      Regarding the question of how our findings could be translated into drug development, indeed, VP3-PI3P binding constitutes a good potential target for drugs that counteract infectious bursal disease. However, we did not mention this idea in the manuscript, first because it is somewhat speculative and second because infected farms do not implement any specific treatment. The control is based on vaccination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Critical issues to address:

      (1) The citations in the important paragraph on lines 101-5 are not identifiable. These references are described as showing that VP3 is associated with EEs via P2 and PI3P, which is basically what this paper also shows. The significant advance here is unclear.

      We apologize for this mistake. These citations are identifiable in the revised version of the manuscript (lines 100-105). As mentioned before, in this manuscript we present biochemical and biophysical details that have not been reported before about how VP3 connects with early endosomes, showing that it interacts directly with the PI3P. Additionally, we have now identified a critical residue in VP3 P2—the R<sub>200</sub>—for binding to PI3P and its key role in the viral life cycle. Furthermore, the molecular dynamics simulations helped us come up with a mechanism for VP3 to connect with PI3P in early endosomes. This constitutes a big step forward in our understanding of how these "non-canonical" viruses replicate.

      (2) Even if all the claims were to be clearly supported through major revamping, authors should make the significance of knowing that this protein binds to early endosomes through PI3P more clear?

      Thank you for the recommendation, which aligns with a similar suggestion from Reviewer #2. In response, we have revised the significance paragraph to emphasize the mechanistic aspects of our findings. Please refer to lines 62–67 in the revised manuscript.

      (3) Flotation assay shows binding, but this is not quantitative. An estimate of a Kd would be useful. BLI experiments suggest that half of the binding disappears at 0.5 mM, implying a very low binding affinity.

      We agree with the reviewer that our biophysical and molecular simulation results suggest a specific but weak interaction of VP3 with PI3P bearing membranes. Indeed, our previous version of the manuscript already contained a paragraph in this regard. Please, see lines 323-332 in the revised version of the manuscript.

      From a biological point of view, a low binding affinity of VP3 for the endosomes may constitute an advantage for the virus, in the sense that its traffic through the endosomes may be short lived during its infectious cycle. Indeed, VP3 has been demonstrated to be a "multifunctional" protein involved in several processes of the viral cycle (detailed in lines 84-90), and in our laboratory we have shown that the Golgi complex and the endoplasmic reticulum are organelles where further viral maturation occurs. Taking all of this into account, a high binding affinity of VP3 for endosomes could result in the protein becoming trapped on the endosomal membrane, potentially hindering the progression of the viral infection within the host cell.

      (4) There are some major internal inconsistencies in the data: Figure 1B quantifies VP3-FL T/B ratio ~4 (which appears inconsistent with the image shown, as the T lanes are much lighter than the B) whereas apparently the same experiment in Figure 1G shows it to be ~0.6. With the error bars shown, these results would appear dramatically different from each other, despite supposedly measuring the same thing. The same issue with the FYVE domain between Figures 1A and 4A.

      We appreciate the reviewer’s comment, as it made us aware of an error in Figure 1B. There, the mean value for the VP3-FL Ts/B ratio is 3.0786 for liposomes PI3P(+) and 0.4553 for liposomes PI3P(-) (Please, see the new bar graph on Figure 1B). This may have occurred because, due to the significance of these experiments, we performed multiple rounds of quantification in search of the most suitable procedure for our observations, leading to a mix-up of data sets. Anyway, it’s possible that these corrected values still seem inconsistent given that T lanes are much lighter than the B for VP3-FL in the image shown. Flotation assays are quite labor-intensive and, at least in our experience, yield fairly variable results in terms of quantification. To illustrate this point, the following image shows the three experiments conducted for Figure 1B, where it is clear that, despite producing visually distinct images, all three yielded the same qualitative observation. For Figure 1B, we chose to present the results from experiment #2. However, all three experiments contributed to a Ts/B ratio of 3.0786 for His-VP3 FL, which may account for the apparent inconsistency when focusing solely on the image in Figure 1B.

      Author response image 1.

      We acknowledge that, at first glance, some inconsistencies may appear in the results, and we have thoroughly discussed the best approach for quantification. However, we believe the observations are robust in terms of reproducibility and reliable, as the VP3-PI3P interaction was consistently validated by comparison with liposomes lacking PI3P, where no binding was observed.

      (5) Comparison of PA (or PI) to PI3P at the same molar concentration is inappropriate because PI3P has at least double charge. The more interesting question about specificity would be whether PI45P2 (or even better PI35P2) binds or not. Without this comparison, no claim to specificity can be made.

      For us, "specificity" refers to the requirement of a phosphoinositide in the endosomal membrane for VP3 binding. Phosphoinositides have a conspicuous distribution among cellular compartments, and knowing that VP3 associates with early endosomes, our specificity assays aimed to demonstrate that PI3P is strictly required for the binding of VP3. To validate this, we used PI (lacking the phosphate group) and PA (lacking the inositol group) despite their similar charges. In spite of the potential chemical interactions between VP3 and various phosphoinositides, our experimental results suggest that the virus specifically targets endosomal membranes by binding to PI3P, a phosphoinositide present only in early endosomes.

      That said, we agree with the reviewer’s point and consider adequate to smooth our specificity claim in the manuscript as follows: “We observed that His-VP3 FL bound to liposomes PI3P(+), but not to liposomes PA or PI, reinforcing the notion that a phosphoinositide is required since neither a single negative charge nor an inositol ring are sufficient to promote VP3 binding to liposomes (SI Appendix, Fig. S2)” (Lines 136-139).

      (6) In the EM images, many of the gold beads are inside the vesicles. How do they cross the membranes?

      They do not cross the membrane. Our EM images are two-dimensional projections, meaning that the gold particles located on top or beneath the plane appear to be inside the liposome.

      (7) Images in Figure 2D are very low quality and do not show the claimed difference between any of the mutants. All red signal looks basically cytosolic in all images. It is not clear what criteria were used for the quantification in Figure 2E. The same issue is in Figure 2E, where no red WT puncta are observable at all. Consistently, there is minimal colocalization in the quantification in Figure S3, which appears to show no significant differences between any of the mutants, in direct contradiction to the claim in the manuscript.

      We apologize for the poor quality of panels in Figures 2D and 2E. Unfortunately, this was due to the PDF conversion of the original files. Please, check the high-quality version of Figure 2. As suggested by reviewers #2 and #3, we have incorporated zoomed panels, which help the reader to better see the differences in distribution.

      As mentioned in the legend to Figure 2, the quantification in Figure 2D was performed by calculating the percentage of cells with punctuated fluorescent red signal (showing VP3 distribution) for each protein. The data were then normalized to the P2 WT protein, which is the VP3 wild type.

      Figure S3 certainly shows a tendency which positively correlates with the results shown in Figure 3, where we used FYVE to detect PI3P on endosomes and observed significantly less co-localization when VP3 bears its P2 region all reversed or lacks the R<sub>200</sub>

      (8) The only significant differences in colocalization are in Figure 3B, whose images look rather dramatically different from the rest of the manuscript, leading to some concern about repeatability. Also, it is unclear how colocalization is quantified, but this number typically cannot be above 1. Finally, it is unclear what is being colocalized here: with three fluorescent components, there are 3 possible binary colocalizations and an additional ternary colocalization.

      We thank the reviewer for pointing out those aspects related to Figure 3. The experiments performed for Figure 3B were conducted by a collaborator abroad handling the purified GST-2xFYVE, which recognizes endogenous PI3P, while the rest of the cell biology experiments were conducted in our laboratory in Argentina. This is why they are aesthetically different. We have made an effort in homogenizing the way they look for the revised version of the manuscript. Please, see the new version of Figure 3.

      For quantification of the co-localization of VP3 and EGFP-2xFYVE (Figure 3A), the Manders M2 coefficient was calculated out of approximately 30 cells per construct and experiment. The M2 coefficient, which reflects co-localization of signals, is defined as the ratio of the total intensities of magenta image pixels for which the intensity in the blue channel is above zero to the total intensity in the magenta channel. JACoP plugin was utilized to determine M2. For VP3 puncta co-distributing with EEA1 and GST-FYVE (Figure 3B), the number of puncta co-distributing for the three signals was manually determined out of approximately 40 cells per construct and experiment per 200 µm². We understand that Manders or Pearson coefficients, typically ranging between 0 and 1, is the most commonly used method to quantify co-localizing immunofluorescent signals; however, this “manual” method has been used and validated in previous published manuscripts [Figures 3 and 7 from (Morel et al., 2013); Figure 7 in (Khaldoun et al., 2014); and Figure 4 in (Boukhalfa et al., 2021)].

      (9) SegA/B plasmids are not introduced, and it is not clear what these are or how this assay is meant to work. Where are the foci forming units in the images of Figure 4C? How does this inform on replication? Again, this assay is not quantitative, which is essential here: does the R<sub>200</sub> mutant completely kill activity (whatever that is here)? Or reduce it somewhat?

      We apologize for the missing information. Segments A and B are basically the components of the IBDV reverse genetics system. For their construction, we used a modification of the system described by Qi and coworkers (Qi et al., 2007), in which the full length sequences of the IBDV RNA segments A and B, flanked by a hammerhead ribozyme at the 5’-end and the hepatitis delta ribozyme at the 3’-end, were expressed under the control of an RNA polymerase II promoter within the plasmids pCAGEN.Hmz.SegA.Hdz (SegA) and pCAGEN.Hmz.SegB.Hdz (SegB). For this specific experiment we generated a third plasmid, pCAGEN.Hmz.SegA.R<sub>200</sub>D.Hdz (SegA.R<sub>200</sub>D), harboring a mutant version of segment A cDNA containing the R<sub>200</sub>D substitution. Then, QM7 cells were transfected with the plasmids SegA, SegB or Seg.R<sub>200</sub>D alone (as controls) or with a mixture of plasmids SegA+SegB (wild type situation) or SegA.R<sub>200</sub>D+SegB (mutant situation). At 8 h post transfection (p.t.), when the new viruses have been able to assemble starting from the two segments of RNA, the cells were recovered and re-plated onto fresh non-transfected cells for revealing the presence (or not) of infective viruses. At 72 h post-plating, the generation of foci forming units (FFUs) was revealed by Coomassie staining. As expected, single-transfections of SegA, SegB or Seg.R<sub>200</sub>D did not produce FFUs and, as shown in Figure 4C, the transfection of SegA+SegB produced detectable FFUs (the three circles in the upper panel) while no FFUs (the three circles in the lower panel) were detected after the transfection of SegA.R<sub>200</sub>D+SegB (Figure 4C). This system is quantitative, since the FFUs detected 72 h post-plating are quantifiable by simply counting the FFUs. However, since no FFUs were detected after the transfection of SegA.R<sub>200</sub>D+SegB, evidenced by a complete monolayer of cells stained blue, we did not find any sense in quantifying. In turn, this drastic observation indicates that viruses bearing the VP3 R<sub>200</sub>D mutation lose their replication ability (is “dead”), demonstrating its crucial role in the infectious cycle.

      We agree with the reviewer that a better explanation was needed in the manuscript, so we have incorporated a paragraph in the results section of our revised version of the manuscript (lines 209-219).

      (10) Why pH 8 for simulation?

      The Molecular Theory calculations were performed at pH 8 for consistency with the experimental conditions used in our biophysical assays. These biophysical experiments were also performed at pH 8, following the conditions established in the original study where VP3 was first purified for crystallization (DOI: 10.1016/j.str.2007.10.023).

      (11) There is minimal evidence for the sequential binding model described in the abstract. The simulations do not resolve this model, nor is truly specific PI3P binding shown.

      In response to your concerns, we would like to emphasize that our simulations provide robust evidence supporting the two more important aspects of the sequential binding model: 1) Membrane Approach: In all simulations, VP3 consistently approaches the membrane via its positively charged C-terminal (Ct) region. 2) PI3P Recruitment: Once the protein is positioned flat on the membrane surface, PI3P is unequivocally recruited to the positively charged P2 region. The enrichment of PI3P in the proximity to the protein is clearly observed and has been quantified via radial distribution functions, as detailed in the manuscript and supplementary material.

      While we understand that opinions may vary on the sufficiency of the data to fully validate the model, we believe the results offer meaningful insights into the proposed binding mechanism. That said, we acknowledge that the specificity of VP3 binding may not be restricted solely to PI3P but could extend to phosphoinositides in general. To address this, we performed the new set of co-flotation experiments which are discussed in detail in our response to point 5.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 1: Consider changing the title to better reflect the mostly biochemical and computational data presented in the paper: "Mechanism of Birnavirus VP3 Interactions with PI3P-Containing Membranes". There are no data to show hijacking by a virus presented.

      We appreciate this recommendation, which was also expressed by reviewer #3. Additionally, we thank for the suggested title. We have replaced the title of the manuscript by a more specific one. Thus, our current is

      "On the Role of VP3-PI3P Interaction in Birnavirus Endosomal Membrane Targeting".

      (2) Lines 53-54 and throughout: Consider rephrasing "demonstrate" to "validate" to give credit to Gimenez et al., 2018, 2022 for discovery.

      Thanks for the suggestion. We have followed it accordingly. Please see line 52 from our revised version of the manuscript.

      (3) Line 56-59 and throughout: Consider tempering and rephrasing these conclusions that are based mostly on computational data. For example, change "unveil" to "suggest" or another term.

      We have now modified the wording throughout the manuscript.

      (4) The abstract could also emphasize that this study sought to map the resides within VP3 that are important for P13P interaction.

      Thanks for the suggestion. We have followed it accordingly. Please, see lines 53-55 from our revised version of the manuscript.

      (5) Lines 63-69: This Significance paragraph seems tangential. The findings in this paper aren't at all related to the evolutionary link between birnaviruses and positive-strand RNA viruses. The significance of the work for me lies in the deep biochemical/biophysical insights into how a viral protein interacts with membranes to nucleate its replication factory.

      We have re-written the significance paragraph highlighting the mechanistic aspect of our findings. Please, see lines 62-67 in our revised version of the manuscript.

      (6) Line 74: Please define "IDBV" abbreviation.

      We apologize for the missing information. We have defined the IBDV abbreviation in our revised version of the manuscript (please, see line 73).

      (7) Line 88: Please define "pVP2" abbreviation.

      We apologize for the missing information. We have defined the pVP2 abbreviation in our revised version of the manuscript (please, see line 87).

      (8) Lines 101-105: Please change references (8, 9, 10) to be consistent with the rest of the manuscript (names, year).

      We apologize for this mistake. These citations are identifiable and consistent in the revised version of the manuscript (lines 100-105).

      (9) Line 125: For a broad audience, consider explaining that recombinant His-2xFYVE domain is known to exhibit PI3P-binding specificity and was used as a positive control.

      Thanks for the recommendation. We have incorporated a brief explanation supporting the use of His-2xFYVE as a positive control in our revised version of the manuscript. Please, see lines 127-129.

      (10) Lines 167-171: The quantitative data in Figure S3 shows that there was a non-significant co-localization coefficient of the R<sub>200</sub>D mutant. For transparency, this should be stated in the Results section when referenced.

      We agree with this recommendation. We have clearly mentioned it in the revised version of the manuscript. Please, see lines 177-179. Also, we have referred this fact when introducing the assays performed using the purified GST-2xFYVE, shown in Figure 3. Please, see lines 182-184.

      (11) Lines 156 and 173: These Results section titles have nearly identical wording. Consider rephrasing to make it distinct.

      We agree with the reviewer’s observation. In fact, we sought to do it on purpose as for them to be a “wordplay”, but we understand that could result in a awkwarded redundancy. So, in the revised version of the manuscript, both titles are:

      Role of VP3 P2 in the association of VP3 with the EE membrane (line 163).

      VP3 P2 mediates VP3-PI3P association to EE membranes (line 182).

      (12) Line 194: Is it alternatively possible that the R<sub>200</sub>D mutant lost its capacity to dimerize, and that in turn impacted PI3P interaction?

      Thanks for the relevant question. VP3 was crystallized and its structure reported in (Casañas et al., 2008) (DOI: 10.1016/j.str.2007.10.023). In that report, the authors showed that the two VP3 subunits associate in a symmetrical manner by using the crystallographic two-fold axes. Each subunit contributes with its 30% of the total surface to form the dimer, with 81 interprotomeric close contacts, including polar bonds and van der Waals contacts. The authors identified the group of residues involved in these interactions, among which the R<sub>200</sub> is not included. Addittionally, the authors determined that the interface of the VP3 dimer in crystals is biologically meaningful (not due to the crystal packing).

      To confirm that the lack of binding was not due to misfolding of the mutant, we compared the circular dichroism spectra of mutant and wild type proteins, without detecting significant differences (shown in Figure 4B). These observations do not exclude the possibility mentioned by the reviewer, but constitute solid evidences, we believe, to validate our observations.

      (13) Lines 231-243: Consider changing verbs to past tense (i.e., change "is" to "was") for the purposes of consistency and tempering.

      Thanks for the recommendation, we have proceeded as suggested. Please, see lines 249-262 in our revised version of the manuscript.

      (14) Lines 306-308: Is there any information about whether it is free VP3 (v. VP3 complexed in RNP) that binds to membrane? I am just trying to wrap my head around how these factories form during infection.

      Thanks for pointing this out. We first observed that in infected cell, all the components of the RNPs [VP3, VP1 (the viral polymerase) and the dsRNA] were associated to the endosomes. Since by this moment it had been already elucidated that VP3 "wrapped" de dsRNA within the RNPs (Luque et al., 2009) (DOI: 10.1016/j.jmb.2008.11.029), we sought that VP3 was most probably leading this association. We answered yes after studying its distribution, also endosome-associated, when ectopically expressed. These results were published in (Delgui et al., 2013) (DOI: 10.1128/jvi.03152-12).

      Thus, in our subsequent studies, we have worked with both, the infection-derived or the ectopically expressed VP3, to advance in elucidating the mechanism by which VP3 hijacks the endosomal membranes and its relevancy for viral replication, reported in this current manuscript.

      (15) Lines 320-334: This last paragraph discussing evolutionary links between birnaviruses and positive-strand RNA viruses seems tangential and distracting. Consider reducing or removing.

      Thanks for highlighting this aspect of our work. Maybe difficult to follow, but in the context of other evidences reported for the Birnaviridae family of viruses, we strongly believe that there is an evolutionary aspect in having observed that these dsRNA viruses replicate associated to membranous organelles, a hallmark of +RNA viruses. However, we agree with the reviewer that this might not be the main point of our manuscript, so we reduced this paragraph accordingly. Please, see lines 358-367 in our revised version of the manuscript.

      (16) Lines 322-324: Change "RdRd" to "RdRp" if keeping paragraph.

      Thanks. We have corrected this mistake in lines 360 and 361.

      (17) Figures 1A, 1B, and throughout: Again, please check and explain protein sizes and amounts. This would improve the clarity of the manuscript.

      All our flotation assays were performed using 1 mM concentration of purified protein in a final volume of 100 mL (mentioned in M&M section). The complete fusion protein His-2xFYVE (shown in Figs. 1A and 4A left panel) is 954 base pairs-long and contains 317 residues (~35 kDa). The complete fusion protein His-VP3 FL (shown in Figs. 1B and 1G left panel) is 861 base pairs-long and contains 286 residues (~32 kDa). The complete fusion protein His-VP3 DCt (shown in Fig. 1G, right panel) is 753 bp-long and contains 250 residues (~28 kDa). The complete fusion protein His-VP3 FL R<sub>200</sub>D (shown in Fig. 4A right panel) is 861 bp-long and contains 286 residues (~32 kDa). This latter information was incorporated in our revised version of the manuscript. Please, see lines 381-382, 396-397 and 399-400 from the M&M section, and lines in the corresponding figure legends.

      (18) Figures 1B and 1G show different results for PI3P(+) membranes. I see protein associated with the top fraction in 1B, but I don't see any such result in 1G.

      As already mentioned, liposome-based methods, such as the co-flotation assay, are well-established and widely regarded as the preferred approach for studying protein-phosphoinositide interactions. However, this approach is rather qualitative, as density gradient separation reveals whether the protein is located in the top fractions (bound to liposomes) or the bottom fractions (unbound). Our quantifications aim to demonstrate differences in the bound fraction between liposome populations with and without PI3P. Given the setting of the co-flotation assays, each protein-liposome system [2xFYVE-PI3P(-), 2xFYVE-PI3P(+), VP3-PI3P(-), or VP3-PI3P(+)] is assessed separately, and even if the conditions are homogeneous, it’s not surprising to observe differences in the protein level between each one. Indeed, the revised version of the manuscript include a membrane for Figure 1G, were His-VP3 FL associated with the top fraction is more clear. Please, see the new version of Figure 1G.

      (19) Figure 1C: Please include cryo-EM images of the liposome PI3P(-) variables to assess the visual differences of the liposomal membranes under these conditions.

      Thanks for the recommendation. it has been verified that there is no binding of gold particles to liposomes PI3P(-) when they are incubated solely with the gold-particle reagent, or when they are pre-incubated with the gold-particle reagent with either His-2xFYVE or His-VP3 FL. We have incorporated a new panel in Figure 1C showing a representative image of these results. Please, see lines 143-144 in the revised version of our manuscript and our revised version of Figure 1C.

      (20) Figures 2D, 2E, and 3A: The puncta are not obvious in these images. Consider adding Zoomed panels.

      We apologize for this aspect of Figures 2 and 3, also highlighted by reviewer #1. We believe that this was due to the low quality resulting from the PDF conversion of the original files. For Figure 3A, we have homogenized its aspect with those from 3B. Regarding Figure 2, we have incorporated zoomed panels, as suggested. Please, see the revised versions of both Figures.

      (21) Figure 4A: There is almost no protein in the control PI3P(+) blot. Why? Also, the quantification shows no significant membrane association for this control. This result is different from Figure 1A and very confusing (and concerning).

      We apologize for the confusion. We replaced membranes for Figure 4A (left panel) with more similar band intensities to that shown in Figure 1A. Please, visit our new version of Figure 4. The quantification shows no significant difference in the association to liposomes PI3P(+) compared to liposomes PI3P(+); it’s true and this is due to, once more, the intrinsically lack of homogeneity of co-flotation assays. However, this one shown in Figure 4A is a redundant control (has been shown in Figure 1A) and we believe that the new membrane is qualitative eloquent.

      Reviewer #3 (Recommendations For The Authors):

      (1) Overall, the title is general and does not summarize the study. I recommend making the title more specific. The current title is better suited for a review as opposed to a research article. This study provides further biophysical details on the interaction. This should be reflected in the title.

      We appreciate this recommendation, which was also expressed by reviewer #2. We have chosen a new title for the manuscript: “On the Role of VP3-PI3P Interaction in Birnavirus Endosomal Membrane Targeting”.

      (2) References 8,9,10 are important but they were not correctly cited in the work, this should be corrected.

      We apologize for this mistake. These citations are identifiable in our revised version of the manuscript. See lines 100-105.

      (3) Flotation experiments and cryo-EM convincingly show that VP3 binds to membranes in a PIP3-dependent manner. However, it would be advisable to include a control for cryo-EM using liposomes that do not contain PIP3 but are incubated with HIS-VP3-FL. This would allow us to rule out any unspecific binding that might not be detected on WB.

      Thanks for the advice, also given by reviewer #2. We confirmed that no gold particles were bound on liposomes PI3P(-) even when incubated with the Ni-NTA reagent alone or pre-incubated with His-2xFYVE of His-VP3 FL. We have incorporated a new panel to Figure 1C showing a representative image of these results. Please, see lines 143-144 in the revised version of the manuscript and see the revised version of Figure 1C.

      (4) It is not clear what is the difference between WB in B and WB in G. Figure 1G seems to show the same experiment as shown in B, is this a repetition? In both cases, plots next to WBs show quantification with bars, do they represent STD or SEM? Legend A mentions significance p>0.01 (**) but the plot shows ***. This should be corrected.

      The Western blot membrane in Figure 1B shows the result of co-flotation assay using His-VP3 FL protein, while the Western blot membrane in Figure 1G (left panel) shows a co-flotation assay using His-VP3 FL protein as a positive control. In another words, in 1B the His-VP3 FL protein is the question while in 1G (left panel) it’s the co-flotation positive control for His-VP3 DCt. The bar plots next to Western blots show quantification, the mean and the STD. Thanks for highlighting this inconsistency. We have now corrected it on the revised version of the manuscript.

      (5) It would be useful to indicate positively charged residues and P2 on the AF2 predicted structure in Fig 1.

      These are indicated in panels A and B of Figure 2.

      (6) Figure 1 legend: Change cryo-fixated liposomes to cryo-fixation or better to "liposomes were vitrified". There is a missing "o" in the cry-fixation in the methods section.

      Thanks for the recommendation. We have modified Figure 1. legend to "liposomes were vitrified" (line 758), and fixed the word cryo-fixation in the methods section (line 512).

      (7) Figure 2B. It is not clear how the punctated phenotype was unbiasedly characterized (Figure 2D). I see no difference in the representative images. Magnified images should be shown. This should be measured as colocalization (Pearson's and Mander's coefficient) with an early endosomal marker Rab5. Perhaps this figure could be consolidated with Figure 3.

      Unfortunately, the lack of clarity in Figure 2D was due to the PDF conversion of the original files. Please, observe the high-quality original image above in response to reviewer #1, where we have additionally included zoomed panels, as also suggested by the other reviewers. For quantification of the co-localization of VP3 and either EGFP-Rab5 orEGFP-2xFYVE, the Manders M2 coefficient was calculated out of approximately 30 cells per construct and experiment and were shown in Figure S3 and Figure 3A, respectively, in our previous version of the manuscript.

      (8) PIP3 antagonist drugs should be used to further substantiate the results. If PIP3 specifically recruits VP3, this interaction should be abolished in the presence of PIP3 drug and VP3 should show a diffused signal.

      We certainly agree with this point. These experiments were performed and the results were reported in (Gimenez et al., 2020). Briefly, in that work, we blocked the synthesis of PI3P in QM7 cells in a stable cell line overexpressing VP3, QM7-VP3, with either the pan-PI3Kinase (PI3K) inhibitor LY294002, or the specific class III PI3K Vps34 inhibitor Vps34-IN1. In Figure 4, we showed that 98% of the cells treated with these inhibitors had the biosensor GFP-2FYVE dissociated from EEs, evidencing the depletion of PI3P in EEs (Figure 4A). In QM7-VP3 cells, we showed that the depletion of PI3P by either inhibitor caused the dissociation of VP3 from EEs and the disaggregation of VP3 puncta toward a cytosolic distribution (Figure 4B). Moreover, since this observation was crucial for our hipothesis, these results were further confirmed with an alternative strategy to deplete PI3P in EEs. We employed a system to inducibly hydrolyze endosomal PI3P through rapamycin-induced recruitment of the PI3P-myotubularin 1 (MTM1) to endosomes in cells expressing MTM1 fused to the FK506 binding protein (FKBP) and the rapamycin-binding domain fused to Rab5, using the fluorescent proteins mCherry-FKBP-MTM1 and iRFP-FRB-Rab5, as described in (Hammond et al., 2014). These results, shown in Figures 5, 6 and 7 in the same manuscript, further reinforced the notion that PI3P mediates and is necessary for the association of VP3 protein with EEs.

      (9) The authors should show the localization of VP3 in IBDV-infected cells and treat cells with PI3P antagonists. The fact that R<sub>200</sub> is not rescued does not necessarily mean that this is because of the failed interaction with PI3P. As the authors wrote in the discussion: VP3 bears multiple essential roles during the viral life cycle (line 305).

      Indeed, after having confirmed that the VP3 lost its localization associated to the endosomes after the treatment of the cells with PI3P antagonists, we demonstrated that depletion of PI3P significantly reduced the production of IBDV progeny. For this aim, we used two approaches, the inhibitor Vps34-IN1 and an siRNA against VPs34. In both cases, we observed a significantly reduced production of IBDV progeny (Figures 9 and 10). Specifically related to the reviewer’s question, the localization of VP3 in IBDV-infected cells and treated with PI3P antagonists was shown and quantified in Figure 9a.

      (10) Could you provide adsorption-free energy profiles and MD simulations also for the R<sub>200</sub> mutant?

      Following the reviewer’s suggestion, we have added a new figure to the supplementary information (Figure S15). Instead of presenting a full free-energy profile for each protein, we focused on the adsorption free energy (i.e., the minimum of the adsorption free-energy profile) for VP3 ΔNt and its mutants, VP3 ΔNt R<sub>200</sub>D and VP3 ΔNt P2 Mut, as a function of salt concentration. The aim was to compare the adsorption free energy of the three proteins and evaluate the effect of electrostatic forces on it, which become increasingly screened at higher salt concentrations. As shown in the referenced figure, reducing the number of positively charged residues from VP3 ΔNt to VP3 ΔNt P2 Mut systematically weakens the protein’s binding to the membrane. This effect is particularly pronounced at lower salt concentrations, underscoring the importance of electrostatic interactions in the adsorption of the negatively charged VP3 onto the anionic membrane.

      (11) Liposome deformations in the presence of VP3 are interesting (Figure 6G), were these also observed in Figure 1C?

      Good question. The liposome deformations in the presence of VP3 shown in Figure 6G were a robust observation since, as mentioned, it was detectable in 36% of the liposomes PI3P(+), while they were completely absent in PI3P(-) liposomes. However, and unfortunately, the same deformations were not detectable in experiments performed using gold particles shown in Figure 1C. In this regard, we think that it might be possible that the procedure of gold particles incubation itself, or even the presence of the gold particles in the images, would somehow “mask” the deformations effect.

      Bibliography

      Boukhalfa A, Roccio F, Dupont N, Codogno P, Morel E. 2021. The autophagy protein ATG16L1 cooperates with IFT20 and INPP5E to regulate the turnover of phosphoinositides at the primary cilium. Cell Rep 35:109045. doi:10.1016/j.celrep.2021.109045

      Casañas A, Navarro A, Ferrer-Orta C, González D, Rodríguez JF, Verdaguer N. 2008. Structural Insights into the Multifunctional Protein VP3 of Birnaviruses. Structure 16:29–37. doi:10.1016/j.str.2007.10.023

      Delgui LR, Rodriguez JF, Colombo MI. 2013. The Endosomal Pathway and the Golgi Complex Are Involved in the Infectious Bursal Disease Virus Life Cycle. J Virol 87:8993–9007. doi:10.1128/JVI.03152-12

      Gimenez MC, Issa M, Sheth J, Colombo MI, Terebiznik MR, Delgui LR. 2020. Phosphatidylinositol 3-Phosphate Mediates the Establishment of Infectious Bursal Disease Virus Replication Complexes in Association with Early Endosomes. J Virol 95:e02313-20. doi:10.1128/jvi.02313-20

      Hammond GRV, Machner MP, Balla T. 2014. A novel probe for phosphatidylinositol 4-phosphate reveals multiple pools beyond the Golgi. J Cell Biol 205:113–126. doi:10.1083/jcb.201312072

      Khaldoun SA, Emond-Boisjoly MA, Chateau D, Carrière V, Lacasa M, Rousset M, Demignot S, Morel E. 2014. Autophagosomes contribute to intracellular lipid distribution in enterocytes. Mol Biol Cell 25:118. doi:10.1091/mbc.E13-06-0324

      Luque D, Saugar I, Rejas MT, Carrascosa JL, Rodríguez JF, Castón JR. 2009. Infectious Bursal Disease Virus: Ribonucleoprotein Complexes of a Double-Stranded RNA Virus. J Mol Biol 386:891–901. doi:10.1016/j.jmb.2008.11.029

      Morel E, Chamoun Z, Lasiecka ZM, Chan RB, Williamson RL, Vetanovetz C, Dall’Armi C, Simoes S, Point Du Jour KS, McCabe BD, Small SA, Di Paolo G. 2013. Phosphatidylinositol-3-phosphate regulates sorting and processing of amyloid precursor protein through the endosomal system. Nature Communications 2013 4:1 4:1–13. doi:10.1038/ncomms3250

      Qi X, Gao Y, Gao H, Deng X, Bu Z, Wang Xiaoyan, Fu C, Wang Xiaomei. 2007. An improved method for infectious bursal disease virus rescue using RNA polymerase II system. J Virol Methods 142:81–88. doi:10.1016/j.jviromet.2007.01.021

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors aimed to confirm the association between the human leukocyte antigen (HLA)-II region and tuberculosis (TB) susceptibility within admixed African populations. Building upon previous findings from the International Tuberculosis Host Genetics Consortium (ITHGC), this study sought to address the limitations of small sample size and the inclusion of admixed samples by employing the Local Ancestry Allelic Adjusted (LAAA) model, as well as identify TB susceptibility loci in an admixed South African cohort. 

      Strengths: 

      The major strengths of this study include the use of six TB case-control datasets collected over 30 years from diverse South African populations and ADMIXTURE for global ancestry inference. The former represents comprehensive dataset used in this study and the later ensures accurate determination of ancestral contributions. In addition, the identified association in the HLA-DPB1 gene shows near-genomewide significance, enhancing the credibility of the findings. 

      Weaknesses: 

      The major weakness of this study includes insufficient significant discoveries and reliance on crossvalidation. This study only identified one variant significantly associated with TB status, located in an intergenic region with an unclear link to TB susceptibility. Despite identifying multiple lead SNPs, no other variants reached the genome-wide significance threshold, limiting the overall impact of the findings. The absence of an independent validation cohort, with the study relying solely on crossvalidation, is also a major limitation. This approach restricts the ability to independently confirm the findings and evaluate their robustness across different population samples. 

      Appraisal: 

      The authors successfully achieved their aims of confirming the association between the HLA-II region and TB susceptibility in admixed African populations. However, the limited number of significant discoveries, reliance on cross-validation, and insufficient discussion of model performance and SNP significance weaken the overall strength of the findings. Despite these limitations, the results support the conclusion that considering local ancestry is crucial in genetic studies of admixed populations. 

      Impact:  

      The innovative use of the LAAA model and the comprehensive dataset in this study make substantial contributions to the field of genetic epidemiology. 

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript is about using different analytical approaches to allow ancestry adjustments to GWAS analyses amongst admixed populations. This work is a follow-on from the recently published ITHGC multi-population GWAS (https://doi.org/10.7554/eLife.84394), with a focus on the admixed South African populations. Ancestry adjustment models detected a peak of SNPs in the class II HLA DPB1, distinct from the class II HLA DQA1 loci significant in the ITHGC analysis. 

      Strengths: 

      Excellent demonstration of GWAS analytical pipelines in highly admixed populations. Further confirmation of the importance of the HLA class II locus in genetic susceptibility to TB. 

      Weaknesses: 

      Limited novelty compared to the group's previous existing publications and the body of work linking HLA class II alleles with TB susceptibility in South Africa or other African populations. This work includes only ~100 new cases and controls from what has already been published. High-resolution HLA typing has detected significant signals in both the DQA1 and DPB1 regions identified by the larger ITHGC and in this GWAS analysis respectively (Chihab L et al. HLA. 2023 Feb; 101(2): 124-137). Despite the availability of strong methods for imputing HLA from GWAS data (Karnes J et Plos One 2017), the authors did not confirm with HLA typing the importance of their SNP peak in the class II region. This would have supported the importance of this ancestry adjustment versus prior ITHGC analysis. 

      The populations consider active TB and healthy controls (from high-burden presumed exposed communities) and do not provide QFT or other data to identify latent TB infection. 

      Important methodological points for clarification and for readers to be aware of when reading this paper: 

      (1) One of the reasons cited for the lack of African ancestry-specific associations or suggestive peaks in the ITHGC study was the small African sample size. The current association test includes a larger African cohort and yields a near-genome-wide significant threshold in the HLA-DPB1 gene originating from the KhoeSan ancestry. The investigation is needed as to whether the increase in power is due to increased African samples and not necessarily the use of the LAAA model as stated on lines 295 and 296? 

      Thank you for your comment. The Manhattan plot in Figure 3 includes the results for all four models: the traditional GWAS model (GAO), the admixture mapping model (LAO), the ancestry plus allelic (APA) model and the LAAA model. In this figure, it is evident that only the LAAA model identified the association peak on chromosome 6, which lends support the argument that the increase in power is due to the use of the LAAA model and not solely due to the increase in sample size. 

      (2) In line 256, the number of SNPs included in the LAAA analysis was 784,557 autosomal markers; the number of SNPs after quality control of the imputed dataset was 7,510,051 SNPs (line 142). It is not clear how or why ~90% of the SNPs were removed. This needs clarification. 

      Thank you for your recommendation. In our manuscript (line 194), we mention that “…variants with minor allele frequency (MAF) < 1% were removed to improve the stability of the association tests.” A large proportion of imputed variants fell below this MAF threshold, and were subsequently excluded from this analysis. Below, we show the number of imputed variants across MAF bins for one of our datasets [RSA(A)] to substantiate this claim:  

      Author response image 1.

      (3) The authors have used the significance threshold estimated by the STEAM p-value < 2.5x10<sup>-6</sup> in the LAAA analysis. Grinde et al. (2019 implemented their significance threshold estimation approach tailored to admixture mapping (local ancestry (LA) model), where there is a reduction in testing burden. The authors should justify why this threshold would apply to the LAAA model (a joint genotype and ancestry approach). 

      Thank you for your recommendation. We describe in the methods (line 189 onwards) that the LAAA model is an extension of the APA model. Since the APA model itself simultaneously performs the null global ancestry only model and the local ancestry model (utilised in admixture mapping), we thus considered the use of a threshold tailored to admixture mapping appropriate for the LAAA model.  

      (4) Batch effect screening and correction (line 174) is a quality control check. This section is discussed after global and local ancestry inferences in the methods. Was this QC step conducted after the inferencing? If so, the authors should justify how the removed SNPs due to the batch effect did not affect the global and local ancestry inferences or should order the methods section correctly to avoid confusion. 

      Thank you for your comments. The batch effect correction method utilised a pseudo-case-control comparison which included global ancestry proportions. Thus, batch effect correction was conducted after ancestry inference. We excluded 36 627 SNPs that were believed to have been affected by the batch effect. We have amended line 186 to include the exact number of SNPs excluded due to batch effect. 

      The ancestry inference by RFMix utilised the entire merged dataset of 7 510 051 SNPs. Thus, the SNPs removed due to the batch effect make up a very small proportion of the SNPs used to conduct global and local ancestry inferences (less than 0.5%). As a result, we do not believe that the removed SNPs would have significantly affected the global and local ancestry inferences. However, we did conduct global ancestry inference with RFMix on each separate dataset as a sanity check. In the tables below, we show the average global ancestry proportions inferred for each separate dataset, the average global ancestry proportions across all datasets and the average global ancestry proportions inferred using the merged dataset. The SAC and Xhosa cohorts are shown in two separate tables due to the different number of contributing ancestral populations to each cohort. The differences between the combined average global ancestry proportions across the separate cohorts does not differ significantly to the global ancestry proportions inferred using the merged dataset. 

      Author response table 1.

      Comparison of global ancestry proportions across the separate SAC datasets and the merged cohort.

      Author response table 2.

      Comparison of global ancestry proportions in the Xhosa dataset and the merged cohort. 

      Reviewer #1 (Recommendations for the authors): 

      Suggestions for Improved or Additional Experiments, Data, or Analyses:   

      (1) It might be beneficial to consider splitting the data into separate discovery and validation cohorts rather than relying solely on cross-validation. This approach could provide a stronger basis for independently confirming the findings. 

      Thank you for your suggestion. However, we are hesitant to divide our already modest dataset (n=1544) into separate discovery and validation cohorts, as this would reduce the statistical power to detect significant associations.

      (2) Clearly stating the process of cross-validation in the methods section and reporting relevant validation statistics, such as accuracy, sensitivity, specificity, and area under the curve (AUC), would provide a more comprehensive assessment of the model's performance.  

      Thank you for your recommendation. We would like to highlight this article, “GWAS in the southern African context” (1), which evaluated the performance of the LAAA model compared to other models in three- and five-way admixed populations. Given the thorough evaluation of the model’s performance in that study, we did not find it necessary to reassess its performance in this manuscript.   

      (3) Analysing racial cohorts separately to see if you can replicate previous results and find significant markers in combined non-African populations that are not evident in African-only samples might be useful. 

      Thank you for your suggestion. We would like to respectfully note that race is a social construct, and its use as a proxy for genetic ancestry can be problematic (2). In our study, we rather rely on genetic ancestry inferred using ancestry inference software to provide a more accurate representation of our cohort's genetic diversity. Additionally, our cohort consists mostly of a highly admixed population group, with some individuals exhibiting ancestral contributions from up to five different global populations. Therefore, it is not possible to categorize our samples into distinct “Africanonly” or “non-African” groups.

      (4) It might be worthwhile to consider using polygenic risk scores (PRS) to combine multiple genetic influences. This approach could help in identifying cumulative genetic effects that are not apparent when examining individual SNPs.  

      Thank you for your recommendation. While constructing a polygenic risk score (PRS) is beyond the scope of the current study, but an ongoing interest in our group, we recognize its potential value and will consider incorporating this approach in future research endeavours or a separate publication. A recent publication by Majara et al showed that that PRS accuracy is low for all traits and varies across ancestrally and ethnically diverse South African groups (3).

      Recommendations for Improving the Writing and Presentation: 

      Including a more thorough discussion of the methodological limitations, such as the challenges of studying admixed populations and the potential limitations of the LAAA model, would provide a more balanced perspective. 

      Thank you for your suggestion. To provide a more balanced perspective, we included the limitations of our study in the discussion, from line 429 to like 451.

      Minor Corrections to the Text and Figures: 

      Including all relevant statistics would improve clarity. For example, providing confidence intervals for the odds ratios and discussing any observed trends or outliers would be beneficial. 

      Thank you for your recommendation. We have added 95% confidence intervals to all odds ratios reported in Table 3. However, beyond the association peak identified in the HL-II region associated with the phenotype, we do not observe any other trends or outliers in or LAAA analysis.  

      Reviewer #2 (Recommendations for the authors): 

      Points for improvement: 

      (1) Related to the different datasets and inclusions in previous publications, it would also be good to better understand the different numbers of cases and controls included across the previous and current analyses, or discussion thereof. For instance, the RSA(M) dataset includes 555/440 cases/controls for this analysis and only 410/405 cases/controls in the ITHGC analysis. Other discrepancies are noted across the other published datasets compared to those included in this analysis, and these always need to be detailed in a supplement or similar to better understand if this could have introduced bias or was in fact correct based on the additional ancestry-related restriction applied.  

      Thank you for your comments. Table 1 of our manuscript lists number of individuals in the RSA(M) dataset, including related individuals. As described in line 131, related individuals were subsequently excluded during quality control: “Individual datasets were screened for relatedness using KING software (Manichaikul et al., 2010) and individuals up to second degree relatedness were removed.” The ITHGC only reported the number of unrelated individuals included their analyses, which would account for the discrepancies in the reported number of cases and controls.  

      (2) The imbalance between cases and controls in this analysis is quite striking, and it is unusual to have the imbalance favour cases over controls. This contrasts with the ITHGC, where there are substantially more controls. There is no comment on how this could potentially impact this analysis. 

      Thank you for your comment. We have included a note on our case-control imbalance in the discussion:

      “While many studies discuss methods for addressing case-control imbalances with more controls than cases (which can inflate type 1 error rates (Zhou et al. 2018; Dai et al. 2021; Öztornaci et al. 2023), few address the implications of a large case-to-control ratio like ours (952 cases to 592 controls). To assess the impact of this imbalance, we used the Michigan genetic association study (GAS) power calculator (Skol et al. 2006). Under an additive disease model with an estimated prevalence of 0.15, a disease allele frequency of 0.3, a genotype relative risk of 1.5, and a default significance level of 7 × 10<sup>-6</sup>, we achieved an expected power of approximately 75%. With a balanced sample size of 950 cases and 950 controls, power would exceed 90%, but it would drop significantly with a smaller balanced cohort of 590 cases and 590 controls. Given these results, we proceeded with our analysis to maximize statistical power despite the case-control imbalance.” 

      Author response image 2.

      Minor comments 

      (1) Referencing around key points of TB epidemiology and disease states seems out of date, given recent epidemiology reviews and seminal nature or lancet review articles. Please update.  

      Thank you for your suggestion. We have included the following recent publications in the introductory paragraph: 

      Zaidi, S. M. A., Coussens, A. K., Seddon, J. A., Kredo, T., Warner, D., Houben, R. M. G. J., & Esmail, H. (2023). Beyond latent and active tuberculosis: a scoping review of conceptual frameworks. EClinicalMedicine, 66, 102332. https://doi.org/10.1016/j.eclinm.2023.102332

      Menzies, N. A., Swartwood, N., Testa, C., Malyuta, Y., Hill, A. N., Marks, S. M., Cohen, T., & Salomon, J. A. (2021). Time Since Infection and Risks of Future Disease for Individuals with Mycobacterium tuberculosis Infection in the United States. Epidemiology, 32(1), 70–78. https://doi.org/10.1097/EDE.0000000000001271  

      Cudahy, P. G. T., Wilson, D., & Cohen, T. (2020). Risk factors for recurrent tuberculosis after successful treatment in a high burden setting: a cohort study. BMC Infectious Diseases, 20(1), 789. https://doi.org/10.1186/s12879-020-05515-4  

      Escombe, A. R., Ticona, E., Chávez-Pérez, V., Espinoza, M., & Moore, D. A. J. (2019). Improving natural ventilation in hospital waiting and consulting rooms to reduce nosocomial tuberculosis transmission risk in a low resource setting. BMC Infectious Diseases, 19(1), 88. https://doi.org/10.1186/s12879-019-3717-9  

      Laghari, M., Sulaiman, S. A. S., Khan, A. H., Talpur, B. A., Bhatti, Z., & Memon, N. (2019). Contact screening and risk factors for TB among the household contact of children with active TB: a way to find source case and new TB cases. BMC Public Health, 19(1), 1274. https://doi.org/10.1186/s12889-0197597-0  

      Matose, M., Poluta, M., & Douglas, T. S. (2019). Natural ventilation as a means of airborne tuberculosis infection control in minibus taxis. South African Journal of Science, 115(9/10). https://doi.org/10.17159/sajs.2019/5737

      Smith, M. H., Myrick, J. W., Oyageshio, O., Uren, C., Saayman, J., Boolay, S., van der Westhuizen, L., Werely, C., Möller, M., Henn, B. M., & Reynolds, A. W. (2023). Epidemiological correlates of overweight and obesity in the Northern Cape Province, South Africa. PeerJ, 11, e14723. https://doi.org/10.7717/peerj.14723  

      (2) Lines 46 to 48 appear to have two contradictory statements next to each other. The first says there are numerous GWAS investigating TB susceptibility; the second says there are sparse. Please clarify.

      Thank you for bringing this to our attention. We have amended the lines as follows: 

      “Numerous genome-wide association studies (GWASs) investigating TB susceptibility have been conducted across different population groups. However, findings from these studies often do not replicate across population groups (Möller & Kinnear, 2020; Möller et al., 2018; Uren et al., 2017).”

      (3) Add ref in line 69 for two SAC populations.

      Thank you for your recommendation. We have included the citation for the ITHGC meta-analysis paper here: 

      “The authors described possible reasons for the lack of associations, including the smaller sample size compared to the other ancestry-specific meta-analyses, increased genetic diversity within African individuals and population stratification produced by two admixed cohorts from the South African Coloured (SAC) population (Schurz et al. 2024).”

      (4) Write out abbreviations the first time they appear (Line 121).

      Thank you for your recommendation. We have corrected the sentence as follows: 

      “Monomorphic sites were removed. Individuals were screened for deviations in Hardy-Weinberg Equilibrium (HWE) for each SNP and sites deviating from the HWE threshold of 10-5 were removed.”

      (5) It would be good in the supplement to see if there is a SNP peak in chromosome 20 with a hit that reached significance in the Bantu-speaking African ancestry.

      Thank you for your recommendation. We have included a regional plot for the lead variant identified on chromosome 20 originating from Bantu-speaking African ancestry in the supplementary material (Supplementary Figure 3).

      (6) It would be good to mention the p-values of rs28383206 from the ITHGC paper in this cohort for KhoeSan and Bantu-speaking African ancestries. 

      Thank you for your suggestion. We have included the following paragraph from line 352:

      “The lead variant identified in the ITHGC meta-analysis, rs28383206, was not present in our genotype or imputed datasets. The ITHGC imputed genotypes using the 1000 Genomes (1000G) reference panel (4). Variant rs28383206 has an alternate allele frequency of 11.26% in the African population subgroup within the 1000G dataset (https://www.ncbi.nlm.nih.gov/snp/rs28383206). However, rs28383206 is absent from our in-house whole-genome sequencing (WGS) datasets, which include Bantu-speaking African and KhoeSan individuals. This absence suggests that rs28383206 might not have been imputed in our datasets using the AGR reference panel, potentially due to its low alternate allele frequency in southern African populations. Our merged dataset contained two variants located within 800 base pairs of r_s28383206: rs482205_ (6:32576009) and rs482162 (6:32576019). However, these variants were not significantly associated with TB status in our cohort (Supplementary Table 1).” Supplementary Table 1 can be found in the supplementary material:

      (7) It would improve the readability of the ancestry proportions listed on lines 236 and 237 if these population groups were linked with the corresponding specific population used in Figure 1, as has been done in Table 2.

      Thank you for your suggestion. We have amended Figure 1 to include the corresponding population labels mentioned in Table 2.  

      (8) In line 209, it is not clear why the number of alleles of a specific ancestry at a locus is referred to as a covariate in admixture mapping when the corresponding marginal effect is the parameter of interest. 

      Thank you for bringing this to our attention. We have amended the description as follows: 

      “(2) Local ancestry (LA) model:

      This model is used in admixture mapping to identify ancestry-specific variants associated with a specific phenotype. The LA model evaluates the number of alleles of a specific ancestry at a locus and includes the corresponding marginal effect as a covariate in association analyses.”

      (9) Table 3 would benefit from a column on whether the SNP was genotyped or imputed. 

      Thank you for your suggestion. We have included a column indicating whether the SNP was genotyped or imputed, as well as an additional column with the INFO score for imputed genotypes. 

      (10) The authors should remove the print and download icons in Figure 1 on lines 240 and 241.

      Thank you for your suggestion. We have amended the figure as requested.  

      (11) In the quality control, the authors use a more relaxed threshold for missingness in individuals (90%) and genotypes (5%) and have strayed away from the conventional 97%-98%. An explanation of the choice of these thresholds will be helpful to the reader.

      Thank you for your suggestion. We aimed to use similar genotype and individual missingness thresholds outline by the ITHGC meta-analysis (which utilised a threshold of 10% for both genotype and individual missingness) and the previous LAAA analysis paper performed by Swart et al. in 2021. We have amended line 116 for more clarity: 

      “Individuals with genotype call rates less than 90% and SNPs with more than 5% missingness were removed as described previously (5).”

      References  

      (1) Swart Y, van Eeden G, Uren C, van der Spuy G, Tromp G, Moller M. GWAS in the southern African context. Cold Spring Harbor Laboratory. 2022;

      (2) Byeon YJJ, Islamaj R, Yeganova L, Wilbur WJ, Lu Z, Brody LC, et al. Evolving use of ancestry, ethnicity, and race in genetics research-A survey spanning seven decades. Am J Hum Genet. 2021 Dec 2;108(12):2215–23.

      (3) Majara L, Kalungi A, Koen N, Tsuo K, Wang Y, Gupta R, et al. Low and differential polygenic score generalizability among African populations due largely to genetic diversity. HGG Adv. 2023 Apr 13;4(2):100184.

      (4) Schurz H, Naranbhai V, Yates TA, Gilchrist JJ, Parks T, Dodd PJ, et al. Multi-ancestry metaanalysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture. eLife. 2024 Jan 15;13.

      (5) Swart Y, Uren C, van Helden PD, Hoal EG, Möller M. Local ancestry adjusted allelic association analysis robustly captures tuberculosis susceptibility loci. Front Genet. 2021 Oct 15;12:716558.

    1. Authorr Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The objective of this investigation was to determine whether experimental pain could induce alterations in cortical inhibitory/facilitatory activity observed in TMS-evoked potentials (TEPs). Previous TMS investigations of pain perception had focused on motor evoked potentials (MEPs), which reflect a combination of cortical, spinal, and peripheral activity, as well as restricting the focus to M1. The main strength of this investigation is the combined use of TMS and EEG in the context of experimental pain. More specifically, Experiment 1 investigated whether acute pain altered cortical excitability, reflected in the modulation of TEPs. The main outcome of this study is that relative to non-painful warm stimuli, painful thermal stimuli led to an increase on the amplitude of the TEP N45, with a larger increase associated with higher pain ratings. Because it has been argued that a significant portion of TEPs could reflect auditory potentials elicited by the sound (click) of the TMS, Experiment 2 constituted a control study that aimed to disentangle the cortical response related to TMS and auditory activity. Finally, Experiment 3 aimed to disentangle the cortical response to TMS and reafferent feedback from muscular activity elicited by suprathreshold TMS applied over M1. The fact that the authors accompanied their main experiment with two control experiments strengthens the conclusion that the N45 TEP peak could be implicated in the perception of painful stimuli.

      Perhaps, the addition of a highly salient but non-painful stimulus (i.e. from another modality) would have further ruled out that the effects on the N45 are not predominantly related to intensity/saliency of the stimulus rather than to pain per se.

      We thank the reviewer for their comment on the possibility of whether stimulus intensity influences the N45 as opposed to pain per se. We agree that the ideal experiment would have included multiple levels of stimulation. We would argue, however, that that in Experiment 1, despite the same level of stimulus intensity for all participants (46 degrees), individual differences in pain ratings were associated with the change in the N45 amplitude, suggesting that the results cannot be explained by stimulus intensity, but rather by pain intensity.

      Reviewer #2 (Public Review):

      The authors have used transcranial magnetic stimulation (TMS) and motor evoked potentials (MEPs) and TMS-electroencephalography (EEG) evoked potentials (TEPs) to determine how experimental heat pain could induce alterations in these metrics.
In Experiment 1 (n = 29), multiple sustained thermal stimuli were administered over the forearm, with the first, second, and third block of stimuli consisting of warm but non-painful (pre-pain block), painful heat (pain block) and warm but non-painful (post-pain block) temperatures respectively. Painful stimuli led to an increase in the amplitude of the fronto-central N45, with a larger increase associated with higher pain ratings. Experiments 2 and 3 studied the correlation between the increase in the N45 in pain and the effects of a sham stimulation protocol/higher stimulation intensity. They found that the centro-frontal N45 TEP was decreased in acute pain. The study comes from a very strong group in the pain fields with long experience in psychophysics, experimental pain, neuromodulation, and EEG in pain. They are among the first to report on changes in cortical excitability as measured by TMS-EEG over M1. While their results are in line with reductions seen in motor-evoked responses during pain and effort was made to address possible confounding factors (study 2 and 3), there are some points that need attention. In my view the most important are:

      1) The method used to calculate the rest motor threshold, which is likely to have overestimated its true value : calculating highly abnormal RMT may lead to suprathreshold stimulations in all instances (Experiment 3) and may lead to somatosensory "contamination" due to re-afferent loops in both "supra" and "infra" (aka. less supra) conditions.

      The method used to assess motor threshold was the TMS motor threshold Assessment Tool (MTAT) which estimates motor threshold using maximum likelihood parametric estimation by sequential testing (Awiszus et al., 2003; Awiszus and Borckardt, 2011). This was developed as a quicker alternative for calculating motor threshold compared to the traditional Rossini-Rothwell method which involves determining the lowest intensity that evokes at least 5/10 MEPs of at least 50 microvolts. The method has been shown to achieve the same accuracy of determining motor threshold as the traditional Rossini-Rothwell method, but with fewer pulses (Qi et al., 2011; Silbert et al., 2013).

      We have now made this clearer in the manuscript:

      “The RMT was determined using the TMS motor thresholding assessment tool, which estimates the TMS intensity required to induce an MEP of 50 microvolts with a 50% probability using maximum likelihood parametric estimation by sequential testing (Awiszus, 2003; Awiszus & Borckardt, 2011). This method has been shown to achieve the accuracy of methods such as the Rossini-Rothwell method (Rossini et al., 1994; Rothwell et al., 1999) but with fewer pulses (Qi, Wu, & Schweighofer, 2011; Silbert, Patterson, Pevcic, Windnagel, & Thickbroom, 2013). The test stimulus intensity was set at 110% RMT to concurrently measure MEPs and TEPs during pre-pain, pain and post-pain blocks.”

      Therefore, the high RMTs in our study cannot be explained by the threshold assessment method. Instead, they are likely explained by aspects of the experimental setup that increased the distance between the TMS coil and the scalp, including the layer of foam placed over the coil, the EEG cap and the fact that the electrodes we used had a relatively thick profile. This has been explained in the paper:

      “We note that the relatively high RMTs are likely due to aspects of the experimental setup that increased the distance between the TMS coil and the scalp, including the layer of foam placed over the coil, the EEG cap and relatively thick electrodes (6mm)”

      Awiszus, F. (2003). TMS and threshold hunting. In Supplements to Clinical neurophysiology (Vol. 56, pp. 13-23). Elsevier.

      Qi, F., Wu, A. D., & Schweighofer, N. (2011). Fast estimation of transcranial magnetic stimulation motor threshold. Brain stimulation, 4(1), 50-57.

      Silbert, B. I., Patterson, H. I., Pevcic, D. D., Windnagel, K. A., & Thickbroom, G. W. (2013). A comparison of relative-frequency and threshold-hunting methods to determine stimulus intensity in transcranial magnetic stimulation. Clinical Neurophysiology, 124(4), 708-712.

      2) The low number of pulses used for TEPs (close to ⅓ of the usual and recommended)

      We agree that increasing the number of pulses can increase the signal to noise ratio. During piloting, participants were unable to tolerate the painful stimulus for long periods of time and we were required to minimize the number of pulses per condition.

      We note that there is no set advised number of trials in TMS-EEG research. According to the recommendations paper, the number of trials should be based on the outcome measure e.g., TEP peaks vs. frequency domain measures vs. other measures and based on previous studies investigating test-retest reliability (Hernandez-Pavon et al., 2023). The choice of 66 pulses per condition was based on the study by Kerwin et al., (2018) showing that optimal concordance between TEP peaks can be found with 60-100 TMS pulses delivered in the same run (as in the present study). The concordance was particularly higher for the N40 peak at prefrontal electrodes, which was the key peak and electrode cluster in our study. We have made this clearer:

      “Current recommendations (Hernandez-Pavon et al., 2023) suggest basing the number of TMS trials per condition on the key outcome measure (e.g., TEP peaks vs. frequency measures) and based on previous test-retest reliability studies. In our study the number of trials was based on a test-retest reliability study by (Kerwin, Keller, Wu, Narayan, & Etkin, 2018) which showed that 60 TMS pulses (delivered in the same run) was sufficient to obtain reliable TEP peaks (i.e., sufficient within-individual concordance between the resultant TEP peaks of each trial).”

      Further supporting the reliability of the TEP data in our experiment, we note that the scalp topographies of the TEPs for active TMS at various timepoints (Figures 5, 7 and 9) were similar across all three experiments, especially at 45 ms post-TMS (frontal negative activity, parietal-occipital positive activity).

      In addition to this, the interclass correlation coefficient (Two-way fixed, single measure) for the N45 to active suprathreshold TMS across timepoints for each experiment was 0.90 for Experiment 1 (across pre-pain, pain, post-pain time points), 0.74 for Experiment 2 (across pre-pain and pain conditions), and 0.95 for Experiment 3 (across pre-pain conditions). This suggests that even with the fluctuations in the N45 induced by pain, the N45 for each participant was stable across time, further supporting the reliability of our data. These ICCs are now reported in the supplementary material (subheading: Test-retest reliability of N45 Peaks).

      Hernandez-Pavon, J. C., Veniero, D., Bergmann, T. O., Belardinelli, P., Bortoletto, M., Casarotto, S., ... & Ilmoniemi, R. J. (2023). TMS combined with EEG: Recommendations and open issues for data collection and analysis. Brain Stimulatio, 16(3), 567-593

      Kerwin, L. J., Keller, C. J., Wu, W., Narayan, M., & Etkin, A. (2018). Test-retest reliability of transcranial magnetic stimulation EEG evoked potentials. Brain stimulation, 11(3), 536-544.

      Lack of measures to mask auditory noise.

      In TMS-EEG research, various masking methods have been proposed to suppress the somatosensory and auditory artefacts resulting from TMS pulses, such as white noise played through headphones to mask the click sound (Ilmoniemi and Kičić, 2010), and a thin layer of foam placed between the TMS coil and EEG cap to minimize the scalp sensation (Massimini et al., 2005). However, recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by studies that show commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination. To separate the direct cortical response to TMS from sensory evoked activity, Experiment 2 included a sham TMS condition that mimicked the auditory/somatosensory aspects of active TMS to determine whether any alterations in the TEP peaks in response to pain were due to changes in sensory evoked activity associated with TMS, as opposed to changes in cortical excitability. Therefore, the lack of auditory masking does not impact the main conclusions of the paper.

      We have made this clearer:

      “… masking methods have been used to suppress these sensory inputs, (Ilmoniemi and Kičić, 2010; Massimini et al., 2005). However recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many leading authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination.”

      Ilmoniemi, R. J., & Kičić, D. (2010). Methodology for combined TMS and EEG. Brain topography, 22, 233-248.

      Massimini, M., Ferrarelli, F., Huber, R., Esser, S. K., Singh, H., & Tononi, G. (2005). Breakdown of cortical effective connectivity during sleep. Science, 309(5744), 2228-2232.

      Biabani, M., Fornito, A., Mutanen, T. P., Morrow, J., & Rogasch, N. C. (2019). Characterizing and minimizing the contribution of sensory inputs to TMS-evoked potentials. Brain stimulation, 12(6), 1537-1552.

      Conde, V., Tomasevic, L., Akopian, I., Stanek, K., Saturnino, G. B., Thielscher, A., ... & Siebner, H. R. (2019). The non-transcranial TMS-evoked potential is an inherent source of ambiguity in TMS-EEG studies. Neuroimage, 185, 300-312.

      Rocchi, L., Di Santo, A., Brown, K., Ibáñez, J., Casula, E., Rawji, V., ... & Rothwell, J. (2021). Disentangling EEG responses to TMS due to cortical and peripheral activations. Brain stimulation, 14(1), 4-18.

      3) A supra-stimulus heat stimulus not based on individual HPT, that oscillates during the experiment and that lead to large variations in pain intensity across participants is unfortunate.

      The choice of whether to calibrate or fix stimulus intensity is a contentious question in experimental pain research. A recent discussion by Adamczyk et al., (2022) explores the pros and cons of each approach and recommends situations where one method may be preferred over the other. That paper suggests that the choice of the methodology is related to the research question – when the main outcome of the research is objective (neurophysiological measures) and researchers are interested in the variability in pain ratings, the fixed approach is preferrable. Given we explored the relationship between MEP/N45 modulation by pain and pain intensity, this question is better explored by using the same stimulus intensity for all participants, as opposed to calibrating the intensity to achieve a similar level of pain across participants.

      We have made this clearer:

      “Given we were interested in the individual relationship between pain and excitability changes, the fixed temperature of 46ºC ensured larger variability in pain ratings as opposed to calibrating the temperature of the thermode for each participant (Adamczyk et al., 2022).”.

      Adamczyk, W. M., Szikszay, T. M., Nahman-Averbuch, H., Skalski, J., Nastaj, J., Gouverneur, P., & Luedtke, K. (2022). To calibrate or not to calibrate? A methodological dilemma in experimental pain research. The Journal of Pain, 23(11), 1823-1832.

      So is the lack of report on measures taken to correct for a fortuitous significance (multiple comparison correction) in such a huge number of serial paired tests.

      Note that we used a Bayesian approach for all analyses as opposed to the traditional frequentist approach. In contrast to the frequentist approach, the Bayesian approach does not require corrections for multiple comparisons (Gelman et al., 2000) given that they provide a ratio representing the strength of evidence for the null vs. alternative hypotheses as opposed to accepting or rejecting the null hypothesis based on p-values. As such, throughout the paper, we frame our interpretations and conclusions based on the strength of evidence (e.g. anecdotal/weak, moderate, strong, very strong) as opposed to referring to the significance of the effects.

      Gelman A, Tuerlinckx F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational statistics, 15(3):373-90.

      Reviewer #3 (Public Review):

      The present study aims to investigate whether pain influences cortical excitability. To this end, heat pain stimuli are applied to healthy human participants. Simultaneously, TMS pulses are applied to M1 and TMS-evoked potentials (TEPs) and pain ratings are assessed after each TMS pulse. TEPs are used as measures of cortical excitability. The results show that TEP amplitudes at 45 msec (N45) after TMS pulses are higher during painful stimulation than during non-painful warm stimulation. Control experiments indicate that auditory, somatosensory, or proprioceptive effects cannot explain this effect. Considering that the N45 might reflect GABAergic activity, the results suggest that pain changes GABAergic activity. The authors conclude that TEP indices of GABAergic transmission might be useful as biomarkers of pain sensitivity.

      Pain-induced cortical excitability changes is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are mostly convincing, and the interpretation is adequate. The following clarifications and revisions might help to improve the manuscript further.

      1) Non-painful control condition. In this condition, stimuli are applied at warmth detection threshold. At this intensity, by definition, some stimuli are not perceived as different from the baseline. Thus, this condition might not be perfectly suited to control for the effects of painful vs. non-painful stimulation. This potential confound should be critically discussed.

      In Experiment 3, we also collected warmth ratings to confirm whether the pre-pain stimuli were perceived as different from baseline. This detail has been added to them methods:

      “In addition to the pain rating in between TMS pulses, we collected a second rating for warmth of the thermal stimulus (0 = neutral, 10 = very warm) to confirm that the participants felt some difference in sensation relative to baseline during the pre-pain block. This data is presented in the supplementary material”.

      We did not include these data in the initial submission but have now included it in the supplemental material. These data showed warmth ratings were close to 2/10 on average. This confirms that the non-painful control condition produced some level of non-painful sensation.

      2) MEP differences between conditions. The results do not show differences in MEP amplitudes between conditions (BF 1.015). The analysis nevertheless relates MEP differences between conditions to pain ratings. It would be more appropriate to state that in this study, pain did not affect MEP and to remove the correlation analysis and its interpretation from the manuscript.

      The interindividual relationship between changes in MEP amplitude and individual pain rating is statistically independent from the overall group level effect of pain on MEP amplitude. Therefore, conclusions for the individual and group level effects can be made independently.

      It is also important to note that in the pain literature, there is now increasing emphasis placed on investigating the individual level relationship between changes in cortical excitability and pain as opposed to the group level effect (Seminowicz et al., 2019; Summers et al., 2019). As such, it is important to make these results readily available for the scientific community.

      We have made this clearer:

      ‘As there is now increasing emphasis placed on investigating the individual level relationship between changes in cortical excitability and pain and not only the group level effect, (Chowdhury et al., 2022; Seminowicz et al., 2018; Seminowicz, Thapa, & Schabrun, 2019; Summers et al., 2019) we also investigated the correlations between pain ratings and changes in MEP (and TEP) amplitude”

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      Summers, S. J., Chipchase, L. S., Hirata, R., Graven-Nielsen, T., Cavaleri, R., & Schabrun, S. M. (2019). Motor adaptation varies between individuals in the transition to sustained pain. Pain, 160(9), 2115-2125.

      Seminowicz, D. A., Thapa, T., & Schabrun, S. M. (2019). Corticomotor depression is associated with higher pain severity in the transition to sustained pain: a longitudinal exploratory study of individual differences. The Journal of Pain, 20(12), 1498-1506.

      3) Confounds by pain ratings. The ISI between TMS pulses is 4 sec and includes verbal pain ratings. Considering this relatively short ISI, would it be possible that verbal pain ratings confound the TEP? Moreover, could the pain ratings confound TEP differences between conditions, e.g., by providing earlier ratings when the stimulus is painful? This should be carefully considered, and the authors might perform control analyses.

      It is unlikely that the verbal ratings contaminated the TEP response as the subsequent TMS pulse was not delivered until the verbal rating was complete and given that each participant was cued by the experimenter to provide the pain rating after each pulse (rather than the participant giving the rating at any time). As such, it would not be possible for participants to provide earlier ratings to more painful stimuli.

      We have made this clearer:

      "To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse.”

      4) Confounds by time effects. Non-painful and painful conditions were performed in a fixed order. Potential confounds by time effects should be carefully considered.

      Previous research suggests that pain alters neural excitability even after pain has subsided. In a recent meta-analysis (Chowdhury et al., 2022) we found effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved. As such, we avoided intermixing pain and warm blocks given subsequent warm blocks would not serve as a valid baseline, as each subsequent warm block would have residual effects from the previous pain blocks.

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      At the same time, given there was no conclusive evidence for a difference in N45 amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), it is unlikely that the effect of pain was an artefact of time i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45, regardless of whether the stimuli are painful or not. We will make this point in our next revision.

      We have discussed this issue:

      “Lastly, future research should consider replicating our experiment using intermixed pain and no pain blocks, as opposed to fixed pre-pain and pain blocks, to control for order effects i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45 peak, regardless of whether the stimuli are painful or not. However, we note that there was no conclusive evidence for a difference in N45 peak amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), suggesting it is unlikely that the observed effects were an artefact of time.”

      5) Data availability. The authors should state how they make the data openly available.

      We have uploaded the MEP, TEP and pain data on the Open science framework https://osf.io/k3psu/

      Reviewer #1 (Recommendations For The Authors):

      I think the study is quite solid and I only have very minor recommendations for the authors:

      • Introduction, p. 3: "Functional magnetic resonance imaging has helped us understand where in the brain pain is processed". This is an overstatement. fMRI provides us with potential biomarkers (e.g. "the pain signature"), but the specificity of these responses for pain is debated and we still do not know where in the brain pain is processed.

      We have amended to:

      “functional magnetic resonance imaging has assisted in the localization of brain structures implicated in pain processing”

      • Introduction, p. 5: "neural baseline" should be "neutral baseline"?

      We thank the reviewer for identifying this – this has now been amended.

      Reviewer #2 (Recommendations For The Authors):

      INTRODUCTION

      The introduction mentions how important extra-motor areas can be explored by TMS-EEG, then the effects of DLPFC rTMS on TEPs ... but you do not explore the DLPFC... Perhaps the introduction should be reframed.

      The current work explores cortical excitability throughout the brain (as shown in our cluster-based permutation and source localization analyses), so our investigations are in line with the introductions statement about the importance of studying non-motor areas.

      The reference to DLPFC rTMS was to highlight current existing research that has applied TMS-EEG to understand pain. It was not used as a methodological rationale to investigate the DLPFC in the present study. To make the research gap clearer, we state:

      “While these studies assist us in understanding whether TEPs might mediate rTMS-induced pain reductions, no study has investigated whether TEPs are altered in direct response to pain”

      Lignes 63-65 the term "TMS" is used to refer to motor corticospinal excitability measures, in contrast to TMS-EEG measures of TEPs. Then the authors come back to TMS-EEG and then again back to MEPs. This is rather confusing: TMS means TMS... the concept of MEP/ motor corticospinal excitability measures is not intuitive when using the term "TMS". I suggest using motor corticospinal excitability measures when referring to MEP/MEP-based measures of cortical excitability...) and M1TMS-EEG-evoked potentials (usually abbreviated to TEPs) to refer to TMS-EEG responses as measured here.

      Throughout the manuscript, we now use the term TEPs when referring to TMS-EEG measures, and MEPs when referring to TMS-EMG measure. The use of TEPs vs. MEPs will make it easier for readers to follow which measures we are referring to.

      Line 83: "As such, the precise origin of the pain mechanism cannot be localized." Please rephrase, the sentence conveys the idea that it is indeed possible to localize the origin of a pain mechanism with a different approach, and we know this is not currently possible, irrespective of the methodological setup.

      We have replaced this with:

      “This makes it unclear as to whether pain processes occur at the cortical, spinal or peripheral level.”

      How can one predetermine the temperature that will be perceived as painful by someone else, and not base it on individual HPT? This is against principles of psychophysics. Please comment. Attesting all participants had HPT below 46 is important, but then being stimulated at 46C when our HPT is 45C is different from when our HPT is 39C. Please explain why the pain intensity was not standardised based on individual HPT.

      Please refer to our response to the public review related to the issue

      Line 38: "if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline". I do not understand why it is not possible to have a pain-free baseline, followed by a pain/warm sequence.

      In our study, we had the choice of either intermixing blocks or to use a fixed sequence. Previous research suggests that pain alters neural excitability even after pain has subsided. In a recent meta-analysis (Chowdhury et al., 2022) we found effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved. As such, we avoided intermixing pain and warm blocks given subsequent warm blocks would not serve as a valid baseline, as each subsequent warm block would have residual effects from the previous pain blocks.

      We have updated the manuscript to be clearer about why we used a fixed sequence:

      “The pre-pain/pain/post-pain design has been commonly used in the TMS-MEP pain literature, as many studies have demonstrated strong changes in corticomotor excitability that persist beyond the painful period. Indeed, in a systematic review, we showed effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved (Chowdhury et al., 2022). As such, if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline”

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      Please explain, and provide evidence that stimulation of people with predetermined temperatures is able to create warm/pain/warm sensations, without entraining pain in the last warm stimulation.

      A previous study by Dube et al. (2011) used sequences of warm (36°C), painful and neutral (32° C) and found that participants did not experience pain at any time when the temperature was at a warm temperature of 36°C. We have now cited this study:

      “Based on a previous study (Dubé & Mercier, 2011) which also used sequences of painful (50ºC) and warm (36°C) thermal stimuli, we did not anticipate that the stimulus in the pain block would entrain pain in the post-pain block”

      Dubé, J. A., & Mercier, C. (2011). Effect of pain and pain expectation on primary motor cortex excitability. Clinical neurophysiology, 122(11), 2318-2323.

      METHODS

      It is not clear if participants with chronic pain, present in 20% of the general population, were excluded. If they were, please provide "how" in methods.

      We excluded participants with a history or presence of acute/chronic pain. This has now been clarified:

      “Participants were excluded if they had a history of chronic pain condition or any current acute pain”

      Line 489: the definition of warm detection threshold is unusual, please provide a reference.

      We used an identical method to Furman et al., (2020). We have made the reference to this clearer: “Warmth, cold and pain thresholds were assessed in line with a previous study (Furman et al., 2020)”

      Furman, A. J., Prokhorenko, M., Keaser, M. L., Zhang, J., Chen, S., Mazaheri, A., & Seminowicz, D. A. (2020). Sensorimotor peak alpha frequency is a reliable biomarker of prolonged pain sensitivity. Cerebral Cortex, 30(12), 6069-6082.

      In Experiment 2, please explain how the lack of randomisation between "pre-pain" and "pain" may have influenced results.

      Given we tried to replicate Experiment 1’s methodology as close as possible (to isolate the source of the effect from Experiment 1) we chose to repeat the same sequence of blocks as Experiment 1: pre-pain followed by pain.

      Given there was no conclusive evidence for a difference in N45 amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), it is unlikely that the effect of pain was an order effect i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45, regardless of whether the stimuli are painful or not.

      We now discuss the issue of randomization:

      “Lastly, future research should consider replicating our experiment using intermixed pain and no pain blocks, as opposed to fixed pre-pain and pain blocks, to control for order effects i.e. the explanation that successive thermal stimuli applied to the skin results an increase in the N45 peak, regardless of whether the stimuli are painful or not. However, we note that there was no conclusive evidence for a difference in N45 peak amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), suggesting it is unlikely that the observed effects were an artefact of time”

      Also, in Methods in general, disclose how pain intensity was assessed, and how.

      Pain intensity was assessed using a verbal rating scale (0 = no pain, and 10 = most pain imaginable). We have provided more detail:

      “During each 40 second thermal stimulus, TMS pulses were manually delivered, with a verbal pain rating score (0 = no pain, and 10 = worst pain imaginable) obtained between pulses. To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse”

      Please explain how auditory masking was made during data collection.

      Auditory masking noise was not played through the headphones, given that Experiment 2 controlled for auditory evoked potentials. We have made this clearer:

      “Auditory masking was not used. Instead, auditory evoked potentials resulting from the TMS click sound were controlled for in Experiment 2”

      Please explain if online TEP monitoring was used during data collection

      Online TEP monitoring was not available with our EEG software. We have made this clearer in the manuscript:

      “Online TEP monitoring was not available with the EEG software”

      Line 499: what is subthreshold TMS here? You are measuring TEPs, and not MEPs initially, so you may have a threshold for MEPs and TEPs, which are not the same.

      The intensity was calibrated relative to the MEP response (rather than TEP response) - this has now been clarified:

      “… and the inclusion of a subthreshold TMS (90% of resting motor threshold) condition intermixed within both the pre-pain and pain blocks.”

      Please provide a reference and a figure to illustrate the electric stimulation used in the sham procedure in Study 2

      The apparatus for the electrical stimulation is shown in Figure 7A, and was based on previous papers using electrical stimulation over motor cortex to simulate the somatosensory aspect of real TMS (Chowdhury et al., 2022; Gordon et al., 2022; Rocchi et al., 2021). We have made this clearer:

      “Electrical stimulation was based on previous studies attempting to simulate the somatosensory component of active TMS (Chowdhury et al., 2022; Gordon et al., 2022; Rocchi et al., 2021)”

      Gordon, P. C., Jovellar, D. B., Song, Y., Zrenner, C., Belardinelli, P., Siebner, H. R., & Ziemann, U. (2021). Recording brain responses to TMS of primary motor cortex by EEG–utility of an optimized sham procedure. Neuroimage, 245, 118708.

      Chowdhury, N. S., Rogasch, N. C., Chiang, A. K., Millard, S. K., Skippen, P., Chang, W. J., ... & Schabrun, S. M. (2022). The influence of sensory potentials on transcranial magnetic stimulation–Electroencephalography recordings. Clinical Neurophysiology, 140, 98-109.

      Rocchi, L., Di Santo, A., Brown, K., Ibánez, J., Casula, E., Rawji, V., ... & Rothwell, J. (2021). Disentangling EEG responses to TMS due to cortical and peripheral activations. Brain stimulation, 14(1), 4-18.

      It is not so common to use active electrodes for TMS-EEG. Please confirm the electrodes used and if they are c-ring TMS compatible and provide reference if otherwise (or actual papers recommending active ones)

      To be more specific about the electrode type we have indicated:

      “Signals were recorded from 63 TMS-compatible active electrodes (6mm height, 13mm width), embedded in an elastic cap (ActiCap, Brain Products, Germany), in line with the international 10-10 system”

      A paper directly comparing TEPs between active and passive electrodes found no difference between the two and concluded TEPs can be reliably obtained using active electrodes (Mancuso et al., 2021). There is also evidence that active electrodes have better signal quality than passive electrodes at higher impedance levels (Laszlo et al., 2014).

      This information has now been added to the paper:

      “Active electrodes result in similar TEPs (both magnitude and peaks) to more commonly used passive electrodes (Mancuso et al., 2021). There is also evidence that active electrodes have higher signal quality than passive electrodes at higher impedance levels (Laszlo, Ruiz-Blondet, Khalifian, Chu, & Jin, 2014).”

      There is a growing literature showing that monophonic pulses are not reliable for TEPs when compared to biphasic ones, please provide references. https://doi.org/10.1016/j.brs.2023.02.009

      The reference provided by the reviewer states that biphasic and monophasic pulses both have advantages and disadvantages, rather than stating “monophonic pulses are not reliable for TEPs”. While there is some evidence that the artefacts resulting from monophasic pulses are larger than biphasic pulses, the EEG signal still returns to baseline levels within 5ms of the TMS pulse (Rogasch et al., 2013). Moreover, one paper (Casula et al. 2018) found that the resultant TEPs evoked by monophasic pulses are larger than those resulting from biphasic pulses. The authors postulated that monophasic pulses are more effective at activating widespread cortical areas than biphasic pulses. Ultimately the reference provided by the reviewer concludes that “effect of pulse shape on TEPs has not been systematically investigated and more studies are needed”.

      Rogasch, N. C., Thomson, R. H., Daskalakis, Z. J., & Fitzgerald, P. B. (2013). Short-latency artifacts associated with concurrent TMS–EEG. Brain stimulation, 6(6), 868-876.

      Casula, E. P., Rocchi, L., Hannah, R., & Rothwell, J. C. (2018). Effects of pulse width, waveform and current direction in the cortex: A combined cTMS-EEG study. Brain stimulation, 11(5), 1063-1070.

      In most heads, a pulse in the PA direction is not obtained by a coil oriented 45o to the midline. The later induced later-medial pulses, good to obtain MEPs

      We followed previous studies measuring MEPs from the ECRB elbow muscle (Schabrun et al., 2016; de Martino et al., 2019) whereby the TMS coil handle was angled at 45 degrees relative to the midline in order to induce a posterior-anterior current. We are not aware of literature that shows that the 45 degrees orientation does not induce a posterior anterior current in most heads.

      Schabrun, S. M., Christensen, S. W., Mrachacz-Kersting, N., & Graven-Nielsen, T. (2016). Motor cortex reorganization and impaired function in the transition to sustained muscle pain. Cerebral Cortex, 26(5), 1878-1890.

      De Martino, E., Seminowicz, D. A., Schabrun, S. M., Petrini, L., & Graven-Nielsen, T. (2019). High frequency repetitive transcranial magnetic stimulation to the left dorsolateral prefrontal cortex modulates sensorimotor cortex function in the transition to sustained muscle pain. Neuroimage, 186, 93-102.

      The definition of RMT is (very) unusual. RMT provides small 50microV MEPs in 50% of times. If you obtain MEPs at 50microV you are supra threshold!

      The TMS motor threshold assessment tool calculates threshold in the same manner as other threshold tools – it calculates the intensity that elicits an MEP of 50 microvolts, 50% of the time. We have made this clearer:

      “The RMT was determined using the TMS motor thresholding assessment tool, which estimates the TMS intensity required to induce an MEP of 50 microvolts with a 50% probability using maximum likelihood parametric estimation by sequential testing (Awiszus and Borckardt, 2011). This method has been shown to achieve the accuracy of methods such as the Rossini-Rothwell method (Rossini et al., 1994; Rothwell et al., 1999) but with fewer pulses (Qi et al., 2011; Silbert et al., 2013).”

      Please inform the inter TMS pulse interval used of TEPs and whether they were randomly generated.

      The pulses were delivered manually – the interval was not randomly generated – as stated:

      “As TMS was delivered manually, there was no set interpulse interval. However, the 40 second stimulus duration allowed for 11 pulses for each heat stimulus …. (~ 4 seconds in between …)”

      Why have you stimulated suprathreshold on M1 when assessing TEP´s? The whole idea is that large TEPs can be obtained at lower intensities below real RMT and that prevents re-entering loops of somatosensory and joint movement inputs that insert "noise" to the TEPs.

      The suprathreshold intensity was used to concurrently measure MEPs during pre-pain, pain and post-pain blocks.

      We have made this clearer:

      “The test stimulus intensity was set at 110% RMT to concurrently measure MEPs and TEPs during pre-pain, pain and post-pain blocks.”

      The influence of re-afferent muscle activity was controlled for in Experiment 3.

      Did you assess pain intensity after each of the TEP pulses? Please discuss how such a cognitive task may have influenced results

      Pain intensity was assessed after each TMS pulse, as stated:

      “TMS pulses were manually delivered, with a verbal pain rating score (0 = no pain, and 10 = most pain imaginable) obtained between pulses”

      Reviewer 3 also brought up a concern of whether the verbal rating task might have influenced the TEPs. However, it is unlikely that the task contaminated the TEP response as the subsequent TMS pulse was not delivered until the verbal rating was complete and given that each participant was cued by the experimenter to provide the pain rating after each pulse (rather than the participant giving the rating at any time). We have made this clearer where we state:

      “To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse”

      The QST approach is unusual. Please confirm the sequence of CDT, WDT and HPT were not randomised and that no interval beyond 6sec were used. Proper references are welcome.

      In line with a previous study (Furman et al., 2020), the sequence of the CPT, WDT and HPT were not randomized, and the interval was not more than 6 seconds.

      We have made this clearer:

      “A total of three trials was conducted for each test to obtain an average, with an interstimulus interval of six seconds. The sequence of cold, warmth and pain threshold was the same for all participants (Furman et al. 2020)”

      Performing 60 pulses for TEPs is unusual, and against the minimum number in recommendations

      Please explain and comment.https://doi.org/10.1016/j.brs.2023.02.009

      Please refer to our previous response to this concern in the public reviews.

      Line 578: when you refer to "heat" the reader may confound warm/heat with heat meaning suprathreshold. Please revise the wording.

      We have now replaced the word heat stimulus with thermal stimulus.

      Why were Bayesian statistics used instead as frequentist ones?

      We have made this clearer:

      “Given we were interested in determining the evidence for pain altering TEP peaks in certain conditions (e.g., active TMS) and pain not altering TEP peaks in other conditions (sham TMS), we used a Bayesian approach as opposed to a frequentist approach, which considers the strength of the evidence for the alternative vs. null hypothesis”

      RESULTS

      There is a huge response with high power after 100ms- Please discuss if you believe auditory potentials may have influenced it.

      It is indeed possible that auditory potentials were present at 100ms. We now state:

      “Indeed, the signal at ~100ms post-TMS from Experiment 1 may reflect an auditory N100 response”

      The presence of auditory contamination does not impact the main conclusions of the paper given this was controlled for in Experiment 2.

      Please discuss how pain ranging from 3-10 may have influenced results in the "PAIN" situation,

      It is anticipated that the fixed thermal stimulus intensity approach would lead to large variations in pain ratings (Adamczyk et al., 2022). This is a recommended approach when the aim of the research is to determine relationships between neurophysiological measures and individual differences in pain sensitivity (Adamczyk et al., 2022). Indeed, we were interested in whether alterations in neurophysiological measures were associated with pain intensity, and we found that higher pain ratings were associated with smaller reductions in MEP amplitude and larger increases in N45 amplitude.

      Adamczyk, W. M., Szikszay, T. M., Nahman-Averbuch, H., Skalski, J., Nastaj, J., Gouverneur, P., & Luedtke, K. (2022). To calibrate or not to calibrate? A methodological dilemma in experimental pain research. The Journal of Pain, 23(11), 1823-1832.

      Please indicate if any participants offered pain after warm stimulation ( possible given secondary hyperalgesia after so many plateaux of heat stimulation).

      As stated in the results “All participants reported 0/10 pain during the pre-pain and post-pain blocks”.

      Please discuss the potential effects of having around 10% of "bad channels) In average per experiment per participants, its impacts in source localisation and in TEP measurement. Same for >5 epochs excluded by participant.

      The number of bad channels has been incorrectly stated by the reviewer as being 10% on average per experiment per participant, whereas the correct number of reported bad channels was 3%, 4.7% and 9.8% for Experiment 1, 2 and 3 respectively (see supplementary material). These numbers are below the accepted number of bad channels to interpolate (10%) in EEG pipelines (e.g., Debnath et al., 2020; Kayhan et al., 2022), so it is unlikely that our channel exclusions significantly influenced the quality of our source localization an TEP data.

      Debnath, R., Buzzell, G. A., Morales, S., Bowers, M. E., Leach, S. C., & Fox, N. A. (2020). The Maryland analysis of developmental EEG (MADE) pipeline. Psychophysiology, 57(6), e13580.

      Kayhan, E., Matthes, D., Haresign, I. M., Bánki, A., Michel, C., Langeloh, M., ... & Hoehl, S. (2022). DEEP: A dual EEG pipeline for developmental hyperscanning studies. Developmental cognitive neuroscience, 54, 101104.

      The number of excluded epochs is unlikely to have influenced the results given there was evidence for no difference in the number of rejected epochs between conditions (E1 BF10 = 0.145, E2 BF10 = 0.27, E3 BF10 = 0.169 – these BFs have now been reported in the supplementary material), and given the reliability of the N45 was high (see response to previous comment on the number of trials per condition).

      HPT of 42.9 {plus minus} 2.5{degree sign}C means many participants had HPT close to 46oC. Please discuss

      While some participants did indeed have pain thresholds close to 46 degrees, they nonetheless reported pain during the test blocks. While such participants may have reported less pain compared to others, we aimed for larger variations in pain ratings, given one of the research questions was to determine why pain intensity differs between individuals (given the same noxious stimulus). Indeed, we showed that this variation was meaningful (pain intensity was related to alterations in N45 and MEP amplitude).

      Please explain the sentence : line 139 "As such, if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline." I cannot see why.

      Please refer to our previous point on why the fixed sequence was included.

      And on the top of that heat was not individualised according to HPT.

      Please refer to our previous point on why we used a fixed stimulus approach.

      Sequences of warm/heat were not randomised. Please refer to our previous point on the why the sequence of blocks was not randomized.

      Line 197: "However, as this is the first study investigating the effects of experimental pain on TEPsamplitude, there were no a priori regions or timepoints of interest to compare betweenconditions". This is not clear. It means you have not measured the activity (size of the N45) under the electrode closest to the TMS coil? The TEP is supposed to by higher under the stimulated target/respective corresponding electrode…

      We are not aware of any current recommendations that state that the region of interest should be based on the site of stimulation. The advantage of TMS-EEG is that it allows characterisation of cortical excitability changes throughout the brain, not just the site of stimulation. We based our region of interest on a cluster-based permutation analysis, as recommended by Frömer, Maier, & Abdel Rahman, (2018)

      Frömer, R., Maier, M., & Abdel Rahman, R. (2018). Group-level EEG-processing pipeline for flexible single trial-based analyses including linear mixed models. Frontiers in neuroscience, 12, 48.

      Please explain where N45 values came from.

      The N45 was calculated using the TESA peak function (Rogasch et al., 2017) which identifies a data point which is larger/smaller than +/- 5 data points within a specified time window (e,g, 40-70ms post-TMS as in the present study). Where multiple peaks are found, the amplitude of the largest peak is returned. Where no peak is found, the amplitude at the specified latency is returned.

      Rogasch, N. C., Sullivan, C., Thomson, R. H., Rose, N. S., Bailey, N. W., Fitzgerald, P. B., ... & Hernandez-Pavon, J. C. (2017). Analysing concurrent transcranial magnetic stimulation and electroencephalographic data: A review and introduction to the open-source TESA software. Neuroimage, 147, 934-951.

      If only the cluster assessment was made please provide the comparison between P45 from the target TMS channel location in pre pain vs pain.

      We assume the reviewer is referring to the N45 rather than P45, and that by “target” TMS channel they are referring to the stimulated region.

      We first clarify that there is no “target” channel given the motor hotspot differs between individuals and so the channel that is closest to the site of stimulation will always differ.

      Secondly, as stated above, we are not aware of any current recommendations in TMS-EEG research that states that the region of interest for TEP analysis should be based on the site of stimulation. The advantage of TMS-EEG is that it allows characterisation of cortical excitability throughout the brain, not just the site of stimulation. If we based our ROI on the target channel only, we would lose valuable information about excitability changes occurring in other brain regions.

      Lastly, the N45 was localized at frontocentral electrodes, which is also where the cluster differences emerged. As such, we do not believe it would be informative to compare N45 peak amplitude at the region of stimulation.

      Also explain how correction for multiple comparisons was made

      Please refer to our response to the public review related to this issue.

      And report data from pain vs post-pain.

      The pain vs. post-pain comparisons are now reported in the Supplementary material.

      There is a strong possibility the response at N85 is an auditory /muscle signal. Please provide the location of this response.

      We have opted not to include the topography at 85ms in the main paper as it would introduce too much clutter into the figures (which are already very dense), and because the topography was very similar to the topography at 100ms. As an example, for the reviewer, in Author response image 1 we have shown the topography for the pre-pain condition of Experiment 1.

      Author response image 1.

      Experiment 2: I have a strong impression both active TEPs and sham TEPs were contaminated by auditory (and muscle) noise. Please explain.

      While it possible that auditory noise may have influenced TEPs in the active and sham groups, it does not impact the main conclusions of the paper, given that the purpose of the sham condition was to control for auditory and somatosensory stimulation resulting from TMS.

      While muscle activity may also affect have influenced the TEPs in active and sham conditions, we used fastICA in all conditions to suppress muscle activity. The fastICA algorithm (Rogasch et al., 2017) runs an independent component analysis on the data, and classifies components as neural, TMS-evoked muscle, eye movements and electrode noise, based on a set of heuristic thresholding rules (e.g., amplitude, frequency and topography of the components). Components classified as TMS-evoked muscle/other muscle artefacts are then removed. In the supplementary material, we further report that the number of components removed did not differ between conditions, suggesting the impact of muscle artefacts are not larger in some conditions vs. others.

      Rogasch, N. C., Sullivan, C., Thomson, R. H., Rose, N. S., Bailey, N. W., Fitzgerald, P. B., ... & Hernandez-Pavon, J. C. (2017). Analysing concurrent transcranial magnetic stimulation and electroencephalographic data: A review and introduction to the open-source TESA software. Neuroimage, 147, 934-951.

      Experiment 3: One interpretation can be that both supra and sub-threshold TMS were leading to somatosensory re-afferent responses, based on the way RMT was calculated, which hyper estimate the RMT and delivers in reality 2 types of supra-threshold stimulations. Please discuss

      Please refer to our response to the public review related to this issue.

      Please provide correlation between N45 size and MEPs amplitudes.

      This has now been included:

      “There was no conclusive evidence of any relationship between alterations in MEP amplitude during pain, and alterations in N100, N45 and P60 amplitude during pain (see supplementary material).”<br /> The supporting statistics for these analyses have been included in the supplementary material.

      DISCUSSION

      Line 303: " The present study determined whether acute experimental pain induces alterations in cortical inhibitory and/or facilitatory activity observed in TMS-evoked potentials".

      Well, no. The study assessed the N45, and was based on it. It did not really explore other metrics in a systematic fashion. P60 and N100 changes were not replicated in experiments 2 and 3..

      We assume the reviewer is stating that we did not assess other TEP peaks (such as the N15, P30 and P180). However, we did indeed assess these peaks in a systematic fashion. First, we identified the ROI by using a cluster-based analysis. This is a recommended approach when the ROI is unclear (Frömer, Maier, & Abdel Rahman, 2018). We then analysed the TEP representing the mean voltage across the electrodes within the cluster, and then identified any differences in all peaks between conditions (not just the N45). This has been made clearer in the manuscript.

      This has now been included:

      “For all experiments, the mean TEP waveform of any identified clusters from Experiment 1 were plotted, and peaks (e.g., N15, P30, N45, P60, N100) were identified using the TESA peak function (Rogasch et al., 2017)”

      Frömer, R., Maier, M., & Abdel Rahman, R. (2018). Group-level EEG-processing pipeline for flexible single trial-based analyses including linear mixed models. Frontiers in neuroscience, 12, 48.

      And the N45 is not related to facilitatory or inhibitory activity, it is a measure of an evoked response indicating excitability

      Evidence suggests the N45 is mediated by GABAAergic neurotransmission (inhibitory activity), as drugs which increase GABAA receptor activity increase the amplitude of the N45 (Premoli et al., 2014) and drugs which decrease GABAA receptor activity decrease the amplitude of the N45 (Darmani et al., 2016). As such, we and various other empirical papers (e.g., Bellardinelli et al., 2021; Noda et al., 2021; Opie at 2019 ) and review papers (Farzan & Bortoletto, 2022; Tremblay et al., 2019) have interpreted changes in the N45 peak as reflecting changes in cortical inhibitory/GABAA mediated activity.

      Premoli, I., Castellanos, N., Rivolta, D., Belardinelli, P., Bajo, R., Zipser, C., ... & Ziemann, U. (2014). TMS-EEG signatures of GABAergic neurotransmission in the human cortex. Journal of Neuroscience, 34(16), 5603-5612.

      Belardinelli, P., König, F., Liang, C., Premoli, I., Desideri, D., Müller-Dahlhaus, F., ... & Ziemann, U. (2021). TMS-EEG signatures of glutamatergic neurotransmission in human cortex. Scientific reports, 11(1), 8159.

      Darmani, G., Zipser, C. M., Böhmer, G. M., Deschet, K., Müller-Dahlhaus, F., Belardinelli, P., ... & Ziemann, U. (2016). Effects of the selective α5-GABAAR antagonist S44819 on excitability in the human brain: a TMS–EMG and TMS–EEG phase I study. Journal of Neuroscience, 36(49), 12312-12320.

      Noda, Y., Barr, M. S., Zomorrodi, R., Cash, R. F., Lioumis, P., Chen, R., ... & Blumberger, D. M. (2021). Single-pulse transcranial magnetic stimulation-evoked potential amplitudes and latencies in the motor and dorsolateral prefrontal cortex among young, older healthy participants, and schizophrenia patients. Journal of Personalized Medicine, 11(1), 54.

      Farzan, F., & Bortoletto, M. (2022). Identification and verification of a'true'TMS evoked potential in TMS-EEG. Journal of neuroscience methods, 378, 109651.

      Opie, G. M., Foo, N., Killington, M., Ridding, M. C., & Semmler, J. G. (2019). Transcranial magnetic stimulation-electroencephalography measures of cortical neuroplasticity are altered after mild traumatic brain injury. Journal of Neurotrauma, 36(19), 2774-2784.

      Tremblay, S., Rogasch, N. C., Premoli, I., Blumberger, D. M., Casarotto, S., Chen, R., ... & Daskalakis, Z. J. (2019). Clinical utility and prospective of TMS–EEG. Clinical Neurophysiology, 130(5), 802-844.

      Line 321: why have you not measured SEPs in experiment 3?

      It is not possible to directly measure the somatosensory evoked potentials resulting from a TMS pulse, given that the TMS pulse produces a range of signals including cortical activity, muscle/eye blink responses, auditory responses, somatosensory responses and other artefacts. While some researchers attempt to isolate the SEP from TMS using pre-processing methods such as ICA, others use control conditions such as sensory sham conditions (to control for the “tapping” artefact) or subthreshold intensity conditions (to control for reafferent muscle activity), as we have done in Experiment 2 and 3 of our study.

      We have now stated this in the manuscript:

      “As it is extremely challenging to isolate and filter these auditory and somatosensory evoked potentials using pre-processing pipelines, masking methods have been used to suppress these sensory inputs, (Ilmoniemi and Kičić, 2010; Massimini et al., 2005). However recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many leading authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination”

      Line 365: SICI is dependent on GABAa activity. But the way the text is written if conveys the idea that TMS pulses "activate" GABA receptors, which is weird...Please rephrase.

      This has now been reworded.

      “SICI refers to the reduction in MEP amplitude to a TMS pulse that is preceded 1-5ms by a subthreshold pulse, with this reduction believed to be mediated by GABAA neurotransmission (Chowdhury et al., 2022)”

      Reviewer #3 (Recommendations For The Authors):

      -Key references Ye et al., 2022 and Che et al., 2019 need to be included in the reference list.

      These references have now been included in the reference list.

      -Heat pain stimuli and TMS stimuli are applied simultaneously. Sometimes the term "stimulus" is used without specifying whether it refers to TMS pulses or heat pain stimuli. Clarifying this whenever the word "stimulus" is used would enhance clarity for the reader.

      We have now clarified the use of the word “stimulus” throughout the paper.

      -Panels A-D in Figure 6 should be correctly labeled in the text and the figure legend.

      Figure 6 Panel labels have now been amended.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Watanuki et al used metabolomic tracing strategies of U-13C6-labeled glucose and 13C-MFA to quantitatively identify the metabolic programs of HSCs during steady-state, cell-cycling, and OXPHOS inhibition. They found that 5-FU administration in mice increased anaerobic glycolytic flux and decreased ATP concentration in HSCs, suggesting that HSC differentiation and cell cycle progression are closely related to intracellular metabolism and can be monitored by measuring ATP concentration. Using the GO-ATeam2 system to analyze ATP levels in single hematopoietic cells, they found that PFKFB3 can accelerate glycolytic ATP production during HSC cell cycling by activating the rate-limiting enzyme PFK of glycolysis. Additionally, by using Pfkfb3 knockout or overexpressing strategies and conducting experiments with cytokine stimulation or transplantation stress, they found that PFKFB3 governs cell cycle progression and promotes the production of differentiated cells from HSCs in proliferative environments by activating glycolysis. Overall, in their study, Watanuki et al combined metabolomic tracing to quantitatively identify metabolic programs of HSCs and found that PFKFB3 confers glycolytic dependence onto HSCs to help coordinate their response to stress. Even so, several important questions need to be addressed as below:

      We sincerely appreciate the constructive feedback from the reviewer. Additional experiments and textual improvements have been made to the manuscript based on your valuable suggestions. In particular, the major revisions are as follows: First, we investigated the extent to which other metabolites, not limited to the glycolytic system, affect metabolism in HSCs after 5-FU treatment. Second, the extent to which PFKFB3 contributes to the expansion of the HSPC pool in the bone marrow was adjusted to make the description more accurate based on the data. Finally, we overexpressed PFKFB3 in HSCs derived from GO-ATeam2 mice and confirmed that PRMT1 inhibition did not reduce the ATP concentration. We believe that the reviewer's valuable comments have further deepened our knowledge of the significance of glycolytic activation by PFKFB3 that we have demonstrated. Our response to the "Recommendations for Authors" is listed first, followed by our responses to all "Public Review" comments as follows:

      (Recommendations For The Authors):

      1. The methods used in key experiments should be described in more detail. For example, in the section on ‘Conversion of GO-ATeam2 fluorescence to ATP concentration’, the knock-in strategy for GO-ATeam2 should be described, as well as U-13C6 -glucose tracer assays.

      As per your recommendation, we have described the key experimental method in more detail in the revised manuscript: the GO-ATeam2 knock-in method was reported by Yamamoto et al. 1. Briefly, they used a CAG promoter-based knock-in strategy targeting the Rosa26 locus to generate GO-ATeam2 knock-in mice. A description of the method has been added to Methods and the reference has been added to the citation.

      For the U-13C6-glucose tracer analysis, the following points were added to describe the details of the analysis: First, a note was added that the number of cells used for the in vitro tracer analysis was the number of cells used for each sample. Second, we added the solution from which the cells were collected by sorting. We added that the incubation was performed under 1% O2 and 5% CO2.

      1. Confusing image label of Supplemental Figure 1H should be corrected in line 253.

      We have corrected the incorrect figure caption on line 217 in the revised manuscript to "Supplemental Figure 1N" as you suggested.

      1. The percentage of the indicated cell population should also be shown in Figure S1B.

      As you indicated, we have included the percentages for each population in Supplemental Figure 1B.

      Author response image 1.

      1. Please pay attention to the small size of the marks in the graph, such as in Figure S1F and so on.

      As you indicated, we have corrected the very small text contained in Figure S1F. Similar corrections have been made to Figures S1B and S5A.

      1. Please pay attention to the label of line in Figure S6A-D.

      Thank you very much for the advice. We have added line labels to the graph in the original Figures S6A–D.

      (Specific comments)

      1. Based on previous reports, the authors expanded the LSK gate to include as many HSCs as possible (Supplemental Figure 1B). However, while they showed the gating strategy on Day 6 after 5-FU treatment, results from other time-points should also be displayed to ensure the strict selection of time-points.

      Thank you for pointing this out. First, we did not enlarge the Sca-1 gating in this study. We apologize for any confusion caused by the incomplete description. The gating of c-Kit is based on that shown by Umemoto et al (Figure EV1A) 2, who used 250 mg/kg 5-FU, so their c-Kit reduction is more pronounced than ours.

      We followed this study and compared c-Kit expression in Lin-Sca-1+CD150+CD48-EPCR+ gates to BMMNCs on day 6 after 5-FU administration (150 mg/kg). The results are shown below.

      Author response image 2.

      Since the MFI of c-Kit was downregulated, we used gating that extended the c-Kit gate to lower-expression regions on day 6 after 5-FU administration (revised Figure S1C). At other time points, LSK gating was the same as in the PBS-treated group, as noted in the Methods.

      1. In Figure 1, the authors examined the metabolite changes on Day 6 after 5-FU treatment. However, it is important to consider whether there are any dynamic adjustments to metabolism during the early and late stages of 5-FU treatment in HSCs compared to PBS treatment, in order to coordinate cell homeostasis despite no significant changes in cell cycle progression at other time-points.

      Thank you for pointing this out. Below are the results of the GO-ATeam2 analysis during the very early phase (day 3) and late phase (day 15) after 5-FU administration (revised Figures S7A–H).

      Author response image 3.

      In the very early phase, such as day 3 after 5-FU administration, cell cycle progression had not started (Figure S1C) and was not preceded by metabolic changes. Meanwhile, in the late phase, such as day 15 after 5-FU administration, the cell cycle and metabolism returned to a steady state. In summary, the timing of the metabolic changes coincided with that of cell cycle progression. This point is essential for discussing the cell cycle-dependent metabolic system of HSCs and has been newly included in the Results (page 11, lines 321-323).

      1. As is well known, ATP can be produced through various pathways, including glycolysis, the TCA cycle, the PPP, NAS, lipid metabolism, amino acid metabolism and so on. Therefore, it is important to investigate whether treatment with 5-FU or oligomycin affects these other metabolic pathways in HSCs.

      As the reviewer pointed out, ATP production by systems other than the glycolytic system of HSCs is also essential. In this revised manuscript, we examined the effects of the FAO inhibitor (Etomoxir, 100 µM) and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON, 2mM) alone or in combination on the ATP concentration of HSCs after PBS or 5-FU treatment. As shown below, there was no apparent decrease in ATP concentration (revised Figures S7J–M).

      Author response image 4.

      Fatty acid β-oxidation activity was also measured in 5-FU-treated HSCs using the fluorescent probe FAOBlue and was unchanged compared to PBS-treated HSCs (revised Figure S7N).

      Author response image 5.

      Notably, the addition of 100 µM etomoxir plus glucose and Pfkfb3 inhibitors resulted in a rapid decrease in ATP concentration in HSCs (revised Figures S7O–P). This indicates that etomoxir partially mimics the effect of oligomycin, suggesting that at a steady state, OXPHOS is driven by FAO, but can be compensated by the acceleration of the glycolytic system by Pfkfb3. Meanwhile, the exposure of HSCs to Pfkfb3 inhibitors in addition to 2 mM DON, which is an extremely high dose considering that the Ki value of DON for glutaminase is 6 µM, did not reduce ATP (revised Figures S7O–P). This suggests that ATP production from glutaminolysis is limited in HSCs at a steady state.

      Author response image 6.

      These points suggest that OXPHOS is driven by fatty acids at a steady state, but unlike the glycolytic system, FAO is not further activated by HSCs after 5-FU treatment. The results of these analyses and related descriptions are included in the revised manuscript (page 11, lines 332-344).

      1. In part 2, they showed that oligomycin treatment of HSCs exhibited activation of the glycolytic system, but what about the changes in ATP concentration under oligomycin treatment? Are other metabolic systems affected by oligomycin treatment?

      Thank you for your thoughtful comments. The relevant results we have obtained so far with the GO-ATeam2 system are as follows: First, OXPHOS inhibition in the absence of glucose significantly decreases the ATP concentration of HSCs (Figure 4C). Meanwhile, OXPHOS inhibition in the presence of glucose maintains the ATP concentration of HSCs (Figure 5B). Since it is difficult to imagine a completely glucose-free environment in vivo, it is thought that ATP concentration is maintained by the acceleration of the glycolytic system even under hypoxic or other conditions that inhibit OXPHOS.

      Meanwhile, glucose tracer analysis shows that OXPHOS inhibition suppresses nucleic acid synthesis (NAS) except for the activation of the glycolytic system (Figures 2C–F). This is because phosphate groups derived from ATP are transferred to nucleotide mono-/di-phosphate in NAS, but OXPHOS, the main source of ATP production, is impaired, along with the enzyme conjugated with OXPHOS in the process of NAS (dihydroorotate dehydrogenase, DHODH). We have added a new paragraph in the Discussion section (page 17, lines 511-515) to provide more insight to the reader by summarizing and discussing these points.

      1. In Figure 5M, it would be helpful to include a control group that was not treated with 2-DG. Additionally, if Figure 5L is used as the control, it is unclear why the level of ATP does not show significant downregulation after 2-DG treatment. Similarly, in Figure 5O, a control group with no glucose addition should be included.

      Thank you for your advice. The experiments corresponding to the control groups in Figures 5M and O were in Figures 5L and N, respectively, but we have combined them into one graph (revised Figures 5L–M). The results more clearly show that PFKFB3 overexpression enhances sensitivity to 2-DG, but also enhances glycolytic activation upon oligomycin administration.

      Author response image 7.

      1. In this study, their findings suggest that PFKFB3 is required for glycolysis of HSCs under stress, including transplantation. In Figure 7B, the results showed that donor-derived chimerism in PB cells decreased relative to that in the WT control group during the early phase (1 month post-transplant) but recovered thereafter. Although the transplantation cell number is equal in two groups of donor cells, it is unclear why the donor-derived cell count decreased in the 2-week post-transplantation period and recovered thereafter in the Pfkgb3 KO group. Therefore, they should provide an explanation for this. Additionally, they only detected the percentage of donor-derived cells in PB but not from BM, which makes it difficult to support the argument for Increasing the HSPC pool.

      As pointed out by the reviewer, it is interesting to note that the decrease in peripheral blood chimerism in the PFKFB3 knockout is limited to immediately after transplantation and then catches up with the control group (Figure 7B). We attribute this to the fact that HSPC proliferation is delayed immediately after transplantation in PFKFB3 deficiency, but after a certain time, PB cells produced by the delayed proliferating HSPCs are supplied. In support of this, the PFKFB3 knockout HSPCs did not exhibit increased cell death after transplantation (Figure 7K), while a delayed cell cycle was observed (Figures 7G–J). A description of this point has been added to the Discussion (page 19, lines 573-579).

      In addition, the knockout efficiency in bone marrow cells could not be verified because the number of cells required for KO efficiency analysis was not available. Therefore, we have added a statement on this point and have toned down our overall claim regarding the extent to which PFKFB3 is involved in the expansion of the HSPC pool (page 15, lines 474-476).

      1. In Figure 7E, they collected the BM reconstructed with Pfkfb3- or Rosa-KO HSPCs two months after transplantation, and then tested their resistance to 5-FU. However, the short duration of the reconstruction period makes it difficult to draw conclusions about the effects on steady-state blood cell production.

      We agree that we cannot conclude from this experiment alone that PFKFB3 is completely unnecessary in steady state because, as you pointed out, the observation period of the experiment in Figure 7E is not long. We have toned down the claim by stating that PFKFB3 is only less necessary in steady-state HSCs compared to proliferative HSCs (page 15, lines 460-461).

      1. PFK is allosterically activated by PFKFB, and other members of the PFKFB family could also participate in the glycolytic program. Therefore, they should investigate their function in contributing to glycolytic plasticity in HSCs during proliferation. Additionally, they should also analyze the protein expression and modification levels of other members. Although PFKFB3 is the most favorable for PFK activation, the role of other members should also be explored in HSC cell cycling to provide sufficient reasoning for choosing PFKFB3.

      To further justify why we chose PFKFB3 among the PFKFB family members, we reviewed our data and the publicly available Gene Expression Commons (GEXC) 3. PFKFB3 is the most highly expressed member of the PFKFB family in HSCs (revised Figure 4F), and its expression increases with proliferation (Author response image 9). In addition to this, we have also cited the literature 4 indicating that AZ PFKFB3 26 is a Pfkfb3-specific inhibitor that we used in this paper, and added a note to this point (that it is specific) (page 11, lines 327-329). Through these revisions, we sought to strengthen the rationale for Pfkfb3 as the primary target of the analysis.

      Author response image 8.

      Author response image 9.

      1. In this study, the authors identified PRMT1 as the upstream regulator of PFKFB3 that is involved in the glycolysis activation of HSCs. However, PRMT1 is also known to participate in various transcriptional activations. Thus, it is important to determine whether PRMT1 affects glycolysis through transcriptional regulation or through its direct regulation of PFKFB3? Additionally, the authors should investigate whether PRMT1i inhibits ATP production in normal HSCs. Moreover, could we combine Figure 6I and 6J for analysis. Finally, the authors could conduct additional rescue experiments to demonstrate that the effect of PRMT1 inhibitors on ATP production can be rescued by overexpression of PFKFB3.

      Although PRMT1 inhibition reduced m-PFKFB3 levels in HSCs, 5-FU treatment also reduced or did not alter Pfkfb3 transcript levels (Figures 6B, G) and the expression of genes such as Hoxa7/9/10, Itga2b, and Nqo1, which are representative transcriptional targets of PRMT1, in proliferating HSCs after 5-FU treatment (revised Figure S9).

      Author response image 10.

      These results suggest that PRMT1 promotes PFKFB3 methylation, which increases independently of transcription in HSCs after 5-FU treatment.

      A summary analysis of the original Figures 6I and 6J is shown below (revised Figure 6I).

      Author response image 11.

      Finally, we tested whether the inhibition of the glycolytic system and the decrease in ATP concentration due to PRMT1 inhibition could be rescued by the retroviral overexpression of PFKFB3. We found that PFKFB3 overexpression did not decrease the ATP concentration in HSCs due to PRMT1 inhibition (revised Figure 6J). Therefore, PFKFB3 overexpression mitigated the decrease in ATP concentration caused by PRMT1 inhibition. These data and related statements have been added to the revised manuscript (page 14, lines 427-428).

      Author response image 12.

      Reviewer #2:

      In the manuscript Watanuki et al. want to define the metabolic profile of HSCs in stress/proliferative (myelosuppression with 5-FU), and mitochondrial inhibition and homeostatic conditions. Their conclusions are that during proliferation HSCs rely more on glycolysis (as other cell types) while HSCs in homeostatic conditions are mostly dependent on mitochondrial metabolism. Mitochondrial inhibition is used to demonstrate that blocking mitochondrial metabolism results in similar features of proliferative conditions.

      The authors used state-of-the-art technologies that allow metabolic readout in a limited number of cells like rare HSCs. These applications could be of help in the field since one of the major issues in studying HSCs metabolism is the limited sensitivity of the“"standard”" assays, which make them not suitable for HSC studies.

      However, the observations do not fully support the claims. There are no direct evidence/experiments tackling cell cycle state and metabolism in HSCs. Often the observations for their claims are indirect, while key points on cell cycle state-metabolism, OCR analysis should be addressed directly.

      We sincerely appreciate the reviewer's constructive comments. Thank you for highlighting the importance of the highly sensitive metabolic assay developed in this study and the findings based on it. Meanwhile, the reviewer's comments have made us aware of areas where we can further improve this manuscript. In particular, in the revised manuscript, we have performed further studies to demonstrate the link between the cell cycle and metabolic state. Specifically, we further subdivided HSCs by the uptake of in vivo-administered 2-NBDG and performed cell cycle analysis. Next, HSCs after PBS or 5-FU treatment were analyzed by a Mito Stress test using the Seahorse flux analyzer, including ECAR and OCR, and a more direct relationship between the cell cycle state and the metabolic system was found. We believe that the reviewer's valuable suggestions have helped us clarify more directly the importance of the metabolic state of HSCs in response to cell cycle and stress that we wanted to show and emphasize the usefulness of the GO-ATeam2 system. Our response to "Recommendations For The Authors" is listed first, followed by our responses to all comments in "Public Review" as follows:

      (Recommendations For The Authors):

      In general, I believe it would be important:

      1. to directly associate cell cycle state with metabolic state. For example, by sorting HSC (+/- 5FU) based on their cell cycle state (exploiting the mouse model presented in the manuscript or by defining G0/G1/G2-S-M via Pyronin/Hoechst staining which allow to sort live cells) and follow the fate of radiolabeled glucose.

      Thank you for raising these crucial points. Unfortunately, it was difficult to perform the glucose tracer analysis by preparing HSCs with different cell cycle states as you suggested due to the amount of work involved. In particular, in the 5-FU group, more than 60 mice per group were originally required for an experiment, and further cell cycle-based purification would require many times that number of mice, which we felt was unrealistic under current technical standards. As an alternative, we administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than those with low 2-NBDG uptake and are comparable to HSCs after 5-FU treatment, although the overall population of HSCs exiting the G0 phase and entering the G1 phase increased after 5-FU treatment. In both PBS/5-FU-treated groups, these large differences in cell cycle glucose utilization suggest a direct link between HSC proliferation and glycolysis activation. If a more sensitive type of glucose tracer analysis becomes available in the future, it may be possible to directly address the reviewer's comments. We see this as a topic for the future. The descriptions of the above findings and perspectives have been added to the Results and Discussion section (page 7, lines 208-214, page 20, lines 607-610).

      Author response image 13.

      1. Use other radio labeled substrates (fatty acid, glutamate)

      Thank you very much for your suggestion. While this is an essential point for future studies, we believe it is not the primary focus of the paper. We are planning another research project on tracer analysis using labeled fatty acids and glutamates, which we will report on in the near future. We have clearly stated in the Abstract and Introduction of the revised manuscript, that the focus of this study is on changes in glucose metabolism when HSCs are stressed (page 3, line 75 and 87, page 5, lines 135).

      Instead, we added the following analyses of metabolic changes in fatty acids and glutamate using the GO-ATeam2 system. HSCs derived from GO-ATeam2 mice treated with PBS or 5-FU were used to measure changes in ATP concentrations after exposure to the fatty acid beta-oxidation (FAO) inhibitor etomoxir and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON). Etomoxir was used at 100 µM, a concentration that inhibits FAO without inhibiting mitochondrial electron transfer complex I, as previously reported 5. DON was used at 2 mM, a concentration that sufficiently inhibits the enzyme as the Ki for glutaminase is 6 µM. In this experiment, etomoxir alone, DON alone, or etomoxir and DON in combination did not decrease the ATP concentration of HSCs in the PBS and 5-FU groups (revised Figures S7J–M), suggesting that FAO and glutaminolysis were not essential for ATP production in HSCs in the short term. Thus, according to the analysis using the GO-Ateam2 system, HSCs exposed to acute stresses change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. Since there are reports that FAO and glutaminolysis are required for HSC maintenance in the long term 5,6, compensatory pathways may be able to maintain ATP levels in the short term. A description of these points has been added to the Discussion (page 11, lines 332-344).

      Author response image 14.

      1. Include OCR analyses.

      In addition to the ECAR data of the Mito Stress test (original Figures 2G–H), OCR data were added to the revised manuscript (revised Figures 2H, S3D). Compared to c-Kit+ myeloid progenitors (LKS- cells), HSC showed a similar increase in ECAR, while the decrease in OCR was relatively limited. A possible explanation for this is that glycolytic and mitochondrial metabolism are coupled in c-Kit+ myeloid progenitors, whereas they are decoupled in HSCs. This is also suggested by the glucose plus oligomycin experiment in Figures 5B, C, and S6A–D (orange lines). In summary, in HSCs, glycolytic and mitochondrial ATP production are decoupled and can maintain ATP levels by glycolytic ATP production alone, whereas in progenitors including GMPs, the two ATP production systems are constantly coupled, and glycolysis alone cannot maintain ATP concentration. We have added descriptions of these points in the Results and Discussion section (page 8, lines 240-243, page 18, lines 558-561).

      Author response image 15.

      Next, a Mito Stress test was performed using HSCs derived from PBS- or 5-FU-treated mice in the presence or absence of oligomycin (revised Figures 1G–H, S3A–B). Without oligomycin treatment, ECAR in 5-FU-treated HSCs was higher than in PBS-treated HSCs, and OCR was unchanged. Oligomycin treatment increased ECAR in both PBS- and 5-FU-treated HSCs, whereas OCR was unchanged in PBS-treated HSCs, but significantly decreased in 5-FU-treated HSCs. Changes in ECAR in response to oligomycin differed between HSC proliferation or differentiation: ECAR increased in 5-FU-treated HSCs but not in LKS- progenitors (original Figures 2G–H). This suggests a metabolic feature of HSCs in which the coupling of OXPHOS with glycolysis seen in LKS- cells is not essential in HSCs even after cell cycle entry. The results and discussion of this experiment have been added to page 7, lines 194-201 and page 18, lines 558-561).

      Author response image 16.

      1. Correlate proliferation-mitochondrial inhibition-metabolic state

      We agree that it is important to clarify this point. First, OXPHOS inhibition and proliferation similarly accelerate glycolytic ATP production with PFKFB3 (Figures 4G, I, and 5F–I). Meanwhile, oligomycin treatment rapidly decreases ATP in HSCs with or without 5-FU administration (Figure 4C). These results suggest that OXPHOS is a major source of ATP production both at a steady state and during proliferation, even though the analysis medium is pre-saturated with hypoxia similar to that in vivo. This has been added to the Discussion section (page 17, lines 520-523).

      1. Tune down the claim on HSCs in homeostatic conditions since from the data it seems that HSCs rely more on anaerobic glycolysis.

      Thanks for the advice. The original Figures S2C, D, F, and G show that HSC is dependent on the anaerobic glycolytic system even at a steady state, so we have toned down our claims (page 7, lines 192-194).

      1. For proliferative HSCs mitochondrial are key. When you block mitochondria with oligomycin there's the biggest drop in ATP.

      In the revised manuscript, we have tried to highlight the key findings that you have pointed out. First, we mentioned in the Discussion (page 17, lines 523-525) that previous studies suggested the importance of mitochondria in proliferating HSCs. Meanwhile, the GO-ATeam2 and glucose tracer analyses in this study newly revealed that the glycolytic system activated by PFKFB3 is activated during the proliferative phase, as shown in Figure 4C. We also confirmed that mitochondrial ATP production is vital in proliferating HSCs, and we hope to clarify the balance between ATP-producing pathways and nutrient sources in future studies.

      1. To better clarify this point authors, authors should do experiments in hypoxic conditions and compare it to oligomycin treatment and showing that mito-inhibition acts differently on HSCs (considering that all these drugs are toxic for mitochondria and induce rapidly stress responses ex: mitophagy).

      We apologize for any confusion caused by not clearly describing the experimental conditions. As pointed out by the reviewer, we also recognize the importance of experiments in a hypoxic environment. All GO-ATeam2 analyses were performed in a medium saturated sufficiently under hypoxic conditions and analyzed within minutes, so we believe that the medium did not become oxygenated (page S5-S6, lines 160-163 in the Methods). Despite being conducted under such hypoxic conditions, the substantial decrease in ATP after oligomycin treatment is intriguing (original Figures 4C, 5B, 5C). The p50 value of mitochondria (the partial pressure of oxygen at which respiration is half maximal) is 0.1 kPa, which is less than 0.1% of the oxygen concentration at atmospheric pressure 7. Thus, biochemically, it is consistent that OXPHOS can maintain sufficient activity even in a hypoxic environment like the bone marrow. We are currently embarking on a study to determine ATP concentration in physiological hypoxic conditions using in vivo imaging within the bone marrow, which we hope to report in a separate project. We have discussed these points, technical limitations, and perspectives in the Discussion section (page 20, lines 610-612).

      • In Figure 1 C, D, E and F, the comparison should be done as unpaired t test and the control group should not be 1 as the cells comes from different individuals.

      Thank you very much for pointing this out. We have reanalyzed and revised the figures (revised Figures 1C–F)

      Author response image 17.

      • In Figure S2A, the post-sorting bar of 6PG, R5P and S7P are missing.

      Metabolites below the detection threshold (post-sorting samples of 6PG, R5P, and S7P) are now indicated as N.D. (not detected) (revised Figure S2A).

      Author response image 18.

      • In the 2NBDG experiments, authors should add the appropriate controls, since it has been shown that 2NBDG cellular uptake do not correctly reflect glucose uptake (Sinclair LV, Immunometabolism 2020) (a cell type dependent variations) thus inhibitors of glucose transporters should be added as controls (cytochalasin B; 4,6-O-ethylidene-a-D-glucose) it would be quite challenging to test it in vivo but it would be sufficient to show that in vitro in the different HSPCs analyzed.

      We appreciate the essential technical point raised by the reviewer. In the revised manuscript, we performed a 2-NBDG assay with cytochalasin B and phloretin as negative controls. After PBS treatment, 2-NBDG uptake was higher in 5-FU-treated HSCs compared to untreated HSCs. This increase was inhibited by both cytochalasin B and phloretin. In PBS-treated HSCs, cytochalasin B did not downregulate 2-NBDG uptake, whereas phloretin did. Although cytochalasin B inhibits glucose transporters (GLUTs), it is also an inhibitor of actin polymerization. Therefore, its inhibitory effect on GLUTs may be weaker than that of phloretin. We have revised the figure (revised Figure S1L) and added the corresponding description (page 7, lines 207-208).

      Author response image 19.

      • S5C: authors should show the cell number for each population. If there's a decreased in % in Lin- that will be reflected in all HSPCs. Comparing the proportion of the cells doesn't show the real impact on HSPCs.

      Thank you for your insightful point. In the revision, we compared the numbers, not percentages, of HSPCs and found no difference in the number of cells in the major HSPC fractions in Lin-. The figure has been revised (revised Figure S6C) and the corresponding description has been added (page 10, lines 296-299).

      Author response image 20.

      Minor:

      1. In S1 F-G is not indicated in which day post 5FU injection is done the analysis. I assume on day 6 but it should be indicated in the figure legend and/or text.

      Thank you for pointing this out. As you assumed, the analysis was performed on day 6. The description has been added to the legend of the revised Figure S1G.

      1. S1K is not described in the text. What are proliferative and quiescence-maintaining conditions? The analyses are done by flow using LKS SLAM markers after culture? How long was the culture?

      Thank you for your comments. First, the figure citation on line 250 was incorrect and has been corrected to Figure S1N. Regarding the proliferative and quiescence-maintaining conditions, we have previously reported on these 8. In brief, these are culture conditions that maintain HSC activity at a high level while allowing for the proliferation or maintenance of HSCs in quiescence, achieved by culturing under fatty acid-rich, hypoxic conditions with either high or low cytokine concentrations. Analysis was performed after one week of culture, with the HSC number determined by flow cytometry based on the LSK-SLAM marker. While these are mentioned in the Methods section, we have added a description in the main text to highlight these points for the reader (page 7, lines 214-217).

      1. In Figure 5G, why does the blue line (PFKFB3 inhibitor) go up in the end of the real-time monitoring? Does it mean that other compensatory pathway is turned on?

      As you have pointed out, we cannot rule out the possibility that other unknown compensatory ATP production pathways were activated. We have added a note in the Discussion section to address this (page 18, lines 555-556).

      1. In Figure S6H&J, the reduction is marginal. Does it mean that PKM2 is not important for ATP production in HSCs?

      The activity of the inhibitor is essential in the GO-ATeam2 analysis. The commercially available PKM2 inhibitors have a higher IC50 value (IC50 = 2.95 μM in this case). Nevertheless, the effect of reducing the ATP concentration was observed in progenitor cells, but not in HSCs. The report by Wang et al. 9 on the analysis using a PKM2-deficient model suggests a stronger effect on progenitor cells than on HSCs. Our results are similar to those of the previous report.

      (Specific comments)

      Specifically, there are several major points that rise concerns about the claims:

      1. The gating strategy to select HSCs with enlarged Sca1 gating is not convincing. I understand the rationale to have a sufficient number of cells to analyze, however this gating strategy should be applied also in the control group. From the FACS plot seems that there are more HSCs upon 5FU treatment (Figure S1b). How that is possible? Is it because of the 20% more of cycling cells at day 6? To prove that this gating strategy still represents a pure HSC population, authors should compare the blood reconstitution capability of this population with a "standard" gated population. If the starting population is highly heterogeneous then the metabolic readout could simply reflect cell heterogeneity.

      Thank you for pointing this out. First, we did not enlarge the Sca-1 gating in this study. We apologize for any confusion caused by the incomplete description. The gating of c-Kit is based on that shown by Umemoto et al (Figure EV1A) 2, who used 250 mg/kg 5-FU, so their c-Kit reduction is more pronounced than ours.

      We followed this study and compared c-Kit expression in the Lin-Sca-1+CD150+CD48-EPCR+ gates to BMMNCs on day 6 after 5-FU administration (150 mg/kg). The results are shown below.

      Author response image 21.

      Since the MFI of c-Kit was downregulated, we used gating that extended the c-Kit gate to lower expression regions on day 6 after 5-FU administration (revised Figure S1C).

      At other time points, LSK gating was the same as in the PBS-treated group, as noted in the Methods.

      The reason why the number of HSCs appears to be higher in the 5-FU group is because most of the differentiated blood cells were lost due to 5-FU administration and the same number of cells as in the PBS group were analyzed by FACS, resulting in a relatively higher number of HSCs. The legend of Figure S1 shows that the number of HSCs in both the PBS and 5-FU groups appeared to increase because the same number of BMMNCs was obtained at the time of analysis (page S22, lines 596-598).

      Regarding cellular heterogeneity, from a metabolic point of view, the heterogeneity in HSCs is rather reduced by 5-FU administration. As shown in Figure S3A–C, this is simulated under stress conditions, such as after 5-FU administration or during OXPHOS inhibition, where the flux variability in each enzymatic reaction is significantly reduced. GO-ATeam2 analysis after 5-FU treatment showed no increase in cell population variability. After 2-DG treatment, ATP concentrations in HSCs were widely distributed from 0 mM to 0.8 mM in the PBS group, while more than 80% of those in the 5-FU group were less than 0.4 mM (Figures 4B, D). HSCs may have a certain metabolic diversity at a steady state, but under stress conditions, they may switch to a more specialized metabolism with less cellular heterogeneity in order to adapt.

      1. S2 does not show major differences before and after sorting. However, a key metabolite like Lactate is decreased, which is also one of the most present. Wouldn't that mean that HSCs once they move out from the hypoxic niche, they decrease lactate production? Do they decrease anaerobic glycolysis? How can quiescent HSC mostly rely on OXPHOS being located in hypoxic niche?

      2. Since HSCs in the niche are located in hypoxic regions of the bone marrow, would that not mimic OxPhos inhibition (oligomycin)? Would that not mean that HSCs in the niche are more glycolytic (anaerobic glycolysis)?

      3. In Figure 5B, the orange line (Glucose+OXPHOS inhibition) remains stable, which means HSCs prefer to use glycolysis when OXPHOS is inhibited. Which metabolic pathway would HSCs use under hypoxic conditions? As HSCs resides in hypoxic niche, does it mean that these steady-state HSCs prefer to use glycolysis for ATP production? As mentioned before, mitochondrial inhibition can be comparable at the in vivo condition of the niche, where low pO2 will "inhibit" mitochondria metabolism.

      Thank you for the first half of comment 2 on the technical features of our approach. First, as you have pointed out, there is minimal variation and stable detection of many metabolites before and after sorting (Figure S2A), suggesting that isolation from the hypoxic niche and sorting stress do not significantly alter metabolite detection performance. This is consistent with a previous report by Jun et al. 10. Meanwhile, lactate levels decreased by sorting. Therefore, if the activity of anaerobic glycolysis was suppressed in stressed HSCs, it may be difficult to detect these metabolic changes with our tracer analysis. However, in this study, several glycolytic metabolites, including an increase in lactate, were detected in HSCs from 5-FU-treated mice compared with HSCs from PBS-treated mice that were similarly sorted and prepared, suggesting an increase in glycolytic activity. In other words, we may have been fortunate to detect the stress-induced activation of the glycolytic system beyond the characteristic of our analysis system that lactate levels tend to appear lower than they are. Given that damage to the bone marrow hematopoiesis tends to alleviate the low-oxygen status of the niche 11, we postulate that this upregulated aerobic glycolysis arises intrinsically in HSCs rather than from external conditions.

      The second half of comment 2, and comments 7 and 10, are essential and overlapping comments and will be answered together. Although genetic analyses have shown that HSCs produce ATP by anaerobic glycolysis in low-oxygen environments 9,12, our GO-ATeam2 analysis in this study confirmed that HSCs also generate ATP via mitochondria. This is also supported by Ansó's prior findings where the knockout of the Rieske iron–sulfur protein (RISP), a constituent of the mitochondrial electron transport chain, impairs adult HSC quiescence and bone marrow repopulation 13. Bone marrow is a physiologically hypoxic environment (9.9–32.0 mmHg 11). However, the p50 value of mitochondria (the partial pressure of oxygen at which respiration is half maximal) is below 0.1% oxygen concentration at atmospheric pressure (less than 1 mmHg) 7. This suggests that OXPHOS can retain sufficient activity even under physiologically hypoxic conditions. We are currently initiating efforts to discern ATP concentrations in vivo within the bone marrow under physiological hypoxia. This will be reported in a separate project in the future. Admittedly, when we began this research, we did not anticipate the significant mitochondrial reliance of HSCs. As we previously reported, the metabolic uncoupling of glycolysis and mitochondria 12 may enable HSCs to activate only glycolysis, and not mitochondria, under stress conditions such as post-5-FU administration, suggesting a unique metabolic trait of HSCs. We have included these technical limitations and perspectives in the Discussion section (page 17, lines 520-523).

      1. The authors performed challenging experiments to track radiolabeled glucose, which are quite remarkable. However, the data do not fully support the conclusions. Mitochondrial metabolism in HSCs can be supported by fatty acid and glutamate, thus authors should track the fate of other energy sources to fully discriminate the glycolysis vs mito-metabolism dependency. From the data on S2 and Fig1 1C-F, the authors can conclude that upon 5FU treatment HSCs increase glycolytic rate.

      2. FIG.2B-C: Increase of Glycolysis upon oligomycin treatment is common in many different cell types. As explained before, other radiolabeled substrates should be used to understand the real effect on mitochondria metabolism.

      Thank you for your suggestion. While this is essential for future studies, we believe it is not the primary focus of the paper. We are planning another research project on tracer analysis using labeled fatty acids and glutamates, which we will report on in the near future. We have clearly stated in the Abstract and Introduction of the revised manuscript that the focus of this study is on changes in glucose metabolism when HSCs are stressed (page 3, line 75 and 87, page 5, lines 135).

      Instead, we have added the following analyses of metabolic changes in fatty acids and glutamate using the GO-ATeam2 system: HSCs derived from GO-ATeam2 mice treated with PBS or 5-FU were used to measure changes in ATP concentrations after exposure to the fatty acid beta-oxidation (FAO) inhibitor etomoxir and the glutaminolysis inhibitor 6-diazo-5-oxo-L-norleucine (DON). Etomoxir was used at 100 µM, a concentration that inhibits FAO without inhibiting mitochondrial electron transfer complex I, as previously reported 5. DON was used at 2 mM, a concentration that sufficiently inhibits the enzyme as the Ki for glutaminase is 6 µM. In this experiment, etomoxir alone, DON alone, or etomoxir and DON in combination did not decrease the ATP concentration of HSCs in the PBS and 5-FU groups (revised Figures S7J–M), suggesting that FAO and glutaminolysis were not essential for ATP production in HSCs in the short term. Thus, according to the analysis using the GO-Ateam2 system, HSCs exposed to acute stresses change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. Since there are reports that FAO and glutaminolysis are required for HSC maintenance in the long term 5,6, compensatory pathways may be able to maintain ATP levels in the short term. A description of these points has been added to the Discussion (page 17, lines 525-527).

      Author response image 22.

      Fatty acid β-oxidation activity was also measured in 5-FU-treated HSCs using the fluorescent probe FAOBlue and was unchanged compared to PBS-treated HSCs (revised Figure S7N).

      Author response image 23.

      Notably, the addition of 100 µM etomoxir plus glucose and Pfkfb3 inhibitors resulted in a rapid decrease in ATP concentration in HSCs (revised Figures S7O–P). This indicates that etomoxir partially mimics the effect of oligomycin, suggesting that at a steady state, OXPHOS is driven by FAO, but can be compensated by the acceleration of the glycolytic system by Pfkfb3. Meanwhile, the exposure of HSCs to Pfkfb3 inhibitors in addition to 2 mM DON did not reduce ATP (revised Figures S7O–P). This suggests that ATP production from glutaminolysis is limited in HSCs at a steady state.

      Author response image 24.

      These points suggest that OXPHOS is driven by fatty acids at a steady state, but unlike the glycolytic system, FAO is not further activated by HSCs after 5-FU treatment. The results of these analyses and related descriptions are included in the revised manuscript (page 11, lines 332-344).

      1. In Figure S1, 5-FU leads to the induction of cycling HSCs and in figure 1, 5-FU results in higher activation of glycolysis. Would it be possible to correlate these two phenotypes together? For example, by sorting NBDG+ cells and checking the cell cycle status of these cells?

      We appreciate the reviewer’s insightful comments. We administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than HSCs with low 2-NBDG uptake and were comparable to HSCs after 5-FU treatment, although the overall population of HSCs that exited the G0 phase and entered the G1 phase increased after 5-FU treatment. In both PBS/5-FU-treated groups, these profound differences in cell cycle glucose utilization suggest a direct link between HSC proliferation and glycolysis activation. Descriptions of the above findings and perspectives have been added to the Results and Discussion section (page 7, lines 208-214, page 20, lines 607-610).

      Author response image 25.

      1. Why are only ECAR measurements (and not OCR measurements) shown? In Fig.2G, why are HSCs compared with cKit+ myeloid progenitors, and not with MPP1? The ECAR increased observed in HSC upon oligomycin treatment is shared with many other types of cells. However, cKit+ cells have a weird behavior. Upon oligo treatment cKit+ cells decrease ECAR, which is quite unusual. The data of both HSCs and cKit+ cells could be clarified by adding OCR curves. Moreover, it is recommended to run glycolysis stress test profile to assess the dependency to glycolysis (Glucose, Oligomycin, 2DG).

      In addition to the ECAR data of the Mito Stress test (original Figures 2G–H), OCR data were added in the revised manuscript (revised Figures 2H, S3D). Compared to c-Kit+ myeloid progenitors (LKS- cells), HSC exhibited a similar increase in ECAR, while the decrease in OCR was relatively limited. This may be because glycolytic and mitochondrial metabolism are coupled in c-Kit+ myeloid progenitors, whereas they are decoupled in HSCs. This is also suggested by the glucose plus oligomycin experiment in Figures 5B, C, and S6A–D (orange lines). In summary, in HSCs, glycolytic and mitochondrial ATP production are decoupled and can maintain ATP levels by glycolytic ATP production alone, whereas in progenitors including GMPs, the two ATP production systems are constantly coupled, and glycolysis alone cannot maintain the ATP concentration. While we could not conduct a glycolysis stress test, we believe that Pfkfb3-dependent glycolytic activation, which is evident in the oligomycin+glucose+Pfkfb3i experiment, is only apparent in HSCs when subjected to glucose+oligomycin treatment (original Figures 5F–I). We have added descriptions of these points in the Results and Discussion section (page 8, lines 240-243, page 18, lines 558-561).

      Author response image 26.

      FIG.3 A-C. As mentioned previously, the flux analyses should be integrated with data using other energy sources. If cycling HSCs are less dependent to OXPHOS, what happen if you inhibit OXHPHOS in 5-FU condition? Since the authors are linking OXPHOS inhibition and upregulation of Glycolysis to increase proliferation, do HSCs proliferate more when treated with oligomycin?

      First, please see our response to comments 3 and 5 regarding the first part of this comment about the flux analysis of other energy sources. According to the analysis using the GO-Ateam2 system, stressed HSCs change the efficiency of glucose utilization (accelerated glycolytic ATP production) rather than other energy sources. The change in ATP concentration after OXPHOS inhibition for 5-FU-treated HSCs is shown in Figures 4C and E, suggesting that the activity of OXPHOS itself does not increase. HSCs after oligomycin treatment and HSCs after 5-FU treatment are similar in that they activate glycolytic ATP production. However, inhibition of OXPHOS did not induce the proliferation of HSCs (original Figure S1K). This suggests that proliferation activates glycolysis and not that activation of the glycolytic system induces proliferation. This similarity and dissimilarity of glycolytic activation upon proliferation and OXPHOS inhibition is discussed in the Discussion section (page 16-17, lines 505-515).

      1. FIG.4 shows that in vivo administration of radiolabeled glucose especially marks metabolites of TCA cycle and Glycolysis. The authors interpret enhanced anaerobic glycolysis, but I am not sure this is correct; if more glycolysis products go in the TCA cycle, it might mean that HSC start engaging mitochondrial metabolism. What do the authors think about that?

      Thank you for pointing this out. We believe that the data are due to two differences in the experimental features between in vivo (Figure S5) and in vitro (Figures 1 and S2) tracer analysis. The first difference is that in in vivo tracer analysis, unlike in vitro, all cells can metabolize U-13C6-glucose. Another difference is that after glucose labeling in vivo, it takes approximately 120–180 minutes to purify HSCs to extract metabolites, and processing on ice may result in a gradual progression of metabolic reactions within HSCs. As a result, in vivo tracer analysis may detect an increased influx of labeled carbon derived from U-13C6-glucose into the TCA cycle over an extended period. However, it is difficult to interpret whether this influx of labeled carbon is derived from the direct influx of glycolysis or the re-uptake by HSCs of metabolites that have been metabolized to other metabolites in other cells. Meanwhile, as shown in Figure 4C using the GO-ATeam2 system, ATP production from mitochondria is not upregulated by 5-FU treatment. This suggests that even if the direct influx from glycolysis into the TCA cycle is increased, the rate of ATP production does not exceed that of glycolysis. Despite these technical caveats in interpretation, the results of in vivo and in vitro tracer analyses are considered essential. In particular, we consider the increased labeling of metabolites involved in glycolysis and nucleotide synthesis to be crucial. We have added a discussion of these points, including experimental limitations (page 17-18, lines 530-545).

      1. FIG.4: the experimental design is not clear. Are BMNNCs stained and then put in culture? Is it 6-day culture or BMNNCs are purified at day 6 post 5FU? FIG-4B-C The difference between PBS vs 5FU conditions are the most significant; however, the effect of oligomycin in both conditions is the most dramatic one. From this readout, it seems that HSCs are more dependent on mitochondria for energy production both upon 5FU treatment and in PBS conditions.

      We apologize for the incomplete description of the experimental details. The experiment involved dispensing freshly stained BMMNC with surface antigens into the medium and immediately subjecting them to flow cytometry analysis. For post-5-FU treatment HSCs, mice were administered with 5-FU (day 1), and freshly obtained BMMNCs were analyzed on day 6. The analysis of HSCs and progenitors was performed by gating each fraction within the BMMNC (original Figure S5A). We have added these details to ensure that readers can grasp these aspects more clearly (page S5, lines 155-158).

      As pointed out by the reviewer, we understand that HSCs produce more ATP through OXPHOS. However, ATP production by glycolysis, although limited, is observed under steady-state conditions (post-PBS treatment HSC), and its reliance increases during the proliferation phase (post-5-FU treatment HSC) (original Figures 4B, D). Until now, discussions on energy production in HSCs have focused on either glycolysis or mitochondrial functions. However, with the GO-ATeam2 system, it has become possible for the first time to compare their contributions to ATP production and evaluate compensatory pathways. As a result, it became evident that while OXPHOS is the main source of ATP production, the reliance on glycolysis plastically increases in response to stress. This has led to a better understanding of HSC metabolism. These points are included in the Discussion as well (page 16, lines 479-488).

      1. FIG.6H should be extended with cell cycle analyses. There are no differences between 5FU and ctrl groups. If 5FU induces HSCs cycling and increases glycolysis I would expect higher 2-NBDG uptake in the 5FU group. How do the authors explain this?

      Thank you for your comments. In the original Figure 6H, we found that 2-NBDG uptake correlated with mPFKFB3 levels in both the 5-FU and PBS groups. mPfkfb3 levels remained low in the few HSCs with low 2-NBDG uptake in the 5-FU group.

      In the revised manuscript, to directly relate glucose utilization to the cell cycle, we administered 2-NBDG to mice and fractionated HSCs at the 2-NBDG fluorescence level for cell cycle analysis. The results are shown below (revised Figure S1M). Notably, even in the PBS-treated group, HSCs with high 2-NBDG uptake were more proliferative than those with low 2-NBDG uptake and are comparable to HSCs after 5-FU treatment, although the overall population of HSCs that exited the G0 phase and entered the G1 phase increased after 5-FU treatment. The large differences in glucose utilization per cell cycle observed in both PBS/5-FU-treated groups suggest a direct link between HSC proliferation and glycolysis activation. Descriptions of the above findings have been added to the Results and Discussion ((page 7, lines 208-214, page 20, lines 607-610).

      Author response image 27.

      1. In S7 the experimental design is not clear. What are quiescent vs proliferative conditions? What does it mean "cell number of HSC-derived colony"? Is it a CFU assay? Then you should show colony numbers. When HSCs proliferate, they need more energy thus inhibition of metabolism will impact proliferation. What happens if you inhibit mitochondrial metabolism with oligomycin?

      Regarding the proliferative and quiescence-maintaining conditions, we have previously reported on these 8. In brief, these are culture conditions that maintain HSC activity at a high level while allowing for the proliferation or maintenance of HSCs in quiescence, achieved by culturing under fatty acid-rich, hypoxic conditions with either high or low cytokine concentrations. Analysis was performed after one week of culture, with the HSC number determined by flow cytometry based on the LSK-SLAM marker. While these are mentioned in the Methods section, we have added a description in the main text to highlight these points for the reader (page 7, lines 214-217).

      In vitro experiments with the oligomycin treatment of HSCs showed that OXPHOS inhibition activates the glycolytic system, but does not induce HSC proliferation (original Figure S1K). This suggests that proliferation activates glycolysis and not that activation of the glycolytic system induces proliferation. This similarity and dissimilarity of glycolytic activation upon proliferation and OXPHOS inhibition is discussed in the Discussion (page 16-17, lines 505-515).

      1. In FIG 7 since homing of HSCs is influenced by the cell cycle state, should be important to show if in the genetic model for PFKFB3 in HSCs there's a difference in homing efficiency.

      In response to the reviewer's comments, we knocked out PFKFB3 in HSPCs derived from Ubc-GFP mice, transplanted 200,000 HSPCs into recipients (C57BL/6 mice) post-8.5Gy irradiation, and harvested the bone marrow of recipients after 16 h to compare homing efficiency (revised Figure S10H). Even with the knockout of PFKFB3, no significant difference in homing efficiency was detected compared to the control group (Rosa knockout group). These results suggest that the short-term reduction in chimerism due to PFKFB3 knockout is not due to decreased homing efficiency or cell death by apoptosis (Figure 7K) but a transient delay in cell cycle progression. We have added descriptions regarding these findings in the Results and Discussion sections (page 15, lines 470-471, page 19, lines 576-578).

      Author response image 28.

      1. Yamamoto M, Kim M, Imai H, Itakura Y, Ohtsuki G. Microglia-Triggered Plasticity of Intrinsic Excitability Modulates Psychomotor Behaviors in Acute Cerebellar Inflammation. Cell Rep. 2019;28(11):2923-2938 e2928.

      2. Umemoto T, Johansson A, Ahmad SAI, et al. ATP citrate lyase controls hematopoietic stem cell fate and supports bone marrow regeneration. EMBO J. 2022:e109463.

      3. Seita J, Sahoo D, Rossi DJ, et al. Gene Expression Commons: an open platform for absolute gene expression profiling. PLoS One. 2012;7(7):e40321.

      4. Boyd S, Brookfield JL, Critchlow SE, et al. Structure-Based Design of Potent and Selective Inhibitors of the Metabolic Kinase PFKFB3. J Med Chem. 2015;58(8):3611-3625.

      5. Ito K, Carracedo A, Weiss D, et al. A PML–PPAR-δ pathway for fatty acid oxidation regulates hematopoietic stem cell maintenance. Nat Med. 2012;18(9):1350-1358.

      6. Oburoglu L, Tardito S, Fritz V, et al. Glucose and glutamine metabolism regulate human hematopoietic stem cell lineage specification. Cell Stem Cell. 2014;15(2):169-184.

      7. Gnaiger E, Mendez G, Hand SC. High phosphorylation efficiency and depression of uncoupled respiration in mitochondria under hypoxia. Proc Natl Acad Sci U S A. 2000;97(20):11080-11085.

      8. Kobayashi H, Morikawa T, Okinaga A, et al. Environmental Optimization Enables Maintenance of Quiescent Hematopoietic Stem Cells Ex Vivo. Cell Rep. 2019;28(1):145-158 e149.

      9. Wang YH, Israelsen WJ, Lee D, et al. Cell-state-specific metabolic dependency in hematopoiesis and leukemogenesis. Cell. 2014;158(6):1309-1323.

      10. Jun S, Mahesula S, Mathews TP, et al. The requirement for pyruvate dehydrogenase in leukemogenesis depends on cell lineage. Cell Metab. 2021;33(9):1777-1792 e1778.

      11. Spencer JA, Ferraro F, Roussakis E, et al. Direct measurement of local oxygen concentration in the bone marrow of live animals. Nature. 2014;508(7495):269-273.

      12. Takubo K, Nagamatsu G, Kobayashi CI, et al. Regulation of glycolysis by Pdk functions as a metabolic checkpoint for cell cycle quiescence in hematopoietic stem cells. Cell Stem Cell. 2013;12(1):49-61.

      13. Anso E, Weinberg SE, Diebold LP, et al. The mitochondrial respiratory chain is essential for haematopoietic stem cell function. Nat Cell Biol. 2017;19(6):614-625.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is an interesting study investigating the mechanisms underlying membrane targeting of the NLRP3 inflammasome and reporting a key role for the palmitoylation-depalmitoylation cycle of cys130 in NRLP3. The authors identify ZDHHC3 and APT2 as the specific ZDHHC and APT/ABHD enzymes that are responsible for the s-acylation and de-acylation of NLRP3, respectively. They show that the levels of ZDHHC3 and APT2, both localized at the Golgi, control the level of palmitoylation of NLRP3. The S-acylation-mediated membrane targeting of NLRP3 cooperates with polybasic domain (PBD)-mediated PI4P-binding to target NLRP3 to the TGN under steady-state conditions and to the disassembled TGN induced by the NLRP3 activator nigericin.

      However, the study has several weaknesses in its current form as outlined below.

      (1) The novelty of the findings concerning cys130 palmitoylation in NLRP3 is unfortunately compromised by recent reports on the acylation of different cysteines in NLRP3 (PMID: 38092000), including palmitoylation of the very same cys130 in NLRP3 (Yu et al https://doi.org/10.1101/2023.11.07.566005), which was shown to be relevant for NLRP3 activation in cell and animal models. What remains novel and intriguing is the finding that NLRP3 activators induce an imbalance in the acylation-deacylation cycle by segregating NLRP3 in late Golgi/endosomes from de-acylating enzymes confined in the Golgi. The interesting hypothesis put forward by the authors is that the increased palmitoylation of cys130 would finally contribute to the activation of NLRP3. However, the authors should clarify the trafficking pathway of acylated-NLRP3. This pathway should, in principle, coincide with that of TGN46 which constitutively recycles from the TGN to the plasma membrane and is trapped in endosomes upon treatment with nigericin. 

      We think the data presented in our manuscript are consistent with the majority of S-acylated NLRP3 remaining on the Golgi via S-acylation in both untreated and nigericin treated cells. We have performed an experiment with BrefeldinA (BFA), a fungal metabolite that disassembles the Golgi without causing dissolution of early endosomes, that further supports the conclusion that NLRP3 predominantly resides on Golgi membranes pre and post activation. Treatment of cells with BFA prevents recruitment of NLRP3 to the Golgi in untreated cells and blocks the accumulation of NLRP3 on the structures seen in the perinuclear area after nigericin treatment (see new Supplementary Figure 4A-D). We do see some overlap of NLRP3 signal with TGN46 in the perinuclear area after nigericin treatment (see new Supplementary Figure 2E), however this likely represents TGN46 at the Golgi rather than endosomes given that the NLRP3 signal in this area is BFA sensitive.  As with 2-BP and GFP-NLRP3C130S, GFP-NLRP3 spots also form in BFA / nigericin co-treated cells but not with untagged NLRP3. These spots also do not show any co-localisation with EEA1, suggesting that under these conditions, endosomes don’t appear to represent a secondary site of NLRP3 recruitment in the absence of an intact Golgi. However, we cannot completely rule out that some NLRP3 may recruited to endosomes at some point during its activation.

      (2) To affect the S-acylation, the authors used 16 hrs treatment with 2-bromopalmitate (2BP). In Figure 1f, it is quite clear that NLRP3 in 2-BP treated cells completely redistributed in spots dispersed throughout the cells upon nigericin treatment. What is the Golgi like in those cells? In other words, does 2-BP alter/affect Golgi morphology? What about PI4P levels after 2-BP treatment? These are important missing pieces of data since both the localization of many proteins and the activity of one key PI4K in the Golgi (i.e. PI4KIIalpha) are regulated by palmitoylation.

      We thank the reviewer for highlighting this point and agree that it is possible the observed loss of NLRP3 from the Golgi might be due to an adverse effect of 2-BP on Golgi morphology or PI4P levels. We have tested the effect of 2-BP on the Golgi markers GM130, p230 and TGN46. 2BP has marginal effects on Golgi morphology with cis, trans and TGN markers all present at similar levels to untreated control cells (Supplementary Figure 2B-D). We also tested the effect of 2-BP on PI4P levels using mCherry-P4M, a PI4P biosensor. Surprisingly, as noted by the reviewer, despite recruitment of PI4K2A being dependent on S-acylation, PI4P was still present on the Golgi after 2-BP treatment, suggesting that a reduction in Golgi PI4P levels does not underly loss of NLRP3 from the Golgi (Supplementary Figure 2A). The pool of PI4P still present on the Golgi following 2-BP treatment is likely generated by other PI4K enzymes that localise to the Golgi independently of S-acylation, such as PI4KIIIB. We have included this data in our manuscript as part of a new Supplementary Figure 2. 

      (3) The authors argue that the spots observed with NLRP-GFP result from non-specific effects mediated by the addition of the GFP tag to the NLRP3 protein. However, puncta are visible upon nigericin treatment, as a hallmark of endosomal activation. How do the authors reconcile these data? Along the same lines, the NLRP3-C130S mutant behaves similarly to wt NLRP3 upon 2-BP treatment (Figure 1h). Are those NLRP3-C130S puncta positive for endosomal markers? Are they still positive for TGN46? Are they positive for PI4P?

      This is a fair point given the literature showing overlap of NLRP3 puncta formed in response to nigericin with endosomal markers and the similarity of the structures we see in terms of size and distribution to endosomes after 2BP + nigericin treatment. We have tested whether these puncta overlap with EEA1, TGN46 or PI4P (Supplementary Figure 2A, E-G). The vast majority of spots formed by GFP-NLRP3 co-treated with 2-BP and nigericin do not co-localise with EEA1, TGN46 or PI4P. This is consistent with these spots potentially being an artifact, although it has recently been shown that human NLRP3 unable to bind to the Golgi can still respond to nigericin (Mateo-Tórtola et al., 2023). These puncta might represent a conformational change cytosolic NLRP3 undergoes in response to stimulation, although our results suggest that this doesn’t appear to happen on endosomes.

      (4) The authors expressed the minimal NLRP3 region to identify the domain required for NLRP3 Golgi localization. These experiments were performed in control cells. It might be informative to perform the same experiments upon nigericin treatment to investigate the ability of NLRP3 to recognize activating signals. It has been reported that PI4P increases on Golgi and endosomes upon NG treatment. Hence, all the differences between the domains may be lost or preserved. In parallel, also the timing of such recruitment upon nigericin treatment (early or late event) may be informative for the dynamics of the process and of the contribution of the single protein domains.

      This is an interesting point which we thank the reviewer for highlighting. However, we think that each domain on its own is not capable of responding to nigericin as shown by the effect of mutations in helix115-125 or the PB region in the full-length NLRP3 protein. NLRP3HF, which still contains a functional PB region, isn’t capable of responding to nigericin in the same way as wild type NLRP3 (Supplementary Figure 6C-D). Similarly, mutations in the PB region of full length NLRP3 that leave helix115-125 intact show that helix115-125 is not sufficient to allow enhanced recruitment of NLRP3 to Golgi membranes after nigericin treatment (Supplementary Figure 9A). We speculate that helix115-125, the PB region and the LRR domain all need to be present to provide maximum affinity of NLRP3 for the Golgi prior to encounter with and S-acylation by ZDHHC3/7. Mutation or loss of any one of the PB region, helix115-125 or the LRR lowers NLRP3 membrane affinity, which is reflected by reduced levels of NLRP3 captured on the Golgi by S-acylation at steady state and in response to nigericin. 

      (5) As noted above for the chemical inhibitors (1) the authors should check the impact of altering the balance between acyl transferase and de-acylases on the Golgi organization and PI4P levels. What is the effect of overexpressing PATs on Golgi functions?

      We have checked the effect of APT2 overexpression on Golgi morphology and can show that it has no noticeable effect, ruling out an impact of APT on Golgi integrity as the reason for loss of NLRP3 from the Golgi in the presence of overexpressed APT2. We have included these images as Supplementary Figure 11H-J. 

      It is plausible that the effects of ZDHHC3 or ZDHHC7 on enhanced recruitment of NLRP3 to the Golgi may be via an effect on PI4P levels since, as mentioned above, both enzymes are involved in recruitment of PI4K2A to the Golgi and have previously been shown to enhance levels of PI4K2A and PI4P on the Golgi when overexpressed (Kutchukian et al., 2021). However, NLRP3 mutants with most of the charge removed from the PB region, which are presumably unable to interact with PI4P or other negatively charged lipids, are still capable of being recruited to the Golgi by excess ZDHHC3. This would suggest that the effect of overexpressed ZDHHC3 on NLRP3 is largely independent of changes in PI4P levels on the Golgi and instead driven by helix115-125 and S-acylation at Cys-130. The latter point is supported by the observation that NLRP3HF and NLRP3Cys130 are insensitive to ZDHHC3 overexpression.

      At the levels of HA-ZDHHC3 used in our experiments with NLRP3 (200ng pEF-Bos-HAZDHHC3 / c.a. 180,000 cells) we don’t see any adverse effect on Golgi morphology (Author response image 1), although it has been noted previously by others that higher levels of ZDHHC3 can have an impact on TGN46 (Ernst et al., 2018). ZDHHC3 overexpression surprisingly has no adverse effects on Golgi function and in fact enhances secretion from the Golgi (Ernst et al., 2018).  

      Author response image 1.

      Overexpression of HA-ZDHHC3 does not impact Golgi morphology. A) Representative confocal micrographs of HeLaM cells transfected with 200 ng HA-ZDHHC3 fixed and stained with antibodies to STX5 or TGN46. Scale bars = 10 µm. 

      Reviewer #2 (Public Review):

      Summary:

      This paper examines the recruitment of the inflammasome seeding pattern recognition receptor NLRP3 to the Golgi. Previously, electrostatic interactions between the polybasic region of NLRP3 and negatively charged lipids were implicated in membrane association. The current study reports that reversible S-acylation of the conserved Cys-130 residue, in conjunction with upstream hydrophobic residues plus the polybasic region, act together to promote Golgi localization of NLRP3, although additional parts of the protein are needed for full Golgi localization. Treatment with the bacterial ionophore nigericin inhibits membrane traffic and prevents Golgi-associated thioesterases from removing the acyl chain, causing NLRP3 to become immobilized at the Golgi. This mechanism is put forth as an explanation for how NLRP3 is activated in response to nigericin.

      Strengths:

      The experiments are generally well presented. It seems likely that Cys-130 does indeed play a previously unappreciated role in the membrane association of NLRP3.

      Weaknesses:

      The interpretations about the effects of nigericin are less convincing. Specific comments follow.

      (1) The experiments of Figure 4 bring into question whether Cys-130 is S-acylated. For Cys130, S-acylation was seen only upon expression of a severely truncated piece of the protein in conjunction with overexpression of ZDHHC3. How do the authors reconcile this result with the rest of the story?

      Providing direct evidence of S-acylation at Cys-130 in the full-length protein proved difficult. We attempted to detect S-acylation of this residue by mass spectrometry. However, the presence of the PB region and multiple lysines / arginines directly after Cys-130 made this approach technically challenging and we were unable to convincingly detect S-acylation at Cys-130 by M/S. However, Cys-130 is clearly important for membrane recruitment as its mutation abolishes the localisation of NLRP3 to the Golgi. It is feasible that it is the hydrophobic nature of the cysteine residue itself which supports localisation to the Golgi, rather than S-acylation of Cys-130. A similar role for cysteine residues present in SNAP-25 has been reported (Greaves et al., 2009). However, the rest of our data are consistent with Cys-130 in NLRP3 being S-acylated. We also refer to another recently published study which provides additional biochemical evidence that mutation of Cys-130 impacts the overall levels of NLRP3 S-acylation (Yu et al., 2024). 

      (2) Nigericin seems to cause fragmentation and vesiculation of the Golgi. That effect complicates the interpretations. For example, the FRAP experiment of Figure 5 is problematic because the authors neglected to show that the FRAP recovery kinetics of nonacylated resident Golgi proteins are unaffected by nigericin. Similarly, the colocalization analysis in Figure 6 is less than persuasive when considering that nigericin significantly alters Golgi structure and could indirectly affect colocalization. 

      We agree that it is likely that the behaviour of other Golgi resident proteins are altered by nigericin. This is in line with a recent proteomics study showing that nigericin alters the amount of Golgi resident proteins associated with the Golgi (Hollingsworth et al., 2024) and other work demonstrating that changes in organelle pH can influence the membrane on / off rates of Rab GTPases (Maxson et al., 2023). However, Golgi levels of other peripheral membrane proteins

      that associate with the Golgi through S-acylation, such as N-Ras, appear unaltered (Author response image 2.), indicating a degree of selectivity in the proteins affected. Our main point here is that NLRP3 is amongst those proteins whose behaviour on the Golgi is sensitive to nigericin and that this change in behaviour may be important to the NLRP3 activation process, although this requires further investigation and will form the basis of future studies. 

      The reduction in co-localisation between NLRP3 and APT2, due to alterations in Golgi organisation and trafficking, was the point we were trying to make with this figure, and we apologise if this was not clear. We think that the changes in Golgi structure and function caused by nigericin potentially affect the ability of APT2 to encounter NLRP3 and de-acylate it. We have added a new paragraph to the results section to hopefully explain this more clearly. We recognise that our results supporting this hypothesis are at present limited and we have toned down the language used in the results section to reflect the nature of these findings..  

      Author response image 2.

      S-acylated peripheral membrane proteins show differential sensitivity to nigericin. A) Representative confocal micrographs of HeLaM cells coexpressing GFP-NRas and an untagged NLRP3 construct. Cells were left untreated or treated with 10 µM nigericin for 1 hour prior to fixation. Scale bars = 10 µm. B) Quantification of GFP-NRas or NLRP3 signal in the perinuclear region of cells treated with or without nigericin

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Does overnight 2-BP treatment potentially have indirect effects that could prevent NLRP3 recruitment? It would be useful here to show some sort of control confirming that the cells are not broadly perturbed.

      Please see our response to point (2) raised by reviewer #1 which is along similar lines. 

      (2) In Figure 5, "Veh" presumably is short for "Vehicle". This term should be defined in the legend.

      We have now corrected this.

      References

      Ernst, A.M., S.A. Syed, O. Zaki, F. Bottanelli, H. Zheng, M. Hacke, Z. Xi, F. Rivera-Molina, M. Graham, A.A. Rebane, P. Bjorkholm, D. Baddeley, D. Toomre, F. Pincet, and J.E. Rothman. 2018. SPalmitoylation Sorts Membrane Cargo for Anterograde Transport in the Golgi. Dev Cell. 47:479-493 e477.

      Greaves, J., G.R. Prescott, Y. Fukata, M. Fukata, C. Salaun, and L.H. Chamberlain. 2009. The hydrophobic cysteine-rich domain of SNAP25 couples with downstream residues to mediate membrane interactions and recognition by DHHC palmitoyl transferases. Mol Biol Cell. 20:1845-1854.

      Hollingsworth, L.R., P. Veeraraghavan, J.A. Paulo, J.W. Harper, and I. Rauch. 2024. Spatiotemporal proteomic profiling of cellular responses to NLRP3 agonists. bioRxiv.

      Kutchukian, C., O. Vivas, M. Casas, J.G. Jones, S.A. Tiscione, S. Simo, D.S. Ory, R.E. Dixon, and E.J. Dickson. 2021. NPC1 regulates the distribution of phosphatidylinositol 4-kinases at Golgi and lysosomal membranes. EMBO J. 40:e105990.

      Mateo-Tórtola, M., I.V. Hochheiser, J. Grga, J.S. Mueller, M. Geyer, A.N.R. Weber, and A. TapiaAbellán. 2023. Non-decameric NLRP3 forms an MTOC-independent inflammasome. bioRxiv:2023.2007.2007.548075.

      Maxson, M.E., K.K. Huynh, and S. Grinstein. 2023. Endocytosis is regulated through the pHdependent phosphorylation of Rab GTPases by Parkinson’s kinase LRRK2. bioRxiv:2023.2002.2015.528749.

      Yu, T., D. Hou, J. Zhao, X. Lu, W.K. Greentree, Q. Zhao, M. Yang, D.G. Conde, M.E. Linder, and H. Lin. 2024. NLRP3 Cys126 palmitoylation by ZDHHC7 promotes inflammasome activation. Cell Rep. 43:114070.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers are concerned that our lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble.

      In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data required further discussion and documentation which we have provided in the revised version of the manuscript as is described in the following.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We have included these and other signal-to-noise metrics for each experiment in the Results section of the revised manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R2 = 0.95, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR. We have included these discussion points in the Results section as well as scatter plots for replicate variant intensities within all three genetic backgrounds in Figure S3 of the revised manuscript.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree. Our findings suggest different mutations may not behave similarly, which we believe is a key finding of this work. We have emphasized this point in the Discussion section of the revised manuscript as follows:

      “These findings suggest the folding-mediated epistasis is likely to vary among different classes of destabilizing mutations in a manner that should also depend on folding efficiency and/ or the mechanism(s) of misfolding in the cell.”

      Some statistical aspects of the study could be improved:

      (1) It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in Figure S3 of the revised version of the manuscript.

      (2) The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We utilized paired Wilcoxon-Signed Rank Tests to evaluate the statistical significance of these observations and modified the description of these findings in the revised version of the results section as follows:

      “Variants bearing mutations within the C-terminal regions including ICL3, TMD6, and TMD7 fare consistently worse in the V276T background relative to WT (paired Wilcoxon-Signed Rank Test p-values of 0.0001, 0.02, and 0.005, respectively) (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form ICL2, TMD4, and ECL2 (paired Wilcoxon-Signed Rank Test p-values of 0.0005, 0.0001, and 0.004, respectively) (Fig. 4 C & F).”

      (3) The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We thank the reviewer for this reasonable suggestion. In the revised manuscript, we included the results of a paired Wilcoxon-Signed Rank Test that confirms the statistical significance of this observation and modified the Results section to reflect this as follows:

      “Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD, Fisher’s Exact Test p = 0.0019). These findings suggest random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that depends on the specific folding defect (V276T vs. W107A) and topological context.”

      Reviewer #1 (Recommendations for the Authors):

      As far as this reviewer is aware, the effect of the V267T variant on MP insertion has not been measured directly; its position corresponds to T277 in TMD6 of human GnRHR that has been measured for TM insertion, but given the clear lack of conservation (threonine vs valine) the mutation in TM6 could potentially have a different impact on the mouse homologue. Please clarify what the predicted delta TM for insertion is between human and mouse GnRHR is? Moreover, I would argue that single TM insertion by tethering to Lep is insufficient to understand MP insertion/folding, as neighbouring TM helices could help to drive TM6 insertion. Has ER microsome experiments for mouse GnRHR also been carried out in the context of neighbouring helices?

      We included measurements (and predictions) of the impact of the V276T substitution on the translocon-mediated membrane integration of the mouse TMD6 in the context of a chimeric Lep protein (see Fig. S1 & Table S1). Our results reveal that this substitution decreases the efficiency of TMD6 membrane integration by ~10%. Though imperfect, this prevailing biochemical assay remains popular for a variety of theoretical and technical reasons. Importantly, extensive experimental testing of this system has shown that these measurements report apparent equilibrium constants that are well-described by two-state equilibrium partitioning models (see DOIs 10.1038/nature03216 and 10.1038/nature06387). This observation provides a reasonable rationale to interpret these measurements using energetic models as we have in this work (see Table S1). From a technical perspective, the Lep system is also advantageous due to the fact that this protein is generally well expressed in the context of in vitro translation systems containing native membranes, which generally ensures a consistent signal to noise and dynamic range for membrane integration measurements. Nevertheless, the reviewers are correct that membrane integration efficiencies are likely distinct in the context of the native mGnRHR protein. For these reasons, we attempted to develop a glycosylation-based topology reporter prior to the posting and submission of this manuscript. However, all GnRHR reporters we tested were poorly expressed in vitro and the resulting 35S-labeled proteins only generated faint smears on our phosphorimaging screens that could not be interpreted. For these reasons, we chose to rely the Lep measurements for these investigations.

      The lack of a more relevant topological reporter is one of many challenges we faced in our investigations of this unstable, poorly behaved protein. We share the reviewer’s frustrations concerning the speculative aspects of this work. Nevertheless, there is increasing appreciation for the fact that our perspectives on protein biophysics have been skewed by our continuing choice to focus on the relatively small set of model proteins that are compatible with our favored methodologies (doi: 10.1016/j.tibs.2013.05.001). We humbly suggest this work represents an example of how we can gain a deeper understanding of the limits of biochemical systems when we instead choose to study the unsavory bits of cellular proteomes. But this choice requires a willingness to make some reasonable assumptions and to lean on energetic/ structural modeling from time to time. Despite this limitation, we believe there is still tremendous value in this compromise.

      What is the experimental evidence the W107A variant affects the protein structure? Has its melting temperature with and without inverse agonist binding for WT vs the W107A variant been measured, for example? Even heat-FSEC of detergent-solubilised membranes would be informative to know how unstable the W107A variant is. If is very unstable in detergent, then it could be that recovery mutants are going to be unlikely as you are already starting with a poor construct showing poor folding/localisation.

      We again understand the rationale for this concern, but do not believe that thermal melting measurements are likely to report the same sorts of conformational transitions involved in cellular misfolding. Heating up a protein to the point in which membranes (or micelles) are disrupted and the proteins begin to form insoluble aggregates is a distinct physical process from those that occur during co- and post-translational folding within intact ER membranes at physiological temperatures (discussed further in the Response to the Reviews). Indeed, as the reviewer points out below, there seems to be little evidence that secretion is linked to thermal stability or various other metrics that others have attempted to optimize for the sake of purification and/ or structural characterization. Thus, we believe it would be just as speculative to suggest thermal aggregation represents a relevant metric for the propensity of membrane proteins to fold in the cell. The physical interpretation of membrane protein misfolding reaction remains contentious in our field due to the key fact that the denatured states of helical membrane proteins remain highly structured in a manner that is hard to generalize beyond the fact that the denatured states retain α-helical secondary structure (doi: 10.1146/annurev-biophys-051013-022926). This is in stark contrast to soluble proteins, where random coil reference states have proven to be generally useful for energetic interpretations of protein stability. For reference, our lab is currently working to leverage epistatic measurements like this to map the prevailing physiological denatured states of an integral membrane protein. Our current findings suggest that non-native electrostatic interactions form in the context of misfolded states. We hope that more information on the structural aspects of these states will help us to develop and interpret meaningful folding measurements within the membrane.

      For reference, even in cases when quantitative folding measurements can be achieved, their relevance remains actively debated. As a point of reference, the corresponding author of this work previously worked on the stability and misfolding of another human α-helical membrane protein (PMP22). Like GnRHR, PMP22 is prone to misfolding in the secretory pathway and is associated with dozens of pathogenic mutations that cause protein misfolding. To understand how the thermodynamic stability of this protein is linked to secretion, the corresponding author purified PMP22, reconstituted it into n-Dodecyl-phosphocholine (DPC) micelles, and measured its resistance to denaturation by an anionic denaturing detergent (Lauryl Sarcosine, LS). The results were initially perplexing due to the fact that equilibrium unfolding curves manifested as an exponential decay (rather than a sigmoid) and relaxation kinetics appeared to be dominated by the rate constant for unfolding (doi: 10.1021/bi301635f). Unfortunately, these data could not be fit with existing folding models due to the lack of a folded protein baseline and the absence of a folding arm in the chevron plot. We eventually found that a full sigmoidal unfolding transition and refolding kinetics could be measured upon addition of 15% (v/v) glycerol. Our measurements revealed that the free energy of unfolding in DPC micelles was 0 kcal/ mol (without glycerol). This shocking lack of WT stability made it impossible to directly measure the effects of destabilizing mutations that enhance misfolding- you can’t measure the unfolding of a protein that is already unfolded. We ultimately had to instead infer the energetic effects of such mutations from the thermodynamic coupling between cofactor binding and folding (doi: 10.1021/jacs.5b03743). Finally, after demonstrating the resulting ΔΔGs correlated with both cellular trafficking and disease phenotype, we still faced justified scrutiny about the relevance of these measurements due to the fact that they were carried out in micelles. For these reasons, we do not feel that additional biophysical measurements will add much to this work until more is understood about the nature of misfolding reactions in the membrane and how to effectively recapitulate it in vitro. We also note that PMP22 is secreted with 20% efficiency in mammalian cell lines, which is 20-fold more efficient than human GnRHR under similar conditions (doi: 10.1016/j.celrep.2021.110046). Thus, we suspect equilibrium unfolding measurements are likely out of reach using previously described measurements.

      Our greatest evidence suggesting W107A destabilizes the protein has to do with the fact that it deletes a highly conserved structural contact and that this structural modification kills its secretion. The fact that this mutation clearly reduces the escape of GnRHR from ER quality control is a classic indicator of misfolding that represents the cell’s way of telling us that the mutation compromises the folding of the nascent protein in some way or another. Precisely how this mutation remodels the nascent conformational ensemble of nascent GnRHR and how this relates to the free energy difference between the native and non-native portions of its conformational ensemble under cellular conditions is a much more challenging question that lies beyond the scope of this investigation (and likely beyond the scope of what’s currently possible). Indeed, there is an entire field dedicated to understanding such. Nevertheless, the difference in the epistatic interactions formed by W107A and V276T is at the very least consistent with our speculative interpretation that these two mutations vary in their misfolding mechanism and/ or in the extent to which they destabilize the protein. For these reasons, we feel the main conclusions of this manuscript are well-justified.

      Please clarify if the protein is glycosylated or not and, if it is, how would this requirement affect the conclusions of your analysis?

      As we noted in the Response to the Reviewers, which also constitutes a published portion of the final manuscript, this protein is indeed glycosylated. We were well aware of this aspect of the protein since inception of this project and do not think this changes our interpretation at all. Most membrane proteins are glycosylated, and several groups have demonstrated in various ways that the secretion efficiency of glycoproteins is proportional to certain stability metrics for secreted soluble proteins and membrane proteins alike. Generally, mutations that enhance misfolding do not change the propensity of the nascent chain to undergo N-linked glycosylation, which occurs during translation before protein synthesis and/ or folding is complete. Misfolded proteins typically carry lower weight glycans, which reflects their failure to advance from the ER to the Golgi, where N-linked glycans are modified and O-linked glycans are added. From our perspective, glycosyl modifications just ensure that nascent proteins are engaged by calnexin and other lectin chaperones involved in QC. It does not decouple folding from secretion efficiency. In the case of PMP22 (described above), we found that removal of its glycosylation site allows the nascent protein to bypass the lectin chaperones in a manner that enhances its plasma membrane expression eight-fold (doi: 10.1016/j.jbc.2021.100719). Similar to WT, the expression of several misfolded PMP22 variants also significantly increases upon removal of the glycosylation site. Nevertheless, their expression is still significantly lower than the un-glycosylated WT protein, and the expression patterns of the mutants relative to WT was quite similar across this panel of un-glycosylated proteins. Thus, while glycosylation certainly impacts secretion, it does not change its dependence on folding efficiency within the ER. There are many layers of partially redundant QC within the ER, and it seems that folding imposes a key bottleneck to secretion regardless of which QC proteins are involved. For these reasons, we do not think glycosylation (or other PTMs) should factor into our interpretation of these results.

      One caveat with the study is that there is a poor understanding of the factors that decide if the protein should be trafficked to the PM or not. Even secretory proteins not going through the calnexin/reticulum cycle (as they have no N-linked glycans), might still get stuck in the ER, despite the fact they are functional. Could this be a technical issue of heterologous expression overloading the Sec system?

      While we agree that there is much to be learned about this topic, we disagree with the notion that our understanding of folding and secretion is insufficient to generally interpret the molecular basis of the observed trends. In collaboration with various other groups, the corresponding author of this paper has shown for several other proteins that the stability of the native topology and the native tertiary structure can constrain secretion efficiency (see dois: 10.1021/jacs.8b08243, 10.1021/jacs.5b03743, and 10.1016/j.jbc.2021.100423). Moreover, the Balch and Kelly groups demonstrated many years ago that relatively simple models for the coupling between folding and chaperone binding can recapitulate the observed effects of mutations on the secretion efficiency of various proteins (doi: 10.1016/j.cell.2007.10.025). Given a wide body of prevailing knowledge in this area, we believe it is entirely reasonable to assume that the conformational effects of these mutation have a dominant effect on plasma membrane expression.

      Whether or not some of the proteins retained in the ER are folded and/ or functional is an interesting question, but is outside the scope of this work. Various lines of evidence concerning approaches to rescue misfolded membrane proteins suggest many of these variants are likely to retain residual function once they escape the ER, which may suggest there are pockets of foldable/ folded proteins within the ER. But it seems generally clear that the efficiency of folding in the ER bottlenecks secretion regardless of whether or not the ER contains some fraction of folded/ functional protein. We note that it is certainly possible, if not likely, that secretion efficiency is likely to be higher at lower expression levels (doi: 10.1074/jbc.AC120.014940). However, the mutational scanning platform used in this work was designed such that all variants are expressed from an identical promoter at the same location within the genome. Thus, for the purposes of these investigations, we believe it is entirely fair to draw “apples-to-apples” comparisons of their relative effects on plasma membrane expression.

      Please see Francis Arnold's paper on this point and their mutagenesis library of the channelrhodopsin (https://www.pnas.org/doi/10.1073/pnas.1700269114), which further found that 20% of mutations improved WT trafficking. Some general comparisons to this paper might be informative.

      We agree that it may be interesting to compare the results from this paper to those in our own. Indeed, we find that 20% of the point mutations characterized herein also enhance the expression of WT mGnRHR, as mentioned in the Results section. However, we think it might be a bit premature to suggest this is a more general trend in light of the fact that the channelrhodopsins engineered in those studies were not of eukaryotic origin and have likely resulted from distinct evolutionary constraints. We ultimately decided against adding more on this to our already lengthy discussion in order to maintain focus on the mechanisms of epistasis.

      Chris Tate and others have shown that there is a high frequency of finding stabilising point mutations in GPCRs and this is the premise of the StAR technology used to thermostabilise GPCRs in the presence of different ligands, i.e. agonist vs inverse agonists. As far as I am aware, there is a poor correlation between expression levels and thermostability (measured by ligand binding to detergent-solubilised membranes). As such, it is possible that some of the mutants might be more stable than WT even though they have lower levels of PME.

      We believe the disconnect between thermostability and expression precisely speaks to our main point about the suitability of current membrane protein folding assays for the questions we address herein. The degradative activity of ER quality control has not necessarily selected for proteins that are resistant to thermal degradation and/ or are suitable for macromolecular crystallography. For this reason, it is often not so difficult to engineer proteins with enhanced thermal stability. We do not believe this disconnect signals that quality control is insensitive to protein folding and stability, but rather that it is more likely to recognize conformational defects that are distinct from those involved in thermal degradation and/ or aggregation. Indeed, recent work from the Fluman group, which builds on a wider body of previous observations, has shown that the exposure of polar groups within the membrane is a key factor that recruits degradation machinery (doi: 0.1101/2023.12.12.571171). It is hard to imagine that these sorts of conformational defects are the same as those involved in thermal aggregation.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe that by focusing more on the epistasis with V276T, and less on W107A, the paper could be strengthened significantly.

      We appreciate this sentiment. But we believe the comparison of these two mutants really drive home the point that destabilizing mutations are not equivalent with respect to the epistatic interactions they form.

      (2) In the abstract - please define the term epistasis in a simple way, to make it accessible to a general audience. For example - negative epistasis means that... this should be explicitly explained.

      We thank the reviewer for this suggestion. To meet eLife formatting, we had to cut down the abstract significantly. We simplified this as best we could in the following statement:

      “Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations.”

      We also define positive and negative epistasis in the results section as follows:

      “Positive Ɛ values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative Ɛ values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with Ɛ values near zero have additive effects on PME.”

      (3) The title is quite complex and might deter readers from outside the protein evolution field. Consider simplifying it.

      We thank the reviewer for this suggestion. We have simplified the title to the following:

      “Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants”

      (4) The paper could benefit from a simple figure explaining the different stages of membrane protein folding (stages 1+2) to make it more accessible to readers from outside the membrane protein field.

      This is a great suggestion. We incorporated a new schematic in the revised manuscript that outlines the nature of these processes (see Fig. 1A in the revised manuscript).

      (5) For the FACS-Seq experiment - it was not clear to me if and when all cells are pulled together. For example - are the 3 libraries mixed together already at the point of transfection, or are the transfected cells pulled together at any point before sorting? This could have some implications on batch effects and should, therefore, be explicitly mentioned in the main text.

      We thank the reviewer for this suggestion. We modified the description of the DNA library assembly to emphasize that the mutations were generated in the context of three mixed plasmid pools, which were then transfected into the cells and sorted independently:

      “We then generated a mixed array of mutagenic oligonucleotides that collectively encode this series of substitutions (Table S3) and used nicking mutagenesis to introduce these mutations into the V276T, W107A, and WT mGnRHR cDNAs (Medina-Cucurella et al., 2019), which produced three mixed plasmid pools.”

      (6) The following description in the text is quite confusing. It would be better to simplify it considerably or remove it: "scores (Ɛ) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods)."

      We thank the reviewer for this valuable feedback and have simplified the text as follows:

      “To compare epistatic trends in these libraries, we calculated epistasis scores (Ɛ) for the interactions that these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described epistasis model (product model, see Methods) (Olson et al. 2014).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths:

      Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses:

      The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      Our revision overhauls the interpretation of the results to prioritize the results we have high confidence in (specifically, PC 2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as PC 1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with R2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      (1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      (2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. Our revision is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferroni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In our revision, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      (3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision includes confidence intervals around ⍴signal for the PN PC2 OCT-MCH model, and for the ORN Brp-Short PC2 OCT-MCH model (lines 170-172, 238)

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the current revision addresses the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn:

      i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We worked to ensure such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and have revised the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure.

      Author response image 1.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when we were only able to image a small portion of the glomeruli.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements.

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements, i.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC 2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states – we have added these for the PN PC2 linear model (lines 170-172).

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we worked to ensure this is appropriately reflected in all word choice across the paper.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Detailed comments: Many of the problems can be identified starting from Figure 4, which summarizes the main claims. I will focus on that figure and its tributaries.

      Acknowledging that the strength of several of our inferences are weak compared to what we consider the main result (the relationship between PC2 of Ca++ and OCT-MCH preference),we have removed Figure 4. This makes the focus of the paper much clearer and appropriately puts focus on the results that have strong statistical support.

      (1) The process of "inferring" correlation among the unobserved latent states for neural sensitivity and behavioral bias is unconventional and risky. The larger the assumed noise linking the latent to the observed variables (i.e. the smaller r_b and r_c) the bigger the inferred correlation rho from a given observed correlation R^2_cb. In this situation, the value of the inferred rho becomes highly dependent on what model one assumes that links latent to observed states. But the specific model drawn in Fig 4 suppl 1 is just one of many possible guesses. For example, models with nonlinear interactions could produce different inference.

      We agree with the reviewer’s notes of caution. To be clear, we do not intend for this analysis to be the main takeaway of the paper and have revised it to make this clear. The signal we are most confident in is the simple correlation between measured Ca++ PC2 and measured behavior. We have added more careful language saying that the attempt to infer the correlation between latent signals is one attempt at describing the data generation process (lines 166-172), and one possible estimate of an “underlying” correlation.

      (2) If one still wanted to go through with this inference process and set confidence bounds on rho, one needs to include all the uncertainties. Here the authors only include uncertainty in the value of R^2_c,b and they peg that at +/-20% (Line 1367). In addition there is plenty of uncertainty associated also with R^2_c,c and R^2_b,b. This will propagate into a wider confidence interval on rho.

      We have replaced the arbitrary +/- 20% window with bootstrapping the pairs of (predicted preference by PN PC2, measured preference) points and getting a bootstrap distribution of R2c,b, which is, not surprisingly, considerably wider. Still, we think there is some value in this analysis as the 90% CI of 𝜌signal under this model is 0.24-0.95. That is, including uncertainty about the R2b,b and R2c,c in the model still implies a significant relationship between latent calcium and behavior signals.

      (2.1) The uncertainty in R^2_cb is much greater than +/-20%. Take for example the highest correlation quoted in Fig 4: R^2=0.23 in the top row of panel A. This relationship refers to Fig 1L. Based on bootstrapping from this data set, I find a 90% confidence interval of CI=[0.002, 0.527]. That's an uncertainty of -100/+140%, not +/-20%. Moreover, this correlation is due entirely to the lone outlier on the bottom left. Removing that single fly abolishes any correlation in the data (R^2=0.04, p>0.3). With that the correlation of rho=0.64, the second-largest effect in Fig 4, disappears.

      We acknowledge that removal of the outlier in Fig 1L abolishes the correlation between predicted and measured OCT-AIR preference. We have thus moved that subfigure to the supplement (now Figure 1 – figure supplement 10B), note that we do not have robust statistical support of ORN PC1 predicting OCT-AIR preference in the results (lines 177-178), and place our emphasis on PN PC2’s capacity to predict OCT-MCH preference throughout the text.

      (2.2) Similarly with the bottom line of Fig 4A, which relies on Fig 1M. With the data as plotted, the confidence interval on R^2 is CI=[0.007, 0.201], again an uncertainty of -100/+140%. There are two clear outlier points, and if one removes those, the correlation disappears entirely (R^2=0.06, p=0.09).

      We acknowledge that removal of the two outliers in Fig 1M between predicted and measured OCT-AIR preference abolishes the correlation. We have also moved that subfigure to the supplement (now Figure 1 – figure supplement 10F) and do not claim to have robust statistical support of PN PC1 predicting OCT-AIR preference.

      (2.3) Similarly, the correlation R^2_bb of behavior with itself is weak and comes with great uncertainty (Fig 1 Suppl 1, panels B-E). For example, panel D figures prominently in computing the large inferred correlation of 0.75 between PN responses and OCT-MCH choice (Line 171ff). That correlation is weak and has a very wide confidence interval CI=[0.018, 0.329]. This uncertainty about R^2_bb should be taken into account when computing the likelihood of rho.

      We now include bootstrapping of the 3 hour OCT-MCH persistence data in our inference of 𝜌signal.

      (2.4) The correlation R^2_cc for the empirical repeatability of Ca signals seems to be obtained by a different method. Fig 4 suppl 1 focuses on the repeatability of calcium recording at two different time points. But Line 625ff suggests the correlation R^2_cc=0.77 all derives from one time point. It is unclear how these are related.

      Because our calcium model predictors utilize principal components of the glomerulus-odor responses (the mean Δf/f in the odor presentation window), we compute R2c,c through adding variance explained along the PCs, up to the point in which the component-wise variance explained does not exceed that of shuffled data (lines 609-620 in Materials and Methods). In this revision we now bootstrap the calcium data on the level of individual flies to get a bootstrap distribution of R2c,c, and propagate the uncertainty forward in the inference of 𝜌signal.

      (2.5) To summarize, two of the key relationships in Fig 1 are due entirely to one or two outlier points. These should not even be used for further analysis, yet they underlie two of the claims in Fig 4. The other correlations are weak, and come with great uncertainty, as confirmed by resampling. Those uncertainties should be propagated through the inference procedure described in Fig 4. It seems possible that the result will be entirely uninformative, leaving rho with a confidence interval that spans the entire available range [0,1]. Until that analysis is done, the claims of neuron-to-behavior correlation in this manuscript are not convincing.

      It is important to note that we never thought our analysis of the relationship between latent behavior and calcium signals should be interpreted as the main finding. Instead, the observed correlation between measured behavior and calcium is the take-away result. Importantly, it is also conservative compared to the inferred latent relationship, which in our minds was always a “bonus” analysis. Our revisions are now focused on highlighting the correlations between measured signals that have strong statistical support.

      As a response to these specific concerns, we have propagated uncertainty in all R2’s (calcium-calcium, behavior-behavior, calcium-behavior) in our new inference for 𝜌signal, yielding a new median estimate for PN PC 2 underlying OCT-MCH preference of 0.68, with a 90% CI of 0.24-0.95. (Lines 171-172 in results, Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) Other statistical methods:

      (3.1) The caption of Fig 4 refers to "model applied to train+test data". Does that mean the training data were included in the correlation measurement? Depending on the number of degrees of freedom in the model, this could have led to overfitting.

      We have removed Figure 4 and emphasize the key results in Figure 1 and 2 that we see statistically robust signal of PN PC 2 explaining OCT-MCH preference variation in both a training set and a testing set of flies (Fig 2 – figure supplement 1C-D).

      (3.2) Line 180 describes a model that performed twice as well on test data (31% EV) as it did on training data (15%). What would explain such an outcome? And how does that affect one's confidence in the 31% number?

      The test set recordings were conducted several weeks after the training set recordings, which were used to establish PN PC 2 as a correlate of OCT-MCH preference. The fact that the test data had a higher R2 likely reflects sampling error (these two correlation coefficients are not significantly different). Ultimately this gives us more confidence in our model, as the predictive capacity is maintained in a totally separate set of flies.

      (3.340 Multiple models get compared in performance before settling on one. For example, sometimes the first PC is used, sometimes the second. Different weighting schemes appear in Fig 2. Do the quoted p-values for the correlation plots reflect a correction for multiple hypothesis testing?

      For all calcium-behavior models, we restricted our analysis to 5 PCs, as the proportion of calcium variance explained by each of these PCs was higher than that explained by the respective PC of shuffled data — i.e., there were at most five significant PCs in that data. We thus performed at most 5 hypothesis tests for a given model. PN PC 2 explained 15% of OCT-MCH preference variation, with a p-value of 0.0063 – this p-value is robust to a conservative Bonferroni correction to the 5 hypotheses considered at alpha=0.05.

      The weight schemes in Figure 2 and Figure 1 – figure supplement 10 reflect our interpretations of the salient features of the PCs and are follow-up analysis of the single principal component hypothesis tests. Thus they do not constitute additional tests that should be corrected. We now state in the methods explicitly that all reported p-values are nominal (line 563).

      (3.4) Line 165 ff: Quoting rho without giving the confidence interval is misleading. For example, the rho for the presynaptic density model is quoted as 0.51, which would be a sizeable correlation. But in fact, the posterior on rho is almost flat, see caption of Fig 4 suppl 1, which lists the CI as [0.11, 0.85]. That means the experiments place virtually no constraint on rho. If the authors had taken no data at all, the posterior on rho would be uniform, and give a median of 0.5.

      We now provide a confidence interval around 𝜌signal for the PN PC 2 model (lines 170-172). But per above, and consistent with the new focus of this revision, we view the 𝜌signal inference as secondary to the simple, significant correlation between PN PC 2 and OCT-MCH preference.

      (4) As it stands now, this paper illustrates how difficult it is to come to a strong conclusion in this domain. This may be worth some discussion. This group is probably in a better position than any to identify what are the limiting factors for this kind of research.

      We thank the reviewer for this suggestion and have added discussion of the difficulties in detecting signals for this kind of problem. That said, we are confident in stating that there is a meaningful correlation between PC 2 of PN Ca++ responses and OCT-MCH behavior given our model’s performance in predicting preference in a test set of flies, and in the consistent signal in ORN Bruchpilot.

      Reviewer #3 (Recommendations for the Authors):

      Two major concerns, one experimental/technical and one conceptual:

      (1) I appreciate the difficulty of the experimental design and problem. However, the correlations reported throughout are based on neural measurements in only 5 glomeruli (~10% of the olfactory system) at early stages of olfactory processing.

      We acknowledge that only imaging 5 glomeruli is regrettable. We worked hard to develop image analysis pipelines that could reliably segment as many glomeruli as possible from almost all individual flies. In the end, we concluded that it was better to focus our analysis on a (small) core set of glomeruli for which we had high confidence in the segmentation. Increasing the number of analyzed glomeruli is high on the list of improvements for subsequent studies. Happily, we are confident that we are capturing a significant, biologically meaningful correlation between PC 2 of PN calcium (dominated by the responses in DC2 and DM2) and OCT-MCH preference.

      3-OCT and MCH activate many glomeruli in addition to the five studied, especially at the concentrations used. There is also limited odor-specificity in their response matrix: notably responses are more correlated in all glomeruli within an individual, compared to responses across individuals (they note this in lines 194-198, though I don't quite understand the specific point they make here). This is a sign of high experimental variability (typically the dynamic range of odor response within an individual is similar to the range across individuals) and makes it even more difficult to resolve underlying individual variation.

      We respectfully disagree with the reviewer’s interpretation here. There is substantial odor-specificity in our response matrix. This is evident in both the ORN and PN response matrices (and especially the PN matrix) as variation in the brightness across rows. Columns, which correspond to individuals, are more similar than rows, which correspond to odor-glomerulus pairs. The dynamic range within an individual (within a column, across rows) is indeed greater than the variation among individuals (within a row, across columns).

      As an (important) aside, the odor stimuli are very unusual in this study. Odors are delivered at extremely high concentrations (variably 10-25% sv, line 464, not exactly sure what "variably' means- is the stimulus intensity not constant?) as compared to even the highest concentrations used in >95% of other studies (usually <~0.1% sv delivered).

      We used these concentrations for a variety of reasons. First, following the protocol of Honegger and Smith (2020), we found that dilutions in this range produce a linear input-output relationship, i.e. doubling or halving one odorant yields proportionate changes in odor-choice behavior metrics. Second, such fold dilutions are standard for tunnel assays of the kind we used. Claridge-Chang et al. (2009) used 14% and 11% for MCH and OCT respectively, for instance. Finally, the specific dilution factor (i.e., within the range of 10-25%) was adjusted on a week-by-week basis to ensure that in an OCT-MCH choice, the mean preference was approximately 50%. This yields the greatest signal of individual odor preference. We have added this last point to the methods section where the range of dilutions is described (lines 442-445).

      A parsimonious interpretation of their results is that the strongest correlation they see (ORN PC1 predicts OCT v. air preference) arises because intensity/strength of ORN responses across all odors (e.g. overall excitability of ORNs) partially predicts behavioral avoidance of 3-OCT. However, the degree to which variation in odor-specific glomerular activation patterns can explain behavioral preference (3-OCT v. MCH) seems much less clear, and correspondingly the correlations are weaker and p-values larger for the 3-OCT v. MCH result.

      With respect, we disagree with this analysis. The correlation between ORN PC 1 and OCT v. air preference (R2 \= 0.23) is quite similar to that of PN PC 2 and OCT vs MCH preference (R2 \= 0.20). However, the former is dependent on a single outlying point, whereas the latter is not. The latter relationship is also backed up by the BRP imaging and modeling. Therefore in the revision we have de-emphasized the OCT v. air preference model and emphasized the OCT v. MCH preference models.

      (2) There is a broader conceptual concern about the degree of logical consistency in the authors' interpretation of how neural variability maps to behavioral variability. For instance, the two odors they focus on, 3-OCT and MCH, barely activate ORNs in 4 of the 5 glomeruli they study. Most of the correlation of ORN PC1 vs. behavioral choice for 3-OCT vs. air, then, must be driven by overall glomerular activation by other odors (but remains predictive since responses across odors appear correlated within an individual). This gives pause to the interpretation that 3-OCT-evoked ORN activity in these five glomeruli is the neural substrate for variability in the behavioral response to 3-OCT.

      Our interpretation of the ORN PC1 linear model is not that 3-OCT-evoked ORN activity is the neural substrate for variability – instead, it is the general responsiveness of an individual’s AL across multiple odors (this is our interpretation of the the uniformly positive loadings in ORN PC1). It is true that OCT and MCH do not activate ORNs as strongly as other odorants – our analysis rests on the loadings of the PCs that capture all odor/glomerulus combinations available in our data. All that said, since a single outlier in Figure 1L dominates the relationship, therefore we have de-emphasized these particular results in our revision.

      This leads to the most significant concern, which is that the paper does not provide strong evidence that odor-specific patterns of glomerular activation in ORNs and PNs underlie individual behavioral preference between different odors (that each drive significant levels of activity, e.g. 3-OCT v. MCH), or that the ORN-PN synapse is a major driver of individual behavioral variability. Lines 26-31 of the abstract are not well supported, and the language should be softened.

      We have modified the abstract to emphasize our confidence in PN calcium correlating with odor-vs-odor preference (removing the ORN & odor-vs-air language).

      Their conclusions come primarily from having correlated many parameters reduced from the ORN and PN response matrices against the behavioral data. Several claims are made that a given PC is predictive of an odor preference while others are not, however it does not appear that the statistical tests to support this are shown in the figures or text.

      For each linear model of calcium dynamics predicting preference, we restricted our analysis to the first 5 principal components. Thus, we do not feel that we correlated many parameters against the behavioral data. As mentioned below, the correlations identified by this approach comfortably survive a conservative Bonferroni correction. In this revision, a linear model with a single predictor – the projection onto PC 2 of PN calcium – is the result we emphasize in the text, and we report R2 between measured and predicted preference for both a training set of flies and for a test set of flies (Figure 1M and Figure 2 – figure supplement 1).

      That is, it appears that the correlation of models based on each component is calculated, then the component with the highest correlation is selected, and a correlation and p-value computed based on that component alone, without a statistical comparison between the predictive values of each component, or to account for effectively performing multiple comparisons. (Figure 1, k l m n o p, Figure 3, d f, and associated analyses).

      To reiterate, this was our process: 1) Collect a training data set of paired Ca++ recordings and behavioral preference scores. 2) Compute the first five PCs of the Ca++ data, and measure the correlation of each to behavior. 3) Identify the PC with the best correlation. 4) Collect a test data set with new experimental recordings. 5) Apply the model identified in step 3. For some downstream analyses, we combined test and training data, but only after confirming the separate significance of the training and test correlations.

      The p-values associated with the PN PC 2 model predicting OCT-MCH preference are sufficiently low in each of the training and testing sets (0.0063 and 0.0069, respectively) to pass a conservative Bonferroni multiple hypothesis correction (one hypothesis for each of the 5 PCs) at an alpha of 0.05.

      Additionally, the statistical model presented in Figure 4 needs significantly more explanation or should be removed- it's unclear how they "infer" the correlation, and the conclusions appears inconsistent with Figure 3 - Figure Supplement 2.

      We have removed Figure 4 and have improved upon our approach of inferring the strength of the correlation between latent calcium and behavior in the Methods, incorporating bootstrapping of all sources of data used for the inference (lines 622-628). At the same time, we now emphasize that this analysis is a bonus of sorts, and that the simple correlation between Ca++ and behavior is the main result.

      Suggestions:

      (1) If the authors want to make the claim that individual variation in ORN or PN odor representations (e.g. glomerular activation patterns) underlie differences in odor preference (MCH v. OCT), they should generalize the weak correlation between ORN/PN activity and behavior to additional glomeruli and pair of odors, where both odors drive significant activity. Otherwise, the claims in the abstract should be tempered.

      We have modified the abstract to focus on the effect we have the highest confidence in: contrasting PN calcium activation of DM2 and DC2 predicting OCT-MCH preference.

      (2) One of the most valuable contributions a study like this could provide is to carefully quantify the amount of measurement variation (across trials, across hemispheres) in neural responses relative to the amount of individual variation (across individuals). Beyond the degree of variation in the amplitude of odor responses, the rank ordering of odor response strength between repeated measurements (to try to establish conditions that account for adaptation, etc.), between hemispheres, and between individuals is important. Establishing this information is foundational to this entire field of study. The authors take a good first step towards this in Figure 1J and Figure 1, supplement 5C, but the plots do not directly show variance, and the comparison is flawed because more comparisons go into the individual-individual crunch (as evidenced by the consistently smaller range of quartiles). The proper way to do this is by resampling.

      We do not know what the reviewer means by “individual-individual crunch,” unfortunately. Thus, it is difficult to determine why they think the analysis is flawed. We are also uncertain about the role of resampling in this analysis. The medians, interquartile ranges and whiskers in the panels referenced by the reviewer are not confidence intervals as might be determined by bootstrap resampling. Rather, these are direct statistics on the coding distances as measured – the raw values associated with these plots are visualized in Figure 1H.

      In our revision we updated the heatmaps in Figure 1 – figure supplement 3 to include recordings across the lobes and trials of each individual fly, and we have added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials, with associated rank-order correlation coefficients. Since the focus of this study was whether measured individual differences predict individual behavioral preference, a full characterization of the statistics of variation in calcium responses was not the focus, though it was the focus of a previous study (Honegger & Smith et al., 2019).

      To help the reader understand the data, we would encourage displaying data prior to dimensionality reduction - why not show direct plots of the mean and variance of the neural responses in each glomerulus across repeats, hemispheres, individuals?

      We added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials.

      A careful analysis of this point would allow the authors to support their currently unfounded assertion that odor responses become more "idiosyncratic" farther from the periphery (line 135-36); presumably they mean beyond just noise introduced by synaptic transmission, e.g. "idiosyncrasy" is reproducible within an individual. This is a strong statement that is not well-supported at present - it requires showing the degree of similarity in the representation between hemispheres is more similar within a fly than between flies in PNs compared to ORNs (see Hige... Turner, 2015).

      Here are the lines in question: “PN responses were more variable within flies, as measured across the left and right hemisphere ALs, compared to ORN responses (Figure 1 – figure supplement 5C), consistent with the hypothesis that odor representations become more idiosyncratic farther from the sensory periphery.”

      That responses are more idiosyncratic farther from the periphery is therefore not an “unfounded assertion.” It is clearly laid out as a hypothesis for which we can assess consistency in the data. We stand by our original interpretation: that several observations are consistent with this finding, including greater distance in coding space in PNs compared to ORNs, particularly across lobes and across flies. In addition, higher accuracy in decoding individual identity from PN responses compared to ORN responses (now appearing as Figure 1 – figure supplement 6A) is also consistent with this hypothesis.

      Still, to make confusion at this sentence less likely, we have reworded it as “suggesting that odor representations become more divergent farther from the sensory periphery.” (lines 139-140)

      (3) Figure 3 is difficult to interpret. Again, the variability of the measurement itself within and across individuals is not established up front. Expression of exogenous tagged brp in ORNs is also not guaranteed to reflect endogenous brp levels, so there is an additional assumption at that level.

      Figure 3 – figure supplement 1 Panels A-C display the variability of measurements (Brp volume, total fluorescence and fluorescence density) both within (left/right lobes) and across individuals (the different data points). We agree that exogenous tagged Brp levels will not be identical to endogenous levels. The relationship appears significant despite this caveat.

      Again there are statistical concerns with the correlations. For instance, the claim that "Higher Brp in DM2 predicted stronger MCH preference... " on line 389 is not statistically supported with p<0.05 in the ms (see Figure 3 G as the closest test, but even that is a test of the difference of DM2 and DC2, not DM2 alone).

      We have changed the language to focus on the pattern of the loadings in PC 2 of Brp-Short density and replaced “predict.” (lines 366-369).

      Can the authors also discuss what additional information is gained from the expansion microscopy in the figure supplement, and how it compares to brp density in DC2 using conventional methods?

      The expansion microscopy analysis was an attempt to determine what specific aspect of Brp expression was predictive of behavior, on the level of individual Brp puncta, as a finer look compared to the glomerulus-wide fluorescence signal in the conventional microscopy approach. Since this method did not yield a large sample size, at best we can say it provided evidence consistent with the observation from confocal imaging that Brp fluorescent density was the best measure in terms of predicting behavior.

      I would prefer to see the calcium and behavioral datasets strengthened to better establish the relationship between ORN/PN responses and behavior, and to set aside the anatomical dataset for a future work that investigates mechanisms.

      We are satisfied that our revisions put appropriate emphasis on a robust result relating calcium and behavior measurements: the relationship between OCT-MCH preference and idiosyncratic PN calcium responses. Finding that idiosyncratic Brp density has similar PC 2 loadings that also significantly predict behavior is an important finding that increases confidence in the calcium-behavior finding. We agree with the reviewer that these anatomical findings are secondary to the calcium-behavior analyses, but think they warrant a place in the main findings of the study. As the reviewer suggests, we are conducting follow-on studies that focus on the relationship between neuroanatomical measures and odor preference.

      (4) The mean imputation of missing data may have an effect on the conclusions that it is possible to draw from this dataset. In particular, as shown in Figure 1, supplemental figure 3, there is a relatively large amount of missing data, which is unevenly distributed across glomeruli and between the cell types recorded from. Strikingly, DC2 is missing in a large fraction of ORN recordings, while it is present in nearly all the PN recordings. Because DC2 is one of the glomeruli implicated in predicting MCH-OCT preference, this lack of data may be particularly likely to effect the evaluation of whether this preference can be predicted from the ORN data. Overall, mean imputation of glomerulus activity prior to PCA will artificially reduce the amount of variance contributed by the glomerulus. It would be useful to see an evaluation of which results of this paper are robust to different treatments of this missing data.

      We confirmed that the linear model of predicted OCT-MCH using PN PC2 calcium was minimally altered when we performed imputation via alternating least squares using the pca function with option ‘als’ to infill missing values on the calcium matrix 1000 times and taking the mean infilled matrix (see MATLAB documentation and Figure 1 – figure supplement 5 of Werkhoven et al., 2021). Fitted slope value for model using mean-infilled data presented in article: -0.0806 (SE = 0.028, model R2 \= 0.15), fitted slope value using ALS-imputed model: -0.0806 (SE 0.026, model R2 \= 0.17).

      Additional comments:

      (1) On line 255 there is an unnecessary condition: "non-negative positive".

      Thank you – non-negative has been removed.

      (2) In Figure 4 and the associated analysis, selection of +/- 20% interval around the observed $R^2$ appears arbitrary. This could be based on the actual confidence interval, or established by bootstrapping.

      We have replaced the +/- 20% rule by bootstrapping the calculation of behavior-behavior R2, calcium-calcium R2, and calcium-behavior R2 and propagating the uncertainties forward (Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) On line 409 the claim is made "These sources of variation specifically implicate the ORN-PN synapse..." While the model recapitulates the glomerulus specific variation of activity under PN synapse density variation, it also occurs under ORN identity variation, which calls into question whether the synapse distribution itself is specifically implicated, or if any variation that is expected to be glomerulus specific would be equally implicated.

      We agree with this observation. We found that varying either the ORNs or the PNs that project to each glomeruli can produce patterns of PN response variation similar to what is measured experimentally. This is consistent with the idea that the ORN-PN synapse is a key site of behaviorally-relevant variation.

      (4) Line 214 "... we conclude that the relative responses of DM2 vs DC2 in PNs largely explains an individual's preference." is too strong of a claim, based on the fact that using the PC2 explains much more of the variance, while using the stated hypothesis noticeable decreases the predictive power ($R^2$ = 0.2 vs $R^2$ = 0.12 )

      We have changed the wording here to “we conclude that the relative responses of DM2 vs DC2 in PNs compactly predict an individual’s preference.” (lines 192-193)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the mechanism by which PGE2 inhibits the release of insulin from pancreatic beta cells in response to glucose. The researchers used a combination of cell line experiments and studies in mice with genetic ablation of the Kv2.2 channel. Their findings suggest a novel pathway where PGE2 acts through EP2/EP4 receptors to activate PKA, which directly phosphorylates a specific site (S448) on the Kv2.2 channel, inhibiting its activity and reducing GSIS.

      Strengths:

      - The study elegantly demonstrates a potential pathway connecting PGE2, EP2/EP4 receptors, PKA, and Kv2.2 channel activity, using embryonic cell line.

      - Additional experiments in INS1 and primary mouse beta cells with altered Kv2.2 function partially support the inhibitory role of PGE2 on GSIS through Kv2.2 inhibition.

      Weaknesses:

      - A critical limitation is the use of HEK293T cells, which are not pancreatic beta cells. Functional aspects can differ significantly between these cell types.

      - The study needs to address the apparent contradiction of PKA activating insulin secretion in beta cells, while also inhibiting GSIS through the proposed mechanism.

      - A more thorough explanation is needed for the discrepancies observed between the effects of PGE2 versus Kv2.2 knockdown/mutation on the electrical activity of beta cells and GSIS.

      Thank you for your positive evaluation and constructive feedback on our study. We appreciate the concern regarding the use of HEK293T cells, which are not pancreatic beta cells and may exhibit functional differences. In response, we have repeated our key experiments using INS1 cells and primary mouse beta cells, which are more representative of the native beta cell environment. These additional experiments confirm our hypothesis and further support the role of Kv2.2 in PGE2-induced inhibition of GSIS. In beta cells, glucose-induced PKA activation is highly localized. As a result, while some PKA pathways promote insulin secretion, others may inhibit it. To directly demonstrate that PGE2-induced PKA phosphorylation of Kv2.2 is involved in the inhibitory effect on GSIS, we overexpressed the S448A mutant Kv2.2 channel in INS-1(832/13) cells. Our results show that Kv2.2-S448A channels significantly attenuate the inhibitory effect of PGE2 on GSIS, further supporting the critical role of Kv2.2 phosphorylation at S448. These data have been added to the revised Figure 7C.

      Reviewer #2 (Public Review):

      The authors identified new target elements for prostaglandin E2 (PGE2) through which insulin release can be regulated in pancreatic beta cells under physiological conditions. In vitro extracellular exposure to PGE2 could directly and dose-dependently inhibit the potassium channel Kv2.2. In vitro pharmacology revealed that this inhibition occurs through the EP2/4 receptors, which activate protein kinase A (PKA). By screening specific sites of the Kv2.2 channel, the target phosphorylation site (S448) for PKA regulation was found. The physiological relevance of the described signaling cascade was investigated and confirmed in vivo, using a Kv2.2 knockdown mouse model.

      The strength of this manuscript is the novelty of the (EP2/4-PKA-Kv2.2 channel) molecular pathway described and the comprehensive methodological toolkit the authors have relied upon.

      The introduction is detailed and contains all the information necessary to place the claims in context. Although the dataset is comprehensive and a logical lead is consistently built, there is one important point to consider: to clarify that the described signaling pathway is characteristic of normal physiological conditions and thus differs from pathological changes. It would be useful to carry out basic experiments in a diabetes model (regardless of whether this is in mice or rats).

      Thank you for your positive evaluation and insightful comment. We have clarified in the Discussion section that our findings pertain specifically to physiological conditions. We acknowledge the importance of investigating the signaling pathway in a pathological context and plan to conduct experiments using a diabetes model in future studies to explore how this pathway may differ under such conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 3A-C: PKA activation regulates different functional aspects in beta cells and HEK293T cells. It is well known that PKA activation enhances insulin secretion in beta cells, therefore the mechanisms that allow the same pathway at the same time to inhibit GSIS are not clear and should be addressed by experiments in beta cells.

      Thank you for your insightful comment. Specificity and versatility in cAMP-PKA signaling are governed by the spatial localization and temporal dynamics of the signal. In beta cells, glucose-induced PKA activation is highly localized (Tengholm and Gylfe, 2017). As a result, while some PKA pathways promote insulin secretion, others may inhibit it. For example, a global increase in cAMP, such as through treatment with Db-cAMP, can simultaneously activate both stimulatory and inhibitory PKA pathways, reflecting a more integrated, complex response. In previous studies, 1 mM Db-cAMP was shown to enhance GSIS in INS-1 cells (Dezaki et al., 2011). We observed that 1 mM Db-cAMP increased GSIS, but lower concentrations (10 mM) decreased GSIS (as shown in Author response image 1). These findings suggest that not all PKA signaling events increase GSIS. To further investigate the role of PGE2-induced PKA phosphorylation of Kv2.2 in the inhibition of GSIS, we overexpressed the S448A mutant of Kv2.2 in INS-1 (832/13) cells. Our results showed that the Kv2.2-S448A mutant significantly attenuated the inhibitory effect of PGE2 on GSIS. These new data have been incorporated into the revised Figure 7C.

      Author response image 1.

      Effect of Db-cAMP on GSIS in INS-1 cells. Statistics for the effect of different concentrations of Db-cAMP on GSIS in INS-1(832/13) cells. One-way ANOVA with Bonferroni post hoc test. *p < 0.05; ***p < 0.001; ****p < 0.0001; n.s., not significant.

      (2) Figure 3G: One would expect that the phospho-mimetic mutation, S448D, will have an opposite effect to S448A and a similar effect as PGE2 or PKA activator in Figure 3B. There is no explanation by the authors for having the same effect in S448A and S448D.

      Thank you for your thoughtful comment. Indeed, the S448D mutation exhibited a similar effect to PGE2 on Kv2.2 channels, as we observed significantly smaller currents compared to wild-type Kv2.2 (Figure 3F). The S448D mutation mimics the phosphorylated state of S448, and since PGE2 regulates Kv2.2 channels by phosphorylating this residue, it has no further effect on the S448D mutant (Figure 3G). In contrast, the S448A mutation prevents phosphorylation at this site, which explains why PGE2 has no effect on the currents of S448A mutant Kv2.2 channels (Figure 3H). These results confirm that PGE2 modulates Kv2.2 channels specifically through phosphorylation of S448, as evidenced by the lack of effect on both the S448A and S448D mutants.

      (3) Figure 4E: Since both PGE2 and Kv2.2 KD inhibit the activity of the channel, it doesn't definitively prove whether PGE2 acts through Kv2.2 in INS-1 cells. A complementary experiment should be done in which overactivation of Kv2.2 rescues the effect of PGE2. For example, with the S448A form of the channel.

      We appreciate your comment and valuable suggestion. Knockdown of Kv2.2 abrogated the inhibitory effect of PGE2 on I<sub>K</sub> currents in INS-1 cells (Figure 4E and F), which strongly indicates that PGE2 acts through Kv2.2. While we agree that the suggested complementary experiment with Kv2.2 overactivation (e.g., using the S448A mutant) could provide additional insights, we believe the current data sufficiently support our conclusion, as the knockdown of Kv2.2 eliminates the observed PGE2 effect, providing direct evidence of the channel's involvement.

      (4) Figure 5C: This result requires further explanation. If PGE2 downregulates Kv2.2 activity and has an inhibitory effect on GSIS, why does Kv2.2 KD have the opposite effect?

      The knockdown of Kv2.2 (Fig. 5C) reduced action potential (AP) firing rates compared to the scramble control (Fig. 5B), which is expected because Kv2.2 is critical for maintaining AP firing. When Kv2.2 is knocked down, the reduced AP firing diminishes the system’s responsiveness to further modulation by PGE2. This is because PGE2 exerts its effects primarily through Kv2.2 channels. Therefore, in the Kv2.2 knockdown condition, PGE2 does not exert an additional inhibitory effect on AP firing rates, as the channels critical for its action are already impaired.

      (5) Figure 5D - The EP1-EP4 receptor antibodies should be validated at least in INS-1(832/13) cells using knockdowns.

      Thank you for your suggestion. We have validated the EP1-EP4 receptor antibodies in INS-1(832/13) cells using knockdown experiments. The validation results, including confirmation of specificity and knockdown efficiency, are provided in Supplemental Figure S2.

      (6) Figure 7B - These experiments don't necessarily prove that PGE2 acts directly through Kv2.2 inhibition. Using the S448A mutation in these experiments could prove this point.

      Thank you for this valuable suggestion. We have now overexpressed the S448A mutant Kv2.2 channels in INS-1(832/13) cells, and the results demonstrate that Kv2.2-S448A channels significantly reduce the inhibitory effect of PGE2 on GSIS. These new data have been incorporated into the revised Figure 7C.

      Reviewer #2 (Recommendations For The Authors):

      (1) Deficiencies and inaccuracies in the description of the methods (animal numbers, name of vendors, abbreviations) and the typos in the figures (axis label) require correction.

      Thank you for pointing this out. We have carefully reviewed the manuscript and the figures, making the necessary corrections to address the deficiencies in the methods section and the typos in the figure axis labels.

      (2) Reducing the number of figures (Figures 7/C-E: knockout mouse line test and Figure1/HEK cell experiments could be part of supplementary) and paragraphs would make the manuscript more compact and powerful. It would also ease its reading for non-experts.

      Thank you for your suggestion. We have moved Figures 7C-E to the supplementary data (Supplemental Figure S1) to streamline the main manuscript.

      (3) Multiple immunostainings for EP receptors in insulinoma cells or pancreatic islets would be representative.

      Due to the rabbit-derived nature of the antibodies (EP1, EP2, EP4), performing multiple immunostainings on the same samples is not feasible due to potential cross-reactivity. However, the immunohistochemistry images demonstrate that each antibody labels more than 90% of the cells, indicating that β-cell express different subtypes of EP receptors simultaneously.

      (4) The antagonists chosen (AH6809, AH23848) are non-specific. Experiments should be re-run (at least some) under more stringent conditions.

      Thank you for your suggestion. AH6809 and AH23848 are well-documented, widely used antagonists in the literature. To further strengthen our findings, we have included additional, widely-used antagonists: the EP2-specific antagonist TG4155 and the EP4-specific antagonist GW627368. The results obtained with these new antagonists were consistent with those observed using AH6809 and AH23848. These updated data are now included in the revised Figure 4I and 4J.

      (5) It would be very helpful to indeed emphasise that this work is for physiological conditions and that it is (or is not) modified in diabetes. Maybe even irrelevant for diabetes (?). This needs to be clarified and supported by data even if one could assume the authors intend to have a follow-up entirely dedicated to pathological changes, perhaps.

      Thank you for this insightful comment. We have clarified in the Discussion that our findings are specific to physiological conditions. To address this point, we have added the following statement:

      "Importantly, our findings pertain to physiological conditions. While we demonstrate the inhibitory effects of PGE2 on Kv2.2 channels in normal b-cells, the role of this pathway under diabetic conditions remains to be investigated and will be the focus of future studies."

      Dezaki K, Damdindorj B, Sone H, Dyachok O, Tengholm A, Gylfe E, Kurashina T, Yoshida M, Kakei M, Yada T (2011) Ghrelin attenuates cAMP-PKA signaling to evoke insulinostatic cascade in islet beta-cells. Diabetes 60:2315-2324.

      Tengholm A, Gylfe E (2017) cAMP signalling in insulin and glucagon secretion. Diabetes Obes Metab 19 Suppl 1:42-53.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Responses to the reviewers

      We thank the editor and reviewers for their insightful feedback and valuable suggestions on our revised manuscript. In this reply, we provided further clarifications and made changes accordingly. Reviewers’ comments are in bold, and our responses are immediately below. Changes in the main text are presented in italics, accompanied by the specific line numbers in the revised manuscript where these changes can be found. Below, we respond to each reviewer’s comments in turn.

      Reviewer #1 (Public Review):

      Ps observed 24 objects and were asked which afforded particular actions (14 action types). Affordances for each object were represented by a 14-item vector, values reflecting the percentage of Ps who agreed on a particular action being afforded by the object. An affordance similarity matrix was generated which reflected similarity in affordances between pairs of objects. Two clusters emerged, reflecting correlations between affordance ratings in objects smaller than body size and larger than body size. These clusters did not correlate themselves. There was a trough in similarity ratings between objects ~105 cm and ~130 cm, arguably reflecting the body size boundary. The authors subsequently provide some evidence that this clear demarcation is not simply an incidental reflection of body size, but likely causally related. This evidence comes in the flavour of requiring Ps to imagine themselves as small as a cat or as large as an elephant and showing a predicted shift in the affordance boundary. The manuscript further demonstrates that ChatGPT (theoretically interesting because it's trained on language alone without sensorimotor information; trained now on words rather than images) showed a similar boundary.

      The authors also conducted a small MRI study task where Ps decide whether a probe action was affordable (graspable?) and created a congruency factor according to the answer (yes/no). There was an effect of congruency in posterior fusiform and superior parietal lobule for objects within body size range, but not outside. No effects in LOC or M1.

      The major strength of this manuscript in my opinion is the methodological novelty. I felt the correlation matrices were a clever method for demonstrating these demarcations, the imagination manipulation was also exciting, and the ChatGPT analysis provided excellent food for thought. These findings are important for our understanding of the interactions between action and perception, and hence for researchers from a range of domains of cognitive neuroscience.

      The major element that limits conclusions is that an MRI study with 12 P in this context can really only provide pilot data. Certainly the effects are not strong enough for 12 P to generate much confidence. The others of my concerns have been addressed in the revision.

      Reviewer #1 (Recommendations For The Authors):

      I think that the authors need to mention in the abstract that the MRI study constitutes a small pilot.

      Response: We appreciate the reviewer’s positive evaluation and constructive suggestions. In response to the concern about the limited number of participants in the fMRI study, we fully acknowledge the implications this has on the generalizability and robustness of our findings related to the congruency effect. To clarity, we have explicitly stated its preliminary nature of the MRI study in the abstract [line 22]: “A subsequent fMRI experiment offered preliminary evidence of affordance processing exclusively for objects within the body size range, but not for those beyond.”

      Reviewer #2 (Public Review):

      Summary

      In this work, the authors seek to test a version of an old idea, which is that our perception of the world and our understanding of the objects in it are deeply influenced by the nature of our bodies and the kinds of behaviours and actions that those objects afford. The studies presented here muster three kinds of evidence for a discontinuity in the encoding of objects, with a mental "border" between objects roughly of human body scale or smaller, which tend to relate to similar kinds of actions that are yet distinct from the kinds of actions implied by human-or-larger scale objects. This is demonstrated through observers' judgments of the kinds of actions different objects afford; through similar questioning of AI large-language models (LLMs); and through a neuroimaging study examining how brain regions implicated in object understanding make distinctions between kinds of objects at human and larger-than-human scales.

      Strengths 

      The authors address questions of longstanding interest in the cognitive neurosciences -- namely how we encode and interact with the many diverse kinds of objects we see and use in daily life. A key strength of the work lies in the application of multiple approaches. Examining the correlations among kinds of objects, with respect to their suitability for different action kinds, is novel, as are the complementary tests of judgments made by LLMs. The authors include a clever manipulation in which participants are asked to judge action-object pairs, having first adopted the imagined size of either a cat or an elephant, showing that the discontinuity in similarity judgments effectively moved to a new boundary closer to the imagined scale than the veridical human scale. The dynamic nature of the discontinuity hints that action affordances may be computed dynamically, "on the fly", during actual action behaviours with objects in the real world.

      Weaknesses 

      A limitation of the tests of LLMs may be that it is not always known what kinds of training material was used to build these models, leading to a possible "black box" problem. Further, presuming that those models are largely trained on previous human-written material, it may not necessarily be theoretically telling that the "judgments" of these models about action-object pairs shows human-like discontinuities. Indeed, verbal descriptions of actions are very likely to mainly refer to typical human behaviour, and so the finding that these models demonstrate an affordance discontinuity may simply reflect those statistics, rather than providing independent evidence for affordance boundaries.

      The relatively small sample size of the brain imaging experiment, and some design features (such as the task participants performed, and the relatively narrow range of objects tested) provide some limits on the extent to which it can be taken as support for the authors' claims.

      Response: We thank the reviewer for the positive evaluation and the constructive comments. We agree that how LLMs work is a “black box”, and thus it is speculative to assume them to possess any human-like ability, because, as the reviewer pointed out, “these models demonstrate an affordance discontinuity may simply reflect those statistics.” Indeed, our manuscript has expressed a similar idea [line 338]: “We speculated that ChatGPT models may have formed the affordance boundary through a human prism ingrained within its linguistic training corpus.” That is, our intention was not to suggest that such information could replace sensorimotor-based interaction or achieve human-level capability, but rather to highlight that embodied interaction is necessary. Additionally, the scope of the present study does not extend to elucidating the mechanisms behind LLMs’ resemblance of affordance boundary, whether through statistical learning or actual comprehension. To clarify this point, in the revised manuscript, we have clarified that the mechanisms underlying the observed affordance boundary in LLMs may be different from human cognitive processes, and advocated future studies to explore this possibility [line 415]: “Nevertheless, caution should be taken when interpreting the capability of LLMs like ChatGPT, which are often considered “black boxes.” That is, our observation indicates that certain sensorimotor information is embedded within human language materials presumably through linguistic statistics, but it is not sufficient to assert that LLMs have developed a human-like ability to represent affordances. Furthermore, such information alone may be insufficient for LLMs to mimic the characteristics of the affordance perception in biological intelligence. Future studies are needed to elucidate such limitation.”

      Regarding the concern about the models’ results not “providing independent evidence for affordance boundaries”, our objective in employing LLMs was to explore if an affordance boundary could emerge from conceptual knowledge without direct sensorimotor experience, rather than to validate the existence of the affordance boundary per se.

      As for the concern about the limitations imposed by the small sample size and certain design features of our brain imaging experiment, please see our reply to Reviewer #1.

      Reviewer #3 (Public Review):

      Summary:

      Feng et al. test the hypothesis that human body size constrains the perception of object affordances, whereby only objects that are smaller than the body size will be perceived as useful and manipulable parts of the environment, whereas larger objects will be perceived as "less interesting components."

      To test this idea, the study employs a multi-method approach consisting of three parts:

      In the first part, human observers classify a set of 24 objects that vary systematically in size (e.g., ball, piano, airplane) based on 14 different affordances (e.g., sit, throw, grasp). Based on the average agreement of ratings across participants, the authors compute the similarity of affordance profiles between all object pairs. They report evidence for two homogenous object clusters that are separated based on their size with the boundary between clusters roughly coinciding with the average human body size. In follow-up experiments, the authors show that this boundary is larger/smaller in separate groups of participants who are instructed to imagine themselves as an elephant/cat.

      In the second part, the authors ask different large language models (LLMs) to provide ratings for the same set of objects and affordances and conduct equivalent analyses on the obtained data. Some, but not all, of the models produce patterns of ratings that appear to show similar boundary effects, though less pronounced and at a different boundary size than in humans.

      In the third part, the authors conduct an fMRI experiment. Human observers are presented with four different objects of different sizes and asked if these objects afford a small set of specific actions. Affordances are either congruent or incongruent with objects. Contrasting brain activity on incongruent trials against brain activity on congruent trials yields significant effects in regions within the ventral and dorsal visual stream, but only for small objects and not for large objects.

      The authors interpret their findings as support for their hypothesis that human body size constrains object perception. They further conclude that this effect is cognitively penetrable, and only partly relies on sensorimotor interaction with the environment (and partly on linguistic abilities).

      Strengths:

      The authors examine an interesting and relevant question and articulate a plausible (though somewhat underspecified) hypothesis that certainly seems worth testing. Providing more detailed insights into how object affordances shape perception would be highly desirable. Their method of analyzing similarity ratings between sets of objects seems useful and the multi-method approach is original and interesting.

      Weaknesses:

      The study presents several shortcomings that clearly weaken the link between the obtained evidence and the drawn conclusions. Below I outline my concerns in no particular order:

      (1) It is not entirely clear to me what the authors are proposing and to what extent the conducted work actually speaks to this. For example, in the introduction, the authors write that they seek to test if body size serves not merely as a reference for object manipulation but also "plays a pivotal role in shaping the representation of objects." This motivation seems rather vague motivation and it is not clear to me how it could be falsified.

      Overall, the lack of theoretical precision makes it difficult to judge the appropriateness of the approaches and the persuasiveness of the obtained results. I would strongly suggest clarifying the theoretical rationale and explaining in more detail how the chosen experiments allow them to test falsifiable predictions.

      (2) The authors used only a very small set of objects and affordances in their study and they do not describe in sufficient detail how these stimuli were selected. This renders the results rather exploratory and clearly limits their potential to discover general principles of human perception. Much larger sets of objects and affordances and explicit data-driven approaches for their selection would provide a more convincing approach and allow the authors to rule out that their results are just a consequence of the selected set of objects and actions.

      (3) Relatedly, the authors could be more thorough in ruling out potential alternative explanations. Object size likely correlates with other variables that could shape human similarity judgments and the estimated boundary is quite broad (depending on the method, either between 80 and 150 cm or between 105 to 130 cm). More precise estimates of the boundary and more rigorous tests of alternative explanations would add a lot to strengthen the authors' interpretation.

      (4) While I appreciate the manipulation of imagined body size, as a clever way to solidify the link between body size and affordance perception, I find it unfortunate that it is implemented in a between-subjects design, as this clearly leaves open the possibility of pre-existing differences between groups. I certainly disagree with the authors' statement that their findings suggest "a causal link between body size and affordance perception."

      (5) The use of LLMs in the current study is not clearly motivated and I find it hard to understand what exactly the authors are trying to test through their inclusion. As it currently stands, I find it hard to discern how the presence of perceptual boundaries in LLMs could constitute evidence for affordance-based perception.

      (6) Along the same lines, the fMRI study also provides little evidence to support the authors' claims. The use of congruency effects as a way of probing affordance perception is not well motivated. Importantly (and related to comment 2 above), the very small set of objects and affordances in this experiment heavily complicates any conclusions about object size being the crucial variable determining the occurrence of congruency effects.

      Overall, I consider the main conclusions of the paper to be far beyond the reported data. Articulating a clearer theoretical framework with more specific hypotheses as well as conducting more principled analyses on more comprehensive data sets could help the authors obtain stronger tests of their ideas.

      Response: We appreciate the insightful inquiries regarding our manuscript. Below, we explained the theoretical motivation and rationale of each part of our experiments.

      In response to the reviewer’s insights, we have modified the expression “plays a pivotal role in shaping the representation of objects” in the revised manuscript and have restated the general question of our study in the introduction. Our motivation is on the long-lasting debate over the representation versus direct perception of affordance, specifically examining the “representationalization” of affordance. That is, we tested whether object affordance simply covaried directly with continuous constraints such as object size, a perspective aligned with the representation-free (direct perception) view, or whether affordance became representationalized, adhering to the representation-based view, constrained by body size. Such representationalization would generate a categorization between objects that are affordable and the environment that exceeds affordance.

      To test these hypotheses, we first delineated the affordance of various objects. We agree with the reviewer that in this step a broader selection of objects and actions could mitigate the risk of our results being influenced by the specific selection of objects and actions. However, our results are unlikely to be biased, because our selection was guided by two key criteria, rather than being arbitrary. First, the objects were selected from the dataset in Konkle and Oliva's study (2011), which systematically investigated object size’ impact on object recognition, thus providing a well-calibrated range of sizes (i.e., from 14 cm to 7,618 cm) reflective of real-world objects. Second, the selected actions covered a wide range of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing) based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a representative sampling of typical human experiences.

      Upon demonstrating a trough in perceived affordance similarity, we recognized the location of the affordance boundary coincidentally fell within the range of human body size. We agree with the reviewer that this observation of the coincidence between body size and the location of boundary alone is not sufficient for a mechanistic explanation, because variables co-varying with object sizes might also generate this coincidence. The identification of a more precise location for the boundary unlikely rules out alternative explanations of this kind. To establish a causal link between body size and the affordance boundary, we opted for a direct manipulation of body sizes through imagination, while keeping all other variables constant across conditions. This approach allowed us to examine whether and how the affordance boundary shifts in response to body size changes.

      Regarding the between-subjects design of the imagination experiment, we wish to clarify that this design aimed to prevent carryover effects. Although a within-subjects design indeed is more sensitive in detecting manipulation effects by accounting for subject variability, it risks contamination across conditions. Specifically, transitioning immediately between different imagined body sizes poses a challenge, and sequential participation could induce undesirable response strategies, such as deliberately altering responses to the same objects in different conditions. The between-subjects design, which susceptible to participant variability (e.g., “pre-existing differences between groups” suggested by the reviewer), avoids such contamination. In addition, we employed random assignment of participants to different conditions (cat-size versus elephant-size).

      The body imagination experiment provided causal evidence of an embodied discontinuity, suggesting the boundary is tied to the agent’s motor capacity, rather than amodal sources. The LLMs experiment then sought to test a prediction from the embodied theories of cognition: the supramodality of object perception. Especially, we asked whether the embodied discontinuity is supramodally accessible, using LLMs to assess whether affordance perception discretization is supramodally accessible beyond the sensorimotor domain through linguistic understanding. From this perspective, our LLM experiment was employed not to affirm affordance-based perception but to examine and support a prediction by the embodied theories of cognition.

      Finally, our preliminary fMRI study aimed to conceptually replicate the perceptual discontinuity and explore it neural correlates using a subset of objects and actions from the behaviour experiments. This approach was chosen to achieve stable neural responses and enhance study power, employing the congruent effect (congruent - incongruent) as a metric for affordance processing (e.g., Kourtis et al., 2018), which reflects facilitated responses when congruent with objects’ affordances (e.g., Ellis & Tucker, 2000). Nevertheless, we recognize the limitation of a relatively small sample sizes, for details please see our reply to the reviewer #1.

      In summary, our findings contribute to the discourse on computationalism’s representation concept and influence of these representations, post-discretization, on processes beyond the sensorimotor domain. We hope that these additional explanations and revisions effectively address the concerns raised and demonstrate our commitment to enhancing the quality of our work in light of your valuable feedback. By acknowledging these limitations and directions for future research, we hope to further the discourse on affordance perception and embodied cognition.

      References

      Ellis, R., & Tucker, M. (2000). Micro‐affordance: The potentiation of components of action by seen objects. British Journal of Psychology, 91(4), 451-471.

      Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

      Konkle, T., & Oliva, A. (2011). Canonical visual size for real-world objects. Journal of Experimental Psychology: human perception and performance, 37(1), 23.

      Kourtis, D., Vandemaele, P., & Vingerhoets, G. (2018). Concurrent cortical representations of function-and size-related object affordances: an fMRI study. Cognitive, Affective, & Behavioral Neuroscience, 18, 1221-1232.


      The following is the authors’ response to the original reviews.

      Responses to the reviewers

      We deeply appreciate the reviewers’ comments. In response to the concerns raised, we have revised the manuscript accordingly. Below we address each of the reviewers’ comments in turn. Reviewers’ comments are in bold, and our responses are immediately below. Changes in the main text are presented in italics, followed by corresponding page and line numbers in the revised manuscript. We also highlighted tracks of change in the revised manuscript.

      Reviewer #1 (Public Review):

      (1) The main behavioural work appears well-powered (>500 Ps). This sample reduces to 100 for the imagination study, after removing Ps whose imagined heights fell within the human range (100-200 cm). Why 100-200 cm? 100 cm is pretty short for an adult. Removing 80% of data feels like conclusions from the imagination study should be made with caution.

      R1: Sorry for the confusion. We did not remove 80% of the participants; instead, a separate sample of participants was recruited in the imagination experiment. The size of this sample (100 participants) was indeed smaller than the first experiment (528 participants), because the first experiment was set for exploratory purposes and was designed to be over-powered. Besides, inspection of the data of the first sample showed that the affordance pattern became stable after the first 50 participants. We explained this consideration in the revised manuscript:

      (p 21, ln 490) “…, another one hundred and thirty-nine participants from the same population were recruited from the same platform. We chose a smaller sample size for the imagination experiment compared to that for the object-action relation judgement task, because inspection of the data of the first sample showed that the affordance pattern became stable after the first 50 participants.”

      The average adult human height ranges from 140-170 cm for women and 150180 cm for men (NCD-RisC, 2016). Accordingly, the criterion of 100-200 cm covered this range and was set to ensure that participants unambiguously imagined a body schema different from that of human, as the tallest domestic cat below 100 cm according to the Guinness World Records and an elephant above 200 cm according to Crawley et al. (2017). We clarified these considerations in the revised manuscript:

      (p 21, ln 494) “To maximize the validity of the manipulation, data from participants whose imagined height fell within the average human size range (100cm - 200cm) were excluded from further analysis. Consequently, 100 participants (49 males, aged from 17 to 39 years, mean age = 23.2 years) remained in the analysis. This exclusion criterion was broader than the standard adult human height range of 140cm to 180cm (NCD-RisC, 2016). This approach ensured that our analysis focused on participants who unambiguously imagined a body schema different from humans, yet within the known height range of cats and elephants.”

      In addition, we also reanalysed the data with a more conservative criterion of 140cm to 180cm, and the results remained.

      (2) There are only 12 Ps in the MRI study, which I think should mean the null effects are not interpreted. I would not interpret these data as demonstrating a difference between SPL and LOC/M1, but rather that some analyses happened to fall over the significance threshold and others did not.

      R2: We would like to clarify that the null hypothesis of this fMRI study is the lack of two-way interaction between object size and object-action congruency, which was rejected by the observed significant interaction. That is, the interpretation of the present study did not rely on accepting any null effect.

      Having said this, we admit that the fMRI experiment is exploratory and the sample size is small (12 participants), which might lead to low power in estimating the affordance effect. In the revision, we acknowledge this issue explicitly:

      (p 16, ln 354) “…, supporting the idea that affordance is typically represented only for objects within the body size range. While it is acknowledged that the sample size of the fMRI study was small (12 participants), necessitating cautious interpretation of its results, the observed neural-level affordance discontinuity is notable. That is, qualitative differences in neural activity between objects within the affordance boundary and those beyond replicated our behavioral findings. This convergent evidence reinforced our claim that objects were discretized into two broad categories along the continuous size axis, with affordance only being manifested for objects within the boundary.”

      (3) I found the MRI ROI selection and definition a little arbitrary and not really justified, which rendered me even more cautious of the results. Why these particular sensory and motor regions? Why M1 and not PMC or SMA? Why SPL and not other parietal regions? Relatedly, ROIs were defined by thresholding pF and LOC at "around 70%" and SPL and M1 "around 80%", and it is unclear how and why these (different) thresholds were determined.

      R3: Our selection of these specific sensory and motor regions was based on prior literature reporting their distinct contribution to affordance perception (e.g., Borghi, 2005; Sakreida et al., 2016). The pFs was chosen as a representative region of the ventral visual stream, involved in object identification and classification, and the SPL was chosen as a representative region of the dorsal visual stream, involved in object perception and manipulation. The primary motor cortex (M1) has also been reported involved in affordance processing (e.g., McDannald et al., 2018), and we chose this region to probe the affordance congruency effect in the motor execution stage of the sense-think-act pathway. We did not choose the premotor cortex (PMC) and the supplementary motor area (SMA) because they were proposedly also involved in processes beyond motor execution (e.g., Hertrich et al., 2016; Kantak et al., 2012), and if any effect was observed, one cannot exclusively attribute the effect to motor execution. As for the parietal regions, our choice of the SPL not IPL/IPS is based on the meta-analysis of affordance processing areas where only the SPL shows consistent activation for both stable and variable affordances (Sakreida et al., 2016). We chose the SPL to capture effects on either type of affordances. In revision, we explained these considerations in the revised manuscript:

      (p 14, ln 280) “In addition to the pFs and SPL, we also examined the congruency effect in the lateral occipital cortex (LO), which is involved in object representation (e.g., Grill-Spector et al., 2000; Konkle & Caramazza, 2013) and provides inputs to both the pFs and SPL (Hebart et al., 2018). Meanwhile, the primary motor cortex (M1), which receives inputs from the dorsal stream (Vainio & Ellis, 2020), is involved in affordance processing (e.g., McDannald et al., 2018) and action executions (Binkofski et al., 2002).”

      (p 29, ln 684) “We chose the pFs, LO, SPL, and M1 as ROIs based on existing literature highlighting their distinct contributions to affordance perception (Borghi, 2005; Sakreida et al., 2016).”

      Regarding ROI thresholding, we apologize for the lack of clarity in reporting the thresholds in the original manuscript. The thresholds were different between ventral regions (from Zhen et al., 2015) and dorsal regions (from Fan et al., 2016) because they are from two different atlases. The former was constructed by probability maps of task-state fMRI activity during localizer contrast with stationary images and the latter by a parcellation of the brain's functional connectivity; therefore, the numerical values in these two atlases are not comparable. To extract ROIs with comparable sizes, we selected a threshold of 55% for the pFs, 90% for the LO, 78% for the SPL, and 94% for the M1 in the original manuscript.

      To rule out the possibility that the results were distorted by the specific choice of thresholds, we re-ran the analysis with a threshold 80% for all ROIs (resulting in 456 voxels in the lpFs, 427 voxels in the rpFs, 1667 voxels in the lLO, 999 voxels in the rLO, 661 voxels in the lSPL, 310 voxels in the rSPL, 231 voxels in the lM1, and 327 voxels in the rM1) with the 2-by-2 repeated-measures ANOVA. Our results remained the same qualitatively. A significant interaction between object type and congruency was observed in the pFs (F(1,11) = 24.87, p <.001, 𝜂2=.69) and SPL (F(1,11) = 14.62, p =.003, 𝜂2=.57). The simple effect analysis revealed the congruency effect solely for objects within body size range (pFs: p =.003; SPL: p <.001), not for objects beyond (ps >.30). For the M1 and LO, neither significant main effects (ps >.11) nor interactions were found (ps >.20).

      We clarified our choice of thresholds in the methods section in the revised manuscript:

      (p 29, ln 686) “Eight ROIs depicted in Fig. 3b were constructed based on the overlap between the whole-brain map activated by both objects within and beyond and corresponding functional atlases (the pFs and LO from Zhen et al., 2015; the SPL and M1 from Fan et al., 2016). To achieve ROIs of similar sizes, we applied varying thresholds to each cortical area: for the pFs and LO, the atlases were thresholded at 55% and 90%, resulting in 266 voxels in the lpFs, 427 in the rpFs, 254 in the lLO and 347 in the rLO; for the SPL and M1, the atlases were thresholded at 78% and 94%, resulting in 661 voxels in the lSPL, 455 in the rSPL, 378 in the lM1, and 449 in the rM1. In the subsequent analysis, homologous areas spanning both cortical hemispheres were merged.”

      (4) Discussion and theoretical implications. The authors discuss that the MRI results are consistent with the idea we only represent affordances within body size range. But the interpretation of the behavioural correlation matrices was that there was this similarity also for objects larger than body size, but forming a distinct cluster. I therefore found the interpretation of the MRI data inconsistent with the behavioural findings.

      R4: We speculated that the similarity in action perception among objects beyond the body size range may be due to these objects being similarly conceptualized as ‘environment’, in contrast to the objects within the body size range, which are categorized differently, namely as the ‘objects for the animal.’ Accordingly, in cortical regions involved in object processing, objects conceptualized as ‘environment’ unlikely showed the congruency effect, distinct from objects within the body size range. We have explained this point in the revised manuscript:

      (p 17, ln 370) “…which resonates the embodied influence on the formation of abstract concepts (e.g., Barsalou, 1999; Lakoff & Johnson, 1980) of objects and environment. Consistently, our fMRI data did not show the congruency effect for objects beyond the body size range, distinct from objects within this range, suggesting a categorization influenced by objects’ relative size to the human body.”

      (5) In the discussion, the authors outline how this work is consistent with the idea that conceptual and linguistic knowledge is grounded in sensorimotor systems. But then reference Barsalou. My understanding of Barsalou is the proposition of a connectionist architecture for conceptual representation. I did not think sensorimotor representation was privileged, but rather that all information communicates with all other to constitute a concept.

      R5: We are sorry for the confusion. We do not intend to argue that the sensorimotor representation is privileged. Instead, we would like to simply emphasize their engagement in concept. According to our understanding, Barsalou’s Perceptual Symbol Theory proposes that grounded concepts include sensorimotor information, and conceptual knowledge is grounded in the same neural system that supports action (Barsalou, 1999). This is consistent with our proposal that the affordance boundary locked to an animal’s sensorimotor capacity might give rise to a conceptual-ish representation of object-ness specific to the very animal. We have clarified this point in the introduction and discussion on the conceptual knowledge and sensorimotor information:

      In the introduction (p 2, ln 59) “…, and the body may serve as a metric that facilitates meaningful engagement with the environment by differentiating objects that are accessible for interactions from those not. Further, grounded cognition theory (see Barsalou, 2008 for a review) suggests that the outputs of such differentiation might transcend sensorimotor processes and integrate into supramodal concepts and language. From this perspective, we proposed two hypotheses...”

      In the discussion (p 18, ln 392) “Indeed, it has been proposed that conceptual knowledge is grounded in the same neural system that supports action (Barsalou, 1999; Glenberg et al., 2013; Wilson & Golonka, 2013), thereby suggesting that sensorimotor information, along with other modal inputs, may be embedded in language (e.g., Casasanto, 2011; Glenberg & Gallese, 2012; Stanfield & Zwaan, 2001), as the grounded theory proposed (see Barsalou, 2008 for a review).”

      (6) More generally, I believe that the impact and implications of this study would be clearer for the reader if the authors could properly entertain an alternative concerning how objects may be represented. Of course, the authors were going to demonstrate that objects more similar in size afforded more similar actions. It was impossible that Ps would ever have responded that aeroplanes afford grasping and balls afford sitting, for instance. What do the authors now believe about object representation that they did not believe before they conducted the study? Which accounts of object representation are now less likely?

      R6: We thank the reviewer for this suggestion. The theoretical motivation of the present study is to explore whether, for continuous action-related physical features (such as object size relative to the agents), affordance perception introduces discontinuity and qualitative dissociation, i.e., to allow the sensorimotor input to be assigned into discrete states/kinds, as representations envisioned by the computationalists; alternatively, whether the activity may directly mirror the input, free from discretization/categorization/abstraction, as proposed by the Replacement proposal of some embodied theories on cognition.

      By addressing this debate, we hoped to shed light on the nature of representation in, and resulted from, the vision-for-action processing. Our finding of affordance discontinuity suggests that sensorimotor input undergoes discretization implied in the computationalism idea of representation. Further, not contradictory to the claims of the embodied theories, these representations do shape processes out of the sensorimotor domain, but after discretization.

      We have now explained our hypotheses and alternatives explicitly in the revised introduction and discussion:

      In the introduction (p 2, ln 45) “However, the question of how object perception is influenced by the relative size of objects in relation to the human body remains open. Specifically, it is unclear whether this relative size simply acts as a continuous variable for locomotion reference, or if it affects differentiating and organizing object representation based on their ensued affordances.”

      In the discussion (p 14, ln 295) “One long-lasting debate on affordance centers on the distinction between representational and direct perception of affordance. An outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998), which challenges the necessity of representation as posited by computationalism’s cognitive theories (e.g., Fodor, 1975). This hypothesis suggests that input is discretized/categorized and subjected to abstraction or symbolization, creating discrete stand-ins for the input (e.g., representations/states). Such representationalization would lead to a categorization between the affordable (the objects) and those beyond affordance (the environment), in contrast to the perspective offered by embodied theories. The present study probed this ‘representationalization’ of affordance by examining whether affordance perception introduces discontinuity and qualitative dissociation in response to continuous action-related physical features (such as object size relative to the agents), which allows sensorimotor input to be assigned into discrete states/kinds, in line with the representation-based view under the constraints of body size. Alternatively, it assessed whether activity directly mirrors the input, free from discretization/categorization/abstraction, in line with the representation-free view.

      First, our study found evidence demonstrating discretization in affordance perception. Then, through the body imagination experiment, we provided causal evidence suggesting that this discretization originates from sensorimotor interactions with objects rather than amodal sources, such as abstract object concepts independent of agent motor capability. Finally, we demonstrated the supramodality of this embodied discontinuity by leveraging the recent advances in AI. We showed that the discretization in affordance perception is supramodally accessible to disembodied agents such as large language models (LLMs), which lack sensorimotor input but can access linguistic materials built upon discretized representations. These results collectively suggest that sensorimotor input undergoes discretization, as implied in the computationalism’s idea of representation. Note that, these results are not contradictory to the claim of the embodied theories, as these representations do shape processes beyond the sensorimotor domain but after discretization.

      This observed boundary in affordance perception extends the understanding of the discontinuity in perception in response to the continuity of physical inputs (Harnad, 1987; Young et al., 1997).”

      Reviewer #1 (Recommendations For The Authors):

      a) I would recommend providing further justification for why 100-200 cm were used as the cut-offs reflecting acceptable imagined body size. Were these decisions preregistered anywhere? If so, please state.

      Ra: Please see R1.

      b) I would encourage the authors to call the MRI a small pilot study throughout, including in the abstract.

      Rb: We completely agree and have indicated the preliminary nature of this study in the revised version:

      (p 11, ln 236) “To test this speculation, we ran an fMRI experiment with a small number of participants to preliminarily investigate the neural basis of the affordance boundary in the brain by measuring neural activity in the dorsal and ventral visual streams when participants were instructed to evaluate whether an action was affordable by an object (Fig. 3a).”

      c) Please provide much further justification of ROI selection, why these thresholds were chosen, and therefore why they are different across regions.

      Rc: Please see R3.

      d) Further elucidation in the discussion would help the reader interpret the MRI data, which should always be interpreted also in light of the behavioural findings.

      Rd: Please see R4.

      e) The authors may wish to outline precisely what they claim concerning the nature of conceptual/linguistic representation. Is sensorimotor information privileged or just part of the distributed representation of concepts?

      Re: This is a great point. For details of corresponding revision, please see R5.

      f) There are some nods to alternative manners in which we plausibly represent objects (e.g. about what the imagination study tells us) but I think this theoretical progression should be more prominent.

      Rf: We thank the reviewer for this suggestion. For details of corresponding revision, please see R6.

      Reviewer #2 (Public Review):

      (1) A limitation of the tests of LLMs may be that it is not always known what kinds of training material was used to build these models, leading to a possible "black box" problem. Further, presuming that those models are largely trained on previous human-written material, it may not necessarily be theoretically telling that the "judgments" of these models about action-object pairs show human-like discontinuities. Indeed, verbal descriptions of actions are very likely to mainly refer to typical human behaviour, and so the finding that these models demonstrate an affordance discontinuity may simply reflect those statistics, rather than evidence that affordance boundaries can arise independently even without "organism-environment interactions" as the authors claim here.

      R1: We agree that how LLMs work is a “black box”, and thus it is speculative to assume them to possess any human-like ability, because, as the reviewer pointed out, “these models demonstrate an affordance discontinuity may simply reflect those statistics.” Indeed, our manuscript has expressed a similar idea: “We speculated that ChatGPT models may have formed the affordance boundary through a human prism ingrained within its linguistic training corpus. (p 16 ln 338)”. That is, we did not intend to claim that such information is sufficient to replace sensorimotor-based interaction, or to restore human-level capability, for which we indeed speculated that embodied interaction is necessary. In the revised manuscript, we have clarified our stand that the mechanism generating the observed affordance boundary in LLMs might be different from that in human cognition, and urged future studies to explore this possibility:

      (p 18, ln 413) “…, as well as alignment methods used in fine-tuning the model (Ouyang et al., 2022). Nevertheless, caution should be taken when interpreting the capabilities of LLMs like ChatGPT, which are often considered “black boxes.” That is, our observation indicates that some degree of sensorimotor information is embedded within human language materials presumably through linguistic statistics, but it is not sufficient to assert that LLMs have developed a human-like ability to represent affordances. Furthermore, such information alone may be insufficient for LLMs to mimic the characteristics of the affordance perception in biological intelligence. Future studies are needed to elucidate such limitation.”

      Indeed, because of this potential dissociation, our LLM study might bear novel implications for the development of AI agents. We elaborated on them in the revised discussion on LLMs:

      (p 19, ln 427) “…, represents a crucial human cognitive achievement that remains elusive for AI systems. Traditional AI (i.e., task-specific AI) has been confined with narrowly defined tasks, with substantial limitations in adaptability and autonomy. Accordingly, these systems have served primarily as tools for humans to achieve specific outcomes, rather than as autonomous agents capable of independently formulating goals and translating them into actionable plans. In recent years, significant efforts have been directed towards evolving traditional AI into more agent-like entities, especially in domains like navigation, object manipulation, and other interactions with the physical world. Despite these advancements, the capabilities of AI still fall behind human-level intelligence. On the other hand, embodied cognition theories suggest that sensorimotor interactions with the environment are foundational for various cognitive domains. From this point of view, endowing AI with human-level abilities in physical agent-environment interactions might provide an unreplaceable missing piece for achieving Artificial General Intelligence (AGI). This development would significantly facilitate AI’s role in robotics, particularly in actions essential for survival and goal accomplishment, a promising direction for the next breakthrough in AI (Gupta et al., 2021; Smith & Gasser, 2005).

      However, equipping a disembodied AI with the ability for embodied interaction planning within a specific environment remains a complex challenge. By testing the potential representationalization of action possibilities (affordances) in both humans and LLMs, the present study suggests a new approach to enhancing AI’s interaction ability with the environment. For instance, our finding of supramodal affordance representation may indicate a possible pathway for disembodied LLMs to engage in embodied physical interactions with their surroundings. From an optimistic view, these results suggest that LLM-based agents, if appropriately designed, may leverage affordance representations embedded in language to interact with the physical world. Indeed, by clarifying and aligning such representations with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may foster the sensorimotor interaction abilities of LLM-based agents. This progression could lead to achieving animal-level interaction abilities with the world, potentially sparking new developments in the field of embodied cognition theories.”

      (2) The authors include a clever manipulation in which participants are asked to judge action-object pairs, having first adopted the imagined size of either a cat or an elephant, showing that the discontinuity in similarity judgments effectively moved to a new boundary closer to the imagined scale than the veridical human scale. The dynamic nature of the discontinuity suggests a different interpretation of the authors' main findings. It may be that action affordance is not a dimension that stably characterises the long-term representation of object kinds, as suggested by the authors' interpretation of their brain findings, for example. Rather these may be computed more dynamically, "on the fly" in response to direct questions (as here) or perhaps during actual action behaviours with objects in the real world.

      R2: We thank the reviewer for pointing out the dynamic nature of affordance perception in our study. This feature indeed reinforced our attribution of the boundary into an affordance-based process instead of a conceptual or semantic process, the latter of which would predict the action possibilities being a fixed belief about the objects, instead of being dynamically determined according to the feature of the agent-object dyads. In addition, this dynamic does not contradict with our interpretation of the observed boundary in affordance perception. With this observation, we speculated that continuous input was abstracted or representationalized into discontinued categories, and the boundary between these categories was drawn according to the motor capacity of the agent. The finding of the boundary adapting to manipulation on body schema suggests that the abstraction/representationalization dynamically updates according to the current belief of motor capacity and body schema of the animal. In addition, we agree that future studies are needed to examine the dynamics of the abstraction/representationalization of affordance, probably by investigating the evolvement of affordance representation during ongoing actual interactions with novel objects or manipulated motor capability. These points are now addressed in the revision:

      (p 17, ln 380) “Therefore, this finding suggests that the affordance boundary is cognitively penetrable, arguing against the directness of affordance perception (e.g., Gibson, 1979; Greeno, 1994; Prindle et al., 1980) or the exclusive sensorimotor origin of affordances (e.g., Gallagher, 2017; Thompson, 2010; Hutto & Myin, 2012; Chemero, 2013). Further, this finding that the boundary adapted to manipulation on body schema suggests that the abstraction/representationalization may be dynamically updated in response to the current motor capacity and body schema of the agent, suggesting that the affordance-based process is probably determined dynamically by the nature of the agent-object dyads, rather than being a fixed belief about objects. Future studies could explore the dynamics of affordance representationalization, probably by investigating how affordance representations evolve during active interactions with novel objects or under conditions of altered motor capabilities. Finally, our findings also suggest that disembodied conceptual knowledge pertinent to action likely modulates affordance perception.”

      Reviewer #2 (Recommendations For The Authors):

      a) As described, I think the authors could improve their discussion of the LLM work and consider more deeply possible different interpretations of their findings with those models. Are they really providing an independent data point about how objects may be represented, or instead is this a different, indirect way of asking humans the same questions (given the way in which these models are trained)?

      Ra: Please see R1.

      b) Some of the decisions behind the design of the fMRI experiment, and some of the logic of its interpretation, could be made clearer. Why those four objects per se? What kinds of confounds, such as familiarity, or the range of possible relevant actions per object, might need to be considered? Is there the possibility that relative performance on the in-scanner behavioural task may be in part responsible for the findings? Why were those specific regions of interest chosen and not others? The authors find that the dorsal and ventral regions make a univariate distinction between congruent and incongruent trials, but only for human-scale objects, but it was not clear from the framework that the authors adopted why that distinction should go in that direction (e.g. congruent > incongruent) nor why there shouldn't also be a distinction for the "beyond" objects? Finally, might some of these brain questions better be approached with an RSA or similar approach, as that would seem to better map onto the behavioural studies?

      Rb: We thank the reviewer for the detailed suggestions.

      Regarding the fMRI study, we have provided further justification on its rationale in the revised manuscript:

      (p 11, ln 231) “The distinct categories of reported affordances demarcated by the boundary imply that the objects on either side of the boundary may be represented differently in the brain. We thus speculated that the observed behavioral discontinuity is likely underpinned by distinct neural activities, which give rise to these discrete ‘representations’ separated by the boundary.”

      The objects used in the fMRI study were selected by taking into account the objective of the fMRI study, which was to provide the neural basis for the affordance discontinuity found in behaviour experiments. In other words, the fMRI study is not an exploratory experiment, but a validation experiment. To this end, we deliberately selected a small range of common objects to ensure that participants were sufficiently familiar with them, as confirmed through their oral reports. Furthermore, to ensure a fair comparison between the two categories of objects in terms of action possibility range, we predetermined an equal number of congruent and incongruent actions for each category. This arrangement was intended to eliminate any bias that might arise from different amount of action choices associated with each category. Therefore, the present object and action sets in the fMRI study, which were based on the behavior experiments, are sufficient for its purpose.

      Regarding the possibility that the performance of the in-scanner behavioural task may be in part responsible for the findings, we analysed participants’ performance. Not surprisingly, participants demonstrated high consistency and accuracy in their responses:

      𝑀𝑒𝑎𝑛𝐶𝑜𝑛𝑔𝑟𝑢𝑒𝑛𝑡_𝑂𝑏𝑗𝑒𝑐𝑡𝑊𝑖𝑡ℎ𝑖𝑛 = 0.991, SD = 0.018;

      𝑀𝑒𝑎𝑛𝐼𝑛𝑐𝑜𝑛𝑔𝑟𝑢𝑒𝑛𝑡_𝑂𝑏𝑗𝑒𝑐𝑡𝑊𝑖𝑡ℎ𝑖𝑛 = 0.996, SD = 0.007;

      𝑀𝑒𝑎𝑛𝐶𝑜𝑛𝑔𝑟𝑢𝑒𝑛𝑡_𝑂𝑏𝑗𝑒𝑐𝑡𝐵𝑒𝑦𝑜𝑛𝑑 = 0.996, SD = 0.004;

      𝑀𝑒𝑎𝑛𝐼𝑛𝑐𝑜𝑛𝑔𝑟𝑢𝑒𝑛𝑡𝑂𝑏𝑗𝑒𝑐𝑡𝐵𝑒𝑦𝑜𝑛𝑑 = 0.998, SD = 0.002

      in all conditions, suggesting constant active engagement with the task. Thus, the inscanner behaviour unlikely resulted in the lack of congruency effect for the ‘beyond’ objects observed in the brain.

      Regarding the selection of ROIs, our decision to focus on these specific sensory and motor regions was based on existing literature highlighting their distinct contribution to affordance perception (Borghi, 2005; Sakreida et al., 2016). The pFs was chosen for its role in object identification and classification, while the SPL was chosen for its involvement in object manipulation. Additionally, the primary motor cortex (M1) is known to be engaged in affordance processing (e.g., McDannald et al., 2018), which was included to investigate the affordance congruency effect during the motor execution stage of the sense-think-act pathway. These considerations are detailed in the revised manuscript:

      (p 14, ln 280) “In addition to the pFs and SPL, we also examined the congruency effect in the lateral occipital cortex (LO), which is involved in object representation (e.g., Grill-Spector et al., 2000; Konkle & Caramazza, 2013) and provides inputs to both the pFs and SPL (Hebart et al., 2018). Meanwhile, the primary motor cortex (M1), which receives inputs from the dorsal stream (Vainio & Ellis, 2020), is involved in affordance processing (e.g., McDannald et al., 2018) and action executions (Binkofski et al., 2002).”

      (p 29, ln 684) “We chose the pFs, LO, SPL, and M1 as ROIs based on existing literature highlighting their distinct contributions to affordance perception (Borghi, 2005; Sakreida et al., 2016).”

      Regarding the congruency effect, in our study, we followed the established fMRI research paradigm of employing the congruent effect as a measure of affordance processing (e.g., Kourtis et al., 2018), and the rationale behind the directionality of the distinction in our framework (congruent > incongruent) is grounded in the concept of affordance, in which the mere perception of a graspable object facilitates motor responses that are congruent with certain qualities of the object (e.g., Ellis & Tucker, 2000). From the interaction of congruency by object type, we observed only congruency effect for objects within rather than objects beyond. We speculate that the objects beyond the affordance boundary is generally beyond the motor capacities of the very animal, being too large for the animal to manipulate, thus no congruency effect was found. We have added these clarifications in the revised manuscript:

      (p 11, ln 244) “The congruency effect, derived from the contrast of Congruent versus Incongruent conditions, is a well-established measure of affordance processing (e.g., Kourtis et al., 2018).”

      (p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to the smaller objects. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

      Regarding the RSA analysis, we agree with the reviewer that RSA may offer a more direct comparison with similarities among objects. However, our primary objective in this fMRI study was to explore the neural basis of the affordance boundary observed in the behavioural study, rather than explaining the similarities in neural responses between different objects. For this reason, we did not conduct RSA analysis.

      c) Page 4 Re statistical evaluation of the discontinuity in judgments, the authors might consider a Bayesian approach, which would be stronger than using "all ps > 0.05" to argue that within-boundary similarities are consistent and high.

      Rc: We thank the reviewer for the suggestion on the Bayesian approach for significance tests, which has been now added in the revised manuscript:

      In the results (p 4, ln 105) “This trough suggested an affordance boundary between size rank 4 and 5, while affordance similarities between neighboring ranks remained high (rs > 0.45) and did not significantly differ from each other (ps > 0.05, all 𝐵𝐹10 < 10) on either side of the boundary (Fig. 1d, left panel, green lines).”

      In the methods (p 25, ln 597) “Pearson and Filon’s (1898) Z, implemented in R package “cocor” (Diedenhofen & Musch, 2015) was used to evaluate the significance of these similarities (alpha level = .05, one-tail test). For significance tests, Bayesian statistical analyses were conducted using the web version of the “bayesplay” R package (Colling, 2021). Specifically, the data (likelihood) model was specified as a normal distribution, where the correlation coefficients were transformed to Fisher’s z. The null hypothesis was specified as a standard normal distribution centred at zero. Conversely, the alternative hypothesis was specified as a normal distribution centred at 2. Bayes factors (BF10) were calculated and interpreted using the classification scheme suggested by Wagenmakers et al. (2011), wherein a Bayes factor greater than 10 is considered strong evidence for accepting H1 over H0.”

      d) Page 4 One question I had about the big objects is whether their internal similarity and dissimilarity to smaller objects, might largely arise if most of the answers about actions for those larger objects are just "no"? This depends on the set of possible actions that were considered: the authors chose 14 from a previous study but did not describe these further or consider possible strengths/limitations of this selection. This is a very important point that needs addressing - to what extent are these findings "fragile" in that they relate only to that specific selection of 14 action kinds?

      Rd: The action judgements for objects beyond body size were not mostly “no”; in fact, there was no significant difference between average action possibilities related to objects beyond (25%) and within (26%). Rather, the dissimilarity between objects within and those beyond likely arose from the difference in most-plausible action set they related. For example, the top three actions related to objects within are “grasp”, “hold” and “throw”, while those related to objects beyond are “sit”, “lift” and “stand”, as stated in our original manuscript: “A further analysis on the affordances separated by the boundary revealed that objects within human body size range were primarily subjected to hand-related actions such as grasping, holding and throwing. These affordances typically involve object manipulation with humans’ effectors. In contrast, objects beyond the size range of human body predominantly afforded actions such as sitting and standing, which typically require locomotion or posture change of the whole body around or within the objects (p 11 ln 229)”.

      Regarding the validity of action selection, the selection of the objects and affordances in this study was guided by two key criteria. First, the objects were selected from the dataset published in Konkle and Oliva's study (2011), which systematically investigates the effect of object size on object recognition. Therefore, the range of object sizes, from 14 cm to 7,618 cm, is well-calibrated and represents a typical array of object sizes found in the real world. Second, the actions were selected to cover a wide range of daily humans-objects/environments interactions, from singlepoint movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a sufficiently representative of typic human experiences. In revision, we have clarified these two criteria in the methods section:

      (p 22, ln 517) “The full list of objects, their diagonal size, and size rankings were provided in Supplementary Table S6. The objects were selected from the dataset in Konkle and Oliva’s study (2011) to cover typic object sizes in the world (ranging from 14 cm to 7,618 cm), and actions related to these objects were selected to span a spectrum of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the Kinetics Human Action Video Dataset (Kay et al., 2017).”

      Having said this, we agree with reviewer that a larger set of objects and actions will facilitate finer localization of the representational discontinuity, which can be addressed in future studies

      (p 16, ln 344): “…, due to their impracticality for direct interactions. Future studies should incorporate a broader range of objects and a more comprehensive set of affordances for finer delineation of the representational discontinuity between objects and the environment.”

      e) Page 12 "no region showed the congruency effect for objects beyond the body size" in a whole brain analysis. What about a similar analysis for the humanscale objects? We must also keep in mind that with N=12 there may be relatively little power to detect such effects at the random-effects level, so this null finding may not be very informative.

      Re: We thank the reviewer for this advice. The whole brain analysis on the congruency effect for human-scale objects (objects within) has now been included in the supplementary materials (please see Author response figure 1d (New Supplementary Fig. S4d) and Author response table 1 (New Supplementary Table S5) below).

      Author response image 1.

      Significant brain activations of different contrasts in the whole-brain level analysis. a, the effect of object type, positive values (warm color) indicated higher activation for objects within than objects beyond and negative values (cold color) indicated the opposite. b, the effect of congruency, positive values indicated higher activation in congruent than incongruent condition. c, the effect of interaction between object type and congruency, positive values indicated the larger congruency effect for objects within than beyond. d, the congruency effect for objects within. All contrasts were corrected with cluster-level correction at p < .05. The detailed cluster-level results for each contrast map can be found in Supplementary Table S2 to S5.

      Author response table 1.

      Cortical regions showing significant congruency effect (congruent versus incongruent) for objects within, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected)

      Regarding the power of the fMRI study, we would like to clarify that, the critical test of this fMRI study is the two-way interaction of congruency effect by object size instead of the (null) congruency effect for the object beyond. Having said this, we agree that the sample size is small which might lead to lack of power in the fMRI study. In the revision we have now acknowledged this issue explicitly:

      (p 16, ln 354) “…supporting the idea that affordance is typically represented only for objects within the body size range. While it is acknowledged that the sample size of the fMRI study was small (12 participants), necessitating cautious interpretation of its results, the observed neural-level affordance discontinuity is notable. That is, qualitative differences in neural activity between objects within the affordance boundary and those beyond replicated our behavior findings. This convergent evidence reinforced our claim that objects were discretized into two broad categories along the continuous size axis, with affordance only being manifested for objects within the boundary.”

      f) Page 14 [the fMRI findings] "suggest that affordance perception likely requires perceptual processing and is not necessarily reflected in motor execution". This seems a large leap to make from a relatively basic experiment that tests only a small set of (arbitrarily chosen) objects and actions. It's important to keep in mind too that none of the studies here actually asked participants to interact with objects; that objects were shown as 2D images; and that the differences between real-world sizes of objects were greatly condensed by the way they are scaled for presentation on a computer screen (and such scaling is probably greater for the larger-than-human objects).

      Rf: The action-congruency judgement task is widely used in the studies of affordance processing (e.g., Kourtis et al., 2018; Peelen & Caramazza, 2012), so does the practice of not including actual interaction with the objects and using 2D instead of 3D objects (e.g., Peelen & Caramazza, 2012; Matić et al., 2020). However, we are aware that alternative practice exists in the field and we agree that it would be interesting for future studies to test whether actual interactions and 3D objects presentation may bring any change on the affordance boundary observed in our study.

      Our inference “affordance perception likely requires perceptual processing and is not necessarily reflected in motor execution” was based on the fMRI finding that the congruency effect only in cortical regions proposedly engaged in perceptual processing, but not in the M1 which is associated with motor execution. This significant two-way interaction pointed to a possibility that affordance processing may not necessarily manifest in motor execution.

      We acknowledge the scaling issue inherent in all laboratory experiments, but we doubt that it significantly influenced our results. In fact, it is a common practice in studies on object size to present objects of different physical sizes as constantly sized images on a screen (e.g., Konkle & Oliva, 2012; Huang et al., 2022). Moreover, scaling does not change the smoothness of object sizes, whereas the affordance boundary represents a singularity point that disrupts this smoothness. Finally, regarding the limited variety of objects and actions, please see Rd.

      g) Page 15 Why are larger objects "less interesting"? They have important implications for navigation, for example?

      Rg: We are sorry for the confusion. Our intention was to express that objects beyond the affordance boundary are generally beyond motor capacities of the animal in question. As such, compared to smaller objects within the environment, these larger objects may not typically be considered as potential targets for manipulation. We have now corrected the wording in the revised text:

      (p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to smaller objects in the environment. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

      h) Page 15 At several places I wondered whether the authors were arguing against a straw man. E.g. "existing psychological studies...define objects in a disembodied manner..." but no citations are given on this point, nor do the authors describe previous theoretical positions that would make a strong counter-claim to the one advocated here.

      Rh: We are sorry for not presenting our argument clearly. Previous studies often define the object space based on object features alone, such as absolute size or function, without reference to the knowledge and the abilities of the agent (e.g., de Beeck et al., 2008; Konkle & Oliva, 2011). This perspective overlooks the importance of the features of the animal-object pairs. Gibson (1979) highlighted that an object’s affordance, which includes all action possibilities it offers to an animal, is determined by the object’s size relative to the animal’s size, rather than its real-world size. Under this embodied view, we argue that the object space is better defined by the features of the agent-object system, and this is the primary assumption and motivation of the present study. We have now clarified this point and added the references in the revision:

      (p 2, ln 35) “A contemporary interpretation of this statement is the embodied theory of cognition (e.g., Chemero, 2013; Gallagher, 2017; Gibbs, 2005; Wilson, 2002; Varela et al., 2017), which, diverging from the belief that size and shape are inherent object features (e.g., de Beeck et al., 2008; Konkle & Oliva, 2011), posits that human body scale (e.g., size) constrains the perception of objects and the generation of motor responses.”

      (p 17, ln 365) “Existing psychological studies, especially in the field of vision, define objects in a disembodied manner, primarily relying on their physical properties such as shape (e.g., de Beeck et al., 2008) and absolute size (e.g., Konkle & Oliva, 2011).”

      Reviewer #3 (Public Review):

      (1) Even after several readings, it is not entirely clear to me what the authors are proposing and to what extent the conducted work actually speaks to this. In the introduction, the authors write that they seek to test if body size serves not merely as a reference for object manipulation but also "plays a pivotal role in shaping the representation of objects." This motivation seems rather vague motivation and it is not clear to me how it could be falsified.

      Similarly, in the discussion, the authors write that large objects do not receive "proper affordance representation," and are "not the range of objects with which the animal is intrinsically inclined to interact, but probably considered a less interesting component of the environment." This statement seems similarly vague and completely beyond the collected data, which did not assess object discriminability or motivational values.

      Overall, the lack of theoretical precision makes it difficult to judge the appropriateness of the approaches and the persuasiveness of the obtained results. This is partly due to the fact that the authors do not spell out all of their theoretical assumptions in the introduction but insert new "speculations" to motivate the corresponding parts of the results section. I would strongly suggest clarifying the theoretical rationale and explaining in more detail how the chosen experiments allow them to test falsifiable predictions.

      R1: We are sorry for the confusion about the theoretical motivation and rationale. Our motivation is on the long-lasting debate regarding the representation versus direct perception of affordance. That is, we tested whether object affordance would simply covary with its continuous constraints such as object size, in line with the representation-free view, or, whether affordance would be ‘representationalized’, in line with the representation-based view, under the constrain of body size. In revision, we have clarified the motivation and its relation to our approach:

      In the introduction (p 2, ln 45): “However, the question of how object perception is influenced by the relative size of objects in relation to the human body remains open. Specifically, it is unclear whether this relative size simply acts as a continuous variable for locomotion reference, or if it affects differentiating and organizing object representations based on their ensued affordances.”

      In the discussion (p 14, ln 295): “One long-lasting debate on affordance centers on the distinction between representational and direct perception of affordance. An outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998), which challenges the necessity of representation as posited by computationalism’s cognitive theories (e.g., Fodor, 1975). This hypothesis suggests that input is discretized/categorized and subjected to abstraction or symbolization, creating discrete stand-ins for the input (e.g., representations/states). Such representationalization would lead to a categorization between the affordable (the objects) and those beyond affordance (the environment). Accordingly, computational theories propose the emergence of affordance perception, in contrast to the perspective offered by embodied theories. The present study probed this ‘representationalization’ of affordance by examining whether affordance perception introduces discontinuity and qualitative dissociation in response to continuous action-related physical features (such as object size relative to the agents), which allows sensorimotor input to be assigned into discrete states/kinds, in line with the representation-based view under the constraints of body size. Alternatively, it assessed whether activity directly mirrors the input, free from discretization/categorization/abstraction, in line with the representation-free view.

      First, our study found evidence demonstrating discretization in affordance perception. Then, through the body imagination experiment, we provided causal evidence suggesting that this discretization originates from sensorimotor interactions with objects rather than amodal sources, such as abstract object concepts independent of agent motor capability. Finally, we demonstrated the supramodality of this embodied discontinuity by leveraging the recent advances in AI. We showed that the discretization in affordance perception is supramodally accessible to disembodied agents such as large language models (LLMs), which lack sensorimotor input but can access linguistic materials built upon discretized representations. These results collectively suggest that sensorimotor input undergoes discretization, as implied in the computationalism’s idea of representation. Note that, these results are not contradictory to the claim of the embodied theories, as these representations do shape processes beyond the sensorimotor domain but after discretization.

      The observed boundary in affordance perception extends the understanding of the discontinuity in perception in response to the continuity of physical inputs (Harnad, 1987; Young et al., 1997).”

      We are also sorry for the confusion about the expression “proper affordance representation”. We intended to express that the neural responses to objects beyond the boundary in the whole brain failed to reflect affordance congruency, and therefore did not show evidence of affordance processing. We have clarified this expression in the revised manuscript:

      (p 12, ln 265) “Taken together, the affordance boundary not only separated the objects into two categories based on their relative size to human body, but also delineated the range of objects that evoked neural representations associated with affordance processing.”

      Finally, we agree with the reviewer that the expressions, such as “not…inclined to interact” and “probably considered a less interesting component of the environment”, may be misleading. Rather, we intended to express that the objects beyond the affordance boundary is generally beyond the motor capacities of the very animal, being too large for the very animal to manipulated, as comparing to the smaller objects in the environment, may not be a typical target object for manipulation for the animal. We have revised these expressions in the manuscript and clarified their speculative nature:

      (p 16, ln 340) “In contrast, objects larger than that range typically surpass the animal’s motor capabilities, rendering them too cumbersome for effective manipulation. Consequently, these larger objects are less likely to be considered as typical targets for manipulation by the animal, as opposed to the smaller objects. That is, they are perceived not as the “objects” in the animal’s eye, but as part of the background environment, due to their impracticality for direct interactions.”

      (2) The authors used only a very small set of objects and affordances in their study and they do not describe in sufficient detail how these stimuli were selected. This renders the results rather exploratory and clearly limits their potential to discover general principles of human perception. Much larger sets of objects and affordances and explicit data-driven approaches for their selection would provide a far more convincing approach and allow the authors to rule out that their results are just a consequence of the selected set of objects and actions.

      R2: The selection of the objects and affordances in this study was guided by two key criteria. First, the objects were selected from the dataset published in Konkle and Oliva's study (2011), which systematically investigates the effect of object size on object recognition. Therefore, the range of object sizes, from 14 cm to 7,618 cm, is well-calibrated and represents a typical array of object sizes found in the real world. Second, the actions were selected to cover a wide range of daily humans objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the kinetics human action video dataset (Kay et al., 2017). Thus, this set of objects and actions is a sufficiently representative of typic human experiences. In revision, we have clarified these two criteria in the methods section:

      (p 22, ln 517) “The full list of objects, their diagonal sizes, and size rankings were provided in Supplementary Table S6. The objects were selected from the dataset in Konkle and Oliva’s study (2011) to cover typic object sizes in the world (ranging from 14 cm to 7,618 cm), and actions related to these objects were selected to span a spectrum of daily humans-objects/environments interactions, from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing), based on the Kinetics Human Action Video Dataset (Kay et al., 2017).”

      Having said this, we agree with reviewer that a larger set of objects and actions will facilitate finer localization of the representational discontinuity, which can be addressed in future studies

      (p 16, ln 344): “…, due to their impracticality for direct interactions. Future studies should incorporate a broader range of objects and a more comprehensive set of affordances for finer delineation of the representational discontinuity between objects and the environment.”

      (3) Relatedly, the authors could be more thorough in ruling out potential alternative explanations. Object size likely correlates with other variables that could shape human similarity judgments and the estimated boundary is quite broad (depending on the method, either between 80 and 150 cm or between 105 to 130 cm). More precise estimates of the boundary and more rigorous tests of alternative explanations would add a lot to strengthen the authors' interpretation.

      R3: We agree with the reviewer that correlation analyses alone cannot rule out alternative explanations, as any variable co-varying with object sizes might also affect affordance perception. Therefore, our study experimentally manipulated the imagined body sizes, while keeping other variable constant across conditions. This approach provided evidence of a causal connection between body size and affordance perception, effectively ruling out alternative explanations. In revision, the rationale of experimentally manipulation of imagined body sizes has been clarified

      (p 7, ln 152): “One may argue that the location of the affordance boundary coincidentally fell within the range of human body size, rather than being directly influenced by it. To rule out this possibility, we directly manipulated participants’ body schema, referring to an experiential and dynamic functioning of the living body within its environment (Merleau-Ponty & Smith, 1962). This allowed us to examine whether the affordance boundary would shift in response to changes in the imagined body size. This experimental approach was able to establish a causal link between body size and affordance boundary, as other potential factors remained constant. Specifically, we instructed a new group of participants to imagine themselves as small as a cat (typical diagonal size: 77cm, size rank 4, referred to as the “cat condition”), and another new group to envision themselves as large as an elephant (typical diagonal size: 577 cm, size rank 7, referred to as the “elephant condition”) throughout the task (Fig. 2a).”

      Meanwhile, with correlational analysis, precise location of the boundary cannot help ruling out alternative explanation. However, we agree that future studies are needed to incorporate a broader range of objects and a more comprehensive set of affordances. For details, please see R2.

      (4) Even though the division of the set of objects into two homogenous clusters appears defensible, based on visual inspection of the results, the authors should consider using more formal analysis to justify their interpretation of the data. A variety of metrics exist for cluster analysis (e.g., variation of information, silhouette values) and solutions are typically justified by convergent evidence across different metrics. I would recommend the authors consider using a more formal approach to their cluster definition using some of those metrics.

      R4: We thank the reviewer for the suggestion. We performed three analyses on this point, all of which consistently indicated the division of objects into two distinct groups along the object size axis.

      First, a hierarchical clustering analysis of the heatmaps revealed a two-maincluster structure, which is now detailed in the revised methods section (p 25, ln 589) “A hierarchical clustering analysis was performed, employing the seaborn clustermap method with Euclidean distance and Complete linkage (Waskom, 2021).”

      Second, the similarity in affordances between neighbouring size ranks revealed the same two-main-cluster structure. In this analysis, each object was assigned a realworld size rank, and then Pearson’s correlation was calculated as the affordance similarity index for each pair of neighbouring size ranks to assess how similar the perceived affordances were between these ranks. Our results showed a clear trough in affordance similarity, with the lowest point approaching zero, while affordance similarities between neighbouring ranks on either side of the boundary remained high, confirming the observation that objects formed two groups based on affordance similarity.

      Finally, we analysed silhouette values for this clustering analysis, where 𝑎𝑖 represents the mean intra-cluster distance, and 𝑏𝑖 represents the mean nearest-cluster distance for each data point i. The silhouette coefficient is calculated as (Rousseeuw, 1987):

      The silhouette analysis revealed that the maximum silhouette value coefficient corresponded to a cluster number of two, further confirming the two-cluster structure (please see Author response table 2 below).

      Author response table 2.

      The silhouette values of a k-means clustering when k (number of clusters) = 2 to 10

      (5) While I appreciate the manipulation of imagined body size, as a way to solidify the link between body size and affordance perception, I find it unfortunate that this is implemented in a between-subjects design, as this clearly leaves open the possibility of pre-existing differences between groups. I certainly disagree with the authors' statement that their findings suggest "a causal link between body size and affordance perception."

      R5: The between-subjects design in the imagination experiment was employed to prevent contamination between conditions. Specifically, after imagining oneself as a particular size, it can be challenging to immediately transition to envisioning a different body size. In addition, participating sequentially participate in two conditions that only differ in imagined body sizes may lead to undesirable response strategies, such as deliberately altering responses to the same objects in the different conditions. The reason of employing the between-subjects design is now clarified in the revised text (p 7, ln 161): “A between-subject design was adopted to minimize contamination between conditions. This manipulation was effective, as evidenced by the participants’ reported imagined heights in the cat condition being 42 cm (SD = 25.6) and 450 cm (SD = 426.8) in the elephant condition on average, respectively, when debriefed at the end of the task.”

      Further, to address the concern that “pre-existing differences between groups” would generate this very result, we adhered to standard protocols such as random assignment of participants to different conditions (cat-size versus elephant-size). Moreover, experimentally manipulating one variable (i.e., body schema) to observe its effect on another variable (i.e., affordance boundary) is the standard method for establishing causal relationships between variables. We could not think of other better ways for this objective.

      (6) The use of LLMs in the current study is not clearly motivated and I find it hard to understand what exactly the authors are trying to test through their inclusion. As noted above, I think that the authors should discuss the putative roles of conceptual knowledge, language, and sensorimotor experience already in the introduction to avoid ambiguity about the derived predictions and the chosen methodology. As it currently stands, I find it hard to discern how the presence of perceptual boundaries in LLMs could constitute evidence for affordance-based perception.

      R6: The motivation of LLMs is to test the supramodality of this embodied discontinuity found in behavioral experiments: whether this discontinuity is accessible beyond the sensorimotor domain. To do this, we leveraged the recent advance in AI and tested whether the discretization observed in affordance perception is supramodally accessible to disembodied agents which lack access to sensorimotor input but only have access to the linguistic materials built upon discretized representations, such as large language models (LLM). The theoretical motivation and rationale regarding the LLM study are now included in the introduction and discussion:

      In the introduction (p 2, ln 59) “…, and the body may serve as a metric that facilitates meaningful engagement with the environment by differentiating objects that are accessible for interactions from those not. Further, grounded cognition theory (see Barsalou, 2008 for a review) suggests that the outputs of such differentiation might transcend sensorimotor processes and integrate into supramodal concepts and language. From this perspective, we proposed two hypotheses...”

      In the introduction (p 3, ln 70) “Notably, the affordance boundary varied in response to the imagined body sizes and showed supramodality. It could also be attained solely through language, as evidenced by the large language model (LLM), ChatGPT (OpenAI, 2022).”

      For details in the discussion, please see R1.

      (7) Along the same lines, the fMRI study also provides very limited evidence to support the authors' claims. The use of congruency effects as a way of probing affordance perception is not well motivated. What exactly can we infer from the fact a region may be more active when an object is paired with an activity that the object doesn't afford? The claim that "only the affordances of objects within the range of body size were represented in the brain" certainly seems far beyond the data.

      R7: In our study, we followed the established fMRI research paradigm of employing the congruent effect as a measure of affordance processing (e.g., Kourtis et al., 2018). The choice of this paradigm has now been clarified in the revised manuscript (p 11, ln 244): “The congruency effect, derived from the contrast of Congruent versus Incongruent conditions, is a well-established measure of affordance processing (e.g., Kourtis et al., 2018).”

      The statement that “only the affordances of objects within the range of body size were represented in the brain” is based on the observed interaction of congruency by object size. In the revised text, we have weakened this statement to better align with the direct implications of the interaction effect (p 1 ln 22): “A subsequent fMRI experiment revealed evidence of affordance processing exclusively for objects within the body size range, but not for those beyond. This suggests that only objects capable of being manipulated are the objects capable of offering affordance in the eyes of an organism.”

      (8) Importantly (related to my comments under 2) above), the very small set of objects and affordances in this experiment heavily complicates any conclusions about object size being the crucial variable determining the occurrence of congruency effects.

      R8: The objective of the fMRI study was to provide the neural basis for the affordance discontinuity found in behaviour experiments. In other words, the fMRI study is not an exploratory experiment, and therefore, the present object and action sets, which are based on the behaviour experiments, are sufficient.

      (9) I would also suggest providing a more comprehensive illustration of the results (including the effects of CONGRUENCY, OBJECT SIZE, and their interaction at the whole-brain level).

      R9: We agree and in revision, we have now included these analyses in the supplementary material (p 30, ln 711): “For the whole-brain analyses on the congruency effect, the object size effect, and their interaction, see Supplementary Fig. S4 and Table S2 to S5.” Please see Author response image 2 (New Supplementary Fig. S4) and Author responses tables 3 to 5 (New Supplementary Table S2 to S4) below.

      Author response image 2.

      Significant brain activations of different contrasts in the whole-brain level analysis. a, the effect of object type, positive values (warm color) indicated higher activation for objects within than objects beyond and negative values (cold color) indicated the opposite. b, the effect of congruency, positive values indicated higher activation in congruent than incongruent condition. c, the effect of interaction between object type and congruency, positive values indicated the larger congruency effect for objects within than beyond. d, the congruency effect for objects within. All contrasts were corrected with cluster-level correction at p < .05. The detailed cluster-level results for each contrast map can be found in Supplementary Table S2 to S5.

      Author response table 3.

      Cortical regions reaching significance in the contrasts of (A) objects within versus object beyond and (B) objects beyond versus objects within, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

      Author response table 4.

      Cortical regions reaching significance in contrasts of (A) congruent versus incongruent and (B) incongruent versus congruent, whole-brain analysis (R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected).

      Author response table 5.

      Review Table 5 (New Supplementary Table S4). Cortical regions showing significant interaction between object type and congruency, whole-brain analysis (OW = Objects within, OB = Objects beyond; R = right hemisphere, L = left hemisphere; Z > 2.3, p = 0.05, cluster corrected)

      Reviewer #3 (Recommendations For The Authors):

      a. >a) Clarify all theoretical assumptions already within the introduction and specify how the predictions are tested (and how they could be falsified).

      Ra: Please see R1.

      b. >b) Explain how the chosen experimental approach relates to the theoretical questions under investigation (e.g., it is not clear to me how affordance similarity ratings can inform inference about which part of the environment is perceived as more or less manipulable).

      Rb: We thank the reviewer for the suggestion, and the theoretical motivation and rationale are now clarified. For details, please see R1.

      c. >c) Include a much larger set of objects and affordances in the behavioural experiments (that is more generalizable and also permits a more precise estimation of the boundary), and use a more rigorous methodology to justify a particular cluster solution.

      Rc: Please see R2 for the limited variance of objects and actions, and R4 for more analyses on the boundary.

      d. >d) Clearly motivate what the use of LLMs can contribute to the study of affordance perception.

      Rd: Please see R6.

      e) Clearly motivate why congruency effects are thought to index "affordance representation in the brain" Re: Please see R7.

      e) Include a much larger set of objects and affordances in the fMRI study.

      Re: Please see R7.

      f) Consider toning down the main conclusions based on the limitations outlined above.

      Rf: We have toned down the main conclusions accordingly.

      We are profoundly grateful for the insightful comments and suggestions provided by the three reviewers, which have greatly improved the quality of this manuscript.   References

      Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22(4), 637-660.

      de Beeck, H. P. O., Torfs, K., & Wagemans, J. (2008). Perceived shape similarity among unfamiliar objects and the organization of the human object vision pathway. Journal of Neuroscience, 28(40), 10111-10123.

      Borghi, A. M. (2005). Object concepts and action. Grounding cognition: The role of perception and action in memory, language, and thinking, 8-34.

      Colling, L.J. (2021). ljcolling/go-bayesfactor: (Version v0.9.0).Zenodo. doi: 10.5281/zenodo.4642331

      Crawley, J. A. H., Mumby, H. S., Chapman, S. N., Lahdenperä, M., Mar, K. U., Htut, W., ... & Lummaa, V. (2017). Is bigger better? The relationship between size and reproduction in female Asian elephants. Journal of Evolutionary Biology, 30(10), 1836-1845.

      Ellis, R., & Tucker, M. (2000). Micro‐affordance: The potentiation of components of action by seen objects. British Journal of Psychology, 91(4), 451-471.

      Fan, L., Li, H., Zhuo, J., Zhang, Y., Wang, J., Chen, L., ... & Jiang, T. (2016). The human brainnetome atlas: a new brain atlas based on connectional architecture. Cerebral Cortex, 26(8), 3508-3526.

      Fodor, J. A. (1975). The Language of Thought (Vol. 5). Harvard University Press.

      Gibson, J. J. (1979). The ecological approach to visual perception: Classic edition.

      Hertrich, I., Dietrich, S., & Ackermann, H. (2016). The role of the supplementary motor area for speech and language processing. Neuroscience & Biobehavioral Reviews, 68, 602-610.

      Huang, T., Song, Y., & Liu, J. (2022). Real-world size of objects serves as an axis of object space. Communications Biology, 5(1), 1-12.

      Kantak, S. S., Stinear, J. W., Buch, E. R., & Cohen, L. G. (2012). Rewiring the brain: potential role of the premotor cortex in motor control, learning, and recovery of function following brain injury. Neurorehabilitation and Neural Repair, 26(3), 282-292.

      Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

      Konkle, T., & Oliva, A. (2011). Canonical visual size for real-world objects. Journal of Experimental Psychology: human perception and performance, 37(1), 23.

      Kourtis, D., Vandemaele, P., & Vingerhoets, G. (2018). Concurrent cortical representations of function-and size-related object affordances: an fMRI study. Cognitive, Affective, & Behavioral Neuroscience, 18, 1221-1232.

      Matić, K., de Beeck, H. O., & Bracci, S. (2020). It's not all about looks: The role of object shape in parietal representations of manual tools. Cortex, 133, 358-370.

      McDannald, D. W., Mansour, M., Rydalch, G., & Bolton, D. A. (2018). Motor affordance for grasping a safety handle. Neuroscience Letters, 683, 131-137.

      NCD Risk Factor Collaboration (NCD-RisC). (2016). A century of trends in adult human height. Elife, 5, e13410.

      Peelen, M. V., & Caramazza, A. (2012). Conceptual object representations in human anterior temporal cortex. Journal of Neuroscience, 32(45), 15728-15736.

      Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.

      Sakreida, K., Effnert, I., Thill, S., Menz, M. M., Jirak, D., Eickhoff, C. R., ... & Binkofski, F. (2016). Affordance processing in segregated parieto-frontal dorsal stream sub-pathways. Neuroscience & Biobehavioral Reviews, 69, 89-112.

      Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(5), 615-628.

      Wagenmakers, E.-J., Wetzels, R., Borsboom, D. & van der Maas, H. L. J. Why psychologists must change the way they analyze their data: the case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432.

      Zhen, Z., Yang, Z., Huang, L., Kong, X. Z., Wang, X., Dang, X., ... & Liu, J. (2015). Quantifying interindividual variability and asymmetry of face-selective regions: a probabilistic functional atlas. NeuroImage, 113, 13-25.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1:

      This work by Leclercq and colleagues performed metabolomics on biospecimens collected from 96 patients diagnosed with several types of alcohol use disorders (AUD). The authors discovered strong alterations in circulating glycerophospholipids, bile acids, and some gut microbe-derived metabolites in AUD patients compared to controls. An exciting part of this work is that metabolomics was also performed in frontal cortex of post-mortem brains and cerebrospinal fluid of heavy alcohol users, and some of the same metabolites were seen to be altered in the central nervous system. This is an important study that will form the basis for hypothesis generation around diet-microbe-host interactions in alcohol use disorder. The work is done in a highly rigorous manner, and the rigorously collected human samples are a clear strength of this work. Overall, many new insights may be gained by this work, and it is poised to have a high impact on the field.

      Strengths:

      (1) The rigorously collected patient-derived samples.

      (2) There is high rigor in the metabolomics investigation.

      (3) Statistical analyses are well-described and strong.

      (4) An evident strength is the careful control of taking blood samples at the same time of the day to avoid alterations in meal- and circadian-related fluctuations in metabolites.

      Weaknesses:

      (1) Some validation in animal models of ethanol exposure compared to pair-fed controls would help strengthen causal relationships between metabolites and alterations in the CNS.

      (2) The classification of "heavy alcohol users" based on autopsy reports may not be that accurate.

      (3) The fact that most people with alcohol use disorder choose to drink over eating food, there needs to be some more discussion around how dietary intake (secondary to heavy drinking) most likely has a significant impact on the metabolome.<br />

      We thank this reviewer for his/her encouraging comments and for highlighting the fact that this study is important in the field to generate hypotheses around diet-microbe-host interactions in alcohol use disorder.

      Concerning weakness #1: Regarding the validation in animal models of ethanol exposure, we were very careful in our discussion to avoid pretending that the study allowed to test causality of the factors. This was certainly not the objective of the present study. The testing of causality would indeed probably necessitate animal models but these models could only test the effects of one single metabolite at a time and could not at the same time capture the complexity of the changes occurring in AUD patients. The testing of metabolites would be a totally different topic. Hence, we do not feel comfortable in conducting rodent experiments for several reasons. First, AUD is a very complex pathology with physiological and psychological/psychiatric alterations that are obviously difficult to reproduce in animal models. Secondly, as mentioned by the reviewer, AUD pathology spontaneously leads to nutritional deficits, including significant reductions in carbohydrates, lipids, proteins and fiber intakes. We have recently published a paper in which we carefully conducted detailed dietary anamneses and described the changes in food habits in AUD patients (Amadieu et al., 2021). As explained below, some blood metabolites that are significantly correlated with depression, anxiety and craving belong to the xanthine family and are namely theobromine, theophylline, and paraxanthine, which derived from metabolism of coffee, tea or chocolate (which are not part of the normal diet of mice or rats).Therefore, conducting an experiment in animal model of ethanol exposure compared to pair-fed controls will omit the important impact of nutrition in blood metabolomics and consequently won’t mimic the human AUD pathology. In addition, if we take into consideration the European Directive 2010/63/EU (on the protection of animals used for scientific purposes) which aims at Reducing (Refining, Replacing) the number of animals used in experiment, it is extremely difficult to justify, at the ethical point of view, the need to reproduce human results in an animal model that won’t be able to mimic the nutritional, physiological and psychological alterations of alcohol use disorder.

      Concerning weakness #2: The classification of subjects to the group who have a history of heavy alcohol use was not solely on autopsy record, but was also based on medical history i.e. diagnosis of alcohol-related diseases: ICD-10 codes F10.X, G31.2, G62.1, G72.1, I42.6, K70.0-K70.4, K70.9, and K86.0, or signs of heavy alcohol use in the clinical or laboratory findings, e.g., increased levels of gamma-glutamyl transferase, mean corpuscular volume, carbohydrate-deficient transferrin, as stated in the methods section of the manuscript. In Finland, the medical records from the whole life of the subjects are available. We consider that getting diagnosis of alcohol-related disease is clear sign of history of heavy alcohol use.

      Concerning weakness#3:  As explained above, we do agree with the reviewer that AUD is not only “drinking alcohol” but is also associated with reduction in food intake that obviously influenced the metabolomics data presented in this current study.  We have therefore added some data, which have not been published before, in the results section that refer to key nutrients modified by alcohol intake and we refer to those data and their link with metabolomics in the discussion section:

      Results section page 8, Line 153-155. This sentence has been added:

      “The changes in metabolites belonging to the xanthine family during alcohol withdrawal could be explained by the changes in dietary intake of coffee, tea and chocolate (see Fig S5).”

      Discussion section: Page 11, Line 235-240.

      “Interestingly, the caffeine metabolites belonging to the xanthine family such as paraxanthine, theophylline and theobromine that were decreased at baseline in AUD patients compared to controls, increased significantly during alcohol withdrawal to reach the levels of healthy controls. Changes in dietary intake of coffee, tea and chocolate during alcohol withdrawal could explain these results”.

      In the conclusion, Page 16, Line 354-356, we clearly stated that: “LC-MS metabolomics plasma analysis allowed for the identification of metabolites that were clearly linked to alcohol consumption, and reflected changes in metabolism, alterations of nutritional status, and gut microbial dysbiosis associated with alcohol intake”

      Reference:

      Amadieu C, Leclercq S, Coste V, Thijssen V, Neyrinck AM, Bindels LB, Cani PD, Piessevaux H, Stärkel P, Timary P de, Delzenne NM. 2021. Dietary fiber deficiency as a component of malnutrition associated with psychological alterations in alcohol use disorder. Clinical Nutrition 40:2673–2682. doi:10.1016/j.clnu.2021.03.029

      Leclercq S, Cani PD, Neyrinck AM, Stärkel P, Jamar F, Mikolajczak M, Delzenne NM, de Timary P. 2012. Role of intestinal permeability and inflammation in the biological and behavioral control of alcohol-dependent subjects. Brain Behav Immun 26:911–918. doi:10.1016/j.bbi.2012.04.001

      Leclercq S, De Saeger C, Delzenne N, de Timary P, Stärkel P. 2014a. Role of inflammatory pathways, blood mononuclear cells, and gut-derived bacterial products in alcohol dependence. Biol Psychiatry 76:725–733. doi:10.1016/j.biopsych.2014.02.003

      Leclercq S, Matamoros S, Cani PD, Neyrinck AM, Jamar F, Stärkel P, Windey K, Tremaroli V, Bäckhed F, Verbeke K, de Timary P, Delzenne NM. 2014b. Intestinal permeability, gut-bacterial dysbiosis, and behavioral markers of alcohol-dependence severity. Proc Natl Acad Sci U S A 111:E4485–E4493. doi:10.1073/pnas.1415174111

      Voutilainen T, Kärkkäinen O. 2019. Changes in the Human Metabolome Associated With Alcohol Use: A Review. Alcohol and Alcoholism 54:225–234. doi:10.1093/alcalc/agz030

      Public Reviewer #2:

      The authors carried out the current studies with the justification that the biochemical mechanisms that lead to alcohol addiction are incompletely understood. The topic and question addressed here are impactful and indeed deserve further research. To this end, a metabolomics approach toward investigating the metabolic effects of alcohol use disorder and the effect of alcohol withdrawal in AUD subjects is valuable. However, it is primarily descriptive in nature, and these data alone do not meet the stated goal of investigating biochemical mechanisms of alcohol addiction. The current work's most significant limitation is the cross-sectional study design, though inadequate description and citation of the underlying methodological approaches also hampers interest. Most of the data are cross-sectional in the study design, i.e., alcohol use disorder vs controls. However, it is well established that there is a high degree of interpersonal variation with metabolism, and further, there is somewhat high intra-personal variation in metabolism over time. This means that the relatively small cohort of subjects is unlikely to reflect the broader condition of interest (AUD/withdrawal). The authors report a comparison of a later time-point after alcohol withdrawal (T2) vs. the AUD condition. However, without replicative time points from the control subjects it is difficult to assess how much of these changes are due to withdrawal vs the intra-personal variation described above.

      We agree with the reviewer. Our goal was not to investigate the biochemical mechanisms of AUD but rather to investigate how metabolomics could contribute to the psychological alterations of AUD. The goals of the study are defined at the end of the introduction (Page 4 – Lines 80-91), as follows:

      “The aims of this study are multiple. First, we investigated the impact of severe AUD on the blood metabolome by non-targeted LC-MS metabolomics analysis. Second, we investigated the impact of a short-term alcohol abstinence on the blood metabolome followed by assessing the correlations between the blood metabolome and psychological symptoms developed in AUD patients. Last, we hypothesized that metabolites significantly correlated with depression, anxiety or alcohol craving could potentially have neuroactive properties, and therefore the presence of those neuroactive metabolites was confirmed in the central nervous system using post-mortem analysis of frontal cortex and cerebrospinal fluid of persons with a history of heavy alcohol use. Our data bring new insights on xenobiotics- or microbial-derived neuroactive metabolites, which can represent an interesting strategy to prevent or treat psychiatric disorders such as AUD”.

      Due to the fact that the method section describing the study design is located at the end of the manuscript, we have decided to clarify the methodological approach in the first paragraph of the result section in order to show that in fact, we have performed a longitudinal study (which includes the same group of AUD, tested at two time points – at the beginning and at the end of alcohol withdrawal). This is stated as follows:

      Results section, Page 6, Line 97-99: “All patients were hospitalized for a 3-week detoxification program, and tested at two timepoints: T1 which represents the first day of alcohol withdrawal, and T2 which represents the last day of the detoxification program”.

      We propose to add a figure with a schematic representation of the protocol. We let the editor deciding whether this figure can be added (as supplemental material).

      Author response image 1.

      Schematic representation of the protocol

      We agree with the reviewer that the correlational analysis (between blood metabolites and psychological symptoms) is conducted at one time point (T1) only, which has probably led to the confusion between cross-sectional and longitudinal study. In fact we had a strong motivation to provide correlations at T1, instead of T2. T1, which is at the admission time, is really the moment where we can take into account variability of the psychological scores. Indeed, after 3 weeks of abstinence (T2), the levels of depression, anxiety and alcohol craving decreased significantly ( as shown in other studies from our group (Leclercq et al., 2014b, 2014a, 2012)) and remained pretty low in AUD patients, with a much lower inter-individual variability which makes the correlations less consistent.

      We agree with the reviewer that there is a high intra and inter-personal variability in the metabolomics data, that could be due to the differences in previous meals intakes within and between subjects. While AUD subjects have been tested twice (at the beginning and at the end of a 3-week detoxification program), the control subjects have only been tested once. Consequently, we did not take into account the intra-personal variability in the control group. The metabolomics changes observed in AUD patients between T1 and T2 are therefore due to alcohol withdrawal but also to intra-personal variability. This is a limitation of the study that we have now added in the discussion section, Page 16, Lines 354-357  as follows:

      “The selection of the control group is always challenging in alcohol research. Here, the healthy subjects were matched for sex, age and BMI but not for smoking status or nutritional intake. Alcohol addiction is a major cause of malnutrition in developed countries and tobacco smoking is more prevalent in alcohol users compared to healthy subjects. These two main confounding factors, although being an integral part of the alcoholic pathology, are known to influence the blood metabolome. Furthermore, another limitation is that the control group was tested only once, while the AUD patients were tested twice (T1 and T2). This means that we do not take into consideration the intra-personal variability of the metabolomics data when interpreting the results of alcohol withdrawal effects”.

      The limitation concerning the small sample size is already mentioned in the discussion section, as follows:

      “Large studies are usually required in metabolomics to observe small and medium size changes. Here, we included only 96 AUD patients, but they were all well characterized and received standardized therapies (for instance, vitB supplementation) during alcohol withdrawal”.

      Overall, there is not enough experimental context to interpret these findings into a biological understanding. For example, while several metabolites are linked with AUD and associated with microbiome or host metabolism based on existing literature, it's unclear from the current study what function these changes have concerning AUD, if any. The authors also argue that alcohol withdrawal shifts the AUD plasma metabolic fingerprint towards healthy controls (line 153). However, this is hard to assess based on the plots provided since the change in the direction of the orange data subset is considers AUD T2 vs T1. In contrast, AUD T2 vs Control would represent the claimed shift. To support these claims, the authors would better support their argument by showing this comparison as well as showing all experimental groups (including control subjects) in their multi-dimensional model (e.g., PCA).

      We thank the reviewer for these comments. It is true in this type of discovery-based approach the causality cannot be interpreted nor do we claim so. The aim was to characterize the metabolic alterations in this population, response to withdrawal period and suggest potential candidate metabolites linked to psychological symptoms. Rigorous pre-clinical assays and validation trials in humans are required to prove the causality, if any, of the discussed metabolites.

      The original claim on line 153 was poorly constructed and the Figure 2c is meant to visualize the influence of withdrawal on selected metabolites and also show the effect of chronic alcohol intake on the selected metabolites at baseline. The description of the Figure 2c has been modified in result section from line 156 onwards: “Overall, Fig. 2c demonstrates that a number of identified metabolites altered in sAUD patients relative to control are affected by alcohol withdrawal. Apart from 4-pyridoxic acid, cotinine, and heme metabolites bilirubin and biliverdin, the shifts observed in the selected metabolites are generally in the opposite direction as compared to the baseline.”

      The authors attempt to extend the significance of their findings by assessing post-mortem brain tissues from AUD subjects; however, the finding that many of the metabolites changed in T2/T1 are also present in AUD brain tissues is interesting; however, not strongly supporting of the authors' claims that these metabolites are markers of AUD (line 173). Concerning the plasma cohort itself, it is unclear how the authors assessed for compliance with alcohol withdrawal or whether the subjects' blood-alcohol levels were independently verified.

      We did not claim that the metabolites significantly correlated with the psychological symptoms - and present in central nervous system (frontal cortex or CSF) -  are “markers of AUD”. Line 173 did not refer to this idea, and the terms “markers of AUD” do not appear in the whole manuscript.

      Regarding the compliance with alcohol cessation, we did not assess the ethanol blood level. The patients are hospitalized for a 3-week detoxification program, they are not allowed to drink alcohol and are under strict control of the nurses and medical staff of the unit. Consuming alcoholic beverage within the hospitalization unit is a reason for exclusion. However, we carefully monitored the liver function during alcohol withdrawal. For the reviewers’ information, we have added here below, the evolution of liver enzymes (ALT, AST, gGT) during the 3-week detoxification program as indirect markers of alcohol abstinence.

      Author response image 2.

      Data are described as median ± SEM. AST, Aspartate transaminase; ALT, Alanine transaminase; gGT: gamma glutamyltranspeptidase. ** p<0.01 vs T1, *** p<0.001 vs T1

       

      The second area of concern is the need for more description of the analytical methodology, the lack of metabolite identification validation evidence, and related statistical questions. The authors cite reference #59 regarding the general methodology. However, this reference from their group is a tutorial/review/protocol-focused resource paper, and it is needs to be clarified how specific critical steps were actually applied to the current plasma study samples given the range of descriptions provided in the citations. The authors report a variety of interesting metabolites, including their primary fragment intensities, which are appreciated (Supplementary Table 3), but no MS2 matching scores are provided for level 2 or 3 hits. Further, level 1 hits under their definition are validated by an in-house standard, but no supporting data are provided besides this categorization. Finally, a common risk in such descriptive studies is finding spurious associations, especially considering many factors described in the current work. These include AUD, depression, anxiety, craving, withdrawal, etc. The authors describe the use of BH correction for multiple-hypothesis testing. However, this approach only accounts for the many possible metabolite association tests within each comparison (such as metabolites vs depression). It does not account for the multi-variate comparisons to the many behavior/clinical factors described above. The authors should employ one of several common strategies, such as linear mixed effects models, for these types of multi-variate assessments.

      The methodological details related to the sample processing, data acquisition, data pre-processing and metabolite identification have been provided in the supplementary materials and described below. Supplementary table 3 has been amended with characteristic MS2 fragments for both positive and negative ionization modes if data was available. Additionally, all annotations against the in-house library additions have been rechecked, identification levels corrected and EICs for all level 1 identifications are provided in the supplementary material.

      As described in the statistical analysis methods, BH correction was employed in the group-wise comparisons to shortlist the altered features for identification. Manual curating was then applied for the significant features and annotated metabolites subjected to correlation analysis. In this discovery-based approach the aim was to discover potential candidates linked with psychological symptoms for subsequent work to evaluate causality. Hence, the application of multi-variate analysis assessing biomarker candidates is not in the scope of this study.

      “LC-MS analysis. Plasma sample preparation and LC-MS measurement followed the parameters previously detailed in Klåvus et al (57).  Samples were randomized and thawed on ice before processing. 100 µl of plasma was added to 400 µl of LC-MS grade acetonitrile, mixed by pipetting four time, followed by centrifugation in 700 g for 5 minutes at 4 °C. A quality control sample was prepared by pooling 10 µl of each sample together. Extraction blanks having only cold acetonitrile and devoid of sample were prepared following the same procedure as sample extracts. LC-MS grade acetonitrile, methanol, water, formic acid and ammonium formate (Riedel-de Haën™, Honeywell, Seelze, Germany) were used to prepare mobile phase eluents in reverse phase (Zorbax Eclipse XDBC18, 2.1 × 100 mm, 1.8 μm, Agilent Technologies, Palo Alto, CA, USA) and hydrophilic interaction (Acquity UPLC® BEH Amide 1.7 μm, 2.1 × 100 mm, Waters Corporation, Milford, MA, USA) liquid chromatography separation. In reverse phase separation, the samples were analyzed by Vanquish Flex UHPLC system (Thermo Scientific, Bremen, Germany) coupled to high-resolution mass spectrometry (Q Exactive Focus, Thermo Scientific, Bremen, Germany) in both positive and negative polarity mass range from 120 to 1200, target AGC 1e6 and resolution 70,000 in full scan mode. Data dependent MS/MS data was acquired for both modes with target AGC 8e3 and resolution 17,500, precursor isolation window was 1.5 amu, normalized collision energies were set at 20, 30 and 40 eV and dynamic exclusion at 10.0 seconds. In hydrophobic interaction separation, the samples were analyzed by a 1290 LC system coupled to a 6540 UHD accurate mass Q-ToF spectrometer (Agilent Technologies, Waldbronn, Karlsruhe, Germany) using electrospray ionization (ESI, Jet Stream) in both positive and negative polarity with mass range from 50 to 1600 and scan rate of 1.67 Hz in full scan mode. Source settings were as in the protocol. Data dependent MS/MS data was acquired separately using 10, 20 and 40 eV collision energy in subsequent runs. Scan rate was set at 3.31 Hz, precursor isolation width of 1.3 amu and target counts/spectrum of 20,000, maximum of 4 precursor pre-cycle, precursor exclusion after 2 spectra and release after 15.0 seconds. Detectors were calibrated prior sequence and continuous mass axis calibration was performed throughout runs by monitoring reference ions from infusion solution for operating at high accuracy of < 2 ppm. Quality control samples were injected in the beginning of the analysis to equilibrate the system and after every 12 samples for quality assurance and drift correction in all modes. All data were acquired in centroid mode by either MassHunter Acquisition B.05.01 (Agilent Technologies) or in profile mode by Xcalibur 4.1 (Thermo Fisher Scientific) softwares.

      Metabolomics analysis of TSDS frontal cortex and CSF samples using the same 1290 LC system coupled with a 6540 UHD accurate mass Q-ToF spectrometer has been previously accomplished by Karkkainen et al (10).

      Peak picking and data processing. Raw instrumental data (*raw and *.d files) were converted to ABF format using Reifycs Abf Converter (https://www.reifycs.com/AbfConverter). MS-DIAL (Version 4.70) was employed for automated peak picking and alignment with the parameters according to Klåvus et al., 2020 (57) separately for each analytical mode. For the 6540 Q-ToF mass data minimum peak height was set at 8,000 and for the Q Exactive Focus mass data minimum peak height was set at 850,000. Commonly, m/z values up to 1600 and all retention times were considered, for aligning the peaks across samples retention time tolerance was 0.2 min and MS1 tolerance 0.015 Da and the “gap filling by compulsion” was selected. Alignment results across all modes and sample types as peak areas were exported into Microsoft Excel sheets to be used for further data pre-processing.

      Pre-processing including drift correction and quality assessment was done using the notame package v.0.2.1 R software version 4.0.3 separately for each mode. Features present in less than 80% of the samples within all groups and with detection rate in less than 70% of the QC samples were flagged. All features were subjected to drift correction where the features were log-transformed and a regularized cubic spline regression line was fitted for each feature against the quality control samples. After drift correction, QC samples were removed and missing values in the non-flagged features were imputed using random forest imputation. Finally, the preprocessed data from each analytical mode was merged into a single data matrix.

      Molecular feature characteristics (exact mass, retention time and MS/MS spectra) were compared against in-house standard library, publicly available databases such as METLIN, HMDB and LIPIDMAPS and published literature. Annotation of metabolites and the level of identification was based on the recommendations given by the Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI) (59): 1 = identified based on a reference standard, 2 = putatively annotated based on physicochemical properties or similarity with public spectral libraries, 3 = putatively annotated to a chemical class and 4 = unknown.”

      Reference 59: Sumner LW, Amberg A, Barrett D, Beale MH, Beger R, Daykin CA, et al. Proposed minimum reporting standards for chemical analysis. Metabolomics. 2007;3:211–221.

      Recommendations for the authors:

      Reviewer #1:

      (1) There should be more discussion comparing and contrasting the differences between the 2 cohorts (ALCOHOLBIS versus GUT2BRAIN), instead of stressing the similarities.

      As indicated in the results section, we have verified that the ALCOHOLBIS cohort and GUT2BRAIN cohort are similar in term of age, gender, smoking habits, drinking habits and severity of psychological symptoms. Those similar features are important to allow the combination of the metabolomics data from the two cohorts, which subsequently allows to have a bigger sample size (n = 96) and more statistical power.

      (2) The identification of 97 heavy alcohol users based on hospital codes at autopsy may not be the most rigorous way to define those with AUD. More information is needed on how these 97 were classified as heavy alcohol users.

      The classification of subjects to the group who have a history of heavy alcohol use was not based solely on the autopsy records. The classification was also based on medical history, which in Finland is available from the whole life of the subjects, and including diagnoses and laboratory finding. The subjects needed to have a diagnosis of alcohol-related disease, as stated in the methods section of the manuscript. However, since some of the used diagnoses are related to organ damage related to heavy alcohol use, we do not claim that these subjects would all have alcohol dependence. But history of heavy use of alcohol is needed to get organ damage associated with alcohol use. Therefore, we consider that diagnosis of alcohol-related disease is a clear sign of a history of heavy alcohol use.

      (3) The fact that the control group mainly died of cardiovascular disease confounds the interpretations around alcohol impact metabolite levels. How much of the metabolomics differences are related to hyperlipidemia or other CVD risk factors in the controls?

      There are no healthy controls in post-mortem studies, since all subjects need to die from something to be included to the cohort. The challenge in studying AUD is that they die relatively young. The only other group of individuals who die outside of hospital at the relatively same age as subjects with AUD are those with CVD. Post-mortem autopsies are done in Finland to all who die outside of hospital, and these are the main source of samples for post-mortem sample cohorts. Therefore, there is no other control group to compare AUD subject to in these types of studies.

      As for the altered metabolites in the post-mortem sample, the phospholipids observed could be associated with CVD. However, alterations in phospholipids are also commonly associated with alcohol use and AUD (for a review see (Voutilainen and Kärkkäinen, 2019)) and this effect is also seen in the results from the clinical cohorts in this study (Figure 1). Therefore, it cannot be said that these phospholipids finding would be due to selection of the control group.

      (4) When examining metabolomics alterations, it is extremely important to understand what people are eating (i.e., providing a substrate). A major confounding issue here is that heavy alcohol users typically choose drinking over eating food. How much of the observed alterations in the plasma metabolome is due to the decreased food intake? Some validation in animal models of ethanol exposure compared to pair-fed controls would help strengthen causal relationships between metabolites and alterations in the circulation and CNS.

      Regarding the validation in animal models of ethanol exposure, we were very careful in our discussion to avoid pretending that the study allowed to test causality of the factors. This was certainly not the objective of the present study. The testing of causality would indeed probably necessitate animal models but these models could only test the effects of one single metabolite at a time and could not at the same time capture the complexity of the changes occurring in AUD patients. The testing of metabolites would be a totally different topic. Hence, we do not feel comfortable in conducting rodent experiments for several reasons. First, AUD is a very complex pathology with physiological and psychological/psychiatric alterations that are obviously difficult to reproduce in animal models. Secondly, as mentioned by the reviewer, AUD pathology spontaneously leads to nutritional deficits, including significant reductions in carbohydrates, lipids, proteins and fiber intakes. We have recently published a paper in which we carefully conducted detailed dietary anamneses and described the changes in food habits in AUD patients (Amadieu et al., 2021). As explained below, some blood metabolites that are significantly correlated with depression, anxiety and craving belong to the xanthine family and are namely theobromine, theophylline, and paraxanthine, which derived from metabolism of coffee, tea or chocolate (which are not part of the normal diet of mice or rats).Therefore, conducting an experiment in animal model of ethanol exposure compared to pair-fed controls will omit the important impact of nutrition in blood metabolomics and consequently won’t mimic the human AUD pathology. In addition, if we take into consideration the European Directive 2010/63/EU (on the protection of animals used for scientific purposes) which aims at Reducing (Refining, Replacing) the number of animals used in experiment, it is extremely difficult to justify, at the ethical point of view, the need to reproduce human results in an animal model that won’t be able to mimic the nutritional, physiological and psychological alterations of alcohol use disorder.

      As explained above, we do agree with the reviewer that AUD is not only “drinking alcohol” but is also associated with reduction in food intake that obviously influenced the metabolomics data presented in this current study.  We have therefore added some data, which have not been published in the previous version of the manuscript, in the results section that refer to key nutrients modified by alcohol intake and we refer to those data and their link with metabolomics in the discussion section:

      Results section page 8, Line 153-155. This sentence has been added:

      “The changes in metabolites belonging to the xanthine family during alcohol withdrawal could be explained by the changes in dietary intake of coffee, tea and chocolate (see Fig S5).”

      Discussion section: Page 11, Line 234-238.

      “Interestingly, the caffeine metabolites belonging to the xanthine family such as paraxanthine, theophylline and theobromine that were decreased at baseline in AUD patients compared to controls, increased significantly during alcohol withdrawal to reach the levels of healthy controls. Changes in dietary intake of coffee, tea and chocolate during alcohol withdrawal could explain these results”.

      In the conclusion, Page 16, Line 360-32, we clearly stated that: “LC-MS metabolomics plasma analysis allowed for the identification of metabolites that were clearly linked to alcohol consumption, and reflected changes in metabolism, alterations of nutritional status, and gut microbial dysbiosis associated with alcohol intake”

      Reference:

      Amadieu C, Leclercq S, Coste V, Thijssen V, Neyrinck AM, Bindels LB, Cani PD, Piessevaux H, Stärkel P, Timary P de, Delzenne NM. 2021. Dietary fiber deficiency as a component of malnutrition associated with psychological alterations in alcohol use disorder. Clinical Nutrition 40:2673–2682. doi:10.1016/j.clnu.2021.03.029

      Leclercq S, Cani PD, Neyrinck AM, Stärkel P, Jamar F, Mikolajczak M, Delzenne NM, de Timary P. 2012. Role of intestinal permeability and inflammation in the biological and behavioral control of alcohol-dependent subjects. Brain Behav Immun 26:911–918. doi:10.1016/j.bbi.2012.04.001

      Leclercq S, De Saeger C, Delzenne N, de Timary P, Stärkel P. 2014a. Role of inflammatory pathways, blood mononuclear cells, and gut-derived bacterial products in alcohol dependence. Biol Psychiatry 76:725–733. doi:10.1016/j.biopsych.2014.02.003

      Leclercq S, Matamoros S, Cani PD, Neyrinck AM, Jamar F, Stärkel P, Windey K, Tremaroli V, Bäckhed F, Verbeke K, de Timary P, Delzenne NM. 2014b. Intestinal permeability, gut-bacterial dysbiosis, and behavioral markers of alcohol-dependence severity. Proc Natl Acad Sci U S A 111:E4485–E4493. doi:10.1073/pnas.1415174111

      Voutilainen T, Kärkkäinen O. 2019. Changes in the Human Metabolome Associated With Alcohol Use: A Review. Alcohol and Alcoholism 54:225–234. doi:10.1093/alcalc/agz030

      Reviewer #2:

      (1) More methodological information about the laboratory processing of samples, instrumentation, and data analysis needs to be provided. Reference 59 needs to be more specific and include important methodological details for this project. Please provide an actual methods section for the mass-spectrometry-based metabolomics.

      The reviewer is correct that the methods should be described in detail but due to word limits, the description was moved to a supplementary file. Methodological details are provided in the answer to the final comment in the public reviews section and we kindly refer to that for the methodological details. Reference 57 (Klåvus et al) is a method paper and covers the whole untargeted metabolomics pipeline that is used in our work.

      (2) The VIP figures, e.g., Figure 1b and Figure 2b are not very informative and would be better represented in a supplementary table

      VIP scores for all annotated metabolites are provided in the supplementary table 3 along with peak data and other values derived from statistical tests. Furthermore, we have removed the VIP value in figures 1 and 2 and we have replaced them by an updated Volcano plot to represent also the VIP values in addition to the q and Cohen’s d values.

      (3) The findings on odd-chain lyso-lipids are interesting, and while these have been reported biologically, odd-chain lipids are uncommon and should be validated with authentic standards as available (please provide an XIC of the level 1 peak and standard if possible, e.g., LPC 17:0) or at least a supplementary figure on manual inspection of the negative mode MS2 spectrum showing the putative fatty acid chain fragment. The current assignments are based on positive mode lipid class fragments and accurate mass.

      We thank the reviewer for pointing this out and it is correct that the negative MS2 spectrum is essential for lipid identification. Although the current assignments show only positive fragments for many lipids, the fatty acid chain, if reported, has been confirmed from negative mode MS2 spectrum. The supplementary table 3 with peak information has been augmented with fragment information from both negative and positive ionizations if available. Also, reference and experimental MS2 spectra have been provided as separate supplemental file for level 1 identifications, including the odd-chain lyso-lipids LPC 15:0 and 17:0.

      (4) Please provide some supplementary information (MS1/MS2 if available) on the untargeted features of interest (up and down-regulated) from Figure 1C, especially the 5 encircled features. If any manual annotation of these features was attempted, please include a brief description in the results/discussion.

      All statistically significant features with MS2 data have been subjected to manual annotation and database searches using at least METLIN, HMDB and LipidMaps. Additionally, if the manual inspection failed to provide any identification, in silico fragmentation software MS-FINDER was used to calculate candidate molecular formula. The features were labeled as unknown if all efforts were unsuccessful. The peak characteristics of the key unknowns in Figure 1b have also been included in the supplemental table.

      A note of the manual inspection has been included in the result section line 129: “The top-ranked metabolites in Fig. 1b remained unknown regardless of manual curation.”

      Reviewer #3:

      I think this is an interesting paper with a very solid methodology and an abundance of results. I am not an expert on metabolomics, and I have some very interesting hours here, trying (but sometimes failing) to grasp this paper's content. This paper also needs to be closely read by a reviewer who knows the metabolomics field and can give feedback on the meaning of the results. I have focused purely on the AUD clinical side as this is where I may contribute. My main concern is conceptualizing the aims and what authors want to investigate. As far as I understand, this is a study of the relationship between alcohol use and the metabolome, and in this respect, I think there are some issues.

      Just take the abstract that talks about (in the first sentence) alcohol use disorder ("AUD") - a term that generally sometimes refers to harmful use of alcohol and alcohol addiction and sometimes to all F10-diagnosis (and thus an inaccurate term), then the following sentence talks about what leads to alcohol addiction (not dependence) - and this in a mechanistic direction and in the last part of the second sentence talks about metabolomics being able to decipher metabolic events related to AUD. So, even in the first two sentences, it is confusing - is this about correlates, mechanisms, prevention, or treatment? The inaccuracy of terms continues in sentence 4. We have "chronic alcohol abuse" (?) and "severe alcohol use disorder (AUD)" (abbreviated for the second time). Later, only "alcohol abuse" is used and the abstract ends with something about these findings being interesting in "the management of [...] AUD". All this illustrates that there is a large mixture of concepts - what aspect of alcohol use or abuse are you looking at? Moreover, of intention: is it to find correlates, explanations, or targets for interventions? Without clarity in this respect, one can get lost in what all these interesting measures mean - how we should interpret them. This comment is made only for the abstract. However, but it is equally valid and important for the introduction and discussion parts of the ms, where additional terms and formulations are introduced: "heavy alcohol use" (lines 86-7) and "prevent or treat psychiatric disorders such as AUD" (lines 90-1). This is then reflected in the discussion where the authors claim that what they have found is related to "chronic alcohol abuse" (line 188), "heavy alcohol drinkers" (line 191), and "AUD patients" (lines 199 and 202 and further on).  

      We thank the reviewer for this useful comment and we apologize for the confusion. We agree that it is important to use the correct terms and definitions. All patients included in this study were diagnosed as severe AUD (for more information on the diagnosis, see answer to the comments related to DSM-IV and DSM5). This manuscript is consequently related to severe AUD and other terms like “alcohol abuse, “alcohol addiction” are therefore not appropriate. In the revised version of the manuscript, we have used severe AUD or the abbreviation sAUD. The figure and legends have been changed accordingly.

      In the first paragraph of the results section, ALCOHOLBIS and GUT2BRAIN are compared. It says they are similar on many measures, including craving, but different on some measures, again including craving. It is difficult to grasp this even if the authors try to explain (lines 101-2). This sentence also introduces some discussion in the results section by saying something normative about their finding and relating this to other research (references 12, 13, and 14).

      We would like to apologize for the confusion related to first paragraph of the results section. We have indeed indicated that, while the ALCOHOLBIS cohort and the GUT2BRAIN cohort are highly similar in term of biological and psychological features, a significant difference does exist in the compulsive component of the craving score. Indeed, the mean score of compulsion is 11 ± 3 in the ALCOHOLBIS cohort and 14  ± 3 in the GUT2BRAIN cohort. In healthy controls, the mean score of compulsion is 1.5 ± 1.5. Despite the statistically significant difference in craving between both cohorts, we do not think that this difference is relevant in our context since both scores (11 and 14) are considered high compared to the control group. In order to simplify the message, we have revised the first paragraph as follows:

      “Both groups of patients were similar in terms of age, gender, smoking and drinking habits and presented with high scores of depression, anxiety and alcohol craving at T1 (Table 1). These biological and psychological similarities allow us to combine both cohorts (and consequently increase sample size) and compare them to a group of heathy controls for metabolomics analysis”.

      In line 104 the abbreviation PCA is introduced but needs to be explained. Such objections could be made for many of the abbreviations used (sPLS-DA VIP, LPC, CSF, CNS, LPE, etc.), but of course, they may be made more difficult by the unusual way of stacking the different sections.

      We thank the reviewer for pointing these out. Most abbreviations are written out in the figure legends or method section but indeed the organization of the different sections makes it less evident. The abbreviations pointed out have been opened in the results section when they are first used.

      Furthermore, they say that the severity of AUD was "evaluated by a psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria, fourth edition (DSM-IV) (ALCOHOLBIS cohort) or fifth edition (DSM-5)" (GUT2BRAIN cohort): This makes sense for DSM-5 but needs to be explained more for DSM-IV. They also need to say what levels were included.

      We thank the reviewer for this very appropriate remark that deserves some explanations.

      While the patients of the GUT2BRAIN cohort were enrolled in 2018-2019 where the DSM5 was applicable, the patients from the ALCOHOLBIS cohort were recruited many years before. The protocol related to the ALCOHOLBIS cohort was written before 2013, and approved by ethical committee, where the DSM-IV was the last version of the DSM used at that moment. 

      We therefore totally agree with the reviewer that our sentence “the severity of AUD was "evaluated by a psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria, fourth edition (DSM-IV) (ALCOHOLBIS cohort) or fifth edition (DSM-5)" (GUT2BRAIN cohort)” is not correct. Indeed, DSM-IV (before 2013) described two distinct disorders, alcohol abuse and alcohol dependence, while the DSM-5 integrates the two DSM-IV disorders into a single disorder called alcohol use disorder with mild (2 or 3 symptoms), moderate (4 or 5 symptoms) and severe (6 or more symptoms) sub-classifications.

      In this present study, we have enrolled patients that received the diagnosis of alcohol dependence (DSM-IV criteria) or severe alcohol use disorder (DSM5 criteria).

      We have changed the paragraph related to this issue into this new one:

      “The severity of AUD was evaluated by a psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria, fourth edition (DSM-IV) (Alcoholbis cohort) or fifth edition (DSM-5) (GUT2BRAIN cohort). Patients evaluated with the DSM-IV received the diagnosis of “alcohol dependence”, while the patients evaluated with the DSM-5 received the diagnosis of “severe alcohol use disorder” (6 or more criteria). To simplify, we used the term “sAUD” (for severe alcohol use disorder) that includes both diagnosis (sAUD and alcohol dependence)”.

      I am unsure about the shared first co-authorship and the shared last co-authorship request, but I leave this up to the editors and the journal policies. Also, the order of the different parts may be correct (the M+M placed last) but is unusual for many journals. This is also up to the journal to decide.

      As mentioned in the guidelines to authors, the method section should be included at the end of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      These experiments are some of the first to assess the role of dopamine release and the activity of D1 and D2 MSNs in pair bond formation in Mandarin voles. This is a novel and comprehensive study that presents exciting data about how the dopamine system is involved in pair bonding. The authors provide very detailed methods and clearly presented results. Here they show dopamine release in the NAc shell is enhanced when male voles encounter their pair bonded partner 7 days after cohabitation. In addition, D2 MSN activity decreases whereas D1 MSN activity increases when sniffing the pair-bonded partner.

      The authors do not provide justification for why they only use males in the current study, without discussing sex as a biological variable these data can only inform readers about one sex (which in pair-bonded animals by definition have 2 sexes). In addition, the authors do not use an isosbestic control wavelength in photometry experiments, although they do use EGFP control mice which show no effects of these interventions, a within-subject control such as an isosbestic excitation wavelength could give more confidence in these data and rule out motion artefacts within subjects.

      We agree with your suggestion that mechanism underlying pair bonding in females should also be investigated. In general, natal philopatry among mammals is female biased in the wild(Greenwood, 1983; Brody and Armitage, 1985; Ims, 1990; Solomon and Jacquot, 2002); social mammals are rarely characterized by exclusively male natal philopatry (Solomon and Jacquot, 2002). Males often disperse from natal area to a new place. Thus, males rodents may play a dominant role in the formation and maintenance of mating relationships. This is a reason we investigate pair bonding in male firstly. Certainly, female mate selection, and sexual receptivity or refusal through olfactory cues from males, thereby affect the formation and maintenance of pair bonding (Hoglen and Manoli, 2022). This is also the reason why we should focus on the mechanisms underlying pair bonding formation in females in the future research. This has been added in the limitation in the discussion.

      In photometry experiments, rAAV-D1/D2-GCaMP6m, a D1/D2 genetically encoded fluorescent calcium sensor, was injected into the NAc shell. The changes in fluorescence signals during these social interactions were collected and digitalized. To assess the specific response to social stimulus in fluorescence signals, changes in fluorescence signals during non-social behavioral bouts (such as freezing, exploration of the environment, grooming, rearing, etc…) were also recorded and analyzed. The result showed that dopamine release or D1/D2 MSNs activity displayed no significant changes after cohabitation of 3 or 7 days upon occurring of no-social behavior such as freezing, exploring, grooming and rearing. In addition, GCaMP6m is a genetically encoded calcium indicator. Changes in its fluorescence signal reflect changes in intracellular calcium ion concentration. Using EGFP virus as a control, it can be determined whether the fluorescence signal observed in the experiment is generated by the specific response of GCaMP6m to calcium or if there are other non-specific factors leading to fluorescence changes. If there is no similar fluorescence change in the EGFP control group, it can more strongly prove that the signal detected by GCaMP6m is a calcium-related specific signal. In some research article, they also use EGFP control group in photometry experiments (Yamaguchi et al., 2020; Qu et al., 2024; Zhan et al., 2024). Therefore, changes in fluorescence signals observed in the present study reflect neuron activities upon specific social behaviors, but were not affected by motion artefacts.

      There is an existing literature (cited in this manuscript) from Aragona et al., (particularly Aragona et al., 2006) which has highlighted key differences in the roles of rostral versus caudal NAc shell dopamine in pair bond formation and maintenance. Specifically, they report that dopamine transmission promoting pair bonding only occurs in the rostral shell and not the caudal shell or core regions. Given that the authors have targeted more caudally a discussion of how these results fit with previous work and why there may be differences in these areas is warranted.

      Thanks for your professional consideration. The brain coordinates of Bilateral 26-gauge guide cannulae were NAc (1.6 mm rostral, ± 1 mm bilateral, 4.5 mm ventral (for shell), 3.5 mm ventral (for core) from bregma) in report from Aragona et al (2006). In the present study, the brain coordinates of virus injection were (AP: +1.5, ML: ±0.99, DV: −4.2 (for NAc shell)). Thus, the virus injection sites were close to rostral shell in our study. However, as the diffusive expression of the virus, part of neurons in the rostrocaudal border and caudal shell also be infected by the virus, so we did not distinguish different subregions of NAc shell. In the future, we will use AAV13, a viral strategy could target / manipulate precise local neural populations, to address this issue. NAc is a complex brain structure with distinct regions that have different functions. Previous study suggested that GABAergic substrates of positive and negative types of motivated behavior in the nucleus accumbens shell are segregated along a rostrocaudal gradient (Reynolds and Berridge, 2001). However, a study found that food intake is significantly enhanced by administering μ-selective opioid agonists into the NAc, especially its shell region (Znamensky et al., 2001). Also, μ-opioid stimulation increases the motivation to eat (“wanting”) both in the NAc shell and throughout the entire NAc, as well as in several limbic or striatal structures beyond. For DAMGO stimulation of eating, the “wanting” substrates anatomically extend additionally beyond the rostrodorsal shell and throughout the entire shell (the caudal shell). Furthermore, DAMGO stimulates eating at NAc shell and core, as well as the neostriatum, amygdala…(Gosnell et al., 1986; Gosnell and Majchrzak, 1989; Peciña and Berridge, 2000; Zhang and Kelley, 2000; Echo et al., 2002; Peciña and Berridge, 2005, 2013; Castro and Berridge, 2014). In pair bond formation and maintenance, the rostral shell is the specific subregion of the NAc important for DA regulation of partner preference (Aragona et al., 2006). In conclusion, it appears that the changes in real time dopamine release and activities and electrophysiological properties of D1R, D2R MSNs in the NAc shell after pair bond formation may have primarily targeted to the rostral shell in our study, which is consistent with the report from Aragona et al.

      The authors could discuss the differences between pair bond formation and pair bond maintenance more deeply.

      Thanks for your suggestion. I have discussed the differences between pair bond formation and pair bond maintenance more deeply.

      The dopamine and different types of dopamine receptors in the NAc may play different roles in regulation of pair bond formation and maintenance. The chemogenetic manipulation revealed that VP-projecting D2 MSNs are necessary and more important in pair bond formation compared to VPprojecting D1 MSNs. It is consistent with previous pharmacological experiments that blocking of D2R with its specific antagonist, while D1R was not blocked, can prevent the formation of a pair bond in prairie voles (Gingrich et al., 2000). This indicates that D2R is crucial for the initial formation of the pair bond. D2R is involved in the reward aspects related to mating. In female prairie voles, D2R in the NAc is important for partner preference formation. The activation of D2R may help to condition the brain to assign a positive valence to the partner's cues during mating, facilitating the development of a preference for a particular mate. In addition, the cohabitation caused the DA release, the high affinity Gi-coupled D2R was activated first, which inhibited D2 MSNs activity and promoted the pair bond formation. And then, after 7 days of cohabitation, the pair bonding was already established, the significantly increased release of dopamine significantly activated Gs-coupled D1R with the low affinity to dopamine, which increased D1 MSNs activity and maintained the formation of partner preference. While D1R is also present and involved in the overall process, its role in the initial formation of the pair bond is not as dominant as D2R (Aragona et al., 2006). However, it still participates in the neurobiological processes related to pair bond formation. For example, in male mandarin voles, after 7 days of cohabitation with females, D1R activity in the NAc shell was affected during pair bond formation. The extracellular DA concentration was higher when sniffing their partner compared to a stranger, and this increase in DA release led to an increase in D1R activity in the NAc shell. In prairie voles, dopamine D1 receptors seem to be essential for pair bond maintenance. Neonatal treatment with D1 agonists can impair partner preference formation later in life, suggesting an organizational role for D1 in maintaining the bond (Aragona et al., 2006). In pair-bonded male prairie voles, D1R is involved in inducing aggressive behavior toward strangers, which helps to maintain the pair bond by protecting it from potential rivals. In the NAc shell, D1 agonist decreases the latency to attack same-sex conspecifics, while D1 antagonism increases it (Aragona et al., 2006). In summary, D2R is more crucial for pair bond formation, being involved in reward association and necessary for the initial development of the pair bond. D1R, on the other hand, is more important for pair bond maintenance, being involved in aggression and mate guarding behaviors and having an organizational role in maintaining the pair bond over time. We therefore suggest that D2 MSNs are more predominantly involved in the formation of a pair bond compared with D1 MSNs.

      The authors have successfully characterised the involvement of dopamine release, changes in D1 and D2 MSNs, and projections to the VP in pair bonding voles. Their conclusions are supported by their data and they make a number of very reasonable discussion points acknowledging various limitations

      Reviewer #2 (Public review):

      Summary:

      Using in vivo fiber-photometry the authors first establish that DA release when contacting their partner mouse increases with days of cohabitation while this increase is not observed when contacting a stranger mouse. Similar effects are found in D1-MSNs and D2-MSNs with the D1MSN responses increasing and D2-MSN responses decreasing with days of cohabitation. They then use slice physiology to identify underlying plasticity/adaptation mechanisms that could contribute to the changes in D1/D2-MSN responses. Last, to address causality the authors use chemogenetic tools to selectively inhibit or activate NAc shell D1 or D2 neurons that project to the ventral pallidum. They found that D2 inhibition facilitates bond formation while D2 excitation inhibits bond formation. In contrast, both D1-MSN activation and inhibition inhibit bond formation.

      Strengths:

      The strength of the manuscript lies in combining in vivo physiology to demonstrate circuit engagement and chemogenetic manipulation studies to address circuit involvement in pair bond formation in a monogamous vole.

      Weaknesses:

      Comment: Weaknesses include that a large set of experiments within the manuscript are dependent on using short promoters for D1 and D2 receptors in viral vectors. As the authors acknowledge this approach can lead to ectopic expression and the presented immunohistochemistry supports this notion. It seems to me that the presented quantification underestimates the degree of ectopic expression that is observed by eye when looking at the presented immunohistochemistry. However, given that Cre transgenic animals are not available for Microtus mandarinus and given the distinct physiological and behavioral outcomes when imaging and manipulating both viral-targeted populations this concern is minor.

      Thanks for your professional comment. The virus used in the present study were purchased from brainVTA company. D1/D2 receptor promoter genes were predicted and amplified for validation by the company. The promoter gene was constructed and packaged by aav virus vector (taking rAAV-D2-mCherry-WPRE-bGH_polyA virus as an example, Author response image 1A). The D1/D2 promoter sequence is shown in the Author response image 1B-C. In addition, the D1 receptor gene promoter and D2 receptor gene promoter viruses used in this paper have been used in several published papers with high specificity (Zhao et al., 2019; Ying et al., 2022). In our paper, a high proportion of virus and mRNA co-localization was found through FISH verification and also showed high specificity of virus (Figure S15, S16).

      Author response image 1.

      (A)   Gene carrier of rAAV-D2-mCherry-WPRE-bGH_polyA. (B-C) Gene sequence of D1 promoter and D2 promoter.

      The slice physiology experiments provide some interesting outcomes but it is unclear how they can be linked to the in vivo physiological outcomes and some of the outcomes don't match intuitively (e.g. cohabitation enhances excitatory/inhibitory balance in D2-MSNs but the degree of contact-induced inhibition is enhanced in D2-MSN).

      Thanks for your comment. The present study found that the frequencies of sEPSC and sIPSC were significantly enhanced after the formation of a pair bond in NAc shell D2 MSNs. The excitatory/inhibitory balance of D2 MSNs was enhanced after cohabitation.These results are not consistent with the findings from fiber photometry of calcium signals. One study showed that NAc D2 MSNs was linked to both ‘liking’ (food consumption) and ‘wanting’ (food approach) but with opposing actions; high D2 MSNs activity signaled ‘wanting’, and low D2 MSNs activity enhanced ‘liking’. D2 MSNs are faced with a tradeoff between increasing ‘wanting’ by being more active or allowing ‘liking’ by remaining silent (Guillaumin et al., 2023). Therefore, the increase in frequencies of sEPSC and sIPSC in D2 MSNs may reflect two processes, liking and wanting, respectively. We thought that hedonia and motivation might influence D2 MSNs activity differently during cohabitation and contribute to the processing of pair bond formation in a more dynamic and complex way than previously expected.

      Moreover, the frequencies of sEPSC and sIPSC were significantly reduced in the NAc shell D1 MSNs after pair bonding, whereas the intrinsic excitability increased after cohabitation with females.

      The bidirectional modifications (reduced synaptic inputs vs. increased excitability) observed in D1 MSNs might result from homeostatic regulation. The overall synaptic transmission may produce no net changes, given that reductions in both excitatory and inhibitory synaptic transmission of D1 MSNs were observed. Also, increases in the intrinsic excitability of D1 MSNs would result in an overall excitation gain on D1 MSNs.

      One interesting finding is that the relationship between D2-MSN and pair bond formation is quite clear (inhibition facilitates while excitation inhibits pair bond formation). In contrast, the role of D1-MSNs is more complicated since both excitation and inhibition disrupt pair bond formation. This is not convincingly discussed.

      Considering the reviewer’s suggestion, the discussion has been added in the revised manuscript.

      In the present study, DREADDs approaches were used to inhibit or excite NAc MSNs to VP projection and it was found that D1 and D2 NAc MSNs projecting to VP play different roles in the formation of a pair bond. Chemogenetic inhibition of VP-projecting D2 MSNs promoted partner preference formation, while activation of VP-projecting D2 MSNs inhibited it (Figure 6). Chemogenetic activation of D2 MSNs produced the opposite effect of DA on the D2 MSNs on partner preference, while inhibition of these neurons produced the same effects of DA on D2 MSNs. DA binding with D2R is coupled with Gi and produces an inhibitory effect (Lobo and Nestler, 2011). It is generally assumed that activation of D2R produces aversive and negative reinforcement. These results were consistent with the reduced D2 MSNs activity upon sniffing their partner in the fiber photometry test and the increased frequency and amplitude of sIPSC in the present study. Our results also agree with other previous studies that chemogenetic inhibition of NAc D2 MSNs is sufficient to enhance reward-oriented motivation in a motivational task (Carvalho Poyraz et al., 2016; Gallo et al., 2018). Inhibition of D2 MSNs during self-administration enhanced response and motivation to obtain cocaine (Bock et al., 2013). This also suggests that the mechanism underlying attachment to a partner and drug addiction is similar.

      Besides, in the present study, the formation of partner preference was inhibited after activation or inhibition of VP-projecting D1 MSNs, which is not consistent with conventional understanding of prairie vole behavior. Alternatively, DA binding with D1R is coupled with Gs and produces an excitatory effect (Lobo and Nestler, 2011), while activation of D1R produces reward and positive reinforcement (Hikida et al., 2010; Tai et al., 2012; Kwak and Jung, 2019). For example, activation of D1 MSNs enhances the cocaine-induced conditioned place preference (Lobo et al., 2010). In addition, D1R activation by DA promotes D1 MSNs activation, which promotes reinforcement. However, a recent study found that NAc-ventral mesencephalon D1 MSNs promote reward and positive reinforcement learning; in contrast, NAc-VP D1 MSNs led to aversion and negative reinforcement learning (Liu et al., 2022). It is consistent with our results that activation of NAc-VP D1 MSNs pathway reduced time spent side-by-side and impaired partner preference after 7 days of cohabitation. In contrast to inhibition of D2 MSNs, we found that inhibition of the D1 MSNs did not elicit corresponding increases in partner preference. One possible explanation is that almost all D1 MSNs projecting to the VTA/ substantia nigra (SN) send collaterals to the VP (Pardo-Garcia et al., 2019). For example, optogenetically stimulating VP axons may inadvertently cause effects in the VTA/SN through the antidromic activation of axon collaterals (Yizhar et al., 2011). Therefore, chemogenetic inhibition of D1 MSNs may also inhibit DA neurons in VTA, subsequently inhibiting the formation of a pair bond.

      The dopamine and different types of dopamine receptors in the NAc may play different roles in regulation of pair bond formation and maintenance. The chemogenetic manipulation revealed that VP-projecting D2 MSNs are necessary and more important in pair bond formation compared to VPprojecting D1 MSNs. It is consistent with previous pharmacological experiments that blocking of D2R with its specific antagonist, while D1R was not blocked, can prevent the formation of a pair bond in prairie voles (Gingrich et al., 2000). This indicates that D2R is crucial for the initial formation of the pair bond. D2R is involved in the reward aspects related to mating. In female prairie voles, D2R in the NAc is important for partner preference formation. The activation of D2R may help to condition the brain to assign a positive valence to the partner's cues during mating, facilitating the development of a preference for a particular mate. In addition, the cohabitation caused the DA release, the high affinity Gi-coupled D2R was activated first, which inhibited D2 MSNs activity and promoted the pair bond formation. And then, after 7 days of cohabitation, the pair bonding was already established, the significantly increased release of dopamine significantly activated Gs-coupled D1R with the low affinity to dopamine, which increased D1 MSNs activity and maintained the formation of partner preference. While D1R is also present and involved in the overall process, its role in the initial formation of the pair bond is not as dominant as D2R (Aragona et al., 2006). However, it still participates in the neurobiological processes related to pair bond formation. For example, in male mandarin voles, after 7 days of cohabitation with females, D1R activity in the NAc shell was affected during pair bond formation. The extracellular DA concentration was higher when sniffing their partner compared to a stranger, and this increase in DA release led to an increase in D1R activity in the NAc shell. In prairie voles, dopamine D1 receptors seem to be essential for pair bond maintenance. Neonatal treatment with D1 agonists can impair partner preference formation later in life, suggesting an organizational role for D1 in maintaining the bond (Aragona et al., 2006). In pair-bonded male prairie voles, D1R is involved in inducing aggressive behavior toward strangers, which helps to maintain the pair bond by protecting it from potential rivals. In the NAc shell, D1 agonist decreases the latency to attack same-sex conspecifics, while D1 antagonism increases it (Aragona et al., 2006). In summary, D2R is more crucial for pair bond formation, being involved in reward association and necessary for the initial development of the bond. D1R, on the other hand, is more important for pair bond maintenance, being involved in aggression and mate guarding behaviors and having an organizational role in maintaining the bond over time. We therefore suggest that D2 MSNs are more predominantly involved in the formation of a pair bond compared with D1 MSNs.

      It seemed a missed opportunity that physiological readout is limited to males. I understand though that adding females may be beyond the scope of this manuscript.

      We gratefully appreciate for your valuable comment. The reviewer 1 also concerned this issue. We made a following response.

      In general, natal philopatry among mammals is female biased in the wild(Greenwood, 1983; Brody and Armitage, 1985; Ims, 1990; Solomon and Jacquot, 2002); social mammals are rarely characterized by exclusively male natal philopatry (Solomon and Jacquot, 2002). Males often disperse from natal area to a new place. Thus, male rodents may play a dominant role in the formation and maintenance of mating relationships. This is a reason we investigate pair bonding in male firstly. Certainly, female mate selection, and sexual receptivity or refusal through olfactory cues from males, thereby affect the formation and maintenance of pair bonding (Hoglen and Manoli, 2022). This is also the reason why we should focus on the mechanisms underlying pair bonding formation in females in the future research. This has been added in the limitation in the discussion.

      Reviewer #3 (Public review):

      Summary:

      The manuscript is evaluating changes in dopamine signaling in the nucleus accumbens following pair bonding and exposure to various stimuli in mandarin voles. In addition, the authors present chemogenetic data that demonstrate excitation and inhibition of D1 and D2 MSN affect pair bond formation.

      Strengths:

      The experimental designs are strong. The approaches are innovative and use cutting-edge methods.

      The manuscript is well written.

      Weaknesses:

      The statistical results are not presented, and not all statistical analyses are appropriate.

      Additionally, some details of methods are absent.

      As you suggested, we added the detailed information in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Remove references to 'extreme significance' - p is set as a threshold and the test is either significant or not.

      Thanks for your suggestion. We have removed 'extreme significance' in the revised manuscript.

      (2) The second half of the abstract is a little confusing the use of activation/inhibition makes it difficult to read and follow, this could be re-worded for clarity.

      Sorry for the confusing. We reorganized the sentence as following.

      In addition, chemogenetic inhibition of ventral pallidum-projecting D2 MSNs in the NAc shell enhanced pair bond formation, while chemogenetic activation of VP-projecting D2 MSNs in the NAc shell inhibited pair bond formation.

      Reviewer #2 (Recommendations for the authors):

      (1) In many instances repeated measures are presented from the same mice (e.g. Figures 1F, I; S1BC). Repeated measures for each mouse should be connected with a line in the figures. This will allow the reader to visually compare the repeated measures for each animal.

      Thanks for your careful consideration. As reviewer suggested, the figures have been changed.

      (2) It is unclear to me how the time point 0 for sniffing was determined. How is the time point 0 for side-by-side contact determined?

      Sniffing is a behavior for olfactory investigation and defined as animals uses nose to inspect any portion of the stimulus mouse’s body, including the tail. The time point 0 for sniffing was the beginning of sniffing behavior occurs. The side-by-side behavior is defined as significant physical contact with a social object and huddle in a quiescent state. The time point 0 for side-byside behavior was the beginning of side-by-side behavior occurs.

      (3) Figure 1-3: For the fiber photometry data 7 events (sniffs) are shown in the heat maps. Are these the first 7 sniffs? What went into the quantification? It seems that DA and D1/D2 responses are habituating. This could be analyzed and would need to be discussed.

      In the heat maps (Figure 1-3), we showed the mean fluorescence signal changes of every subject (n = 7 voles) upon sniffing partner, stranger or an object in the experiment, but not the fluorescence signal changes of sniffing events in one vole. The quantification of changes in mean fluorescence signal of all subjects was showed in Figure 1F, 1I, Figure 2F, 2I, Figure 3F and 3I.

      (4) Generally, it is very difficult to obtain cell type selectivity using short promoters in viruses (the authors acknowledge this). Which D1 and D2 promoter sequences were used for obtaining specificity? The degree of ectopic expression looks much higher than the quantification (e.g. in Fig. 3b, 6C, 7C, S14A, C). Is this due to thresholding?

      The virus used in the present study were purchased from brainVTA company. D1/D2 receptor promoter genes were predicted and amplified for validation. The promoter gene was constructed and packaged by aav virus vector (taking rAAV-D2-mCherry-WPRE-bGH_polyA virus as an example, Author response image 1A). The D1/D2 promoter sequence is shown in the Author response image 1B-C. In addition, the D1 receptor gene promoter and D2 receptor gene promoter viruses used in this paper have been used in several published papers with high specificity (Zhao et al., 2019; Ying et al., 2022). In the Figure 6C, the first image is the merged fluorescence images that were taken under different fluorescence channels with the 20X objective. The second and the third images were taken under 40X objective from field of white box in the first image. The second and the third images were merged into fourth one. Due to the different exposure time and intensity, the fluorescence photo taken at 40X are clearer compared to image taken at the 20X. For example, in the Figure 6C, the labeled-cells were presented as following (Author response image 2). In our paper,virus infection and mRNA through FISH verification were co-localized in a high proportion displaying high specificity of virus (Figure S15, S16).Certainly, the number of positive neurons may be dependent on visuality (thresholding). Only visible cells were counted. The cell counting results at Author response image 2B and 2C are similar to the quantification in the Figure 6C.

      Author response image 2.

      (A) Immunohistological image showing co-localization of hM3Dq- mCherry-anti expression (green), D2R-mRNA (red), and DAPI (blue) in the NAc shell. Scale bar: 100 μm. (B) The cell counts and the determination of colocalization of the 20× immunohistochemistry images. The marked neurons were counted with white dots. (C) The cell counts and the determination of colocalization of the 40× immunohistochemistry images. The marked neurons were counted with white dots.

      (5) Figure 6D/7D: the time scale seems to be off for both traces (40 seconds). For the hM3D Gq experiment, only one trace is shown. It would be more convincing to provide an input-output curve from several mice and to statistically compare the curves.

      Response: Thanks for your careful consideration. As reviewer suggested, the figure of resting membrane potentials before and after drug CNO exposure from several voles was added in the revised manuscript.

      (6) The presence of GIRK channels in MSNs has been a long debate and hM4D Gi activation may mostly act at the level of terminals by inhibiting neurotransmitter release. For demonstrating hyperpolarization of the soma showing the resting membrane potential before and after drug CNO exposure would be more convincing.

      Thanks for your careful consideration. As reviewer suggested, the figure of resting membrane potential before and after drug CNO exposure was added in the revised manuscript.

      (7) It is unclear to me how far the slice physiology informs the in vivo physiology (e.g. cohabitation enhances excitatory/inhibitory balance in D2-MSNs but the degree of contact-induced inhibition is enhanced in D2-MSN; D2-MSNs become less responsive to DA in the slice yet but at the time of enhanced DA release D2-MSN activity is also strongly reduced).

      The present study found that the frequencies of sEPSC and sIPSC were significantly enhanced after the formation of a pair bond in NAc shell D2 MSNs. The excitatory/inhibitory balance of D2 MSNs was enhanced after cohabitation. These results are not consistent with the findings from fiber photometry of calcium signals. One study showed that NAc D2 MSNs was linked to both ‘liking’ (food consumption) and ‘wanting’ (food approach) but with opposing actions; high D2 MSNs activity signaled ‘wanting’, and low D2 MSNs activity enhanced ‘liking’. D2 MSNs are faced with a tradeoff between increasing ‘wanting’ by being more active or allowing ‘liking’ by remaining silent (Guillaumin et al., 2023). Therefore, the increase in frequencies of sEPSC and sIPSC in D2 MSNs may reflect two processes, liking and wanting, respectively. We thought that hedonia and motivation might different influence D2 MSNs activity during cohabitation and contribute to the processing of pair bond formation in a more dynamic and complex way than previously expected.

      Moreover, the frequencies of sEPSC and sIPSC were significantly reduced in the NAc shell D1

      MSNs after pair bonding, whereas the intrinsic excitability increased after cohabitation with females.

      The bidirectional modifications (reduced synaptic inputs vs. increased excitability) observed in D1 MSNs might result from homeostatic regulation. The overall synaptic transmission may produce no net changes, given that reductions in both excitatory and inhibitory synaptic transmission of D1 MSNs were observed. Also, increases in the intrinsic excitability of D1 MSNs would result in an overall excitation gain on D1 MSNs.

      (8) One interesting finding is that the relationship between D2-MSN and pair bond formation is quite clear (inhibition facilitates while excitation inhibits pair bond formation). In contrast, the role of D1-MSNs is more complicated since both excitation and inhibition disrupt pair bond formation.

      The discussion of this would benefit from another attempt.

      As reviewer suggested, the discussion was added in the revised manuscript.

      In the present study, DREADDs approaches were used to inhibit or excite NAc MSNs to VP projection and it was found that D1 and D2 NAc MSNs projecting to VP play different roles in the formation of a pair bond. Chemogenetic inhibition of VP-projecting D2 MSNs promoted partner preference formation, while activation of VP-projecting D2 MSNs inhibited it (Figure 6). Chemogenetic activation of D2 MSNs produced the opposite effect of DA on the D2 MSNs on partner preference, while inhibition of these neurons produced the same effects of DA on D2 MSNs. DA binding with D2R is coupled with Gi and produces an inhibitory effect (Lobo and Nestler, 2011). It is generally assumed that activation of D2R produces aversive and negative reinforcement. These results were consistent with the reduced D2 MSNs activity upon sniffing their partner in the fiber photometry test and the increased frequency and amplitude of sIPSC in the present study. Our results also agree with other previous studies, which showed that chemogenetic inhibition of NAc D2 MSNs is sufficient to enhance reward-oriented motivation in a motivational task (Carvalho Poyraz et al., 2016; Gallo et al., 2018). Inhibition of D2 MSNs during self-administration enhanced response and motivation to obtain cocaine (Bock et al., 2013). This also suggests that the mechanism underlying attachment to a partner and drug addiction is similar.

      Besides, in the present study, the formation of partner preference was inhibited after activation or inhibition of VP-projecting D1 MSNs, which is not consistent with conventional understanding of prairie vole behavior. Alternatively, DA binding with D1R is coupled with Gs and produces an excitatory effect (Lobo and Nestler, 2011), while activation of D1R produces reward and positive reinforcement (Hikida et al., 2010; Tai et al., 2012; Kwak and Jung, 2019). For example, activation of D1 MSNs enhances the cocaine-induced conditioned place preference (Lobo et al., 2010). In addition, D1R activation by DA promotes D1 MSNs activation, which promotes reinforcement. However, a recent study found that NAc-ventral mesencephalon D1 MSNs promote reward and positive reinforcement learning; in contrast, NAc-VP D1 MSNs led to aversion and negative reinforcement learning (Liu et al., 2022). It is consistent with our results that activation of NAc-VP D1 MSNs pathway reduced time spent side-by-side and impaired partner preference after 7 days of cohabitation. In contrast to inhibition of D2 MSNs, we found that inhibition of the D1 MSNs did not elicit corresponding increases in partner preference. One possible explanation is that almost all D1 MSNs projecting to the VTA/ substantia nigra (SN) send collaterals to the VP (Pardo-Garcia et al., 2019). For example, optogenetically stimulating VP axons may inadvertently cause effects in the VTA/SN through the antidromic activation of axon collaterals (Yizhar et al., 2011). Therefore, chemogenetic inhibition of D1 MSNs may also inhibit DA neurons in VTA, subsequently inhibiting the formation of a pair bond.

      The dopamine and different types of dopamine receptors in the NAc may play different roles in regulation of pair bond formation and maintenance. The chemogenetic manipulation revealed that VP-projecting D2 MSNs are necessary and more important in pair bond formation compared to VPprojecting D1 MSNs. It is consistent with previous pharmacological experiments that blocking of D2R with its specific antagonist, while D1R was not blocked, can prevent the formation of a pair bond in prairie voles (Gingrich et al., 2000). This indicates that D2R is crucial for the initial formation of the pair bond. D2R is involved in the reward aspects related to mating. In female prairie voles, D2R in the NAc is important for partner preference formation. The activation of D2R may help to condition the brain to assign a positive valence to the partner's cues during mating, facilitating the development of a preference for a particular mate. In addition, the cohabitation caused the DA release, the high affinity Gi-coupled D2R was activated first, which inhibited D2 MSNs activity and promoted the pair bond formation. And then, after 7 days of cohabitation, the pair bonding was already established, the significantly increased release of dopamine significantly activated Gs-coupled D1R with the low affinity to dopamine, which increased D1 MSNs activity and maintained the formation of partner preference. While D1R is also present and involved in the overall process, its role in the initial formation of the pair bond is not as dominant as D2R (Aragona et al., 2006). However, it still participates in the neurobiological processes related to pair bond formation. For example, in male mandarin voles, after 7 days of cohabitation with females, D1R activity in the NAc shell was affected during pair bond formation. The extracellular DA concentration was higher when sniffing their partner compared to a stranger, and this increase in DA release led to an increase in D1R activity in the NAc shell. In prairie voles, dopamine D1 receptors seem to be essential for pair bond maintenance. Neonatal treatment with D1 agonists can impair partner preference formation later in life, suggesting an organizational role for D1 in maintaining the bond (Aragona et al., 2006). In pair-bonded male prairie voles, D1R is involved in inducing aggressive behavior toward strangers, which helps to maintain the pair bond by protecting it from potential rivals. In the NAc shell, D1 agonist decreases the latency to attack same-sex conspecifics, while D1 antagonism increases it (Aragona et al., 2006). In summary, D2R is more crucial for pair bond formation, being involved in reward association and necessary for the initial development of the bond. D1R, on the other hand, is more important for pair bond maintenance, being involved in aggression and mate guarding behaviors and having an organizational role in maintaining the bond over time. We therefore suggest that D2 MSNs are more predominantly involved in the formation of a pair bond compared with D1 MSNs.

      (9) For the chemogenetic inhibition/excitation experiment please specify the temporal relationship between CNO injection and the behavioral testing. Are the DREADDs activated during the preference testing or are we only looking at the consequences of DREADD activation during cohabitation? This would impact the interpretation of the results.

      Considering the reviewer’s suggestion, we have clarified the time of CNO injection and the behavioral testing. In chemogenetic experiments, male voles were injected with CNO (1 mg/kg, i.p. injection) or saline once per day during 7-days cohabitation period. On day 3 and day 7 of cohabitation, the partner preference tests (3 h) were conducted after 3h of injection. Anton Pekcec (Jendryka et al., 2019) found that, in mice, after 60 min of CNO injection (i.p.), free CNO levels had dropped surprisingly sharply in CSF and cortex tissue, CNO could not be detected after 60 min. However, associated biological effects are reported to endure 6 - 24 h after CNO treatment (Farzi et al., 2018; Desloovere et al., 2019; Paretkar and Dimitrov, 2019). For example, René He et al. (Anacker et al., 2018) showed that chemogenetic inhibition of adult-born neurons in the vDG promotes susceptibility to social defeat stress by using of DREADDs for 10 days, whereas increasing neurogenesis confers resilience to chronic stress. Moreover, Ming-Ming Zhang et al. (Zhang et al., 2022) revealed that the selective activation or inhibition of the IC-BLA projection pathway strengthens or weakens the intensity of observational pain while the CNO (1 mg/kg) was i.p. injected into the infected mice on days 1, 3, 5, and 7 after virus expression. Furthermore, in study of James P Herman et al. (Nawreen et al., 2020) chronic inhibition of IL PV INs reduces passive and increases active coping behavior in FST. Therefore, we believe that 7-day CNO injections can produce chronic effects on MSNs and alters the formation of partner preferences.

      (10) Discussion: "The observed increase in DA release resulted in suppression of D2 neurons in the NAc shell". "In contrast, the rise in DA release increases D1 activity selectively in response to their partner after extended cohabitation." These statements would need to be weakened as causality is not shown here.

      Thanks for your rigorous consideration. We have reorganized the discussion in the revised manuscript.

      “The observed increase in DA release resulted in alterations in activities of D2 and D1 neurons in the NAc shell selectively in response to their partner after extended cohabitation.”

      (11) It would help if the order of supplementary figures would match their order of figures appearance in the result section.

      Thanks for your suggestion. We reorganized the order of appearance in the revised manuscript.

      (12) This may be beyond the focus of the study but it would be very interesting to know whether the physiological responses to partner contact are similarly observed in females.

      Thanks for your concern. It is regretful that we did not observe physiological responses of female to partner contact. We predict the females may show the similar response patterns to their partner. In the future, we will supplement the research on the mechanism of partner preferences in female voles.

      Reviewer #3 (Recommendations for the authors):

      The manuscript is evaluating changes in dopamine signaling in the nucleus accumbens following pair bonding and exposure to various stimuli in mandarin voles. The manuscript is generally wellwritten. The experiment designs seem strong, although there are missing details to fully evaluate them. The statistics are not completed correctly, and the statistical values are not reported making them even harder to evaluate. There are a lot of potential strengths in this research. However, my review is limited because I am limited in how to evaluate data interpretation when statistical analyses are not clear. I provide details below.

      Major

      (1) Statistics should be provided in the Results section. It is not clear how to evaluate the authors' interpretations without presenting the statistical data. What stats are being reported about viral expression in cells on lines 192-194? What posthocs? There is only one condition, so I assume the statistic was a one-sample t-test. The authors should report the t-value, df, and p-value. No post-hoc is needed. There are many issues like this, which makes reviewing this manuscript very difficult. If the statistics were not conducted properly and reported clearly, I do not have confidence that I can evaluate the author's interpretation of the results.

      Thanks for your suggestion. We report the t-value, df, and p-value in the Results section.

      (2) Statistical tests should be labeled correctly. ANOVAs (found in figure caption) for Figure 1 data are not repeated measures. Rather, they are one-way ANOVA (with stimulus as a within-subject variable).

      We used one-way ANOVA to analyze the changes in fluorescence signals in figure1-3. In the experiment, the changes in fluorescence signals of every subject were collected upon sniffing the partner, an unknown female, and an object. So, we used One-Way Repeated Measures ANOVA to analyze the data.

      (3) The protocol for behavioral assessment and stimulus presentation during fiber photometry recording is not clear. For example, the authors mention on line 662 that voles ate carrots during some of the recording sessions, but nothing else is described about the recording session. What was the order of stimulus presentation? What was the object provided? Why is eating carrots analyzed separately from object, partner, and stranger exposure?

      Response: Sorry for the confusing. The detailed description has been added. After 3 and 7 days of cohabitation, males were exposed to their partner or an unfamiliar female (each exposure lasted for 30 min) in random order in a clean social interaction cage. The changes in fluorescence signals during these social interactions with their partner, an unfamiliar vole of the opposite sex, or an object (Rubik's Cube) were collected and digitalized by CamFiberPhotometry software (ThinkerTech). To rule out that the difference in fluorescence signals was caused by the difference in virus expression at different time points, we used the same experimental strategy in new male mandarin voles and measured the fluorescence signal changes upon eating carrot after 3 and 7 days of cohabitation (The male mandarin voles were fasted for four hours before the test.). Since sniffing (object, partner, and stranger) and eating carrot were not tested in the same males, we analyzed sniffing and eating carrot separately.

      (4) Supplement figures would be better as figures instead of tables. Many effects are hard to interpret.

      As you suggested, we added the information of Supplement table1 in results.

      (5) Citations should be included to note when pair bonding occurs in mandarin voles.

      As you suggested, we added the citation in the revised manuscript.

      Minor

      (1) Add a citation for the statement that married people live longer than unmarried people (Lines 51-52).

      As you suggested, we added the citation in the revised manuscript.

      (2) There is a table labeling viral vectors, but the table is not titled properly or referenced in the methods section.

      Thanks for our careful checking. We reorganized the table title and the table was also cited in the revised manuscript.

      (3) Sentences on lines 608-610 and 610-612 seem redundant.

      This sentence was corrected.

      (4) This is a rather subjective statement "Carrots are voles' favorite food."

      We reorganized the sentence in the revised manuscript.

      "Carrots are voles' daily food."

      Anacker C, Luna VM, Stevens GS, Millette A, Shores R, Jimenez JC, Chen B, Hen R (2018) Hippocampal neurogenesis confers stress resilience by inhibiting the ventral dentate gyrus. Nature 559:98-102.

      Aragona BJ, Liu Y, Yu YJ, Curtis JT, Detwiler JM, Insel TR, Wang Z (2006) Nucleus accumbens dopamine differentially mediates the formation and maintenance of monogamous pair bonds. Nature neuroscience 9:133-139.

      Bock R, Shin JH, Kaplan AR, Dobi A, Markey E, Kramer PF, Gremel CM, Christensen CH, Adrover MF, Alvarez VA (2013) Strengthening the accumbal indirect pathway promotes resilience to compulsive cocaine use. Nature neuroscience 16:632-638.

      Brody AK, Armitage KB (1985) The effects of adult removal on dispersal of yearling yellow-bellied marmots. Canadian Journal of Zoology 63:2560-2564.

      Carvalho Poyraz F, Holzner E, Bailey MR, Meszaros J, Kenney L, Kheirbek MA, Balsam PD, Kellendonk C (2016) Decreasing Striatopallidal Pathway Function Enhances Motivation by Energizing the Initiation of Goal-Directed Action. The Journal of neuroscience : the official journal of the Society for Neuroscience 36:5988-6001.

      Castro DC, Berridge KC (2014) Opioid hedonic hotspot in nucleus accumbens shell: mu, delta, and kappa maps for enhancement of sweetness "liking" and "wanting". The Journal of neuroscience : the official journal of the Society for Neuroscience 34:4239-4250.

      Desloovere J, Boon P, Larsen LE, Merckx C, Goossens MG, Van den Haute C, Baekelandt V, De Bundel D, Carrette E, Delbeke J, Meurs A, Vonck K, Wadman W, Raedt R (2019) Longterm chemogenetic suppression of spontaneous seizures in a mouse model for temporal lobe epilepsy. Epilepsia 60:2314-2324.

      Echo JA, Lamonte N, Ackerman TF, Bodnar RJ (2002) Alterations in food intake elicited by GABA and opioid agonists and antagonists administered into the ventral tegmental area region of rats. Physiology & behavior 76:107-116.

      Farzi A, Lau J, Ip CK, Qi Y, Shi YC, Zhang L, Tasan R, Sperk G, Herzog H (2018) Arcuate nucleus and lateral hypothalamic CART neurons in the mouse brain exert opposing effects on energy expenditure. eLife 7.

      Gallo EF, Meszaros J, Sherman JD, Chohan MO, Teboul E, Choi CS, Moore H, Javitch JA, Kellendonk C (2018) Accumbens dopamine D2 receptors increase motivation by decreasing inhibitory transmission to the ventral pallidum. Nature communications 9:1086.

      Gingrich B, Liu Y, Cascio C, Wang Z, Insel TR (2000) Dopamine D2 receptors in the nucleus accumbens are important for social attachment in female prairie voles (Microtus ochrogaster). Behavioral neuroscience 114:173-183.

      Gosnell BA, Majchrzak MJ (1989) Centrally administered opioid peptides stimulate saccharin intake in nondeprived rats. Pharmacology, biochemistry, and behavior 33:805-810.

      Gosnell BA, Levine AS, Morley JE (1986) The stimulation of food intake by selective agonists of mu, kappa and delta opioid receptors. Life sciences 38:1081-1088.

      Greenwood PJ (1983) Mating systems and the evolutionary consequences of dispersal. The ecology of animal movement:116-131.

      Guillaumin MCC, Viskaitis P, Bracey E, Burdakov D, Peleg-Raibstein D (2023) Disentangling the role of NAc D1 and D2 cells in hedonic eating. Molecular psychiatry 28:3531-3547.

      Hikida T, Kimura K, Wada N, Funabiki K, Nakanishi S (2010) Distinct roles of synaptic transmission in direct and indirect striatal pathways to reward and aversive behavior. Neuron 66:896907.

      Hoglen NEG, Manoli DS (2022) Cupid's quiver: Integrating sensory cues in rodent mating systems. Frontiers in neural circuits 16:944895.

      Ims RA (1990) Determinants of natal dispersal and space use in grey-sided voles, Clethrionomys rufocanus : a combined field and laboratory experiment. Oikos 57:106-113.

      Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, Nissen W, Pekcec A (2019) Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Scientific reports 9:4522.

      Kwak S, Jung MW (2019) Distinct roles of striatal direct and indirect pathways in value-based decision making. eLife 8.

      Liu Z, Le Q, Lv Y, Chen X, Cui J, Zhou Y, Cheng D, Ma C, Su X, Xiao L, Yang R, Zhang J, Ma L, Liu X (2022) A distinct D1-MSN subpopulation down-regulates dopamine to promote negative emotional state. Cell Res 32:139-156.

      Lobo MK, Nestler EJ (2011) The striatal balancing act in drug addiction: distinct roles of direct and indirect pathway medium spiny neurons. Front Neuroanat 5:41.

      Lobo MK, Covington HE, 3rd, Chaudhury D, Friedman AK, Sun H, Damez-Werno D, Dietz DM, Zaman S, Koo JW, Kennedy PJ, Mouzon E, Mogri M, Neve RL, Deisseroth K, Han MH, Nestler EJ (2010) Cell type-specific loss of BDNF signaling mimics optogenetic control of cocaine reward. Science (New York, NY) 330:385-390.

      Nawreen N, Cotella EM, Morano R, Mahbod P, Dalal KS, Fitzgerald M, Martelle S, Packard BA, Franco-Villanueva A, Moloney RD, Herman JP (2020) Chemogenetic Inhibition of Infralimbic Prefrontal Cortex GABAergic Parvalbumin Interneurons Attenuates the Impact of Chronic Stress in Male Mice. eNeuro 7.

      Pardo-Garcia TR, Garcia-Keller C, Penaloza T, Richie CT, Pickel J, Hope BT, Harvey BK, Kalivas PW, Heinsbroek JA (2019) Ventral Pallidum Is the Primary Target for Accumbens D1 Projections Driving Cocaine Seeking. The Journal of neuroscience : the official journal of the Society for Neuroscience 39:2041-2051.

      Paretkar T, Dimitrov E (2019) Activation of enkephalinergic (Enk) interneurons in the central amygdala (CeA) buffers the behavioral effects of persistent pain. Neurobiology of disease 124:364-372.

      Peciña S, Berridge KC (2000) Opioid site in nucleus accumbens shell mediates eating and hedonic 'liking' for food: map based on microinjection Fos plumes. Brain research 863:71-86.

      Peciña S, Berridge KC (2005) Hedonic hot spot in nucleus accumbens shell: where do mu-opioids cause increased hedonic impact of sweetness? The Journal of neuroscience : the official journal of the Society for Neuroscience 25:11777-11786.

      Peciña S, Berridge KC (2013) Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered 'wanting' for reward: entire core and medial shell mapped as substrates for PIT enhancement. The European journal of neuroscience 37:1529-1540.

      Qu Y, Zhang L, Hou W, Liu L, Liu J, Li L, Guo X, Li Y, Huang C, He Z, Tai F (2024) Distinct medial amygdala oxytocin receptor neurons projections respectively control consolation or aggression in male mandarin voles. Nature communications 15:8139.

      Reynolds SM, Berridge KC (2001) Fear and feeding in the nucleus accumbens shell: rostrocaudal segregation of GABA-elicited defensive behavior versus eating behavior. The Journal of neuroscience : the official journal of the Society for Neuroscience 21:3261-3270.

      Solomon NG, Jacquot JJ (2002) Characteristics of resident and wandering prairie voles, Microtus ochrogaster. Canadian Journal of Zoology 80:951-955.

      Tai LH, Lee AM, Benavidez N, Bonci A, Wilbrecht L (2012) Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nature neuroscience 15:1281-1289.

      Yamaguchi T, Wei D, Song SC, Lim B, Tritsch NX, Lin D (2020) Posterior amygdala regulates sexual and aggressive behaviors in male mice. Nature neuroscience 23:1111-1124.

      Ying L, Zhao J, Ye Y, Liu Y, Xiao B, Xue T, Zhu H, Wu Y, He J, Qin S, Jiang Y, Guo F, Zhang L, Liu N, Zhang L (2022) Regulation of Cdc42 signaling by the dopamine D2 receptor in a mouse model of Parkinson's disease. Aging cell 21:e13588.

      Yizhar O, Fenno LE, Davidson TJ, Mogri M, Deisseroth K (2011) Optogenetics in neural systems. Neuron 71:9-34.

      Zhan S, Qi Z, Cai F, Gao Z, Xie J, Hu J (2024) Oxytocin neurons mediate stress-induced social memory impairment. Current biology : CB 34:36-45.e34.

      Zhang M, Kelley AE (2000) Enhanced intake of high-fat food following striatal mu-opioid stimulation: microinjection mapping and fos expression. Neuroscience 99:267-277.

      Zhang MM et al. (2022) Glutamatergic synapses from the insular cortex to the basolateral amygdala encode observational pain. Neuron 110:1993-2008.e1996.

      Zhao J, Ying L, Liu Y, Liu N, Tu G, Zhu M, Wu Y, Xiao B, Ye L, Li J, Guo F, Zhang L, Wang H, Zhang L (2019) Different roles of Rac1 in the acquisition and extinction of methamphetamineassociated contextual memory in the nucleus accumbens. Theranostics 9:7051-7071.

      Znamensky V, Echo JA, Lamonte N, Christian G, Ragnauth A, Bodnar RJ (2001) gammaAminobutyric acid receptor subtype antagonists differentially alter opioid-induced feeding in the shell region of the nucleus accumbens in rats. Brain research 906:84-91.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We summarized the main changes:

      (1) In the Introduction part, we give a general definition of habitat fragmentation to avoid confusion, as reviewers #1 and #2 suggested.

      (2) We clarify the two aspects of the observed “extinction”——“true dieback” and “emigration”, as reviewers #2 and #3 suggested.

      (3) In the Methods part, we 1) clarify the reason for testing the temporal trend in colonization/extinction dynamics and describe how to select islands as reviewer #1 suggested; 2) describe how to exclude birds from the analysis as reviewer #2 suggested.

      (4) In the Results part, we modified and rearranged Figure 4-6 as reviewers #1, #2 and #3 suggested.

      (5) In the Discussion part, we 1) discuss the multiple aspects of the metric of isolation for future research as reviewer #3 suggested; 2) provide concrete evidence about the relationship between habitat diversity or heterogeneity and island area and 3) provide a wider perspective about how our results can inform conservation practices in fragmented habitats as reviewer #2 suggested.

      eLife Assessment

      This important study enhances our understanding of how habitat fragmentation and climate change jointly influence bird community thermophilization in a fragmented island system. The evidence supporting some conclusions is incomplete, as while the overall trends are convincing, some methodological aspects, particularly the isolation metrics and interpretation of colonization/extinction rates, require further clarification. This work will be of broad interest to ecologists and conservation biologists, providing crucial insights into how ecosystems and communities react to climate change.

      We sincerely extend our gratitude to you and the esteemed reviewers for acknowledging the importance of our study and for raising these concerns. We have clarified the rationale behind our analysis of temporal trends in colonization and extinction dynamics, as well as the choice of distance to the mainland as the isolation metric. Additionally, we further discuss the multiple aspects of the metric of isolation for future research and provide concrete supporting evidence about the relationship between habitat diversity or heterogeneity and island area.

      Incorporating these valuable suggestions, we have thoroughly revised our manuscript, ensuring that it now presents a more comprehensive and nuanced account of our research. We are confident that these improvements will further enhance the impact and relevance of our work for ecologists and conservation biologists alike, offering vital insights into the resilience and adaptation strategies of communities facing the challenges of climate change.

      Reviewer #1 (Public Review):

      Summary:

      This study reports on the thermophilization of bird communities in a network of islands with varying areas and isolation in China. Using data from 10 years of transect surveys, the authors show that warm-adapted species tend to gradually replace cold-adapted species, both in terms of abundance and occurrence. The observed trends in colonisations and extinctions are related to the respective area and isolation of islands, showing an effect of fragmentation on the process of thermophilization.

      Strengths:

      Although thermophilization of bird communities has been already reported in different contexts, it is rare that this process can be related to habitat fragmentation, despite the fact that it has been hypothesized for a long time that it could play an important role. This is made possible thanks to a really nice study system in which the construction of a dam has created this incredible Thousand Islands lake. Here, authors do not simply take observed presence-absence as granted and instead develop an ambitious hierarchical dynamic multi-species occupancy model. Moreover, they carefully interpret their results in light of their knowledge of the ecology of the species involved.

      Response: We greatly appreciate your recognition of our study system and the comprehensive approach and careful interpretation of results. 

      Weaknesses:

      Despite the clarity of this paper on many aspects, I see a strong weakness in the authors' hypotheses, which obscures the interpretation of their results. Looking at Figure 1, and in many sentences of the text, a strong baseline hypothesis is that thermophilization occurs because of an increasing colonisation rate of warm-adapted species and extinction rate of cold-adapted species. However, there does not need to be a temporal trend! Any warm-adapted species that colonizes a site has a positive net effect on CTI; similarly, any cold-adapted species that goes extinct contributes to thermophilization.

      Thank you very much for these thoughtful comments. The understanding depends on the time frame of the study and specifically, whether the system is at equilibrium. We think your claim is based on this background: if the system is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. We agree with you in this case.

      On the other hand, if a community is at equilibrium, then there will be no net change in CTI over time. Imagine we have an archipelago where the average colonization of warm-adapted species is larger than the average colonization of cold-adapted species, then over time the archipelago will reach an equilibrium with stable colonization/extinction dynamics where the average CTI is stable over time. Once it is stable, then if there is a temporal trend in colonization rates, the CTI will change until a new equilibrium is reached (if it is reached).

      For our system, the question then is whether we can assume that the system is or has ever been at equilibrium. If it is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. If the system is at equilibrium (at the beginning of the study), then CTI will only shift if there is a temporal change or trend in colonization or extinction rates.

      Habitat fragmentation can affect biomes for decades after dam formation. The “Relaxation effect” (Gonzalez, 2000) refers to the fact that the continent acts as a potential species pool for island communities. Under relaxation, some species will be filtered out over time, mainly through the selective extinction of species that are highly sensitive to fragmentation. Meanwhile, for a 100-hectare patch, it takes about ten years to lose 50% of bird species; The smaller the patch area, the shorter the time required (Ferraz et al., 2003; Haddad et al., 2015). This study was conducted 50 to 60 years after the formation of the TIL, making the system with a high probability of reaching “equilibrium” through “Relaxation effect”(Si et al., 2014). We have no way of knowing exactly whether “equilibrium” is true in our system. Thus, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization, which makes our inference more robust.

      We add a note to the legend of Figure 1 on Lines 781-786:

      “CTI can also change simply due to differential colonization-extinction rates by thermal affinity if the system is not at equilibrium prior to the study. In our study system, we have no way of knowing whether our island system was at equilibrium at onset of the study, thus, focusing on changing rates of colonization-extinction over time presents a much stronger tests of thermophilization.”

      We hope this statement can make it clear. Thank you again for this meaningful question.

      Another potential weakness is that fragmentation is not clearly defined. Generally, fragmentation sensu lato involves both loss of habitat area and changes in the spatial structure of habitats (i.e. fragmentation per se). Here, both area and isolation are considered, which may be slightly confusing for the readers if not properly defined.

      Thank you for reminding us of that. Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We have clarified the general definition in the Introduction on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      This study addresses whether bird community reassembly in time is related to climate change by modelling a widely used metric, the community temperature index (CTI). The authors first computed the temperature index of 60 breeding bird species thanks to distribution atlases and climatic maps, thus obtaining a measure of the species realized thermal niche.

      These indices were aggregated at the community level, using 53 survey transects of 36 islands (repeated for 10 years) of the Thousand Islands Lake, eastern China. Any increment of this CTI (i.e. thermophilization) can thus be interpreted as a community reassembly caused by a change in climate conditions (given no confounding correlations).

      The authors show thanks to a mix of Bayesian and frequentist mixed effect models to study an increment of CTI at the island level, driven by both extinction (or emigration) of cold-adapted species and colonization of newly adapted warm-adapted species. Less isolated islands displayed higher colonization and extinction rates, confirming that dispersal constraints (created by habitat fragmentation per se) on colonization and emigration are the main determinants of thermophilization. The authors also had the opportunity to test for habitat amount (here island size). They show that the lack of microclimatic buffering resulting from less forest amount (a claim backed by understory temperature data) exacerbated the rates of cold-adapted species extinction while fostering the establishment of warm-adapted species.

      Overall these findings are important to range studies as they reveal the local change in affinity to the climate of species comprising communities while showing that the habitat fragmentation VS amount distinction is relevant when studying thermophilization. As is, the manuscript lacks a wider perspective about how these results can be fed into conservation biology, but would greatly benefit from it. Indeed, this study shows that in a fragmented reserve context, habitat amount is very important in explaining trends of loss of cold-adapted species, hinting that it may be strategic to prioritize large habitats to conserve such species. Areas of diverse size may act as stepping stones for species shifting range due to climate change, with small islands fostering the establishment of newly adapted warm-adapted species while large islands act as refugia for cold-adapted species. This study also shows that the removal of dispersal constraints with low isolation may help species relocate to the best suitable microclimate in a heterogenous reserve context.

      Thank you very much for your valuable feedback. We greatly appreciate your recognition of the scientific question to the extensive dataset and diverse approach. In particular, you provided constructive suggestions and examples on how to extend the results to conservation guidance. This is something we can’t ignore in the manuscript. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      ‘Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.’

      Strength:

      The strength of the study lies in its impressive dataset of bird resurveys, that cover 10 years of continued warming (as evidenced by weather data), 60 species in 36 islands of varying size and isolation, perfect for disentangling habitat fragmentation and habitat amount effects on communities. This distinction allows us to test very different processes mediating thermophilization; island area, linked to microclimatic buffering, explained rates for a variety of species. Dispersal constraints due to fragmentation were harder to detect but confirms that fragmentation does slow down thermophilization processes.

      This study is a very good example of how the expected range shift at the biome scale of the species materializes in small fragmented regions. Specifically, the regional dynamics the authors show are analogous to what processes are expected at the trailing and colonizing edge of a shifting range: warmer and more connected places display the fastest turnover rates of community reassembly. The authors also successfully estimated extinction and colonization rates, allowing a more mechanistic understanding of CTI increment, being the product of two processes.

      The authors showed that regional diversity and CTI computed only by occurrences do not respond in 10 years of warming, but that finer metrics (abundance-based, or individual islands considered) do respond. This highlights the need to consider a variety of case-specific metrics to address local or regional trends. Figure Appendix 2 is a much-appreciated visualization of the effect of different data sources on Species thermal Index (STI) calculation.

      The methods are long and diverse, but they are documented enough so that an experienced user with the use of the provided R script can follow and reproduce them.

      Thank you very much for your profound Public Review. We greatly appreciate your recognition of the scientific question, the extensive dataset and the diverse approach. 

      Weaknesses:

      While the overall message of the paper is supported by data, the claims are not uniformly backed by the analysis. The trends of island-specific thermophilization are very credible (Figure 3), however, the variable nature of bird observations (partly compensated by an impressive number of resurveys) propagate a lot of errors in the estimation of species-specific trends in occupancy, abundance change, and the extinction and colonization rates. This materializes into a weak relationship between STI and their respective occupancy and abundance change trends (Figure 4a, Figure 5, respectively), showing that species do not uniformly contribute to the trend observed in Figure 3. This is further shown by the results presented in Figure 6, which present in my opinion the topical finding of the study. While a lot of species rates response to island areas are significant, the isolation effect on colonization and extinction rates can only be interpreted as a trend as only a few species have a significant effect. The actual effect on the occupancy change rates of species is hard to grasp, and this trend has a potentially low magnitude (see below).

      Thank you very much for pointing out this shortcoming. The R2 between STI and their respective occupancy trends is relatively small (R2\=0.035). But the R2 between STI and their respective abundance change trends are relatively bigger, in the context of Ecology research (R2\=0.123). The R2 between STI and their respective colonization rate (R2\=0.083) and extinction rate trends (R2\=0.053) are also relatively small. Low R2 indicates that we can’t make predictions using the current model, we must notice that except STI, other factors may influence the species-specific occupancy trend. Nonetheless, it is important to notice that the standardized coefficient estimates are not minor and the trend is also significant, indicating the species-specific response is as least related to STI.

      The number of species that have significant interaction terms for isolation (Figure 6) is indeed low. Although there is uncertainty in the estimation of relationships, there are also consistent trends in response to habitat fragmentation of colonization of warm-adapted species and extinction of cold-adapted species. This is especially true for the effect of isolation, where on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate. We now better highlight these results in the Results and Discussion.

      While being well documented, the myriad of statistical methods used by the authors ampere the interpretation of the figure as the posterior mean presented in Figure 4b and Figure 6 needs to be transformed again by a logit-1 and fed into the equation of the respective model to make sense of. I suggest a rewording of the caption to limit its dependence on the method section for interpretation.

      Thank you for this suggestion. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable so interpretation is actually quite straight forward: positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects...”

      By using a broad estimate of the realized thermal niche, a common weakness of thermophilization studies is the inability to capture local adaptation in species' physiological or behavioral response to a rise in temperature. The authors however acknowledge this limitation and provide specific examples of how species ought to evade high temperatures in this study region.

      We appreciate your recognition. This is a common problem in STI studies. We hope in future studies, researchers can take more details about microclimate of species’ true habitat across regions into consideration when calculating STI. Although challenging, focusing on a smaller portion of its distribution range may facilitate achievement.

      Reviewer #3 (Public Review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase in the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well as the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence-based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) were stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only a few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well-balanced method of simplifying this to the most important factors in question (CTI change, extinction, and colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      We appreciate very much for your positive and constructive comments and suggestions. Thank you for your recognition of the scientific question, the modeling approach and the conclusions. 

      Weaknesses:

      The metric of island isolation based on the distance to the mainland seems a bit too oversimplified as in real life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Thus a more holistic network metric of isolation could have been applied or at least discussed for future research. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint at a more complex pattern going on in real-life than was assumed for this study.

      Thank you for this meaningful question. Isolation can be measured in different ways in the study region. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate (Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This could be the reason why distance to the nearest mainland is the best predictor.

      We agree with you that it’s still necessary to consider more aspects of “isolation” at least in discussion for future research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      Further, the link between larger areas and higher habitat diversity or heterogeneity could be presented by providing evidence for this relationship. The authors do make a reference to a paper done in the same study system, but a more thorough presentation of it would strengthen this assumption further.

      Thank you very much for this question. We now add more details about the relationship between habitat diversity and heterogeneity based on a related study in the same system. The observed number of species significantly increased with increasing island area (slope = 4.42, R2 = 0.70, p < .001), as did the rarefied species richness per island (slope = 1.03, R2 = 0.43, p < .001), species density (slope = 0.80, R2 = 0.33, p = .001) and the rarefied species richness per unit area (slope = 0.321, R2 = 0.32, p = .001). We added this supporting evidence on Lines 317-321:

      “We thus suppose that habitat heterogeneity could also mitigate the loss of these relatively cold-adapted species as expected. Habitat diversity, including the observed number of species, the rarefied species richness per island, species density and the rarefied species richness per unit area, all increased significantly with island area instead of isolation in our system (Liu et al., 2020)”

      Despite the general clear patterns found in the paper, there were some idiosyncratic responses. Those could be due to a multitude of factors which could be discussed a bit better to inform future research using a similar study design.

      Thank you for these suggestions. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1: I disagree that there should be a temporal trend in colonisation/extinction dynamics.

      Thank you again for these thoughtful comments. We have explained in detail in the response to the Public Review.

      (2) L 485-487: As explained before I disagree. I don't see why there needs to be a temporal trend in colonization and extinction.

      Thank you again for these thoughtful comments. Because we can’t guarantee that the study system has reached equilibrium, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization. More detailed statement can be seen in the response to the Public Review.

      (3) L 141: which species' ecological traits?

      Sorry for the confusion. The traits included continuous variables (dispersal ability, body size, body mass and clutch size) and categorical variables (diet, active layer, residence type). Specifically, we tested the correlation between STI and dispersal ability, body size, body mass and clutch size using Pearson correlation test. We also tested the difference in STI between different trait groups using the Wilcoxon signed-rank test for three Category variables: diet (carnivorous/ omnivorous/ herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor). There is no significant difference between any two groups for each of the three category variables (p > 0.2). We added these on Lines 141-145:

      “No significant correlation was found between STI and species’ ecological traits; specifically, the continuous variables of dispersal ability, body size, body mass and clutch size (Pearson correlations for each, |r| < 0.22), and the categorial variables of diet (carnivorous/omnivorous/herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor)”

      (4) L 143: CTIoccur and CTIabun were not defined before.

      Because CTIoccur and CTIabun were first defined in Methods part (section 4.4), we change the sentence to a more general statement here on Lines 147-150:

      “At the landscape scale, considering species detected across the study area, occurrence-based CTI (CTIoccur; see section 4.4) showed no trend (posterior mean temporal trend = 0.414; 95% CrI: -12.751, 13.554) but abundance-based CTI (CTIabun; see section 4.4) showed a significant increasing trend.”

      (5) Figure 4: what is the dashed vertical line? I assume the mean STI across species?

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (6) Figure 6: in the legend, replace 'points in blue' with 'points in blue/orange' or 'solid dots' or something similar.

      Thank you for this suggestion. We changed it to “points in blue/orange” on Lines 823.

      (7) L 176-176: unclear why the interaction parameters are particularly important for explaining the thermophilization mechanism: if e.g. colonization rate of warm-adapted species is constantly higher in less isolated islands, (and always higher than the extinction rate of the same species), it means that thermophilization is increased in less isolated islands, right?

      Thank you for this question. This is also related to the question about “Why use temporal trends in colonization/extinction rate to test for thermophilization mechanisms”. Colonization-extinction over time is actually a much stronger test of thermophilization (more details refer to response to Public Review and Recommendations 1&2).

      Based on this, the two main driving processes of thermophilization mechanism include the increasing colonization rate of warm-adapted species and the increasing extinction rate of cold-adapted species with year. The interaction effect between island area (or isolation) and year on colonization rate (or extinction rate) can tell us how habitat fragmentation mediates the year effect. For example, if the interaction term between year and isolation is negative for a warm-adapted species that increased in colonization rate with year, it indicates that the colonization rate increased faster on less isolated islands. This is a signal of a faster thermophilization rate on less-isolated islands.

      (8) L201-203: this is only little supported by the results that actually show that there is NO significant interaction for most species.

      Thank you for this comment. Although most species showed non-significant interaction effect, the overall trend is relatively consistent, this is especially true for the effect of isolation. To emphasize the “trend” instead of “significant effect”, we slightly modified this sentence in more rigorous wording on Lines 205-208: 

      “We further found that habitat fragmentation influences two processes of thermophilization: colonization rates of most warm-adapted species tended to increase faster on smaller and less isolated islands, while the loss rates of most cold-adapted species tended to be exacerbated on less isolated islands.”

      (9) Section 2.3: can't you have a population-level estimate? I struggled a bit to understand all the parameters of the MSOM (because of my lack of statistical/mathematical proficiency) so I cannot provide more advice here.

      Thank you for raising this advice. We think what you are mentioning is the overall estimate across all species for each variable. From MSOM, we can get a standardized estimate of every variable (year, area, isolation, interaction) for each species, separately. Because the divergent or consistent responses among species are what we are interested in, we didn’t calculate further to get a population-level estimate.

      (10) L 291: a dot is missing.

      Done. Thank you for your correction.

      (11) L 305, 315: a space is missing

      Done

      (12) L 332: how were these islands selected?

      Thank you for this question. The 36 islands were selected according to a gradient of island area and isolation, spreading across the whole lake region. The selected islands guaranteed there is no significant correlation between island area and isolation (the Pearson correlation coefficient r = -0.21, p = 0.21). The biggest 7 islands among the 36 islands are also the only several islands larger than 30 ha in the whole lake region. We have modified this in the Method part on Lines 360-363.

      “We selected 36 islands according to a gradient of island area and isolation with a guarantee of no significant correlation between island area and isolation (Pearson r = -0.21, p = 0.21). For each island, we calculated island area and isolation (measured in the nearest Euclidean distance to the mainland) to represent the degree of habitat fragmentation.”

      (13) L 334: "Distance to the mainland" was used as a metric of isolation, but elsewhere in the text you argue that the observed thermophilization is due to interisland movements. It sounds contradictory. Why not include the average or shortest distance to the other islands?

      Thank you very much for raising this comment. Yes, “Distance to the mainland” was the only metric we used for isolation. We carefully checked through the manuscript where the “interisland movement” comes from and induces the misunderstanding. It must come from Discussion 3.1 (n Lines 217-221): “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to inter-island occurrence dynamics, rather than exogenous community turnover.”

      Sorry, the word “inter-island” is not exactly what we want to express here, we wanted to express that “the thermophilization was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region”. We have changed the sentence in Discussion part on Lines 217-221:

      “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region.”

      Besides, I would like to explain why we use distance to the mainland. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate(Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This may be the reason why distance to the nearest mainland is the best predictor.

      In Discussion part, we added the following discussion and talked about the other measures on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (14) L 347: you write 'relative' abundance but this measure is not relative to anything. Better write something like "we based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys".

      Thank you for this suggestion, we have changed the sentence on Lines 377-379:

      “We based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys.”

      (15) L 378: shouldn't the formula for CTIoccur be (equation in latex format):

      CTI{occur, j, t} =\frac{\sum_{i=1}^{N_{j,t}}STI_{i}}{N_{j,t}}

      Where Nj,t is the total number of species surveyed in the community j in year t

      Thank you very much for this careful check, we have revised it on Lines 415, 417:

      “where Nj,t is the total number of species surveyed in the community j in year t.”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 76: "weakly"

      Done. Thank you for your correction.

      (2) Line 98: I suggest a change to this sentence: "For example, habitat fragmentation renders habitats to be too isolated to be colonized, causing sedentary butterflies to lag more behind climate warming in Britain than mobile ones"

      Thank you for this modification, we have changed it on Lines 99-101.

      (3) Line 101: remove either "higher" or "increasing"

      Done, we have removed “higher”. Thank you for this advice.

      (4) Line 102: "benefiting from near source of"

      Done.

      (5) Line 104: "emigrate"

      Done.

      (6) Introduction: I suggest making it more explicit what process you describe under the word "extinction". At first read, I thought you were only referring to the dieback of individuals, but you also included emigration as an extinction process. It also needs to be reworded in Fig 1 caption.

      Thank you for this suggestion. Yes, we can’t distinguish in our system between local extinction and emigration. The observed “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then if can’t emigrate or withstand, “real local dieback”. It should also be included in the legend of Figure 1, as you said. We have modified the legend in Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, and if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      (7) I also suggest differentiating habitat fragmentation (distances between islands) and habitat amount (area) as explained in Fahrig 2013 (Rethinking patch size and isolation effects: the habitat amount hypothesis) and her latter paper. This will help the reader what lies behind the general trend of fragmentation: fragmentation per se and habitat amount reduction.

      Thank you for this suggestion! Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We now give a general definition of habitat fragmentation on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (8) Line 136: is the "+-" refers to the standard deviation or confidence interval, I suggest being explicit about it once at the start of the results.

      Thank you for reminding this. The "+-" refers to the standard deviation (SD). The modified sentence is now on Lines 135-139:

      “The number of species detected in surveys on each island across the study period averaged 13.37 ± 6.26 (mean ± SD) species, ranging from 2 to 40 species, with an observed gamma diversity of 60 species. The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of STI is 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (9) Line 143: please specify the unit of thermophilization.

      The unit of thermophilization rate is the change in degree per unit year. Because in all analyses, predictor variables were z-transformed to make their effect comparable. We have added on Line 151:

      “When measuring CTI trends for individual islands (expressed as °/ unit year)”

      (10) Line 289: check if no word is missing from the sentence.

      The sentence is: “In our study, a large proportion (11 out of 15) of warm-adapted species increasing in colonization rate and half (12 out of 23) of cold-adapted species increasing in extinction rate were changing more rapidly on smaller islands.”

      Given that we have defined the species that were included in testing the third prediction in both Methods part and Result part: 15 warm-adapted species that increased in colonization rate and 23 cold-adapted species that increased in extinction rate. We now remove this redundant information and rewrote the sentence as below on Lines 300-302:

      “In our study, the colonization rate of a large proportion of warm-adapted species (11 out of 15) and the extinction rate of half of old-adapted species (12 out of 23) were increasing more rapidly on smaller islands.”

      (11) Line 319: I really miss a concluding statement of your discussion, your results are truly interesting and deserve to be summarized in two or three sentences, and maybe a perspective about how it can inform conservation practices in fragmented settings.

      Thank you for this profound suggestion both in Public Review and here. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      “Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.”

      (12) Line 335: I suggest " ... the islands has been protected by forbidding logging, ..."

      Thanks for this wonderful suggestion. Done. The new sentence is now on Lines 365-366:

      “Since lake formation, the islands have been protected by forbidding logging, allowing natural succession pathways to occur.”

      (13) Line 345: this speed is unusually high for walking, check the speed.

      Sorry for the carelessness, it should be 2.0 km/h. It has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (14) Line 351: you could add a sentence explaining why that choice of species exclusion was made. Was made from the start of the monitoring program or did you exclude species afterward?

      We excluded them afterward. We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants). These records were recorded during monitoring, including some of them being on the shore of the island or high-flying above the island, and some nocturnal species were just spotted by accident.

      We described more details about how to exclude species on Lines 379-387:

      “We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants) from our record. First, our surveys were conducted during the day, so some nocturnal and crepuscular species, such as the owls and nightjars were excluded for inadequate survey design. Second, wagtail, kingfisher, and water birds such as ducks and herons were excluded because we were only interested in forest birds. Third, birds like swallows, and eagles who were usually flying or soaring in the air rather than staying on islands, were also excluded as it was difficult to determine their definite belonging islands. Following these operations, 60 species were finally retained.”

      (15) Line 370: I suggest adding the range and median of STI.

      Thanks for this good suggestion. The range, mean±SD of STI were already in the Results part, we added the median of STI there as well. The new sentence is now in Results part on Lines 137-139:

      “The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (16) Figure 4.b: Is it possible to be more explicit about what that trend is? the coefficient of the regression Logit(ext/col) ~ year + ...... ?

      Thank you for this advice. Your understanding is right: we can interpret it as the coefficient of the ‘year’ effect in the model. More specifically, the ‘year’ effect or temporal trend here is the ‘posterior mean’ of the posterior distribution of ‘year’ in the MSOM (Multi-species Occupancy Model), in the context of the Bayesian framework. We modified this sentence on Lines 811-813:

      “ Each point in (b) represents the posterior mean estimate of year in colonization, extinction or occupancy rate for each species.”

      (17) Figure 6: is it possible to provide an easily understandable meaning of the prior presented in the Y axis? E.g. "2 corresponds to a 90% probability for a species to go extinct at T+1", if not, please specify that it is the logit of a probability.

      Thank you for this question both in Public Review and here. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable. So, positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects.”

      (18) Line 773: points in blue only are significant? I suggest "points in color".

      Thank you for your reminder. Points in blue and orange are all significant. We have revised the sentence on Line 823:

      “Points in blue/orange indicate significant effects.”

      These are all small suggestions that may help you improve the readability of the final manuscript. I warmly thank you for the opportunity to review this impressive study.

      We appreciate your careful review and profound suggestions. We believe these modifications will improve the final manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I have a few minor suggestions for paper revision for your otherwise excellent manuscript. I wish to emphasize that it was a pleasure to read the manuscript and that I especially enjoyed a very nice flow throughout the ms from a nicely rounded introduction that led well into the research questions and hypotheses all the way to a good and solid discussion.

      Thank you very much for your review and recognition. We have carefully checked all recommendations and addressed them in the manuscript.

      (1) L 63: space before the bracket missing and I suggest moving the reference to the end of the sentence (directly after habitat fragmentation does not seem to make sense).

      Thank you very much for this suggestion. The missed space was added, and the reference has been moved to the end of the sentence. We also add a general definition of habitat fragmentation. The new sentence is on Lines 61-64:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (2) L 102: I suggest to write "benefitting ..." instead.

      Done.

      (3) L 103: higher extinction rates (add "s").

      Done.

      (4) L 104: this should probably say "emigrate" and "climate warming".

      Done.

      (5) L 130-133: this is true for emigration (more isolated islands show slower emigration). But what about increased local extinction, especially for small and isolated islands? Especially since you mentioned later in the manuscript that often emigration and extinction are difficult to identify or differentiate. Might be worth a thought here or somewhere in the discussion?

      Thank you for this good question. I would like to answer it in two aspects:

      Yes, we can’t distinguish between true local extinction and emigration. The observed local “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then, if can’t emigrate or withstand, “real local dieback”. Over 10 years, the cold-adapted species would have to tolerate before real extinction on remote islands because of disperse limitation, while on less isolated islands it would be easy to emigrate and find a more suitable habitat for the same species. Consequently, it’s harder for us to observe “extinction” of species on more isolated islands, while it’s easier to observe “fake extinct” of species on less isolated islands due to emigration. As a result, the observed extinction rate is expected to increase more sharply for species on less remote islands, while the observed extinction rate is expected to increase relatively moderately for the same species on remote islands.

      We have modified the legend of Figure 1 on Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      Besides, you said “But what about increased local extinction, especially for small and isolated islands?”, I think you are mentioning the “high extinction rate per se on remote islands”. We want to test the “trend” of extinction rate on a temporal scale, rather than the extinction rate per se on a spatial scale. Even though species have a high extinction rate on remote islands, it can also show a slower changing rate in time.

      I hope these answers solve the problem.

      (6) L 245: I think this is the first time the acronym appears in the ms (as the methods come after the discussion), so please write the full name here too.

      Thank you for pointing out this. I realized “Thousand Island Lake” appears for the first time in the last paragraph of the Introduction part. So we add “TIL” there on Lines 108-109:

      “Here, we use 10 years of bird community data in a subtropical land-bridge island system (Thousand Island Lake, TIL, China, Figure 2) during a period of consistent climatic warming.”

      (7) L 319: this section could end with a summary statement on idiosyncratic responses (i.e. some variation in the responses you found among the species) and the potential reasons for this, such as e.g. the role of other species traits or interactions, as well as other ways to measure habitat fragmentation (see main comments in public review).

      Thank you for this suggestion both in Public Review and here. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      We only strengthen “habitat loss” here, because idiosyncratic responses mainly come from the mediating effect of habitat loss. For the mediating effect of isolation, the response is relatively consistent (see Page 8, Lines 183-188): “In particular, the effect of isolation on temporal dynamics of thermophilization was relatively consistent across cold- and warm-adapted species (Figure 5a, b); specifically, on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate”.

      (8) L 333: what about the distance to other islands? it's more of a network than a island-mainland directional system (Figure 2). You could address this aspect in the discussion.

      Thank you for this good question again. Isolation can be measured in different ways in the study region. We chose distance to the mainland because it was the best predictor of colonization and extinction rate of breeding birds in the study region, and produced similar results like the other distance-based measures, including distance to the nearest landmass, distance to the nearest larger landmass (Si et al., 2014). We still agree with you that it’s necessary to consider more aspects of “isolation” at least in discussion for future research. In Discussion part, we addressed these on Lines 292-299. For more details refer to the response to Public Review.

      (9) Figure 2: Is B1 one of the sampled islands? It is clearly much larger than most other islands and I think it could thus serve as an important population source for many of the adjacent smaller islands? Thus, the nearest neighbor distance to B1 could be as important in addition to the distance to the mainland?

      Yes, B1 is one of the sampled islands and is also the biggest island. In previous research in our study system, we tried distance to the nearest landmass, to the nearest larger landmass and the nearest mainland, they produced similar results (For more details refer to the response to Public Review). We agree with you that the nearest neighbor distance to B1 could be a potentially important measure, but need further research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (10) L 345: 20km/h walking seems impressively fast? I assume this is a typo.

      Sorry for the carelessness, it should be 2.0 km/h. it has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (11) L 485: I had difficulties fully understanding the models that were fitted here and could not find them in the codes you provided (which were otherwise very well documented!). Could you explain this modeling step in a bit more detail?

      Thank you for your recognition! According to Line 485 in the online PDF version (Methods part 4.6.3), it says: “An increasing colonization trend of warm-adapted species and increasing extinction trend of cold-adapted species are two main expected processes that cause thermophilization (Fourcade et al., 2021). To test our third prediction about the mediating effect of habitat fragmentation, we selected warm-adapted species that had an increasing trend in colonization rate (positive year effect in colonization rate) and cold-adapted species that had an increasing extinction rate (positive year effect in extinction rate)…..”

      We carefully checked the code in Figshare link and found that the MOSM JAGS code was not uploaded before. Very sorry for that. Now it can be found in the document [MOSM.R] at https://figshare.com/s/7a16974114262d280ef7. Hope the code, together with the modeling process in section 4.5 in the Methods can help to understand the whole modeling process. Besides, we would like to explain how to decide the temporal trend in colonization or extinction of each species related to Line 485. Let’s take the model of species-specific extinction rate for example:

      In this model, “Island” was a random effect, “Year” is added as a random slope, thus allowing “year effect” (that is: the temporal trend) of extinction rate of species to vary with “island”. Further, the interaction effect between island variables (isolation, area) was added to test if the “year effect” was related to island area or isolation.

      Because we are only interested in warm-adapted species that have a positive temporal trend in colonization and cold-adapted species that have a positive temporal trend in extinction, which are two main processes underlying thermophilizaiton, we choose warm-adapted species that have a positive year-effect in colonization, and cold-adapted species that has a positive year-effect in extinction. Hope this explanation and the JAGS code can help if you are confused about this part.

      Hope these explanations can make it clearer.

      (12) Figure 1: to me, it would be more intuitive to put the landscape configuration in the titles of the panels b, c, and d instead of "only" the mechanisms. E.g. they could be: a) fragmented islands with low climate buffering; b) small islands with low habitat heterogeneity; c) isolated islands with dispersal limitations?

      It is also slightly confusing that the bird communities are above "island" in the middle of the three fragmented habitats - which all look a bit different in terms of tree species and structure which makes the reader first think that it has something to do with the "new" species community. so maybe worth rethinking how to illustrate the three fragmented islands?

      We would like to thank you for your nice proposition. Firstly, it’s a good idea to put the landscape configuration in the title of the panels b, c, d. The new title (a) is “Fragmented islands with low climate buffering”, title (b) is “Small islands with low habitat heterogeneity”, and title (c) is “Isolated patches with dispersal limitations”.

      Second, we realized that putting the “bird community” above “island” in the middle of the three patches is a bit confusing. Actually, we wanted to show bird communities only on that one island in the middle. The other two patches are only there to represent a fragmented background. To avoid misunderstanding, we added a sentence in the legend of Figure 1 on Lines 778-780:

      “The three distinct patches signify a fragmented background and the community in the middle of the three patches was selected to exhibit colonization-extinction dynamics in fragmented habitats.”

      (13) Figure 4: please add the description of the color code for panel a.

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (14) Figure 5: You could consider adding this as panel c to Figure 4 as it depicts the same thing as in 4a but for CTI-abundance.

      Thank you for this advice. We have moved the original Figure 5 to Figure 4c. Previous Figure 6 thus turned into Figure 5. All corresponding citations in the main text were checked to adapt to the new index. The new figure is now on Lines 801-815:

      References

      Ferraz, G., Russell, G. J., Stouffer, P. C., Bierregaard Jr, R. O., Pimm, S. L., & Lovejoy, T. E. (2003). Rates of species loss from Amazonian forest fragments. Proceedings of the National Academy of Sciences, 100(24), 14069-14073. doi:10.1073/pnas.2336195100

      Fourcade, Y., WallisDeVries, M. F., Kuussaari, M., van Swaay, C. A., Heliölä, J., & Öckinger, E. (2021). Habitat amount and distribution modify community dynamics under climate change. Ecology Letters, 24(5), 950-957. doi:10.1111/ele.13691

      Gaüzère, P., Princé, K., & Devictor, V. (2017). Where do they go? The effects of topography and habitat diversity on reducing climatic debt in birds. Global Change Biology, 23(6), 2218-2229. doi:10.1111/gcb.13500

      Gonzalez, A. (2000). Community relaxation in fragmented landscapes: the relation between species richness, area and age. Ecology Letters, 3(5), 441-448. doi:10.1046/j.1461-0248.2000.00171.x

      Haddad, N. M., Brudvig, L. A., Clobert, J., Davies, K. F., Gonzalez, A., Holt, R. D., . . . Collins, C. D. (2015). Habitat fragmentation and its lasting impact on Earth’s ecosystems. Science advances, 1(2), e1500052. doi:10.1126/sciadv.1500052

      Richard, B., Dupouey, J. l., Corcket, E., Alard, D., Archaux, F., Aubert, M., . . . Macé, S. (2021). The climatic debt is growing in the understorey of temperate forests: Stand characteristics matter. Global Ecology and Biogeography, 30(7), 1474-1487. doi:10.1111/geb.13312

      Si, X., Pimm, S. L., Russell, G. J., & Ding, P. (2014). Turnover of breeding bird communities on islands in an inundated lake. Journal of Biogeography, 41(12), 2283-2292. doi:10.1111/jbi.12379

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The core idea is to combine the Vector Autoregressive model that is often used to infer Granger-causal connectivity in brain data with an encoding model that maps the features of a sensory stimulus to that brain data. The authors do a nice job of explaining the framework. And then they demonstrate its utility through some simulations and some analysis of real intracranial EEG data recorded from subjects as they watched movies. They infer from their analyses that the functional connectivity in these brain recordings is essentially unaltered during movie watching, that accounting for the driving movie stimulus can protect one against misidentifying brain responses to the stimulus as functional connectivity, and that recurrent brain activity enhances and prolongs the putative neural responses to a stimulus.

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. Overall, I thought this was an interesting manuscript with some rich and intriguing ideas. That said, I had some concerns also - one potentially major - with the inferences drawn by the authors on the analyses that they carried out.

      Main comments:

      (1) My primary concern with the way the manuscript is written right now relates to the inferences that can be drawn from the framework. In particular, the authors want to assert that, by incorporating an encoding model into their framework, they can do a better job of accounting for correlated stimulus-driven activity in different brain regions, allowing them to get a clearer view of the underlying innate functional connectivity of the brain. Indeed, the authors say that they want to ask "whether, after removing stimulus-induced correlations, the intrinsic dynamic itself is preserved". This seems a very attractive idea indeed. However, it seems to hinge critically on the idea of fitting an encoding model that fully explains all of the stimulus-driven activity. In other words, if one fits an encoding model that only explains some of the stimulus-driven response, then the rest of the stimulus-driven response still remains in the data and will be correlated across brain regions and will appear as functional connectivity in the ongoing brain dynamics - according to this framework. This residual activity would thus be misinterpreted. In the present work, the authors parameterize their stimulus using fixation onsets, film cuts, and the audio envelope. All of these features seem reasonable and valid. However, they surely do not come close to capturing the full richness of the stimuli, and, as such, there is surely a substantial amount of stimulus-driven brain activity that is not being accounted for by their "B" model and that is being absorbed into their "A" model and misinterpreted as intrinsic connectivity. This seems to me to be a major limitation of the framework. Indeed, the authors flag this concern themselves by (briefly) raising the issue in the first paragraph of their caveats section. But I think it warrants much more attention and discussion.

      We agree. One can never be sure that all stimulus induced correlation is accounted for. We now formulate our question more cautiously: 

      “We will ask here whether, after removing some of the stimulus-induced correlations, the intrinsic dynamic is similar between stimulus and rest conditions.”

      We also highlight that one may expect the opposite result of what we found: 

      “A general observation of these studies is that a portion of the functional connectivity is preserved between rest and stimulus conditions, while some aspects are altered by the perceptual task [12,16], sometimes showing increased connectivity during the stimulus.[15].” 

      We have added a number of additional features (acoustic edges, fixation novelty, and motion) and more carefully characterize how much “connectivity” each one explains in the neural data: 

      “Removing any of the input features increased the effect size of recurrent connections compared to a model with all features (Fig. S4). We then cumulatively added each feature to the VARX model. Effect size monotonically decreases with each feature added (Fig. 3F). Decreases of effect size are significant when adding film cuts (ΔR=-3.6*10<sup>-6</sup>, p<0.0001, N=26, FDR correction, α=0.05) and the sound envelope (ΔR=-3.59*10<sup>-6</sup>, p=0.002, N=26, FDR correction, α=0.05). Thus, adding more input features progressively reduces the strength of recurrent “connections”.”

      We also added more data to the analysis comparing movies vs rest. We now use 4 different movie segments instead of 1 and find reduced recurrent connectivity during movies: 

      “The number of significant recurrent connections in  were significantly reduced during  movie watching compared to rest (Fig. 4C, fixed effect of stimulus: beta = -3.8*10<sup>-3</sup>, t(17) = -3.9, p<0.001), as is the effect size R (Fig. 4D, fixed effect of stimulus: beta = -2.5*10<sup>-4</sup>, t(17) = -4.1, p<0.001).”

      The additional analysis is described in the Methods section:

      “To compare recurrent connectivity between movies and the resting-state, we compute VARX models in four different movie segments of 5 minutes length to match the length of the resting state recording. We use the first and second half of ‘Despicable Me English’, the first half of ‘Inscapes’ and one of the ‘Monkey’ movies. 18 patients include each of these recordings. For each recording in each patient we compute the fraction of significant channels (p<0.001) and average the effect size R across all channel pairs, excluding the diagonal. We test the difference between movies and resting-state with linear mixed-effect models with stimulus as fixed effect (movie vs rest), and patient as random effect, using matlab’s fitlme() routine.”

      We had already seen this trend of decreasing connectivity during movie watching before, and reported on it cautiously as “largely unaltered”. We updated the Abstract correspondingly from “largely unaltered” to “reduced”: 

      “We also find that the recurrent connectivity during rest is reduced during movie watching.”

      We mentioned this possibility in the Discussion before, namely, that additional input features may reduce recurrent connectivity in the model, and therefore show a difference. We discuss this result now as follows: 

      “The stimulus features we included in our model capture mostly low-level visual and auditory input. It is possible that regressing out a richer stimulus characterization would have removed additional stimulus-induced correlation. While we do not expect that this would change the overall effect of a reduced number of “connections” during movie watching compared to resting state, the interpretation of changes in specific connections will be affected by the choice of features. For example, in sensory cortices, higher recurrent connectivity in the LFP during rest would be consistent with the more synchronized state we saw in rest, as reflected by larger oscillatory activity. Synchronization in higher-order cortices, however, is expected to be more strongly influenced by semantic content of external input.”

      In the Discussion we expand on what might happen if additional stimulus features were to be included into the model:  

      “Previous literature does often not distinguish between intrinsic dynamics and extrinsic effects. By factoring out some of the linear effects of the external input we conclude here that recurrent connectivity is reduced in average. From our prior work49, we know that the stimulus features we included here capture a substantial amount of variance across the brain in intracranial EEG. Arguably, however, the video stimuli had rich semantic information that was not captured by the low-level features used here. Adding such semantic features could have further reduced shared variance, and consequently further reduced average recurrent connectivity in the model.”

      “Similarities and differences between rest and movie watching conditions reported previously, do not draw a firm conclusion as to whether overall “functional connectivity” is increased or reduced. Results seem to depend on the time scale of neural activity analyzed, and the specific brain networks [12,16,63]. However, in fMRI, the conclusion seems to be that functional connectivity during movies is stronger than during rest[15], which likely results from stimulus induced correlations. The VARX model can remove some of the effects of these stimuli, revealing that average recurrent connectivity may be reduced rather than increased during stimulus processing.”

      And in the conclusion we now write: 

      “The model revealed a small but significant decrease of recurrent connectivity when watching movies.”

      (2) Related to the previous comment, the authors make what seems to me to be a complex and important point on page 6 (of the pdf). Specifically, they say "Note that the extrinsic effects captured with filters B are specific (every stimulus dimension has a specific effect on each brain area), whereas the endogenous dynamic propagates this initial effect to all connected brain areas via matrix A, effectively mixing and adding the responses of all stimulus dimensions. Therefore, this factorization separates stimulus-specific effects from the shared endogenous dynamic." It seems to me that the interpretation of the filter B (which is analogous to the "TRF") for the envelope, say, will be affected by the fact that the matrix A is likely going to be influenced by all sorts of other stimulus features that are not included in the model. In other words, residual stimulus-driven correlations that are captured in A might also distort what is going on in B, perhaps. So, again, I worry about interpreting the framework unless one can guarantee a near-perfect encoding model that can fully account for the stimulus-driven activity. I'd love to hear the authors' thoughts on this. (On this issue - the word "dominates" on page 12 seems very strong.)

      This is an interesting point we had not thought about. After some theoretical considerations and some empirical testing we conclude that the effect of missing inputs is relevant, but can be easily anticipated. 

      We have added the following to the Results section explaining and demonstrated empirically the effects of adding features and signals to the model: 

      “As with conventional linear regression, the estimate in B for a particular input and output channel is not affected by which other signals are included in or , provided those other inputs are uncorrelated. We confirmed this here empirically by removing dimensions from (Fig. S11A), and by adding uncorrelated input to (Fig. S11B, adding fixation onset does not affect the estimate for auditory envelope responses). In other words, to estimate B, we do not require all possible stimulus features and all brain activity to be measured and included in the model. In contrast, B does vary when correlated inputs are added to (Fig. S11C, adding acoustic edges changes the auditory envelope response). Evidently the auditory envelope and acoustic edges are tightly coupled in time, whereas fixation onset is not. When a correlated input is missing (acoustic edges) then the other input (auditory envelope) absorbs the correlated variance, thus capturing the combined response of both.”

      (3) Regarding the interpretation of the analysis of connectivity between movies and rest... that concludes that the intrinsic connectivity pattern doesn't really differ. This is interesting. But it seems worth flagging that this analysis doesn't really account for the specific dynamics in the network that could differ quite substantially between movie watching and rest, right? At the moment, it is all correlational. But the dynamics within the network could be very different between stimulation and rest I would have thought.

      As discussed above, with more data and additional stimulus features we now see detectable changes in the connectivity. The example in Figure 4G also shows that specific connections may change in different directions, while overall the strength of connections slightly decreases during movie watching compared to rest. We added the following to the results:

      “While the effect size decreases on average, there is some variation across different brain areas (Fig. 4E-G).”

      But even if the connectivity were unchanged, the activity on this network can be different with varying inputs. We actually also saw that there were changes in the variability of activity (Figs. 6 and S13) that may point to non-linear effects. It seems that injecting the input will cause an overall change in power, which can be explained by a relatively simple non-linear gain adaptation. These effects are already discussed at some length in the paper. 

      (4) I didn't really understand the point of comparing the VARX connectivity estimate with the spare-inverse covariance method (Figure 2D). What was the point of this? What is a reader supposed to appreciate from it about the validity or otherwise of the VARX approach?

      We added the following motivation and clarification on this topic: 

      “To test the descriptive validity [43] of the VARX model we follow the approach of recovering structural connectivity from functional activity in simulation. [44] Specifically, we will compare the recurrent connectivity A derived from brain activity simulated assuming a given structural connectivity, i.e. we ask, can the VARX model recover the underlying structural connectivity, at least in a simulated whole-brian model with known connectivity? … For comparison, we also used the sparse-inverse covariance method to recover connectivity from the correlation matrix (functional connectivity). This method is considered state-of-the-art as it is more sensitive than other methods in detecting structural connections [48]”

      (5) I think the VARX model section could have benefitted a bit from putting some dimensions on some of the variables. In particular, I struggled a little to appreciate the dimensionality of A. I am assuming it has to involve both time lags AND electrode channels so that you can infer Granger causality (by including time) between channels. Including a bit more detail on the dimensionality and shape of A might be helpful for others who want to implement the VARX model.

      Your assumption is correct. We added the following to make this easier for readers: 

      “Therefore, A  has dimensions B has dimensions , where are the dimensions of and respectively.”

      (6) A second issue I had with the inferences drawn by the authors was a difficulty in reconciling certain statements in the manuscript. For example, in the abstract, the authors write "We find that the recurrent connectivity during rest is largely unaltered during movie watching." And they also write that "Failing to account for ... exogenous inputs, leads to spurious connections in the intrinsic "connectivity".

      Perhaps this segment of the abstract needed more explanation. To enhance clarity we have also changed the ordering of the findings. Hopefully this is more clear now: 

      “This model captures the extrinsic effect of the stimulus and separates that from the intrinsic effect of the recurrent brain dynamic. We find that the intrinsic dynamic enhances and prolongs the neural responses to scene cuts, eye movements, and sounds. Failing to account for these extrinsic inputs, leads to spurious recurrent connections that govern the intrinsic dynamic. We also find that the recurrent connectivity during rest is reduced during movie watching.”

      Reviewer #2 (Public review):

      Summary:

      The authors apply the recently developed VARX model, which explicitly models intrinsic dynamics and the effect of extrinsic inputs, to simulated data and intracranial EEG recordings. This method provides a directed method of 'intrinsic connectivity'. They argue this model is better suited to the analysis of task neuroimaging data because it separates the intrinsic and extrinsic activity. They show: that intrinsic connectivity is largely unaltered during a movie-watching task compared to eyes open rest; intrinsic noise is reduced in the task; and there is intrinsic directed connectivity from sensory to higher-order brain areas.

      Strengths:

      (1) The paper tackles an important issue with an appropriate method.

      (2) The authors validated their method on data simulated with a neural mass model.

      (3) They use intracranial EEG, which provides a direct measure of neuronal activity.

      (4) Code is made publicly available and the paper is written well.

      Weaknesses:

      It is unclear whether a linear model is adequate to describe brain data. To the author's credit, they discuss this in the manuscript. Also, the model presented still provides a useful and computationally efficient method for studying brain data - no model is 'the truth'.

      We fully agree and have nothing much to add to this, except to highlight the benefit of a linear model even as explanation for non-linear phenomena: 

      “The [noise-quenching] effect we found here can be explained by a VARX model with the addition of a divisive gain adaptation mechanism … The noise-quenching result and its explanation via gain adaptation shows the benefit of using a parsimonious linear model, which can suggest nonlinear mechanisms as simple corrections from linearity.”

      Appraisal of whether the authors achieve their aims:

      As a methodological advancement highlighting a limitation of existing approaches and presenting a new model to overcome it, the authors achieve their aim. Generally, the claims/conclusions are supported by the results.

      The wider neuroscience claims regarding the role of intrinsic dynamics and external inputs in affecting brain data could benefit from further replication with another independent dataset and in a variety of tasks - but I understand if the authors wanted to focus on the method rather than the neuroscientific claims in this manuscript.

      We fully agree. We added the following to the Discussion section:

      “Future studies should test if our findings replicate in an independent iEEG datasets, including active tasks and whether they generalize to other neuroimaging modalities.”

      Impact:

      The authors propose a useful new approach that solves an important problem in the analysis of task neuroimaging data. I believe the work can have a significant impact on the field.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Did you mean "less" or "fewer" in the following sentence "..larger values lead to overfitting, i.e. less significant connections..."?

      We mean fewer. Thanks for catching this. 

      (2) I didn't see any equations showing how the regularization parameter lambda is incorporated into the framework.

      We prefer the math and details of the algorithm to an earlier paper that has now been published. Instead we added the following clarification: 

      “The VARX models were fitted to data with the matlab version of the code31 using conventional L2-norm regularization. The corresponding regularization parameter was set to 𝜆=0.3.”

      (3) I think some readers of this might struggle to understand the paragraph beginning

      "Connectivity plots are created with nilearn's plot_connectome() function...". It's all quite opaque for the uninitiated.

      Agreed. We now write more simply: 

      “Connectivity plots in Fig. 4 were created with routines from the nilearn toolbox [51].”

      (4) The paragraph beginning "The length of responses for Figure 5..." is also very opaque and could do with being explained more fully. Or this text could be removed from the methods and incorporated into the relevant results section where you actually discuss this analysis.

      Thank you for flagging this. We expand on the details in the Methods as follows: 

      “The length of responses for each channel in B and H to external inputs in Fig. 5 is computed with Matlab's findpeaks() function. This function returns the full-width at half of the peak maximum minus baseline. Power in each channel is computed as the squares of the responses averaged over the time window that was analyzed (0-0.6s).”

      (5) I think adding some comments to the text or caption related to Figures 3C and 3D would be helpful so readers can understand these numbers a bit better. One seems to be the delta log p value and the other is the delta ratio. What does positive or negative mean? Readers might appreciate a little more help.

      We expanded it as follows, hopefully this helps: 

      “C) difference of log for VAX model without minus with inputs (panel A - B). Both models are fit to the same data. D) Thresholding panels A and B at p<0.0001 gives a fraction of significant connections. Here we show the fraction of significant channels for models with and without input. Each line is a patient with color indicating increase or decrease  E) Mean over all channels for VARX models with and without inputs. Each line is a patient.”

      (6) It is not clear what the colors mean in Figures 4 E, F, G.

      We updated the color scheme for those figure panels and carefully explained it in the caption. Please see the manuscript for updated figure 4.   

      (7) It might be nice to slightly unpack what you mean by the "variability of the internal dynamic" and why it can be equated with the power of the innovation process.

      In the methods we added the following clarification right after defining the VARX model: 

      “The innovation process captures the internal variability of the model. Without it, repeating the same input would always result in a fixed deterministic output .”

      In the results section we added the following: 

      “As a metric of internal variability we measured the power of the intrinsic innovation process , which captures the unobserved “random” brain activity which leads to variations in the responses.”

      (8) Typos etc.

      a) "... has been attributed to variability of ongoing dynamic"

      b) The manuscript refers to a Figure 3G, but there is no Figure 3G.

      c) n_a = n_a = 1. Is that a typo?

      d) fiction

      Thank you for catching these. We fixed them. 

      Reviewer #2 (Recommendations for the authors):

      (1) I'm curious about the authors' opinions on the conditions studied. Naively, eyes open rest and passive movie watching seem like similar conditions - were the authors expecting to see a difference with VARX? Do the authors expect that they would see bigger differences when there is a larger difference in sensory input, e.g. eyes closed rest vs movie watching? Given the authors are arguing the need to explicitly model external inputs, a real data example contrasting two very different external inputs might better demonstrate the model's utility.

      Thank you for this suggestion. We added an analysis of eyes-closed rest recordings, available in 8 patients (Fig. S8). The difference between movie and rest is indeed more pronounced than for eyes open rest. The result is described in the methods:

      “In a subset of patients with eyes-closed resting state we find the same effect, that is qualitatively more pronounced (Fig. S8).”

      This complements our updated finding of a difference between movie and eyes-open rest that does show a significant difference after adding more data to this analysis. The results have been updated as following

      “The number of significant recurrent connections in  were significantly reduced during  movie watching compared to rest (Fig. 4C, fixed effect of stimulus:

      beta = -3.8*10<sup>-3</sup>, t(17) = -3.9, p<0.001), as is the effect size R (Fig. 4D, fixed effect of stimulus: beta = -2.5*10<sup>-4</sup>, t(17) = -4.1, p<0.001).”

      The abstract has been updated accordingly:

      “We also find that the recurrent connectivity during rest is reduced during movie watching.”

      (2) It would also have been interesting to see how the proposed model compares to DCM - however, I understand if the authors wanted to focus on their model rather than a comparison with other models.

      We did not try the DCM for a number of reasons. 1) it does not allow for delays in the model dynamic (i.e. the entire time course of the response has to be captured by the recurrent dynamic of a single time step A). 2. It is computationally prohibitive and would not allow us to analyze large channel counts. 3. The available code is custom made for fMRI or EEG analysis with very specified signal generation models that do not obviously apply to iEEG. We added the following to the Discussion of the CDM:  

      “Similar to the VARX model, DCM includes intrinsic and extrinsic effects A and B. However, the modeling is limited to first-order dynamics (i.e. η<sub>a</sub>=η<sub>b</sub>=1). Thus, prolonged responses have to be entirely captured with a first-order recurrent A. … In contrast, here we have analyzed up to 300 channels per subject across the brain, which would be prohibitive with DCM. By analyzing a large number of recordings we were able to draw more general conclusions about whole-brain activity.”

      (3) I believe improving the consistency of the terminology used would improve the manuscript:

      a) Intrinsic dynamics vs intrinsic connectivity vs recurrent connectivity:

      - The term 'intrinsic dynamic' is first introduced in paragraph 3 of the introduction. An explicit definition of is meant by this term would benefit the manuscript.

      - Sometimes the terminology changes to 'intrinsic connectivity' or 'recurrent connectivity'. An explicit definition of these terms (if they refer to different things) would also benefit the manuscript.

      We had used the term “intrinsic” and “recurrent” interchangeably. We now try to mostly say “intrinsic dynamic” when we talk about the more general phenomenon or recurrent brain dynamic, while using “recurrent connectivity” when we refer to the model parameters A. 

      We provide now a definition already at the start of the Abstract: 

      “Sensory stimulation of the brain reverberates in its recurrent neural networks. However, current computational models of brain activity do not separate immediate sensory responses from this intrinsic dynamic. We apply a vector-autoregressive model with external input (VARX), combining the concepts of “functional connectivity” and “encoding models”, to intracranial recordings in humans. This model captures the extrinsic effect of the stimulus and separates that from the intrinsic effect of the recurrent brain dynamic.”

      And at the start of the introduction: 

      “The primate brain is highly interconnected between and within brain areas. … We will refer to the dynamic driven by this recurrent architecture as the intrinsic dynamic of the brain.”

      b) Intrinsic vs Endogenous and Extrinsic vs Exogenous:

      - Footnote 1 defines the 'intrinsic' and 'extrinsic' terminology.

      - However, there are instances where the authors switch back to endogenous/exogenous.

      - Methods section: "Overall system response", paragraph 2.

      - Results section: "Recurrent dynamic enhances and prolongs stimulus responses".

      - Conclusions section.

      With a foot in both neuroscience and systems identification, it’s a hard habit to break. Thanks for catching it. We searched and replaced all instances of endogenous and exogenous.  

      (4) Methods:

      a) The model equation would be clearer if the convolution was written out fully. (I had to read reference 1 to understand the model.).

      We now spell out the full equation and hope it's not too cumbersome to read:  

      “For the th signal channel the recurrence of the VARX model is given by: 

      b) How is an individual dimension omitted in the reduced model, are the values in the y, x set to zero?

      No, it is actually removed from the linear prediction. We added: 

      “… omitted from the prediction …”

      c) "The p-value quantifies the probability that a specific connection in A or B is zero" - for each of n_a/n_b filters?

      d) It should be clarified that D is a vector.

      We hope the following clarification addresses both these questions: 

      “The p-value quantifies the probability that a specific connection in either A or B is zero. Therefore, D,P and R<sup>2</sup> all have dimensions or for A or B  respectively.”

      (5) Results:

      a) Stimulus-induced reduction of noise in the intrinsic activity: would be good to define the frequency range for theta and beta in paragraph 2.

      Added. 

      b) Neural mass model simulation:

      - A brief description of what was simulated is needed.

      We basically ran the sample code of the neurolib library. With that in mind maybe the description we already provide is sufficient:  

      “We used the default model simulation of the neurolib python library (using their sample code for the “ALNModel”), which is a mean-field approximation of adaptive exponential integrate-and-fire neurons. This model can generate simulated mean firing rates in 80 brain areas based on connectivity and delay matrices determined with diffusion tensor imaging (DTI). We used 5 min of “resting state” activity (no added stimulus, simulated at 0.1ms resolution, subsequently downsampled to 100Hz).”

      - It's not clear to me why the A matrix should match the structural connectivity.

      We added the following introduction to make the purpose of this simulation clear:

      “To test the descriptive validity [43] of the VARX model we follow the approach of recovering structural connectivity from functional activity in simulation. [44] Specifically, we will compare the “connectivity” A derived from brain activity simulated assuming a given structural connectivity, i.e. we ask, can the VARX model recover the underlying structural connectivity, at least in a simulated whole-brian model with known connectivity?”

      - It would be interesting to see the inferred A matrix.

      We added a Supplement figure for this and the following: 

      “The VARX model was estimated with n<sub>a</sub>=2, and no input. The resulting estimate for A is dominated by the diagonal elements that capture the autocorrelation within brain areas (Fig. S1).”

      - How many filters were used here?

      No input filters were used for this simulation:

      We used 5 min of “resting state” activity (no added stimulus, simulated at 0.1ms resolution, subsequently downsampled to 100Hz). 

      c) Intracranial EEG:

      - It's not clear how overfitting was measured and how the selection of the number of filters (n_a and n_b) was done.

      We have removed the statement about overfitting. Mostly the word is used in the context of testing on a separate dataset, which we did not do here. So this “overfitting” can be confusing. Instead we used the analytic p-value as indication that a larger model order is not supported by the data. We write this now as follows: 

      “Increasing the number of delays n<sub>a</sub>, increases estimated effect size R (Fig. S3A,B), however, larger values lead to fewer significant connections (Fig. S3C). Significance (p-value) is computed analytically, i.e. non-parametrically, based on deviance. Values around n<sub>a</sub>=6 time delays appear to be the largest model order supported by this statistical analysis.”

      d) Figure 1:

      - Typo: "auto-regressive"

      Fixed. Thanks for catching that. 

      - LFP and BHA in C are defined much later in the text, would be useful to define these in the caption. o Shouldn't B (the VARX model parameter) be a 2x3 matrix for different time lags?

      Hopefully the following clarifications address both these points: 

      “C) Example of neural signal y(t) recorded at a single location in the brain. We will analyze local field potentials (LFP) and broad-band high frequency activity (BHA) in separate analyses.  D) Examples of filters B for individual feed-forward connections between an extrinsic input and a specific recording location in the brain.”

      (6) Discussion:

      I could not find Muller et al 2016 listed in the references.

      Added. Thanks for catching that omission. 

      Additional edits prompted by reviewers, but not in the context of any particular comment.

      While reviewers did not raise this following point, we felt the need clarify the terminology in the Methods to make sure there is not misunderstanding in the proposed interpretation of the model: 

      “We will refer to the filters in matrix A and B and as recurrent and feed-forward “connections”, but avoid the use of the word “causal” which can be misleading.”

      In addressing questions to Figure 4, we noticed that there is quite a bit of variability across patients, so the analysis for Figure 4 and 7 which combines data across patients now accounts for a random effect of patient (previously we have used mean values for repeated measures). We added the following to the Methods to explain this:

      “To compare recurrent connectivity between movies and the resting-state (in Fig. 4), we compute VARX models in four different movie segments of 5 minutes length to match the length of the resting state recording. We use the first and second half of ‘Despicable Me English’, the first half of ‘Inscapes’ and one of the ‘Monkey’ movies. 18 patients include each of these recordings. For each recording in each patient we compute the fraction of significant channels (p<0.001) and average the effect size R across all channel pairs, excluding the diagonal. We test the difference between movies and resting-state with linear mixed-effect models with stimulus as fixed effect (movie vs rest), and patient as random effect (to account for the repeated measures for the different video segments), using matlab’s fitlme() routine. For the analysis of asymmetry of recurrent connectivity (in Fig. 4) we also used a mixed-effect model with T1w/T2w ratio as fixed effect and patients as random effect (to account for the repeated measures in multiple brain locations).”

      All analyses were rerun with more data (eyes closed resting) and 2 additional patients that have become available since the first submission. Therefore all figures and statistics have been updated throughout the paper. Other than the difference between movies and resting state which was trending before and is now significant, no results changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 0: In this paper, the authors develop a comprehensive program to investigate the organization of chromosome structures at 100 kb resolution. It is extremely well executed. The authors have thought through all aspects of the problem. The resulting software will be most useful to the community. Interestingly they capture many experimental observations accurately.

      I have very few complaints.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: The number of parameters in the energy function is very large. Is there any justification for this? Could they simplify the functions?

      We extend our gratitude to the reviewer for their insightful remarks. The parameters within our model can be categorized into two groups: those governing chromosome-chromosome interactions and those governing chromosome-nuclear landmark interactions.

      In terms of chromosome-chromosome interactions, the parameter count is relatively modest compared to the vast amount of Hi-C data available. For instance, while the whole-genome Hi-C matrix at the 100KB resolution encompasses approximately 303212 contacts, our model comprises merely six parameters for interactions among different compartments, along with 1000 parameters for the ideal potential. As outlined in the supporting information, the ideal potential is contingent upon sequence separation, with 1000 chosen to encompass bead separations of up to 100MB. While it is theoretically plausible to reduce the number of parameters by assuming interactions cease beyond a certain sequence separation, determining this scale a priori presents a challenge.

      During the parameterization process, we observed that interchromosomal contacts predicted solely based on compartmental interactions inadequately mirrored Hi-C data. Consequently, we introduced 231 additional parameters to more accurately capture interactions between distinct pairs of autosomes. These interactions may stem from factors such as non-coding RNA or proteins not explicable by simple, non-specific compartmental interactions.

      Regarding parameters concerning chromosome-nuclear landmark interactions, we have 30321 parameters for speckles and 30321 for the nuclear lamina. To streamline the model, we opted to assign a unique parameter to each chromatin bead. However, it is conceivable that many chromatin beads share a similar mechanism for interacting with nuclear lamina or speckles, potentially allowing for a common parameter assignment. Nonetheless, implementing such simplification necessitates a deeper mechanistic understanding of chromosome-nuclear landmark interactions, an aspect currently lacking.

      As our comprehension of nuclear organization progresses, the interpretability of parameter counts may improve, facilitating their reduction.

      Comment 2: What would the modification be if the resolution is increased?

      To increase the resolution of chromatin, we can in principle keep the same energy function as defined in Eq. S6. In this case, we only need to carry out further parameter optimization.

      However, transitioning to higher resolutions may unveil additional features not readily apparent at 100kb. Notably, chromatin loops with an average size of 200kb or smaller have been identified in high-resolution Hi-C data [1]. To effectively capture these loops, new terms in the energy function must be incorporated. For instance, Qi and Zhang [2] employed additional contact potentials between CTCF sites to account for loop formation. Alternatively, an explicit loop-extrusion process could be introduced to model loop formation more accurately.

      Comment 3: They should state that the extracted physical values are scale-dependent. For example, viscosity.

      We thank the reviewer for the comment and would like to clarify that our model does not predict the viscosity. The nucleoplasmic viscosity was set as 1Pa · s to produce a diffusion coefficient that reproduces experimental value. The exact value for the nucleoplasmic viscosity is still rather controversial, and our selected value falls in the range of reported experimental values from 10−1Pa·s to 102Pa · s.

      We have modified the main text to clarify the calculation of the diffusion coefficient.

      “The exponent and the diffusion coefficient Dα = (27±11)×10−4μm2 · s−α both match well with the experimental values [cite], upon setting the nucleoplasmic viscosity as 1Pa · s (see Supporting Information Section: Mapping the reduced time unit to real time for more details).”

      Reviewer 2:

      Comment 0: In this work, Lao et al. develop an open-source software (OpenNucleome) for GPU-accelerated molecular dynamics simulation of the human nucleus accounting for chromatin, nucleoli, nuclear speckles, etc. Using this, the authors investigate the steady-state organization and dynamics of many of the nuclear components.

      We thank the reviewer for summary of our work.

      Comment 1: The authors could introduce a table having every parameter and the optimal parameter value used. This would greatly help the reader.

      We would like to point out that model parameters are indeed provided in Table S1, S2, S3, S4, and Fig. S7. In these tables, we further provided details on how the parameters were determined.

      Given the large number of parameters for the ideal potential (1000), we opted to plot it rather than listing out all the numbers. We added three new figures to plot the interaction parameters between chromosomes, between chromosomes and speckles, and between chromosomes and the nuclear lamina. Numerical values can be found online in the GitHub repository (parameters).

      Comment 2: How many total beads are simulated? Do all beads have the same size?

      The total number of the coarse-grained beads is 70542, including 60642 chromatin beads, 300 nucleolus beads, 1600 speckle beads, and 8000 nuclear lamina beads. The radius of the chromatin, nucleolus, and speckle beads is 0.25, while that of the lamina bead is 0.5. More information of the size and number of the beads are discussed in the Section: Components of the whole nucleus model.

      Comment 3: In Equation S17, what is the 3rd and 4th powers mean? What necessitates it?

      The potential defined in Equation S17 follows the definition of class2 bond in the LAMMPS package (LAMMPS docs). Compared to a typical harmonic potential, the presence of higher order terms produces sharper increase in the energy at large distances (Author response image 1). This essentially reduces the flucatuation of bond length in simulations.

      Author response image 1.

      Comparison between the Class2 potential (defined in Eq. S17) and the Harmonic potential (K(r − r0)2, with K = 20 and r0 = 0.5).

      Comment 4: What do the X-axis and Y-axis numbers in Figure 5A and 5B mean? What are their units?

      We apologize for the lack of clarify in our original figure. In Fig. 5A, the X and Y axis depicts the simulated and experimental radius of gyration (Rg) for individual chromosomes, as indicated in the title of the figure. Similarly, in Fig. 5B, the X and Y axis depicts the simulated and experimental radial position of individual chromosomes.

      We have converted the chromosome Rg values into reduced units and labeled the corresponding axes in the updated figure (Fig. 5). The normalized radial position is unitless and its detailed definition is included in the supporting information Section: Computing simulated normalized chromosome radial positions. We updated the figure caption to provide an explicit reference to the SI text.

      Reviewer 3:

      Comment 0: In this work, the authors present the development of OpenNucleome, a software for simulating the structure and dynamics of the human nucleus. It provides a detailed model of nuclear components such as chromosomes and nuclear bodies, and uses GPU acceleration for better performance based on the OpenMM package. The work also shows the model’s accuracy in comparisons with experimental data and highlights the utility in the understanding of nuclear organization. While I consider this work a good tool for the genome architecture scientific community, I have some comments and questions that could further clarify the usage of this tool and help potential users. I also have a few questions that would help to clarify the technique and results and some suggestions for references.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: Could the authors elaborate on what they consider to be ’well-established and easily adoptable modeling tools’?

      By well established, we meant that models that have been extensively validated and verified, and are highly regarded by the community.

      By easily adoptable, we meant that tools that are well documented and can be relatively easily learned by new groups without help from the developers.

      We have revised the text to clarify our meaning.

      “Despite the progress made in computational modeling, the absence of well-documented software with easy-to-follow tutorials pose a challenge.”

      Comment 2: Recognizing the value of a diverse range of tools in the community, the Open-MiChroM tool is also an open-source platform built on top of OpenMM. The documentation shows various modeling approaches and many tutorials that contain different approaches besides the MiChroM energy function. How does OpenNucleome compare in terms of facilitating crossvalidation and user accessibility? The two tools seem to be complementary, which is a gain to the field. I recommend adding one or two sentences in the matter. Also, while navigating the OpenNucleome GitHub, I have not found the tutorials mentioned in the text. I also consider a barrier in the process of generating necessary input files. I would suggest expanding the tutorials and documentation to help potential users.

      We thank the reviewer for the excellent comments. We agree that while many of the tutorials were included in the original package, they were not as clearly documented. We have revised them extensively to to now present:

      • A tutorial for optimizing chromosome chromosome interactions.

      • A tutorial for optimizing chromosome nuclear landmark interactions.

      • A tutorial for building initial configurations.

      • A tutorial for relaxing the initial configurations.

      • A tutorial for selecting the initial configurations.

      • A tutorial for setting up performing Langevin dynamics simulations.

      • A tutorial for setting up performing Brownian dynamics simulations.

      • A tutorial for setting up performing simulations with deformed nucleus.

      • A tutorial for analyzing simulation trajectories.

      • A tutorial for introducing new features to the model.

      These tutorials and our well-documented and open source code (https://zhanggroup-mitchemistry.github.io/OpenNucleome) should significantly promote user accessibility. Our inclusion of python scripts for analyzing simulation trajectorials shall allow users to compute various quantities for evaluating and comparing model quality.

      We added a new paragraph in the Section: Conclusions and Dicussion of the main text to compare OpenNucleosome with existing software for genome modeling.

      “Our software enhances the capabilities of existing genome simulation tools [cite]. Specifically, OpenNucleome aligns with the design principles of Open-MiChroM [cite], prioritizing open-source accessibility while expanding simulation capabilities to the entire nucleus. Similar to software from the Alber lab [cite], OpenNucleome offers highresolution genome organization that faithfully reproduces a diverse range of experimental data. Furthermore, beyond static structures, OpenNucleome facilitates dynamic simulations with explicit representations of various nuclear condensates, akin to the model developed by [citet].”

      Comment 3: Lastly, I would appreciate it if the authors could expand their definition of ’standardized practices’.

      We apologize for any confusion caused. By ”standardized practices,” we refer to the fact that different groups often employ unique procedures for structural modeling. These procedures differ in the representation of chromosomes, the nucleus environment, and the algorithms for parameter optimization. This absence of a consensus on the optimal practices for genome modeling can be daunting for newcomers to the field.

      We have revised the text to the following to avoid confusion:

      “Many research groups develop their own independent software, which complicates crossvalidation and hinders the establishment of best practices for genome modeling [3–5].”

      Comment 4: On page 7, the authors refer to the SI Section: Components of the whole nucleus model for further details. Could the authors provide more information on the simulated density of nuclear bodies? Is there experimental data available that details the ratio of chromatin to other nuclear components, which was used as a reference in the simulation?

      We thank the reviewer for the comment. Imaging studies have provided quantitative measures about the size and number of various nuclear bodies. For example, there are 2 ∼ 5 nucleoli per nucleus, with the typical size RNo ≈ 0.5μm [6–10]. In the review by Spector and Lamond [11], the authors showed that there are 20 ∼ 50 speckles, with the typical size RSp ≈ 0.3μm. We used these numbers to guide our simulation of nuclear bodies. These information was mentioned in the Section: Chromosomes as beads on the string polymers of the supporting information.

      The chromatin density is fixed by the average size of chromatin bead and the nucleus size. We chose the size of chromatin based on imaging studies as detailed in the Subsection: Mapping chromatin bead size to real unit of the supporting information. Upon fixing the bead size, the chromatin volume is determined.

      Comment 5: In the statement, ’the ideal potential is only applied for beads from the same chromosome to approximate the effect of loop extrusion by Cohesin molecules for chromosome compaction and territory formation,’ it would be helpful if the authors could clarify the scope of this potential. Specifically, the code indicates that the variable ’dend ideal’ is set at 1000, suggesting an interaction along a 100Mb polymer chain at a resolution of 100Kb per bead. Could the authors elaborate on their motivation for the Cohesin complex’s activity having a significant effect over such long distances within the polymer chain?

      We thank the reviewer for the insight comment. They are correct that the ideal potential was introduced to capture chromosome folding beyond the interactions between compartments, including loop extrusion. Practically, we parameterized the ideal potential such that the simulated average contact probabilities as a function of sequence separation match the experimental values. The reviewer is correct that beyond a specific value of sequence separation, one would expect the impact of loop extrusion on chromosome folding should be negligible, due to Cohesin dissociation. Correspondingly, the interaction potential should be zero at large sequence separations.

      However, it is important to note that the precise separation scale cannot be known a priori. We chose 100Mb as a conservative estimation. However, as we can see from Fig. S7, our parameterization scheme indeed produced interaction parameters are mainly zero at large sequence separations. Interesting, the scale at which the potential approaches 0 (∼ 500KB), indeed agree with the estimated length traveled by Cohesin molecules before dissociation [12].

      Comment 6: On pages 8 and 9, the authors discuss the optimization process. However, in reviewing the code and documentation available on the GitHub page, I could not find specific sections related to the optimization procedure described in the paper. In this context, I have a few questions: Could the authors provide more details or direct me to the parts of the documentation and the text/SI that address the optimization procedure used in their study? Additional clarification on the cost/objective function employed during the optimization process would be highly beneficial, as this was not readily apparent in the text.

      We thank the reviewer for the comment. We revised the SI to include the definition of the cost function for the Adam optimizer.

      “During the optimization process, our aim was to minimize the disparity between experimental findings and simulated data. To achieve this, we defined the cost function as follows:

      where the index i iterates over all the constraints defined in Eq. S28.”

      The detailed optimization procedure was included in the SI as quoted below

      “The details of the algorithm for parameter optimization are as follows

      (1) Starting with a set of values for and we performed 50 independent 3-million-step long MD simulations to obtain an ensemble of nuclear configurations. The 500K steps of each trajectory are discarded

      as equilibration. We collected the configurations at every 2000 simulation steps from the rest of the simulation trajectories to compute the ensemble averages defined on the left-hand side of Eq. S13.

      (2) Check the convergence of the optimization by calculating the percentage of error

      defined as . The summation over i includes all the average contact probabilities defined in Eq. S28.

      (3) If the error is less than a tolerance value etol, the optimization has converged, and we stop the simulations. Otherwise, we update the parameters, α, using the Adam optimizer [13]. With the new parameter values, we return to step one and restart the iteration.”

      Previously, the optimization code was included as part of the analysis folder. To avoid confusion and improve readability, a separate folder named optimization has been created. This folder provides the Adam optimization of chromosome-chromosome interactions (chr-chr optimization) and chromosome-nuclear landmarks interactions (chr-NL optimization).

      Comment 7: What was the motivation for choosing the Adam algorithm for optimization? Adam is designed for training on stochastic objective functions. Could the authors elucidate on the ’stochastic’ aspect of their function to be optimized? Why the Adam algorithm was considered the most appropriate choice for this application?

      We thank the reviewer for the comment. As defined in Eq. R1, the cost function measures the difference between the simulated constraints with corresponding experimental values. The estimation of simulation values, by averaging over an ensemble of chromosome configurations, is inherently noisy and stochastic. Exact ensemble averages can only be achieved with unlimited samples obtained from infinite long simulations.

      In the past, we have used the Newton’s method for parameterization, and the detailed algorithm can be found in the SI of Ref. 14. However, we found that Adam is more efficient as it is a first-order approximation method. The Newton’s method, on the other hand, is second-order approximation method and requires estimation of the Hessian matrix. When the number of constraints is large, as is in our case, the computational cost for estimating the Hessian matrix can be significant. Another advantage of the Adam algorithm lies in its adjustment of the learning rate along the optimization to further speedup convergence.

      Comment 8: The authors mention that examples of setting up simulations, parameter optimization, and introducing new features are provided in the GitHub repository. However, I was unable to locate these examples. Could the authors guide me to these specific resources or consider adding them if they are not currently available?

      We thank the reviewer for the comment. We have improved the GitHub repository and all the tutorials can be found using the links provided in Response to Comment 2.

      Comment 9: Furthermore, the paper states that ’a configuration file that provides the position of individual particles in the PDB file format is needed to initialize the simulations.’ It would be beneficial for new users if the authors could elaborate on how this file is generated. And all other input files in general. Detailing the procedures for a new user to run their system using OpenNucleome would be helpful.

      We thank the reviewer for the comment. The procedure for generating initial configurations was explained in the SI Section: Initial configurations for simulations and quoted below.

      “We first created a total of 1000 configurations for the genome by sequentially generating the conformation of each one of the 46 chromosomes as follows. For a given chromosome, we start by placing the first bead at the center (origin) of the nucleus. The positions of the following beads, i, were determined from the (i − 1)-th bead as . v is a normalized random vector, and 0.5 was selected as the bond length between neighboring beads. To produce globular chromosome conformations, we rejected vectors, v, that led to bead positions with distance from the center larger than 4σ. Upon creating the conformation of a chromosome i, we shift its center of mass to a value ri com determined as follows. We first compute a mean radial distance, with the following equation

      where Di is the average value of Lamin B DamID profile for chromosome i. Dhi and Dlo represent the highest and lowest average DamID values of all chromosomes, and 6σ and 2σ represent the upper and lower bound in radial positions for chromosomes. As shown in Fig. S6, the average Lamin B DamID profiles are highly correlated with normalized chromosome radial positions as reported by DNA MERFISH [cite], supporting their use as a proxy for estimating normalized chromosome radial positions. We then select as a uniformly distributed random variable within the range . Without loss of generality, we randomly chose the directions for shifting all 46 chromosomes.

      We further relaxed the 1000 configurations to build more realistic genome structures. Following an energy minimization process, one-million-step molecular dynamics (MD) simulations were performed starting from each configuration. Simulations were performed with the following energy function

      where UGenome is defined as in Eq. S7. UG-La is the excluded volume potential between chromosomes and lamina, i.e, only the second term in Eq. S24. Parameters in UGenome were from a preliminary optimization. The end configurations of the MD simulations were collected to build the final configuration ensemble (FCE).”

      The tutorial for preparing initial configurations can be found at this link.

      Comment 10: In the section discussing the correlation between simulated and experimental contact maps, as referenced in Figure 4A and Figure S2, the authors mention a high degree of correlation. Could the authors specify the exact value of this correlation and explain the method used for its computation? Considering that comparing two Hi-C matrices involves a large number of data points, it would be helpful to know if all data points were included in this analysis.

      We have updated Fig 4A and S2 to include Pearson correlation coefficients next to the contact maps. The reviewer is correct in that all the non-redundant data points of the contact maps are included in computing the correlation coefficients.

      For improved clarity, we added a new section in the supporting information to detail the calculations. The section is titled Computing Pearson correlation coefficients between experimental and simulated contact maps, and the relevant text is quoted below.

      “We computed the Pearson correlation coefficients (PCC) between experimental and simulated contact maps in Fig. 4A and Fig. S2 as

      xi and yi represent the experimental and simulated contact probabilities, and n is the total number of data points. Only non-redundant data points, i.e., half of the pairwise contacts, are used in the PCC calculation.”

      Comment 11: In addition, the author said: ”Moreover, the simulated and experimental average contact probabilities between pairs of chromosomes agree well, and the Pearson correlation coefficient between the two datasets reaches 0.89.” How does this correlation behave when not accounting for polymer compaction or scaling? An analysis presenting the correlation as a function of genomic distance would be interesting.

      Author response image 2.

      Pearson correlation coefficient between experimental and simulated contact probabilities as a function of the sequence separation within specific chromosomes. For each chromosome, we first gathered a set of experimental contacts alongside a matching set of simulated ones for genomic pairs within a particular separation range. The Pearson correlation coefficient at the corresponding sequence separation was then determined using Equation R4. We limited the calculations to half of the chromosome length to ensure the availability of sufficient data.

      We thank the reviewer for the comment. The analysis presenting the correlation as a function of genomic distance (sequence separation) for each chromosome is shown in Figure S12 and also included in the SI. While the correlation coefficients decreases at larger separation, the values around 0.5 is quite reasonable and comparable to results obtained using Open-Michrom.

      We also computed the correlation of whole genome contact maps after excluding intra-chromosomal contacts. The PCC decreased from 0.89 to 0.4. Again, the correlation coefficient is quite reasonable considering that these contacts are purely predicted by the compartmental interactions and were not directly optimized.

      Comment 12: I recommend using the web-server that is familiar to the authors to benchmark the OpenNucleome tool/model: ”3DGenBench: A Web-Server to Benchmark Computational Models for 3D Genomics.” Nucleic Acids Research, vol. 50, no. W1, July 2022, pp. W4-12.

      We appreciate the reviewer’s suggestion. Unfortunately, the website is no longer active during the time of the revision. However, as detailed in Response to comment 11, we used the one of the popular metrics to exclude polymer compact effect and evaluate the agreement between simulation and experiments.

      Comment 13: Regarding the comparison of simulation results with microscopy data from reference 34. Given their different resolutions and data point/space groupings, how do the authors align these datasets? Could the authors describe how they performed this comparison? How were the radial positions calculated in both the simulations and experiments? Since the data from reference 34 indicates a non-globular shape of the nucleus; how did this factor into the calculation of radial distributions?

      We thank the reviewer for the comment and apologize for the confusion. First, the average properties we examined, including radial positions and interchromosomal contacts, were averaged over all genomic loci. Therefore, they are independent of data resolution.

      Secondly, instead of calculating the absolute radial positions, which are subject to variations in nucleus shape and size, we defined the normalized radial positions. They measure the ratio between the distance from the nucleus center to the chromosome center and the distance from the nucleus center to the lamina. This definition was frequently used in prior imaging studies to measure chromosome radial positions.

      The calculation of the simulated normalized radial positions and the experimental normalized radial positions are discussed in the Section: Computing simulated normalized chromosome radial positions

      “For a given chromosome i, we first determined its center of mass position denoted as Ci. Starting from the center of the nucleus, O, we extend the the vector vOC to identify the intersection point with the nuclear lamina as Pi. The normalized chromosome radial position i is then defined as , where ||·|| represents the L2 norm.

      and Section: Computing experimental normalized chromosome radial positions.

      “We followed the same procedure outlined in Section: Computing simulated normalized chromosome radial positions to compute the experimental values. To determine the center of the nucleus using DNA MERFISH data, we used the algorithm, minimum volume enclosing ellipsoid (MVEE)[15], to fit an ellipsoid for each genome structure. The optimal ellipsoid defined as is obtained by optimizing subjecting to the constraint that . xi correspond to the list of chromatin positions determined experimentally.”

      Comment 14: In the sentence: ”It is evident that telomeres exhibit anomalous subdiffusive motion.” I recommend mentioning the work ”Di Pierro, Michele, et al., ”Anomalous Diffusion, Spatial Coherence, and Viscoelasticity from the Energy Landscape of Human Chromosomes.” Proceedings of the National Academy of Sciences, vol. 115, no. 30, July 2018, pp. 7753-58.”.

      We have revised the sentence to include the citation as follows.

      “In line with previous research [cite], telomeres display anomalous subdiffusive motion. When fitted with the equation , these trajectories yield a spectrum of α values, with a peak around 0.59.”

      Comment 15: Regarding the observation that ’chromosomes appear arrested and no significant changes in their radial positions are observed over timescales comparable to the cell cycle,’ could the authors provide more details on the calculations or analyses that led to this conclusion? Specifically, information on the equilibration/relaxation time of chromosome territories relative to rearrangements within a cell cycle would be interesting.

      Our conclusion here was mostly based on the time trace of normalized radial positions shown in Figure 6A of the main text. Over the timescale of an entire cell cycle (24 hours), the relatively little to no changes in the radial positions supports glassy dynamics of chromosomes. We further determined the mean squared displacement (MSD) for chromosome center of masses. As shown in the left panel of Fig. S12, the MSDs are much smaller than the average size of chromosomes (see Rg values in Fig. 5A), supporting arrested dynamics.

      We further computed the auto-correlation function of the normalized chromosome radial position as

      where t indexes over the trajectory frames and ¯r is the mean position. As shown in Fig. S12, the positions are not completely decorrelated over 10 hours, again supporting slow dynamics. It would be interesting to examine the relaxation timescale more closely in future studies.

      Comment 16: The authors also comment on the SI ”Section: Initial configurations for simulations provides more details on preparing the 1000 initial configurations.” and related to reference 34 mentioning that ”the average Lamin B DamID profiles are highly correlated with chromosome radial positions as reported by DNA MERFISH”. How do the authors account for situations where homologous chromosomes are neighbors or have an interacting interface? Ref. 34 indicates that distinguishing between these scenarios can be challenging, potentially leading to ’invalid distributions’ that are filtered out. Clarification on how such cases were handled in the simulations would be helpful.

      We would like to first clarify that when comparing with experimental data, we averaged over the homologous chromosomes to obtain haploid data. We added the following text in the manuscript to emphasize this point

      “Given that the majority of experimental data were analyzed for the haploid genome, we adopted a similar approach by averaging over paternal and maternal chromosomes to facilitate direct comparison. More details on data analysis can be found in the Supporting Information Section: Details of simulation data analysis.”

      Furthermore, we used the processed DNA MERFISH data from the Zhuang lab, which unambiguously assigns a chromosome ID to each data point. Therefore, the issue mentioned by the reviewer is not present in the procssed data. In our simulations, since we keep track of the explicit connection between genomic segments, the trace of individual chromosomes can be determined for any configuration. Therefore, there is no ambiguity in terms of simulation data.

      Comment 17: When discussing the interaction with nuclear lamina and nuclear envelop deformation, I suggest mentioning the following studies: The already cited ref 52 and ”Contessoto, Vin´ıcius G., et al. ”Interphase Chromosomes of the Aedes Aegypti Mosquito Are Liquid Crystalline and Can Sense Mechanical Cues.” Nature Communications, vol. 14, no. 1, Jan. 2023, p. 326.”

      We updated the text to include the suggested reference.

      “Numerous studies have highlighted the remarkable influence of nuclear shape on the positioning of chromosomes and the regulation of gene expression [16, 17].”

      Comment 18: The authors state that ’Tutorials in the format of Python Scripts with extensive documentation are provided to facilitate the adoption of the model by the community.’ However, as I mentioned, the documentation appears to be limited, and the available tutorials could benefit from further expansion. I suggest that the authors consider enhancing these resources to better assist users in adopting and understanding the model.

      As detailed in the Response to Comment 2, we have updated the GitHub repository to better document the included Jupyter notebooks and tutorials.

      Comment 19: In the Methods section, the authors discuss using Langevin dynamics for certain simulations and Brownian dynamics for others. Could the authors provide more detailed reasoning behind the choice of these different dynamics for different aspects of the simulation? Furthermore, it would be insightful to know how the results might vary if only one of these dynamics was utilized throughout the study. Such clarification would help in understanding the implications of these methodological choices on the outcomes of the simulations.

      We thank the reviewer for the comment. As detailed in the supporting information Section: Mapping the Reduced Time Unit to Real Time, the Brownian dynamics simulations provide a rigorous mapping to the biological timescale. By choosing a specific value for the nucleoplasmic viscosity, we determined the time unit in simulations as τ = 0.65s. With this time conversion, the simulated diffusion coefficients of telomeres match well with experimental values. Therefore, Brownian dynamics simulations are recommended for computing time dependent quantities and the large damping coefficients mimics the complex nuclear environment well.

      On the other hand, the large damping coefficient slows down the configuration relaxation of the system significantly. For computing equilibrium statistical properties, it is useful to use a small coefficient and the Langevin integrator with large time steps to facilitate conformational relaxation.

      References

      [1] Rao, S. S.; Huntley, M. H.; Durand, N. C.; Stamenova, E. K.; Bochkov, I. D.; Robinson, J. T.; Sanborn, A. L.; Machol, I.; Omer, A. D.; Lander, E. S.; others A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014, 159, 1665–1680.

      [2] Qi, Y.; Zhang, B. Predicting three-dimensional genome organization with chromatin states. PLoS computational biology 2019, 15, e1007024.

      [3] Yildirim, A.; Hua, N.; Boninsegna, L.; Zhan, Y.; Polles, G.; Gong, K.; Hao, S.; Li, W.; Zhou, X. J.; Alber, F. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nature Structural & Molecular Biology 2023, 1–14.

      [4] Junior, A. B. O.; Contessoto, V. G.; Mello, M. F.; Onuchic, J. N. A scalable computational approach for simulating complexes of multiple chromosomes. Journal of molecular biology 2021, 433, 166700.

      [5] Fujishiro, S.; Sasai, M. Generation of dynamic three-dimensional genome structure through phase separation of chromatin. Proceedings of the National Academy of Sciences 2022, 119, e2109838119.

      [6] Caragine, C. M.; Haley, S. C.; Zidovska, A. Nucleolar dynamics and interactions with nucleoplasm in living cells. Elife 2019, 8, e47533.

      [7] Brangwynne, C. P.; Mitchison, T. J.; Hyman, A. A. Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proceedings of the National Academy of Sciences 2011, 108, 4334–4339.

      [8] Farley, K. I.; Surovtseva, Y.; Merkel, J.; Baserga, S. J. Determinants of mammalian nucleolar architecture. Chromosoma 2015, 124, 323–331.

      [9] Qi, Y.; Zhang, B. Chromatin network retards nucleoli coalescence. Nature Communications 2021, 12, 6824.

      [10] Caragine, C. M.; Haley, S. C.; Zidovska, A. Surface fluctuations and coalescence of nucleolar droplets in the human cell nucleus. Physical review letters 2018, 121, 148101.

      [11] Spector, D. L.; Lamond, A. I. Nuclear speckles. Cold Spring Harbor perspectives in biology 2011, 3, a000646.

      [12] Banigan, E. J.; Mirny, L. A. Loop extrusion: theory meets single-molecule experiments. Current opinion in cell biology 2020, 64, 124–138.

      [13] Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014,

      [14] Zhang, B.; Wolynes, P. G. Topology, structures, and energy landscapes of human chromosomes. Proceedings of the National Academy of Sciences 2015, 112, 6062–6067.

      [15] Moshtagh, N.; others Minimum volume enclosing ellipsoid. Convex optimization 2005, 111, 1–9.

      [16] Brahmachari, S.; Contessoto, V. G.; Di Pierro, M.; Onuchic, J. N. Shaping the genome via lengthwise compaction, phase separation, and lamina adhesion. Nucleic Acids Res. 2022, 50, 1–14.

      [17] Contessoto, V. G.; Dudchenko, O.; Aiden, E. L.; Wolynes, P. G.; Onuchic, J. N.; Di Pierro, M. Interphase chromosomes of the Aedes aegypti mosquito are liquid crystalline and can sense mechanical cues. Nature Communications 2023, 14, 326.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive and constructive evaluations. Based upon the reviewers’ helpful comments, we have performed complementary experiments. In particular, we additionally show that:

      • a complete analysis of CXCR1/2 binding chemokines in the secretions of tissular CD8+ T cells reinforces the key role of CXCL8 in CD8+ T cell-induced fibrocyte chemotaxis (new panel D in Figure 2)

      • a direct contact between fibrocytes and CD8+ T cells triggers CD8+ T cell cytotoxicity against primary basal bronchial epithelial cells (new Figure 6)

      • the interaction between CD8+ T cells and fibrocytes is bidirectional, with CD8+ T cells triggering the development of fibrocyte immune properties (new Figure 7)

      • the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition was estimated to be about 2.5 years using the simulations. Interfering with chemotaxis and adhesion processes by inhibiting CXCR1/2 and CD54, respectively was not sufficient to reverse the COPD condition, as predicted by the mathematical model (new Figure 9)

      • the massive proliferation effect induced by fibrocytes is specific to CD8+ T cells and not CD4+ T cells (new Figure 3-figure supplement 2), and that fibrocytes moderately promote the death of unactivated CD8+ T cells in direct co-culture (new Figure 3-figure supplement 3)

      We have graphically summarized our findings (new Figure 10) suggesting the existence of a positive feedback loop playing a role in the vicious cycle that promotes COPD. A new table describing patient characteristics for basal bronchial epithelial cell purification has also been added (new Supplementary File 9), the Supplementary Files 7 and S8 have been up-dated to take into account the new experiments.

      The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD041402.  

      Reviewer #1 (Recommendations For The Authors):

      The experimental approaches are all rationally designed and the data clearly presented, with appropriate analyses and sample sizes. I could find no technical or interpretative concerns. The interrelationship between the observational data (histology) with the quantitative live cell imaging and the follow-on functional investigations is especially laudable. The data nicely unifies several years of accumulated data regarding the (separate) participation of CD8 T cells and fibrocytes in COPD.

      We thank the reviewer for his/her comments.

      I have only minor comments:

      1) Line 79: The observation that T cells may influence fibrocyte differentiation/function was initially made some years earlier by Abe et al (J Immunol 2001; 7556), and should be cited in addition to the follow-on work of Niedermeyer.

      This reference has been added to acknowledge this seminal work.

      2) Line 632: Corticosteroids originate from the cortex of the adrenal gland. Budenoside and fluticasone are glucocorticoids, not corticosteroids.

      This mistake has been corrected in the discussion of the revised manuscript (see line 802 in the revised manuscript).

      3) Given the state of T cell immunotherapies, cytokine/chemokine antagonists, and emerging fibrocyte-targeted drugs, can the authors possibly speculate as to desired pathways to target therapeutically?

      Chemokine-receptor based therapies could be used to inhibit fibrocyte recruitment into the lungs, such as CXCR4 blockade. We have very recently shown that using the CXCR4 antagonist, plerixafor, alleviates bronchial obstruction and reduces peri-bronchial fibrocytes density (Dupin et al., 2023). Because CXCR4 expression in human fibrocytes is dependent on mTOR signaling and is inhibited by rapamycin in vitro (Mehrad et al., 2009), alternative strategies consisting of targeting fibrocytes via mTOR have been proposed. This target has proven effective in bronchiolitis obliterans, idiopathic pulmonary fibrosis, and thyroid-associated ophthalmopathy, using rapamycin (Gillen et al., 2013; Mehrad et al., 2009), sirolimus (Manjarres et al., 2023) or an insulin-like growth factor-1 (IGF-I) receptor blocking antibody (Douglas et al., 2020; Smith et al., 2017). Inhibiting mTOR is also expected to have effects on CD8+ T cells, ranging from an immunostimulatory effect by activation of memory CD8+ T-cell formation, to an immunosuppressive effect by inhibition of T cell proliferation (Araki et al., 2010). Last, chemokine-receptor base therapies could also include strategies to inhibit the CD8+-induced fibrocyte chemotaxis, such as dual CXCR1-CXCR2 blockade. We were able to test this latter strategy in our mathematical model, see response to point 6 of reviewer 2.

      Immunotherapies directly targeting the interaction between fibrocytes and CD8+ T cells could also be considered, such as CD86 or CD54 blockade. The use of abatacept and belatacept, that interfere with T cell co-stimulation, is effective in patients with rheumatoid arthritis (Pombo-Suarez & Gomez-Reino, 2019) and in kidney-transplant recipients (Vincenti et al., 2016), respectively. Targeting the IGF-I receptor by teprotumumab in the context of thyroid-associated ophthalmopathy also improved disease outcomes, possibly by altering fibrocyte-T cell interactions (Bucala, 2022; Fernando et al., 2021).

      We also tested this CD86 and CD54 blocking strategy for COPD treatment by simulations, see response to point 6 of reviewer 2.

      However, such therapies should be used with caution as they may favour adverse events such as infections, particularly in the COPD population (Rozelle & Genovese, 2007). Additionally, the fibrocytes-lymphocytes interaction has recently been shown to promote anti-tumoral immunity via the PD1-PDL1 immunological synapse (Afroj et al., 2021; Mitsuhashi et al., 2023). Therefore, care should be taken in the selection of patients to be treated and/or timing of treatment administration with regards to the increased risk of lung cancer in COPD patients.

      The discussion section has been altered accordingly.

      4) The authors may want to consider mentioning (and citing) recent insight into the immune-mediated fibrosis in thyroid-associated ophthalmopathy

      These important publications are now cited in a dedicated paragraph about the possible therapeutical interventions (see answer to point 3, and discussion in the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      1) The rationale for the selection of chemokines overexpressed by CD8+ T cells in COPD is based on literature data of n=2 patients per group. This is limited and risky. I am less concerned about false positives given the selection of chemokines and the available literature but am worried about the possibility that many chemokines may not have been selected based on insufficient power to do meaningful stats on this comparison. For example, many other CXCR1/2 binding CXCL chemokines exist and these could contribute to the migration effect in Fig 2C as well. Given the currently available single-cell resources it should be possible to extend these observations and to investigate CXCL chemokine expression in COPD CD8 T cells to the benefit of Fig 2A in full detail.

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Author response image 1).

      Author response image 1.

      Expression of CXC chemokines in lung CD8+ CD103+ and CD8+ CD103- T cells from patients with COPD (n=18 independent samples) in comparison with healthy control subjects (n=29 independent samples) under resting conditions by Single-Cell RNA sequencing analysis (GEO accession GSE136831). The heatmaps show the normalized expression of genes (horizontal axes) encoding CXC chemokines. PF4=CXCL4, PPBP= CXCL7.

      The latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the results section of the revised version.

      2) Equally, it would strengthen the work if multiplex ELISA assays could be provided on the supernatants used in Fig 2D to provide a more comprehensive view of CXCR1/2 binding chemokines.

      In order to have a complete view of CXCR1/2 binding chemokines, we have now performed supplementary ELISA assays to measure the concentrations of CXCL1, 3, 5, 6 and 7, in addition of the measurements of CXCL2 and CXCL8 already presented in the previous version of the manuscript (Figure 2D). Results of these new assays are now presented in the revised version of Figure 2. Concentrations of CXCL1, 3, 5, 6 and 7 were unchanged between the control and COPD conditions.

      3) In the functional analyses, I missed information on the activation of the fibrocytes. Equally, the focus on CD8 T cells was mainly on proliferation in the functional work. RNAseq analyses on the cells, comparing CD8 T cells and fibrocytes, alone and in co-culture to each other would help to identify interaction patterns in comprehensive detail. Such an experiment would bolster the significance of the studies by providing impact analysis not only on the T cells beyond proliferation but by expanding on the effect of the interaction on the fibrocyte as well.

      Regarding the activation state of fibrocytes, we apologize if this was not clear: in our in vitro co-culture experiments, we chose not to activate the fibrocytes. This setting is in agreement with previous findings, demonstrating an antigen-independent T cell proliferation effect driven by fibrocytes (Nemzek et al., 2013), and it is now explicitly written in the results of the revised manuscript.

      Regarding the focus of the functional analyses:

      First, we have pushed forward the analysis of the consequences of the interaction beyond CD8+ T cells proliferation. In particular, having shown that fibrocytes promote CD8+ T cells expression of cytotoxic molecules such as granzyme B, we decided to investigate the cytotoxic capacity of CD8+ T cells against primary basal bronchial epithelial cells (see new Supplementary File 9 in the revised manuscript for patient characteristics).

      Direct co-culture with fibrocytes increased total and membrane expression of the cytotoxic degranulation marker CD107a, which was only significant in non-activated CD8+ T cells (see new Figure 6A-E in the revised manuscript). A parallel increase of cytotoxicity against primary epithelial cells was observed in the same condition (see new Figure 6F-H in the revised manuscript). This demonstrates that following direct interaction with fibrocytes, CD8+ T cells have the ability to kill target cells such as bronchial epithelial cells. This is now included in the results section of the revised manuscript.

      Second, we have now performed proteomic analyses on fibrocytes, alone or in co-culture during 6 days with CD8+ T cells either non-activated or activated (see new Figure 7A in the revised manuscript). Of the top ten pathways that were most significantly activated in co-cultured vs mono-cultured fibrocytes, largest upregulated genes were those of the dendritic cell maturation box, the multiple sclerosis signaling pathway, the neuroinflammation signaling pathway and the macrophage classical signaling pathway, irrespective of the activation state of CD8+ T cells (see new Figure 7B in the revised manuscript). The changes were globally identical in the two conditions of CD8+ T cell activation, with some upregulation more pronounced in the activated condition. They were mostly driven by up-regulation of a core set of Major Histocompatibility Complex class I (HLA-B, C, F) and II (HLA-DMB, DPA1, DPB1, DRA, DRB1, DRB3) molecules, co-simulatory and adhesion molecules (CD40, CD86 and CD54). Another notable proteomic signature was that of increased expression of IFN signaling-mediators IKBE and STAT1, and the IFN-responsive genes GBP2, GBP4 and RNF213. We also observed a strong downregulation of CD14, suggesting fibrocyte differentiation, and an upregulation of the matrix metalloproteinase-9 (MMP9) in the non-activated condition only. Altogether, these changes suggest that the interaction between CD8+ T cells and fibrocytes promotes the development of fibrocyte immune properties, which could subsequently impact the activation of CD4+ T cells activation.

      Up-regulated pathways identified in proteomic profile of fibrocytes co-cultured with CD8+ T cells are very consistent with a shift towards a proinflammatory phenotype rather than towards a reparative role. The activation of IFN-γ signaling could be triggered by CD8+ T cell secretion of IFN upon fibrocyte interaction, suggesting the existence of a positive feedback loop (see new Figure 10). Additionally, the priming of fibrocytes by CD8+ T cells could also induce CD4+ T cell activation.

      4) I suggest rewording the abstract to capture the main storyline and wording more. The abstract is good, but I see so many novelties in the paper that are not well sold in the abstract, particularly the modelling aspects.

      As suggested by the reviewer, we revised the abstract, as shown below and in the revised manuscript. The changes are indicated in red:

      Revised abstract:

      Bronchi of chronic obstructive pulmonary disease (COPD) are the site of extensive cell infiltration, allowing persistent contacts between resident cells and immune cells. Tissue fibrocytes interaction with CD8+ T cells and its consequences were investigated using a combination of in situ, in vitro experiments and mathematical modeling. We show that fibrocytes and CD8+ T cells are found in vicinity in distal airways and that potential interactions are more frequent in tissues from COPD patients compared to those of control subjects. Increased proximity and clusterization between CD8+ T cells and fibrocytes are associated with altered lung function. Tissular CD8+ T cells from COPD patients promote fibrocyte chemotaxis via the CXCL8-CXCR1/2 axis. Live imaging shows that CD8+ T cells establish short-term interactions with fibrocytes, that trigger CD8+ T cell proliferation in a CD54- and CD86-dependent manner, pro-inflammatory cytokines production, CD8+ T cell cytotoxic activity against bronchial epithelial cells and fibrocyte immunomodulatory properties. We defined a computational model describing these intercellular interactions and calibrated the parameters based on our experimental measurements. We show the model’s ability to reproduce histological ex vivo characteristics, and observe an important contribution of fibrocyte-mediated CD8+ T cell proliferation in COPD development. Using the model to test therapeutic scenarios, we predict a recovery time of several years, and the failure of targeting chemotaxis or interacting processes. Altogether, our study reveals that local interactions between fibrocytes and CD8+ T cells could jeopardize the balance between protective immunity and chronic inflammation in bronchi of COPD patients.

      5) The probabilistic model appears to suggest that reduced CD8 T cell death may also explain the increase in the pathology in COPD. Did the authors find that fibrocytes reduce cell death of the CD8 T cells?

      Taking advantage of the staining of CD8+ T cells with the death marker Zombie NIR™, we have quantified CD8+ T cell death in our co-culture assay. The presence of fibrocytes in the indirect co-culture assay did not affect CD8+ T cell death (see new Figure 3-figure supplement 3A-B in the revised manuscript). In direct co-culture, the death of CD8+ T cells was significantly increased in the non-activated condition but not in the activated condition (see new Figure 3-figure supplement 3C-D in the revised manuscript). Of note, these results are in agreement with a recent study showing the existence of CD8+ T cell-population-intrinsic mechanisms regulating cellular behavior, with induction of apoptosis to avoid an excessive increase in T cell population (Zenke et al., 2020). This is taken into account in our mathematical model by an increased probability p_(dC+) of dying when a CD8+ T cell is surrounded by many other T cells in its neighborhood. It also suggests that the reduced CD8+ T cell death evidenced in tissues from patients with COPD (Siena et al., 2011) might not be due to the specific interplay between fibrocyte and CD8+ T cells, but rather to a global pro-survival environment in COPD lungs.

      These new data have been described in the results section.

      6) Following the modeling in Figure 6, curiosity came to mind, which is how long it would take for the pathology to disappear if a drug would be applied to the patient. How much should the interactions be reduced and how long would it take to reach clinical benefit? Could such predictions be made? I understand that this may be outside the main message of the manuscript but perhaps this could be included in the discussion.

      This is a very interesting question, that we have addressed by performing additional simulations to investigate the outcomes of possible therapeutic interventions. First, we applied a COPD dynamics during 20 years, to generate the COPD state, that provide the basis for treatment implementation. Then, we applied a COPD dynamic during 7 years, that mimics the placebo condition (see new Figure 9A in the revised manuscript, and below), that we compared to a control dynamics (“Total inhibition”), that mimics an ideal treatment able to restore all cellular processes. As expected the populations of fibrocytes and CD8+ T cells, as well as the density of mixed clusters, decreased. These numbers reached levels similar of healthy subjects after approximately 2.5 years, and this time point can therefore be considered as the steady state (Figure 9B-E).

      Monitoring of the different processes revealed that these effects were mainly due to a reduction in fibrocyte-induced CD8+ T duplication, and a transient or more prolonged increase in basal fibrocyte and CD8+ T death (Figure 9C-D).

      Then, three possible realistic treatments were considered (Figure 9A). We tested the effect of directly inhibiting the interaction between fibrocytes and CD8+ T cells by blocking CD54. This was implemented in the model by altering the increased probability of a CD8+ T cell to divide when a fibrocyte is in its neighbourhood, as shown by the co-culture results (Figure 4). We also chose to reflect the effect of a dual CXCR1/2 inhibition by setting the displacement function of fibrocyte similar to that of control dynamics, in agreement with the in vitro experiments (Figure 2E). Blocking CD54 only slightly reduced the density of CD8+ T cells compared to the placebo condition, and had no effect on fibrocyte and mixed cluster densities (Figure 9B). CXCR1/2 inhibition was a little bit more potent on the reduction of CD8+ T cells than CD54 inhibition, and it also significantly decreased the density of mixed clusters (Figure 9B). As expected, this occurred through a reduction of fibrocyte-induced duplication, which was affected more strongly by CXCR1/2 blockage than by CD54 blockage (Figure 9C-E). Combining both therapies (CD54 and CXCR1/2 inhibition) did not strongly major the effects (Figure 9B-E). In all the conditions tested, the size of the fibrocyte population remained unchanged, suggesting that other processes such as fibrocyte death or infiltration should be targeted to expect broader effects.

      The results section has been altered accordingly.

      Using the simulations, we were also able to estimate the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition. This time of approximately 2.5 years was totally unpredictable by in vitro experiments, and indicates that a treatment aiming at restoring these cellular processes should be continued during several years to obtain significant changes.

      We have also investigated the outcomes of more realistic treatments, modifying specifically processes such as chemotaxis or targeting directly the intercellular interactions. The modification of parameters controlling these processes only slightly affected the final state, suggesting that such treatments may be more effective when used in combination with other drugs e.g. those affecting fibrocyte infiltration and/or death.

      The discussion section has been altered accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1) Broader assessment of cell types in the lung: Staining for other cell types such as dendritic cells, CD4 cells, and interstitial macrophages, and comparing their proximity to fibrocytes with that of CD8 cells would better justify the CD8 focus.

      We agree with the reviewer that multiple stainings would have better justified the focus on CD8+ T cells. However, it is difficult to distinguish fibrocytes, dendritic cells and interstitial macrophages on the basis of immunohistochemistry, as we and others previously showed (Dupin et al., 2019; Mitsuhashi et al., 2015; Pilling et al., 2009). On the other hand, the study of Afroj et al. indicated the possible interaction between fibrocytes and CD8+ T cells in cancer context, with the induction of CD8+ T cell proliferation (Afroj et al., 2021). This T cell-costimulatory function of fibrocytes and CD8+ T cells was further confirmed in a very recent study, together with the antitumor effects of PD-L1 and VEGF blockade (Mitsuhashi et al., 2023). These data, along with the specific implication on CD8+ T cells in COPD, relying mainly on their abundance in COPD bronchi (O’Shaughnessy et al., 1997), their overactivation state (Roos-Engstrand et al., 2009), their cytotoxic phenotype (Freeman et al., 2010; Wang et al., 2020) and the protection against lung inflammation and emphysema induced by their depletion (Maeno et al., 2007) justified the CD8 focus.

      To further justify this focus, we have now performed co-culture between fibrocytes and CD4+ T cells, indicating that the massive fibrocyte-mediated proliferation was specific to CD8+ T cells (see answer to comment 3 below). This is in agreement with the results obtained with the simulations, showing that considering fibrocytes and CD8+ T cells only was sufficient to reproduce the spatial patterns in the bronchi of healthy and COPD patients. Altogether, we think that focusing on the CD8+ T cell-fibrocyte interplay was pertinent in the context of COPD. It does obviously not exclude the possibility of other interactions, that could be the focus of other studies.

      2) Transcriptomic analysis: Using n=2 and only showing the chemokines as well as selected adhesion receptor data narrows the focus but does not provide broader insights into the interactions. Using a more robust sample size and performing a comprehensive pathway analysis would represent an unbiased analysis to determine the most dysregulated pathways. Importantly, the authors could use a single-cell RNA-seq dataset to broadly assess the transcriptomes of several cell types in the lung (such as the data from (Sauler et al, Characterization of the COPD alveolar niche using single-cell RNA sequencing).

      This very pertinent suggestion has also been raised by reviewer 2, see our answer to comment 1 of reviewer 2, and below:

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Figure scRNAseq, in the answer to comment 1 of reviewer 2).

      These latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the text of the revised version.

      3) Inclusion of control/comparison cell types in co-culture studies would help establish that CD8 cells are more relevant for interactions with fibrocytes than for example CD4 cells.

      We have now performed co-cultures between fibrocytes and CD4+ T cells, with the same settings than for CD8+ T cells. The results from these experiments show that fibrocytes did not have any significant effect of CD4+ T cells death, regardless of their activation state (see new Figure 3-figure supplement 2A-C in the revised manuscript, and below). Fibrocytes were able to promote CD4+ T cells proliferation in the activated condition but not in the non-activated condition (see new Figure 3-figure supplement 2A-D in the revised manuscript). Altogether this indicates that although fibrocyte-mediated effect on proliferation is not specific to CD8+ T cells, the amplitude of the effect is much larger on CD8+ T cells than on CD4+ T cells.

      These new data have been added in the results section.

      4) In vitro analysis of cells from non-COPD patients would also help assess whether the circulating cells from COPD patients have a level of baseline activation which promotes the vicious cycle but may not exist in healthy cells.

      Regarding circulating cells, the present study relies on the COBRA cohort (COhort of BRonchial obstruction and Asthma), which includes only asthma and COPD patients, and therefore does not grant access to healthy subjects’ blood samples (Pretolani et al., 2017). Unfortunately, we have no other ongoing study with healthy subjects that would allow us to retrieve blood for research, and fibrocytes can only be grown from freshly drawn blood samples. We agree with the reviewer that it is a limitation of our study, which is now acknowledged at the end of the discussion section.  

      References

      Afroj, T., Mitsuhashi, A., Ogino, H., Saijo, A., Otsuka, K., Yoneda, H., Tobiume, M., Nguyen, N. T., Goto, H., Koyama, K., Sugimoto, M., Kondoh, O., Nokihara, H., & Nishioka, Y. (2021). Blockade of PD-1/PD-L1 Pathway Enhances the Antigen-Presenting Capacity of Fibrocytes. The Journal of Immunology, 206(6), 1204‑1214. https://doi.org/10.4049/jimmunol.2000909

      Araki, K., Youngblood, B., & Ahmed, R. (2010). The role of mTOR in memory CD8+ T-cell differentiation. Immunological reviews, 235(1), 234‑243. https://doi.org/10.1111/j.0105-2896.2010.00898.x

      Bucala, R. J. (2022). Targeting fibrocytes in autoimmunity. Proceedings of the National Academy of Sciences, 119(5), e2121739119. https://doi.org/10.1073/pnas.2121739119

      Douglas, R. S., Kahaly, G. J., Patel, A., Sile, S., Thompson, E. H. Z., Perdok, R., Fleming, J. C., Fowler, B. T., Marcocci, C., Marinò, M., Antonelli, A., Dailey, R., Harris, G. J., Eckstein, A., Schiffman, J., Tang, R., Nelson, C., Salvi, M., Wester, S., … Smith, T. J. (2020). Teprotumumab for the Treatment of Active Thyroid Eye Disease. The New England Journal of Medicine, 382(4), 341‑352. https://doi.org/10.1056/NEJMoa1910434

      Dupin, I., Henrot, P., Maurat, E., Abohalaka, R., Chaigne, S., Hamrani, D. E., Eyraud, E., Prevel, R., Esteves, P., Campagnac, M., Dubreuil, M., Cardouat, G., Bouchet, C., Ousova, O., Dupuy, J.-W., Trian, T., Thumerel, M., Begueret, H., Girodet, P.-O., … Berger, P. (2023). CXCR4 blockade alleviates pulmonary and cardiac outcomes in early COPD (p. 2023.03.10.529743). bioRxiv. https://doi.org/10.1101/2023.03.10.529743

      Dupin, I., Thumerel, M., Maurat, E., Coste, F., Eyraud, E., Begueret, H., Trian, T., Montaudon, M., Marthan, R., Girodet, P.-O., & Berger, P. (2019). Fibrocyte accumulation in the airway walls of COPD patients. The European Respiratory Journal, 54(3), Article 3. https://doi.org/10.1183/13993003.02173-2018

      Fernando, R., Caldera, O., & Smith, T. J. (2021). Therapeutic IGF-I receptor inhibition alters fibrocyte immune phenotype in thyroid-associated ophthalmopathy. Proceedings of the National Academy of Sciences, 118(52), e2114244118. https://doi.org/10.1073/pnas.2114244118

      Freeman, C. M., Han, M. K., Martinez, F. J., Murray, S., Liu, L. X., Chensue, S. W., Polak, T. J., Sonstein, J., Todt, J. C., Ames, T. M., Arenberg, D. A., Meldrum, C. A., Getty, C., McCloskey, L., & Curtis, J. L. (2010). Cytotoxic potential of lung CD8+ T cells increases with COPD severity and with in vitro stimulation by IL-18 or IL-15. Journal of immunology (Baltimore, Md. : 1950), 184(11), 6504‑6513. https://doi.org/10.4049/jimmunol.1000006

      Gillen, J. R., Zhao, Y., Harris, D. A., LaPar, D. J., Stone, M. L., Fernandez, L. G., Kron, I. L., & Lau, C. L. (2013). Rapamycin Blocks Fibrocyte Migration and Attenuates Bronchiolitis Obliterans in a Murine Model. The Annals of thoracic surgery, 95(5), 1768‑1775. https://doi.org/10.1016/j.athoracsur.2013.02.021

      Hombrink, P., Helbig, C., Backer, R. A., Piet, B., Oja, A. E., Stark, R., Brasser, G., Jongejan, A., Jonkers, R. E., Nota, B., Basak, O., Clevers, H. C., Moerland, P. D., Amsen, D., & van Lier, R. A. W. (2016). Programs for the persistence, vigilance and control of human CD8+ lung-resident memory T cells. Nature Immunology, 17(12), Article 12. https://doi.org/10.1038/ni.3589

      Maeno, T., Houghton, A. M., Quintero, P. A., Grumelli, S., Owen, C. A., & Shapiro, S. D. (2007). CD8+ T Cells are required for inflammation and destruction in cigarette smoke-induced emphysema in mice. Journal of Immunology (Baltimore, Md.: 1950), 178(12), 8090‑8096. https://doi.org/10.4049/jimmunol.178.12.8090

      Manjarres, D. C. G., Axell-House, D. B., Patel, D. C., Odackal, J., Yu, V., Burdick, M. D., & Mehrad, B. (2023). Sirolimus suppresses circulating fibrocytes in idiopathic pulmonary fibrosis in a randomized controlled crossover trial. JCI Insight. https://doi.org/10.1172/jci.insight.166901

      Mehrad, B., Burdick, M. D., & Strieter, R. M. (2009). Fibrocyte CXCR4 regulation as a therapeutic target in pulmonary fibrosis. The International Journal of Biochemistry & Cell Biology, 41(8‑9), 1708‑1718. https://doi.org/10.1016/j.biocel.2009.02.020

      Mitsuhashi, A., Goto, H., Saijo, A., Trung, V. T., Aono, Y., Ogino, H., Kuramoto, T., Tabata, S., Uehara, H., Izumi, K., Yoshida, M., Kobayashi, H., Takahashi, H., Gotoh, M., Kakiuchi, S., Hanibuchi, M., Yano, S., Yokomise, H., Sakiyama, S., & Nishioka, Y. (2015). Fibrocyte-like cells mediate acquired resistance to anti-angiogenic therapy with bevacizumab. Nature Communications, 6(1), Article 1. https://doi.org/10.1038/ncomms9792

      Mitsuhashi, A., Koyama, K., Ogino, H., Afroj, T., Nguyen, N. T., Yoneda, H., Otsuka, K., Sugimoto, M., Kondoh, O., Nokihara, H., Hanibuchi, M., Takizawa, H., Shinohara, T., & Nishioka, Y. (2023). Identification of fibrocyte cluster in tumors reveals the role in antitumor immunity by PD-L1 blockade. Cell Reports, 112162. https://doi.org/10.1016/j.celrep.2023.112162

      Nemzek, J. A., Fry, C., & Moore, B. B. (2013). Adoptive transfer of fibrocytes enhances splenic T-cell numbers and survival in septic peritonitis. Shock (Augusta, Ga.), 40(2), 106‑114. https://doi.org/10.1097/SHK.0b013e31829c3c68

      O’Shaughnessy, T. C., Ansari, T. W., Barnes, N. C., & Jeffery, P. K. (1997). Inflammation in bronchial biopsies of subjects with chronic bronchitis : Inverse relationship of CD8+ T lymphocytes with FEV1. American Journal of Respiratory and Critical Care Medicine, 155(3), 852‑857. https://doi.org/10.1164/ajrccm.155.3.9117016

      Pilling, D., Fan, T., Huang, D., Kaul, B., & Gomer, R. H. (2009). Identification of markers that distinguish monocyte-derived fibrocytes from monocytes, macrophages, and fibroblasts. PloS One, 4(10), e7475. https://doi.org/10.1371/journal.pone.0007475

      Pombo-Suarez, M., & Gomez-Reino, J. J. (2019). Abatacept for the treatment of rheumatoid arthritis. Expert Review of Clinical Immunology, 15(4), 319‑326. https://doi.org/10.1080/1744666X.2019.1579642

      Pretolani, M., Soussan, D., Poirier, I., Thabut, G., Aubier, M., COBRA Study Group, & COBRA cohort Study Group. (2017). Clinical and biological characteristics of the French COBRA cohort of adult subjects with asthma. The European Respiratory Journal, 50(2), 1700019. https://doi.org/10.1183/13993003.00019-2017

      Roos-Engstrand, E., Ekstrand-Hammarström, B., Pourazar, J., Behndig, A. F., Bucht, A., & Blomberg, A. (2009). Influence of smoking cessation on airway T lymphocyte subsets in COPD. COPD, 6(2), 112‑120. https://doi.org/10.1080/15412550902755358

      Rozelle, A. L., & Genovese, M. C. (2007). Efficacy results from pivotal clinical trials with abatacept. Clinical and Experimental Rheumatology, 25(5 Suppl 46), S30-34.

      Sauler, M., McDonough, J. E., Adams, T. S., Kothapalli, N., Barnthaler, T., Werder, R. B., Schupp, J. C., Nouws, J., Robertson, M. J., Coarfa, C., Yang, T., Chioccioli, M., Omote, N., Cosme, C., Poli, S., Ayaub, E. A., Chu, S. G., Jensen, K. H., Gomez, J. L., … Rosas, I. O. (2022). Characterization of the COPD alveolar niche using single-cell RNA sequencing. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-28062-9

      Siena, L., Gjomarkaj, M., Elliot, J., Pace, E., Bruno, A., Baraldo, S., Saetta, M., Bonsignore, M. R., & James, A. (2011). Reduced apoptosis of CD8+ T-lymphocytes in the airways of smokers with mild/moderate COPD. Respiratory Medicine, 105(10), 1491‑1500. https://doi.org/10.1016/j.rmed.2011.04.014

      Smith, T. J., Kahaly, G. J., Ezra, D. G., Fleming, J. C., Dailey, R. A., Tang, R. A., Harris, G. J., Antonelli, A., Salvi, M., Goldberg, R. A., Gigantelli, J. W., Couch, S. M., Shriver, E. M., Hayek, B. R., Hink, E. M., Woodward, R. M., Gabriel, K., Magni, G., & Douglas, R. S. (2017). Teprotumumab for Thyroid-Associated Ophthalmopathy. The New England Journal of Medicine, 376(18), 1748‑1761. https://doi.org/10.1056/NEJMoa1614949

      Vincenti, F., Rostaing, L., Grinyo, J., Rice, K., Steinberg, S., Gaite, L., Moal, M.-C., Mondragon-Ramirez, G. A., Kothari, J., Polinsky, M. S., Meier-Kriesche, H.-U., Munier, S., & Larsen, C. P. (2016). Belatacept and Long-Term Outcomes in Kidney Transplantation. The New England Journal of Medicine, 374(4), 333‑343. https://doi.org/10.1056/NEJMoa1506027

      Wang, X., Zhang, D., Higham, A., Wolosianka, S., Gai, X., Zhou, L., Petersen, H., Pinto-Plata, V., Divo, M., Silverman, E. K., Celli, B., Singh, D., Sun, Y., & Owen, C. A. (2020). ADAM15 expression is increased in lung CD8+ T cells, macrophages, and bronchial epithelial cells in patients with COPD and is inversely related to airflow obstruction. Respiratory Research, 21(1), 188. https://doi.org/10.1186/s12931-020-01446-5

      Zenke, S., Palm, M. M., Braun, J., Gavrilov, A., Meiser, P., Böttcher, J. P., Beyersdorf, N., Ehl, S., Gerard, A., Lämmermann, T., Schumacher, T. N., Beltman, J. B., & Rohr, J. C. (2020). Quorum Regulation via Nested Antagonistic Feedback Circuits Mediated by the Receptors CD28 and CTLA-4 Confers Robustness to T Cell Population Dynamics. Immunity, 52(2), 313-327.e7. https://doi.org/10.1016/j.immuni.2020.01.018

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of oxytocin (OT) neurons in the paraventricular nucleus (PVN) and their projections to the medial prefrontal cortex (mPFC) in regulating pup care and infanticide behaviors in mandarin voles. The researchers used techniques like immunofluorescence, optogenetics, OT sensors, and peripheral OT administration. Activating OT neurons in the PVN reduced the time it took pup-caring male voles to approach and retrieve pups, facilitating pup-care behavior. However, this activation had no effect on females. Interestingly, this same PVN OT neuron activation also reduced the time for both male and female infanticidal voles to approach and attack pups, suggesting PVN OT neuron activity can promote pup care while inhibiting infanticide behavior. Inhibition of these neurons promoted infanticide. Stimulating PVN->mPFC OT projections facilitated pup care in males and in infanticide-prone voles, activation of these terminals prolonged latency to approach and attack. Inhibition of PVN->mPFC OT projections promoted infanticide. Peripheral OT administration increased pup care in males and reduced infanticide in both sexes. However, some results differed in females, suggesting other mechanisms may regulate female pup care.

      Strengths:

      This multi-faceted approach provides converging evidence, strengthens the conclusions drawn from the study, and makes them very convincing. Additionally, the study examines both pup care and infanticide behaviors, offering insights into the mechanisms underlying these contrasting behaviors. The inclusion of both male and female voles allows for the exploration of potential sex differences in the regulation of pup-directed behaviors. The peripheral OT administration experiments also provide valuable information for potential clinical applications and wildlife management strategies.

      Weaknesses:

      While the study presents exciting findings, there are several weaknesses that should be addressed. The sample sizes used in some experiments, such as the Fos study and optogenetic manipulations, appear to be small, which may limit the statistical power and generalizability of the results. Effect sizes are not reported, making it difficult to evaluate the practical significance of the findings. The imaging parameters and analysis details for the Fos study are not clearly described, hindering the interpretation of these results (i.e., was the entire PVN counted?). Also, does the Fos colocalization align with previous studies that look at PVN Fos and maternal/ paternal care? Additionally, the study lacks electrophysiological data to support the optogenetic findings, which could provide insights into the neural mechanisms underlying the observed behaviors. 

      In some previous studies (He et al., 2019; Mei, Yan, Yin, Sullivan, & Lin, 2023), the sample size in morphological studies is also small and may be representative. We agree with reviewer’s opinion that results from larger sample size may be more statistically powerful and generalizable. We will pay attention to this issue in the future study. As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio. We have added the objective magnification used in the figure legend. The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, and Fos, OT and merged positive neurons were counted. Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly, Hiura, Saunders, & Ophir, 2017). To support the optogenetic findings, we used c-Fos expression as a marker of neuron activity and revealed significant increases/decreases of c-Fos positive neurons induced by optogenetic activation/inhibition (Supplementary Data Fig. 1), and additionally we found that optogenetic inhibition of OT neurons reduced levels of OT release using OT1.0 sensors. Based on these two experiments, we verified that optogenetic manipulation in the present study is validate and results of optogenetic experiment are reliable (Supplementary Data Fig. 5).

      The study has several limitations that warrant further discussion. Firstly, the potential effects of manipulating OT neurons on the release of other neurotransmitters (or the influence of other neurochemicals or brain regions) on pup-directed behaviors, especially in females, are not fully explored. Additionally, it is unclear whether back-propagation of action potentials during optogenetic manipulations causes the same behavioral effect as direct stimulation of PVN OT cells. Moreover, the authors do not address whether the observed changes in behavior could be explained by overall increases or decreases in locomotor activity.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study. For the optogenetics experiments, we have referred to some of the previous research (Mei et al., 2023; Murugan et al., 2017), and in our study we have also carried out the verification of the reliability of the methods. To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The authors do not specify the percentage of PVN->mPFC neurons labeled that were OT-positive, nor do they directly compare the sexes in their behavioral analysis (or if they did, it is not clear statistically). While the authors propose that the sex difference in pup-directed behaviors is due to females having greater OT expression, they do not provide evidence to support this claim from their labeling data. It is also uncertain whether more OT neurons were manipulated in females compared to males. The study could benefit from a more comprehensive discussion of other factors that could influence the neural circuit under investigation, especially in females.

      AAV11-Ef1a-EGFP virus can infect fibers and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4). In addition, as reviewers suggested, we compared the numbers of OT neurons, activated OT neurons (OT and Fos double-labeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. We did not analyze whether more OT neurons were manipulated in females compared to males, which is indeed a limitation of this study that requires our attention. 

      As the reviewers suggested, we also discussed other factors that could influence the neural circuit under investigation. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice, pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021). The effects of these factors on pup-directed responses should also be considered in the future study. 

      Reviewer #2 (Public Review):

      Summary:

      This series of experiments studied the involvement of PVN OT neurons and their projection to the mPFC in pup-care and attack behavior in virgin male and female Mandarin voles. Using Fos visualization, optogenetics, fiber photometry, and IP injection of OT the results converge on OT regulating caregiving and attacks on pups. Some sex differences were found in the effects of the manipulations.

      Strengths:

      Major strengths are the modern multi-method approaches and involving both sexes of Mandarin vole in every experiment.

      Weaknesses:

      Weaknesses include the lack of some specific details in the methods that would help readers interpret the results. These include:

      (1) No description of diffusion of centrally injected agents.

      Thanks for your professional consideration. Individuals with appropriate viral expression and optical fiber implant location were included in the statistical analysis, otherwise excluded. For optogenetic experiments, the virus (AAV2/9-mOXT-hCHR2(H134R)–mCherry-ER2-WPRE-pA or rAAV-mOXT-eNpHR3.0-mCherry-WPRE-hGH-pA) was designed and constructed to only infect OT neurons, which limited the diffusion of the virus. For fiber photometric experiments, the OT1.0 sensor was largely able to restrict expression within the mPFC brain region, and additionally individuals with incorrect optical fiber embedding position were not included in the statistical analysis. The diffusion of central optogenetic viruses and OT1.0 sensors are shown in the supplemental figure (Supplementary Data Fig. 7).

      (2) Whether all central targets were consistent across animals included in the data analyses. This includes that is not stated if the medial prelimbic mPFC target was in all optogenetic study animals as shown in Figure 4 and if that is the case, there is no discussion of that subregion's function compared to other mPFC subregions.

      As shown in Figure 4 and in the schematic diagram of the optogenetic experiment, the central targets of virus infection and fiber location remain consistent in the data analysis, otherwise the data would be excluded. In the present study, viruses were injected into the prelimbic (PrL). The PrL and infralimbic (IL) regions of the mPFC play different roles in different social interaction contexts (Bravo-Rivera, Roman-Ortiz, Brignoni-Perez, Sotres-Bayon, & Quirk, 2014; Moscarello & LeDoux, 2013). A study has shown that the PrL region of the mPFC contributes to active avoidance in situations where conflict needs to be mitigated, but also contributes to the retention of conflict responses for reward (Capuzzo & Floresco, 2020). This may reveal that the suppression of infanticide by PVN to mPFC OT projections is a behavioral consequence of active conflict avoidance. In a study on pain in rats, OT neurons projections from the PVN to the PrL were found to increase the responsiveness of cell populations in the PrL, suggesting that OT may act by altering the local excitation-inhibition (E/I) balance in the PrL (Liu et al., 2023). A study on anxiety-related behaviors in male rats suggests that the anxiolytic effects of OT in the mPFC are PrL-specific but not infralimbic or anterior cingulate and that this is achieved primarily through the engagement of GABAergic neurons, which ultimately modulate downstream anxiety-related brain regions, including the amygdala (Sabihi, Dong, Maurer, Post, & Leuner, 2017). This finding may provide possible downstream pathways for further research. 

      (3) How groups of pup-care and infanticidal animals were created since there was no obvious pretest mentioned so perhaps there was the testing of a large number of animals until getting enough subjects in each group.  

      Before the experiments, we exposed the animals to pups, and subjects may exhibit pup care, infanticide, or neglect; we grouped subjects according to their behavioral responses to pups, and individuals who neglected pups were excluded.

      (4) The apparent use of a 20-minute baseline data collection period for photometry that started right after the animals were stressed from handling and placement in the novel testing chamber.

      In fiber photometric experiments, all experimental animals were required to acclimatize to the environment for at least 20 minutes prior to the experiment as described in the Methods section. The time 0 in Fig. 4 represents the point in time when a behavior or a segment of behavior started and is not the actual time 0 at which the test was started.

      (5) A weakness in the results reporting is that it's unclear what statistics are reported (2 x 2 ANOVA main effect of interaction results, t-test results) and that the degrees of freedom expected for the 2 X 2 ANOVAs in some cases don't appear to match the numbers of subjects shown in the graphs; including sample sizes in each group would be helpful because the graph panels are very small and data points overlap.

      Thanks for your suggestion. We displayed analysis methods for the data statistics and the sample sizes for each group of experiments in the figure legends.

      The additional context that could help readers of this study is that the authors overlook some important mPFC and pup caregiving and infanticide studies in the introduction which would help put this work in better context in terms of what is known about the mPFC and these behaviors. These previous studies include Febo et al., 2010; Febo 2012; Peirera and Morrell, 2011 and 2020; and a very relevant study by Alsina-Llanes and Olazábal, 2021 on mPFC lesions and infanticide in virgin male and female mice. The introduction states that nothing is known about the mPFC and infanticide. In the introduction and discussion, stating the species and sex of the animals tested in all the previous studies mentioned would be useful. The authors also discuss PVN OT cell stimulation findings seen in other rodents, so the work seems less conceptually novel. Overall, the findings add to the knowledge about OT regulation of pup-directed behavior in male and female rodents, especially the PVN-mPFC OT projection.

      We appreciate you very much to provide so many valuable references. We have cited them in the introduction and discussion. We agree with the reviewer’s opinion that nothing is known about the mPFC and infanticide is incorrect. It should be whether mPFC OT projections are involved in paternal cares and infanticide remains unclear. A study in mother rats indicated that inactivation or inhibition of neuronal activity in the mPFC largely reduced pup retrieval and grouping (Febo, Felix-Ortiz, & Johnson, 2010). In a subsequent study on firing patterns in the mPFC of mother rats suggested that sensory-motor processing occurs in the mPFC that may affect decision making of maternal care to their pups (Febo, 2012). In a study on new mother rats examining different regions of the mPFC (anterior cingulate (Cg1), PrL, IL), they identified a involvement of the IL cortex in biased preference decision-making in favour of the offspring (Pereira & Morrell, 2020). A study on maternal motivation in rats suggests that in the early postpartum period, the IL and Cg1 subregion in mPFC, are the motivating circuits for pup-specific biases (Pereira & Morrell, 2011), while the PrL subregion, are recruited and contribute to the expression of maternal behaviors in the late postpartum period (Pereira & Morrell, 2011).

      Reviewer #3 (Public Review):

      Summary:

      Here Li et al. examine pup-directed behavior in virgin Mandarin voles. Some males and females tend towards infanticide, others tend towards pup care. c-Fos staining showed more oxytocin cells activated in the paraventricular nucleus (PVN) of the hypothalamus in animals expressing pup care behaviors than in infanticidal animals. Optogenetic stimulation of PVN oxytocin neurons (with an oxytocin-specific virus to express the opsin transgene) increased pup-care, or in infanticidal voles increased latency towards approach and attack.

      Suppressing the activity of PVN oxytocin neurons promoted infanticide. The use of a recent oxytocin GRAB sensor (OT1.0) showed changes in medial prefrontal cortex (mPFC) signals as measured with photometry in both sexes. Activating mPFC oxytocin projections increased latency to approach and attack in infanticidal females and males (similar to the effects of peripheral oxytocin injections), whereas in pup-caring animals only males showed a decrease in approach. Inhibiting these projections increased infanticidal behaviors in both females and males and had no effect on pup caretaking.

      Strengths:

      Adopting these methods for Mandarin voles is an impressive accomplishment, especially the valuable data provided by the oxytocin GRAB sensor. This is a major achievement and helps promote systems neuroscience in voles.

      Weaknesses:

      The study would be strengthened by an initial figure summarizing the behavioral phenotypes of voles expressing pup care vs infanticide: the percentages and behavioral scores of individual male and female nulliparous animals for the behaviors examined here. Do the authors have data about the housing or life history/experiences of these animals? How bimodal and robust are these behavioral tendencies in the population?

      As our response to reviewer 2, animals generally exhibit three types of behavioral responses toward pups, and data on the percentage of these different behavioral types occurring in the group will be included in another study in our lab. The reviewer's suggestion of scoring the behaviors is an inspiring idea that will help us to more fully parse these behaviors. Mandarin voles were captured from the wild in Henan, China. The experimental subjects were F2 generation voles reared in the Experimental Animal Centre of Shaanxi Normal University. In our observations, pup care and infanticide behaviors were conserved across several pup exposures, especially pup care behaviors, whereas for infanticide behaviors we did not conduct more pup exposures in order to protect the pups. 

      Optogenetics with the oxytocin promoter virus is a nice advance here. More details about their preparation and methods should be in the main text, and not simply relegated to the methods section. For optogenetic stimulation in Figure 2, how were the stimulation parameters chosen? There is a worry that oxytocin neurons can co-release other factors- are the authors sure that oxytocin is being released by optogenetic stimulation as opposed to other transmitters or peptides, and acting through the oxytocin receptor (as opposed to a vasopressin receptor)?

      As reviewer suggested, more detailed information about virus construction and choice of optogenetic stimulation parameter have been added in the revised manuscript. The details about the construction of CHR2 and mCherry viruses used in optogenetic manipulation can refer to a previous study in which they constructed an rAAV-expressing Venus from a 2.6 kb region upstream of OT exon 1, which is conserved in mammalian species (Knobloch et al., 2012). For details about construction of the eNpHR 3.0 virus, expression of the vector is driven by the mouse OXT promoter, a 1kb promoter upstream of exon 1 of the OXT gene, which has been shown to induce cell type-specific expression in OXT cells (Peñagarikano et al., 2015). Details about the construction of OT1.0 sensor can be referred to the research of Professor Li's group (Qian et al., 2023). The mapping of the viral vectors and OT1.0 sensor is shown below. 

      The optogenetic stimulation parameters were used based on a previous study (He et al., 2021). However, our description of the parameters in the experiment is still not in detail, so some information about optogenetic stimulation parameters has been added in the method. In pupdirected pup care behavioral test, light stimulation lasted for 11 min. Parameters used in optogenetic manipulation of PVN OT neurons were ~ 3 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF and parameters used in optogenetic manipulation of PVN OT neurons projecting to mPFC were ~ 10 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF to cover the entire interaction. We performed fiber photometric experiments to determine the role that OT plays in behavior, and these results were able to support each other with optogenetic experiments. In addition, we further confirmed the role of optogenetic manipulation on OT release in combination with optogenetic inhibition and OT1.0 sensors (Supplementary Data Fig. 2). It has been previously shown that OT is able to act specifically on OTR in mPFC-PL (Sabihi et al., 2017). Our study focuses on oxytocin neurons as well as oxytocin release, and more research is needed to construct a more complex and complete network regarding the involvement of the OTR and other factors in the mPFC in these behaviors.

      Author response image 1.

      Author response image 2.

       

      Given that they are studying changes in latency to approach/attack, having some controls for motion when oxytocin neurons are activated or suppressed might be nice. Oxytocin is reported to be an anxiolytic and a sedative at high levels.

      As our response to reviewer 1, to exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The OT1.0 sensor is also amazing, these data are quite remarkable. However, photometry is known to be susceptive to motion artifacts and I didn't see much in the methods about controls or correction for this. It's also surprising to see such dramatic, sudden, and large-scale suppression of oxytocin signaling in the mPFC in the infanticidal animals - does this mean there is a substantial tonic level of oxytocin release in the cortex under baseline conditions?

      The optical fiber recording system used in the present study can automatically exclude effects of motion artifacts by simultaneously recording signals stimulated by a 405nm light source. As shown in the formula below, the z-score data were calculated and presented, and the increase and decline of the OT signal is a trend relative to the baseline. For a smooth baseline, the decreasing signal is generally amplified after calculation. In our experiments combining optogenetic inhibition and OT1.0 sensors, we were able to find that there was a certain level of OT release at baseline, on which there was room for a decrease in the signal recorded by the OT1.0 sensor.

      Figure 5 is difficult to parse as-is, and relates to an important consideration for this study: how extensive is the oxytocin neuron projection from PVN to mPFC?

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected the this virus (green, AAV11-Ef1aEGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).  

      In Figures 6 and 7, the authors use the phrase 'projection terminals'; however, to my knowledge, there have not been terminals (i.e., presynaptic formations opposed to a target postsynaptic site) observed in oxytocin neuron projections into target central regions.

      According your suggestion, we replaced the ‘terminals’ with ‘fibers’ to describe it more accurately..

      Projection-based inhibition as in Figure 7 remains a controversial issue, as it is unclear if the opsin activation can be fast enough to reduce the fast axonal/terminal action potential. Do the authors have confirmation that this works, perhaps with the oxytocin GRAB OT sensor?

      Thanks for your suggestion. We measured the OT release using OT1.0 sensors when the OT neuron projections in the mPFC were optogenetically inhibited. The result showed that optogenetic inhibition of OT neuron fibers in the mPFC significantly reduced OT release that validate the method of projection-based inhibition (Supplementary Data Fig. 5).

      As females and males had similar GRAB OT1.0 responses in mPFC, why would the behavioral effects of increasing activity be different between the sexes?

      In the present study, females released higher levels of OT into the mPFC (Figure 4 d, e) than males upon occurrence of different behaviors. In addition, females already exhibited more rapid approach and retrieval of pups than male before the optogenetic activation this may be the reason no effects of this manipulation were found in female.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Check for spelling and grammar errors throughout.

      Thanks to the reviewer's suggestion, we have checked and revised the article.

      (2) Report effect sizes for all significant findings to allow evaluation of practical significance.

      As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio.

      (3) Provide detailed information on the imaging parameters and analysis methods used in the Fos study.

      The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, andFos, OT and merged positive neurons were counted.

      (4) Compare the Fos colocalization results with previous studies examining PVN Fos and maternal/paternal care to contextualize the findings.

      Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly et al., 2017).

      (5) Discuss the limitations of the study, such as the potential effects of manipulating OT neurons on the release of other transmitters or the influence of other neurochemicals or brain regions on pupdirected behaviors, especially in females.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study.

      (6) Address the possibility of back-propagation of action potentials in the optogenetic manipulations causing the same behavioral effects as PVN OT cell stimulation.

      We agree with the reviewer’s opinion hat optogenetic manipulation may possibly induce back-propagation of action potentials that may result in same behavioral effects as OT cell stimulation. We will pay attention to this issue in the future study.  

      (7) Investigate whether changes in locomotor behavior could explain the observed effects on pupdirected behaviors.

      To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      (8) Report the percentage of PVN->mPFC neurons labeled that were OT-positive.

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).

      (9)  Directly compare the sexes in the behavioral analysis and discuss any potential sex differences.

      We agree with the reviewer's suggestion and have added comparisons between two sexes and discussion about relevant results. 

      (10) If available, report and discuss the OT expression levels and the number of OT neurons manipulated in each sex.

      In the present study, we have counted the number of OT cells, but did not measure the level of OT expression using WB or qPCR. In addition, the percentages of CHR2(H134R) and eNpHR3.0 virus infected neurons in total OT positive neurons were presented (Supplementary Data Fig. 7), but we did not know how many cells were actually manipulated during the optogenetic experiment.

      (11) Expand the discussion to include what could be regulating or interacting with the OT circuit under investigation, particularly in females where the effects were less pronounced.

      As the reviewers suggested, we have also added relevant discussion. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021).

      Reviewer #2 (Recommendations For The Authors):

      A few additional things the authors may want to consider:

      (1) I don't understand the subject numbers in the peripheral OT study data shown in Figure 8. Panels p and q have 69 females shown and 50 males. Was there a second, much larger, IP injection study conducted that was different than the subjects shown in panels a-o that had ~5 subjects per treatment group per sex?

      Sorry for the confusing. More animals were used to test effects of OT on infanticide behaviors in our pre-test. These data combined with data from formal pharmacological experiment were presented in Fig. 8p, q. After OT treatment, the changes in detailed and specific behaviors were only collected in several animals. We have clarified that in the revised manuscript. 

      (2) The authors suggest higher baseline OT release in the female mPFC, which makes sense and helps explain some of their results. It seems that the data in Figure 1 show what is probably no sex difference in OT cell numbers in the PVN of Mandarin voles, which is unlike the old studies in mice or rats. If readers look at the data in Figure 1 showing what seems to be no sex difference in OT cell number, the authors' argument in the discussion about mPFC OT release levels higher in females would be inconsistent with their own data shown. The authors have the brain sections they need to help support or undermine this argument in the discussion, so maybe it would be useful to analyze the OT cell numbers across the PVN and report it in this paper or briefly mention it in the discussion.

      We compared the numbers of OT neurons, activated OT neurons (OT and Fos doublelabeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. The inconsistency of the OT cell numbers with previous studies may be due to the method of cell counting, as we did not count all slides consecutively.  

      (3) The discussion suggests visual cues are involved in mPFC OT release relevant for pup care or infanticide, but this is a very odd claim for nocturnal animals that live and nest with their pups in underground burrows.

      Sorry for the confusing. Here, we cited the finding in mice that activation of PVN OT neurons induced by visual stimulation promoted pup care to support our finding that the activity of OT cells of the PVN is involved in pup care, rather than to illustrate the role of visual stimulation in voles. We have clarified that in the revised manuscript.

      (4) The lack of decrease in mPFC OT release in the 2nd and 3rd approaches to pups is probably because the release was so high after the 1st approach that it didn't have time to drop before the subsequent approaches. The authors don't state how long those between-approach intervals were on average to help readers interpret this result.

      As described in our methods, we spaced about 60 s between each behavioral test to allow the signal return back to the baseline level.

      (5) Do PVN-mPFC OT somata collateralize to other brain sites? Could mPFC terminal stimulation activate entire PVN cells and every site they project to? A caveat could be mentioned in the discussion if there's support for this from other optogenetic and PVN OT cell projection studies.

      We verified the OT projections from PVN to mPFC, to validate the optogenetic manipulation of this pathway, but did not investigate whether the OT neurons projecting from PVN to mPFC also project collaterally to other brain regions. It is suggested that mPFC terminal stimulation only activate PVN OT cells projecting mPFC, whether other OT neurons were activated remains unclear. 

      (6) I don't see an ethics statement related to the experiments obviously having to involve pup injury or death. Nothing is said in methods about what happened after adult subjects attacked pups. I assumed the tests were quickly terminated and pups euthanized.

      In case the pups were attacked, we removed them immediately to avoid unnecessary injuries, and injured pups were euthanized.

      (7) The authors could be more specific about what psychological diseases they refer to in the abstract and elsewhere that are relevant to this study. Depression? Rare cases of psychosis? Even within the already rare parental psychosis, infanticide is tragic but rare.

      Infanticide is caused by a variety of factors, mental illness, especially depression and psychosis, is often a very high risk factor among them (Milia & Noonan, 2022; Naviaux, Janne, & Gourdin, 2020). In human, infanticide has been used to refer to the killing, neglect or abuse of newborn babies and older children (Jackson, 2006). Here, we believe that research on the neural mechanisms of infanticide can also contribute to the understanding and treatment of attacks on children, physical and verbal abuse, and direct killing of babies. 

      (8) Figure 8 - in one case the "*" is a chi-square result , correct?

      Thanks for your careful checking. In Figure 8p, q, we applied the chi-square test and  added it in the legend.

      Reviewer #3 (Recommendations For The Authors):

      The only other thing is a typo on line 135: the authors mean 'stimulation' instead of 'simulation'.

      Corrected.

      References

      Bravo-Rivera, C., Roman-Ortiz, C., Brignoni-Perez, E., Sotres-Bayon, F., & Quirk, G. J. (2014). Neural structures mediating expression and extinction of platform-mediated avoidance. J Neurosci, 34(29), 9736-9742. doi:10.1523/jneurosci.0191-14.2014

      Capuzzo, G., & Floresco, S. B. (2020). Prelimbic and Infralimbic Prefrontal Regulation of Active and Inhibitory Avoidance and Reward-Seeking. J Neurosci, 40(24), 4773-4787. doi:10.1523/jneurosci.0414-20.2020

      Febo, M. (2012). Firing patterns of maternal rat prelimbic neurons during spontaneous contact with pups. Brain Res Bull, 88(5), 534-542. doi:10.1016/j.brainresbull.2012.05.012

      Febo, M., Felix-Ortiz, A. C., & Johnson, T. R. (2010). Inactivation or inhibition of neuronal activity in the medial prefrontal cortex largely reduces pup retrieval and grouping in maternal rats. Brain Res, 1325, 77-88. doi:10.1016/j.brainres.2010.02.027

      He, Z., Young, L., Ma, X. M., Guo, Q., Wang, L., Yang, Y., . . . Tai, F. (2019). Increased anxiety and decreased sociability induced by paternal deprivation involve the PVN-PrL OTergic pathway. Elife, 8. doi:10.7554/eLife.44026

      He, Z., Zhang, L., Hou, W., Zhang, X., Young, L. J., Li, L., . . . Tai, F. (2021). Paraventricular Nucleus Oxytocin Subsystems Promote Active Paternal Behaviors in Mandarin Voles. J Neurosci, 41(31), 66996713. doi:10.1523/jneurosci.2864-20.2021

      Jackson, M. (2006). Infanticide. The Lancet, 367(9513), 809. doi:https://doi.org/10.1016/S01406736(06)68323-2

      Kelly, A. M., Hiura, L. C., Saunders, A. G., & Ophir, A. G. (2017). Oxytocin Neurons Exhibit Extensive Functional Plasticity Due To Offspring Age in Mothers and Fathers. Integr Comp Biol, 57(3), 603618. doi:10.1093/icb/icx036

      Kenkel, W. M., Paredes, J., Yee, J. R., Pournajafi-Nazarloo, H., Bales, K. L., & Carter, C. S. (2012). Neuroendocrine and behavioural responses to exposure to an infant in male prairie voles. J Neuroendocrinol, 24(6), 874-886. doi:10.1111/j.1365-2826.2012.02301.x

      Knobloch, H. S., Charlet, A., Hoffmann, L. C., Eliava, M., Khrulev, S., Cetin, A. H., . . . Grinevich, V. (2012). Evoked axonal oxytocin release in the central amygdala attenuates fear response. Neuron, 73(3), 553-566. doi:10.1016/j.neuron.2011.11.030

      Liu, Y., Li, A., Bair-Marshall, C., Xu, H., Jee, H. J., Zhu, E., . . . Wang, J. (2023). Oxytocin promotes prefrontal population activity via the PVN-PFC pathway to regulate pain. Neuron, 111(11), 17951811.e1797. doi:10.1016/j.neuron.2023.03.014

      Mei, L., Yan, R., Yin, L., Sullivan, R. M., & Lin, D. (2023). Antagonistic circuits mediating infanticide and maternal care in female mice. Nature, 618(7967), 1006-1016. doi:10.1038/s41586-023-061479

      Milia, G., & Noonan, M. (2022). Experiences and perspectives of women who have committed neonaticide, infanticide and filicide: A systematic review and qualitative evidence synthesis. J Psychiatr Ment Health Nurs, 29(6), 813-828. doi:10.1111/jpm.12828

      Moscarello, J. M., & LeDoux, J. E. (2013). Active avoidance learning requires prefrontal suppression of amygdala-mediated defensive reactions. J Neurosci, 33(9), 3815-3823. doi:10.1523/jneurosci.2596-12.2013

      Murugan, M., Jang, H. J., Park, M., Miller, E. M., Cox, J., Taliaferro, J. P., . . . Witten, I. B. (2017). Combined Social and Spatial Coding in a Descending Projection from the Prefrontal Cortex. Cell, 171(7), 1663-1677.e1616. doi:10.1016/j.cell.2017.11.002

      Naviaux, A. F., Janne, P., & Gourdin, M. (2020). Psychiatric Considerations on Infanticide: Throwing the Baby out with the Bathwater. Psychiatr Danub, 32(Suppl 1), 24-28. 

      Okabe, S., Tsuneoka, Y., Takahashi, A., Ooyama, R., Watarai, A., Maeda, S., . . . Kikusui, T. (2017). Pup exposure facilitates retrieving behavior via the oxytocin neural system in female mice. Psychoneuroendocrinology, 79, 20-30. doi:10.1016/j.psyneuen.2017.01.036

      Peñagarikano, O., Lázaro, M. T., Lu, X. H., Gordon, A., Dong, H., Lam, H. A., . . . Geschwind, D. H. (2015). Exogenous and evoked oxytocin restores social behavior in the Cntnap2 mouse model of autism. Sci Transl Med, 7(271), 271ra278. doi:10.1126/scitranslmed.3010257

      Pereira, M., & Morrell, J. I. (2011). Functional mapping of the neural circuitry of rat maternal motivation: effects of site-specific transient neural inactivation. J Neuroendocrinol, 23(11), 1020-1035. doi:10.1111/j.1365-2826.2011.02200.x

      Pereira, M., & Morrell, J. I. (2020). Infralimbic Cortex Biases Preference Decision Making for Offspring over Competing Cocaine-Associated Stimuli in New Mother Rats. eNeuro, 7(4). doi:10.1523/eneuro.0460-19.2020

      Qian, T., Wang, H., Wang, P., Geng, L., Mei, L., Osakada, T., . . . Li, Y. (2023). A genetically encoded sensor measures temporal oxytocin release from different neuronal compartments. Nat Biotechnol, 41(7), 944-957. doi:10.1038/s41587-022-01561-2

      Sabihi, S., Dong, S. M., Maurer, S. D., Post, C., & Leuner, B. (2017). Oxytocin in the medial prefrontal cortex attenuates anxiety: Anatomical and receptor specificity and mechanism of action. Neuropharmacology, 125, 1-12. doi:10.1016/j.neuropharm.2017.06.024

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Potential bleed-over across frequencies in the spectral domain is a major concern for all of the results in this paper. The fact that alpha power, 36Hz and 40Hz frequency-tagged amplitude and 4Hz intermodulation frequency power is generally correlated with one another amplifies this concern. The authors are attaching specific meaning to each of these frequencies, but perhaps there is simply a broadband increase in neural activity when anticipating an auditory target compared to a visual target?

      We appreciate the reviewer’s insightful comment regarding the potential bleed-over across frequencies in the spectral domain. We fully acknowledge that the trade-off between temporal and frequency resolution is a challenge, particularly given the proximity of the frequencies we are examining.

      To address this concern, we performed additional analyses to investigate whether there is indeed a broadband increase in neural activity when anticipating an auditory target as compared to a visual target, as opposed to distinct frequency-specific effects. Our results show that the bleed-over between frequencies is minimal and does not significantly affect our findings. Specifically, we repeated the analyses using the same filter and processing steps for the 44 Hz frequency. At this frequency, we did not observe any significant differences between conditions.

      These findings suggest that the effects we report are indeed specific to the 40 Hz frequency band and not due to a general broadband increase in neural activity. We hope this addresses the reviewer’s concern and strengthens the validity of our frequency-specific results. We have now added this analysis to the methods section of our manuscript.

      Line 730: To confirm that 4 Hz is a sufficient distance between tagging frequencies, we repeated to analysis for 43.5 to 44.5. We found no indication of frequency-bleeding over, as the effects observed at 40 Hz, were not present at 44 Hz (see SUPPL Fig. 11).

      We do, however, not specifically argue against the possibility of a broadband increase in sensory processing when anticipating an auditory compared to a visual target. But even a broadband-increase would directly contradict the alpha inhibition hypothesis, which poses that an increase in alpha completely disengage the whole cortex. We have made this clearer in the text now.

      Line 491: As auditory targets were significantly more difficult than visual targets in our first study and of comparable difficulty in our second study, these results strongly speak to a vigilance increase of sensory processing independent of modality and an inability to selectively disengage one sensory modality in anticipation of a demanding task. This view is consistent with previous work in which visual SSEPs elicited by irrelevant background stimulation increased with task load in an auditory discrimination task (Jacoby et al., 2012).

      (2) Moreover, 36Hz visual and 40Hz auditory signals are expected to be filtered in the neocortex. Applying standard filters and Hilbert transform to estimate sensory evoked potentials appears to rely on huge assumptions that are not fully substantiated in this paper. In Figure 4, 36Hz "visual" and 40Hz "auditory" signals seem largely indistinguishable from one another, suggesting that the analysis failed to fully demix these signals.

      We appreciate the reviewer’s insightful concern regarding the filtering and demixing of the 36 Hz visual and 40 Hz auditory signals, and we share the same reservations about the reliance on standard filters and the Hilbert transform method.

      To address this, we would like to draw attention to SUPPL Fig. 11, which demonstrates that a 4 Hz difference is sufficient to effectively demix the signals using our chosen filtering and Hilbert transform approach. We argue that the reason the 36 Hz visual and 40 Hz auditory signals show similar topographies lies not in incomplete demixing but rather in the possibility that this condition difference reflects sensory integration, rather than signal contamination.

      This interpretation is further supported by our findings with the intermodulation frequency at 4 Hz, which also suggests cross-modal integration. Furthermore, source localization analysis revealed that the strongest condition differences were observed in the precuneus, an area frequently associated with sensory integration processes. We have now expanded on this in the discussion section to better clarify this point.

      Line 578: Previous research has shown that simultaneous frequency-tagging at multiple frequencies can evoke a response at the intermodulation frequency (f1 – f2), which in multimodal settings is thought to reflect cross-modal integration (Drijvers et al., 2021). This concept aligns closely with our findings, where increased vigilance in the sensory system, prompted by anticipation of a difficult auditory target, resulted in an increase in the intermodulation frequency. Similarly, our data shows that visual signal enhancement was localized in the precuneus, further supporting the role of this region in sensory integration (Al-Ramadhani et al., 2021; Xie et al., 2019).

      (3) The asymmetric results in the visual and auditory modalities preclude a modality-general conclusion about the function of alpha. However, much of the language seems to generalize across sensory modalities (e.g., use of the term 'sensory' rather than 'visual').

      We agree that in some cases we have not made a sufficient distinction between visual and sensory. We have now made sure, that when using ‘sensory’, we either describe overall theories, which are not visual-exclusive or refer to the possibility of a broad sensory increase. However, when directly discussing our results and the interpretation thereof, we now use ‘visual’.

      (4) In this vein, some of the conclusions would be far more convincing if there was at least a trend towards symmetry in source-localized analyses of MEG signals. For example, how does alpha power in primary auditory cortex (A1) compare when anticipating auditory vs visual target? What do the frequency tagged visual and auditory responses look like when just looking at primary visual cortex (V1) or A1?

      We thank the reviewer for this important suggestion and have added a virtual channel analysis. We were however, not interested in alpha power in primary auditory cortex, as we were specifically interested in the posterior alpha, which is usually increased when expecting an auditory compared to a visual target (and used to be interpreted as a blanket inhibition of the visual cortex). We have now improved upon the clarity concerning this point in the manuscript.

      We have however, followed the reviewer’s suggestion of a virtual channel analysis, showing that the condition differences are not observable in primary visual cortex for the 36 Hz visual signal and in primary auditory cortex for the 40 Hz auditory signal. Our data clearly shows that there is an alpha condition difference in V1, while there no condition difference for 36 Hz in V1 and for 40 Hz in Heschl’s Gyrus.

      Line 356: Additionally, we replicated this effect with a virtual channel analysis in V1 (see SUPPL Fig. 12)

      Line 403: Furthermore, a virtual channel analysis in V1 and Heschl’s gyrus confirmed that there were no condition differences in primary visual and auditory areas (see SUPPL Fig. 12).

      (5) Blinking would have a huge impact on the subject's ability to ignore the visual distractor. The best thing to do would be to exclude from analysis all trials where the subjects blinked during the cue-to-target interval. The authors mention that in the MEG experiment, "To remove blinks, trials with very large eye-movements (> 10 degrees of visual angle) were removed from the data (See supplement Fig. 5)." This sentence needs to be clarified, since eye-movements cannot be measured during blinking. In addition, it seems possible to remove putative blink trials from EEG experiments as well, since blinks can be detected in the EEG signals.

      We agree with the reviewer that this point has been phrased in a confusing way. From the MEG-data, we removed eyeblinks using ICA. Along for the supplementary Fig. 5 analysis, we used the eye-tracking data to make sure that participants were in fact fixating the centre of the screen. For this analysis, we removed trials with blinks (which can be seen in the eye-tracker as huge amplitude movements or as large eye-movements in degrees of visual angle; see figure below to show a blink in the MEG data and the according eye-tracker data in degrees of visual angle). We have now clarified this in the methods section.

      As for the concern closed eyes to ignore visual distractors, in both experiments we can observe highly significant distractor cost in accuracy for visual distractors, which we hope will convince the reviewer that our visual distractors were working as intended.

      Author response image 1.

      Illustration of eye-tracker data for a trial without and a trial with a blink. All data points recorded during this trial are plottet. A, ICA component 1, which reflects blinks and its according data trace in a trial. No blink is visible. B, eye-tracker data transformed into degrees of visual angle for the trial depicted in A. C, ICA component 1, which reflects blinks and its according data trace in a trial. A clear blink is visible. D, eye-tracker data transformed into degrees of visual angle for the trial depicted in C.

      Line 676: To confirm that participants had focused on the fixation cross during the cue-to-target interval, we incorporated eye-tracking into our MEG-experiment (EyeLink 1000 Plus). Correct trials of the second block were analysed for vertical and horizontal eye-movements. To exclude blinks from this analysis, trials with very large eye-movements (> 10 degrees of visual angle) were removed from the eye-tracking data (See suppl Fig. 5).

      (6) It would be interesting to examine the neutral cue trials in this task. For example, comparing auditory vs visual vs neutral cue conditions would be indicative of whether alpha was actively recruited or actively suppressed. In addition, comparing spectral activity during cue-to-target period on neutral-cue auditory correct vs incorrect trials should mimic the comparison of auditory-cue vs visual-cue trials. Likewise, neutral-cue visual correct vs incorrect trials should mimic the attention-related differences in visual-cue vs auditory-cue trials.

      We have analysed the neutral cue trials in the EEG dataset (see suppl. Fig. 1). There were no significant differences to auditory or visual cues, but descriptively alpha power was higher for neutral cues compared to visual cues and lower for neutral cues compared to auditory cues. While this may suggest that for visual trials alpha is actively suppressed and for auditory trials actively recruited, we do not feel comfortable to make this claim, as the neutral condition may not reflect a completely neutral state. The neutral task can still be difficult, especially because of the uncertainty of the target modality.

      As for the analysis of incorrect versus correct trials, we appreciate the idea, but unfortunately the accuracy rate was quite high so that the number of incorrect trials is insufficient to perform a reliable analysis.

      (7) In the abstract, the authors state that "This implies that alpha modulation does not solely regulate 'gain control' in early sensory areas but rather orchestrates signal transmission to later stages of the processing stream." However, I don't see any supporting evidence for the latter claim, that alpha orchestrates signal transmission to later stages of the processing stream. If the authors are claiming an alternative function to alpha, this claim should be strongly substantiated.

      We thank the reviewer for pointing out, that we have not sufficiently explained our case. The first point refers to gain control as elucidated by the alpha inhibition hypothesis, which claims that increases in alpha disengage an entire cortical area. Since we have confirmed the alpha increase in our data to originate from primary visual cortex through source analysis, this should lead to decreased visual processing. The increase in 36 Hz visual processing therefore directly contradicts the alpha inhibition hypothesis. We propose an alternative explanation for the functionality of alpha activity in this task. Through pulsed inhibition, information packages of relevant visual information could be transmitted down the processing stream, thereby enhancing relevant visual signal transmission. We argue the fact that the enhanced visual 36 Hz signal we found correlated with visual alpha power on a trial-by-trial basis, and did not originate from primary visual cortex, but from areas known for sensory integration supports our claim.

      We have now tried to make this point clearer by rephrasing our manuscript. Additionally, we have also now further clarified this point in our discussion.

      Line 527: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity covaries over trials with SSEP magnitude in higher order sensory areas. If alpha activity exerted gain control in early visual regions, increased alpha activity would have to lead to a decrease in SSEP responses. In contrast, we observe that increased alpha activity originating from early visual cortex is related to enhanced visual processing. Source localization confirmed that this enhancement was not originating from early visual areas, but from areas associated with later stages of the processing stream such as the precuneus, which has been connected to sensory integration (Al-Ramadhani et al., 2021; Xie et al., 2019). While we cannot completely rule out alternative explanations, it seems plausible to assume that inhibition of other task-irrelevant communication pathways leads to prioritised and thereby enhanced processing over relevant pathways. In line with previous literature (Morrow et al., 2023; Peylo et al., 2021; Zhigalov & Jensen, 2020b), we therefore suggest that alpha activity limits task-irrelevant feedforward communication, thereby enhancing processing capabilities in relevant downstream areas (see Fig. 1A).

      Reviewer #1 (Recommendations for the authors):Minor Concerns:

      (1) I suggest adding more details about the task in the Results and/or Figure 1 legend. Specifically, when describing the task, I think it would help the readers if the authors specified what the participants had to do to get a trial correct (e.g., press left / down / right arrow if the tone pitch was low (500Hz) / medium (1000Hz) / high (2000Hz).)

      (2) Please clarify whether Gaboar patch was drifting.

      (3) Figure 2C-D: I suggest clarifying in the X-tick labels that + and - trials are in separate blocks (e.g., put 'Block1 visual-' instead of 'visual-').

      We followed the suggestions of the reviewer detailed in point 1-3, which indeed greatly improves the clarity and readability of these parts.

      (4) "Interestingly, auditory distractors reduced reaction times to visual targets, which could be explained by a generally faster processing of auditory targets (Jain et al., 2015), possibly probing faster responses in visual tasks (Naue et al., 2011)." - Please elaborate on how faster processing of auditory targets could lead to the probing of faster responses in visual tasks. Further, if I understand correctly, this should result in a speed-accuracy trade-off, which is not observed in the MEG experiments. If there is a learning effect due to the blocked structure in the MEG experiments, why is it not observed on auditory trials?

      We thank the reviewer for suggesting clarifying this paragraph. We have now rephrased this part and added additional information.

      Concerning the reviewer’s theory, intersensory facilitation can occur in the absence of a speed-accuracy trade-off, as it can affect the motor execution after a decision has been made. Nevertheless, learning effects could also have led to this result in the MEG experiment. Our difficulty calibration did not lead to comparable accuracies in block 1, where auditory targets wetre now less difficult than visual targets. Whith the addition of distractors in block 2, accuracy for auditory targets decreased, while it increased for visual targets. Indeed, one interpretation could be that there was a learning effect for visual targets, which was not prevalent for auditory targets. However, the speed increase when visual targets are coupled with auditory distractors is prevalent in both experiments. Accordingly, we find the intersensory facilitation account more likely.

      line 148: Interestingly, auditory distractors reduced reaction times to visual targets, which could be explained by a generally faster processing of auditory targets (Jain et al., 2015). As such, the auditory distractor possibly caused intersensory facilitation (Nickerson., 1973), whereby reaction times to a target can be facilitated when accompanied by stimuli of other sensory modalities, even if they are irrelevant or distracting.

      (5) Please briefly describe the cluster permutation analysis in the results section.

      We have now added a brief description of the cluster permutation analysis we performed in the results section.

      Line 166: We then applied cluster permutation analysis, whereby real condition differences were tested against coincidental findings by randomly permutating the condition labels to the data and testing for condition differences 1000 times (Maris & Oostenveld, 2007).

      (6) Figure 4A legend: "auditory steady-state evoked potential (ASSEP) averaged over 6 central electrodes displaying the highest 40 Hz power (Fz, FC1, FC2, F11, F2, FCz)." - I suggest marking these 6 electrodes in the scalp map on the figure panel.

      We have followed the suggestion of the reviewer and marked the electrodes/sensors used to illustrate the steady-state responses.

      (7) Lines 281-283: "It was highly significant for the visual 36 Hz response (Fig. 5A, middle columns, p = .033; t(19) = 2.29; BF(10) = 1.91) but did not reach significance for the visual 40 Hz response (Fig. 5B, middle column; p = 0.20; t(19) = 1.32; BF(10) = 0.49)." - Was "visual 40Hz response" a typo? I believe 40Hz pertains to auditory, not visual?

      We thank the reviewer for pointing out this error and agree that the phrasing was sometimes confusing. We have now used the terms VSSEP and ASSEP to make things clearer throughout the manuscript.

      L. 224-229: The median split was highly significant for the 36 Hz VSSEP response (Fig. 5A, middle columns, p \= .033; t<sub>(19)</sub> = 2.29; BF<sub>(10)</sub> = 1.91) but did not reach significance for the 40 Hz ASSEP response (Fig. 5B, middle column; p = 0.20; t<sub>(19)</sub> = 1.32; BF<sub>(10)</sub> = 0.49).

      Reviewer #2 (Public review):

      Brickwedde et al. investigate the role of alpha oscillations in allocating intermodal attention. A first EEG study is followed up with an MEG study that largely replicates the pattern of results (with small to be expected differences). They conclude that a brief increase in the amplitude of auditory and visual stimulus-driven continuous (steady-state) brain responses prior to the presentation of an auditory - but not visual - target speaks to the modulating role of alpha that leads them to revise a prevalent model of gating-by-inhibition.

      Overall, this is an interesting study on a timely question, conducted with methods and analysis that are state-of-the-art. I am particularly impressed by the author's decision to replicate the earlier EEG experiment in MEG following the reviewer's comments on the original submission. Evidently, great care was taken to accommodate the reviewers suggestions.

      We thank the reviewer for the positive feedback and expression of interest in the topic of our manuscript.

      Nevertheless, I am struggling with the report for two main reasons: It is difficult to follow the rationale of the study, due to structural issues with the narrative and missing information or justifications for design and analysis decisions, and I am not convinced that the evidence is strong, or even relevant enough for revising the mentioned alpha inhibition theory. Both points are detailed further below.

      We have now revised major parts of the introduction and results in line with the reviewer’s suggestions, hoping that our rationale is now easier to follow and that our evidence will now be more convincing. We have separated our results section into the first study (EEG) and to second study (MEG), to enhance the rationale of our design choices and readability. We have clarified all mentioned ambiguous parts in our methods section. Additionally, we have revised the introduction to now explain more clearly what results to expect under the alpha inhibition theory in contrast to our alternative account.

      Strength/relevance of evidence for model revision: The main argument rests on 1) a rather sustained alpha effect following the modality cue, 2) a rather transient effect on steady-state responses just before the expected presentation of a stimulus, and 3) a correlation between those two. Wouldn't the authors expect a sustained effect on sensory processing, as measured by steady-state amplitude irrespective of which of the scenarios described in Figure 1A (original vs revised alpha inhibition theory) applies? Also, doesn't this speak to the role of expectation effects due to consistent stimulus timing? An alternative explanation for the results may look like this: Modality-general increased steady-state responses prior to the expected audio stimulus onset are due to increased attention/vigilance. This effect may be exclusive (or more pronounced) in the attend-audio condition due to higher precision in temporal processing in the auditory sense or, vice versa, too smeared in time due to the inferior temporal resolution of visual processing for the attend-vision condition to be picked up consistently. As expectation effects will build up over the course of the experiment, i.e., while the participant is learning about the consistent stimulus timing, the correlation with alpha power may then be explained by a similar but potentially unrelated increase in alpha power over time.

      We thank the reviewer for raising these insightful questions and suggestions.

      It is true that our argument rests on a rather sustained alpha effect and a rather transient effect on steady-state responses ,and a correlation between the two. However, this connection would not be expected under the alpha inhibition hypothesis, which states that alpha activity would inhibit a whole cortical area (when irrelevant to the task), exerting “gain control”. This notion directly contradicts our results of the “irrelevant” visual information a) being transmitted at all and b) increasing.

      However, it has been shown in various reports (see for instance Dugué et al., 2011; Haegens et al., 2011; Spaak et al., 2012) that alpha activity exerts pulsed inhibition, so we proposed an alternative theory of an involvement in signal transmission. In this case, the cyclic inhibition would serve as an ordering system, which only allows for high-priority information to pass, resulting in higher signal-to-noise ratio. We do not make a claim about how fast or when these signals are transmitted in relation to alpha power. For instance, it could be that alpha power increases as a preparatory state even before signal is actually transmitted.  Zhigalov (2020 Hum. Brain M.) has shown that in V1, frequency-tagging responses were up-and down regulated with attention – independent of alpha activity.

      However, we do believe that visual alpha power correlates on a trial-by-trial level with visual 36 Hz frequency-tagging increases (see Fig. 5 and 10 in our manuscript) - a relationship which has not been found in V1 by us and others (see SUPPL Fig. 12 and Zhigalov 2020, Hum. Brain Mapp.) suggest a strong connection. Furthermore, the fact that the alpha modulation originates from early visual areas and occurs prior to any frequency-tagging changes, while the increase in frequency-tagging can be observed in areas which are later in the processing stream (such as the precuneus) is strongly indicative for an involvement of alpha power in the transmission of this signal. We cannot fully exclude alternative accounts and mechanisms which effect both alpha power and frequency-tagging responses.  

      The alternative account described by the reviewer does not contradict our theory, as we argue that the alpha power modulation reflects an expectation effect (and the idea that it could be related to the resolution of auditory versus visual processing is very interesting!). It is also possible that this expectation is, as the reviewer suggests, related to attention/vigilance and might result in a modality-general signal increase. By way of support, we observed an increase in the frequency-tagging response in sensory integration areas. Accordingly, we argue that the alternative explanation provided by the reviewer contradicts the alpha inhibition hypothesis, but not necessarily our alternative theory.

      We have now revised the discussion and are confident our case is now stronger and easier to follow. Additionally, we mentioned the possibility for alternative explanations as well as the possibility, that alpha networks fulfil different roles in different locations/task environments.

      Line 523: Here we propose that alpha activity, rather than modulating early primary sensory processing, exhibits its inhibitory effects at later stages of the processing stream (Antonov et al., 2020; Gundlach et al., 2020; Zhigalov & Jensen, 2020a; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021). Our data provides evidence in favour of this view, as we can show that early sensory alpha activity covaries over trials with SSEP magnitude in higher order sensory areas. If alpha activity exerted gain control in early visual regions, increased alpha activity would have to lead to a decrease in SSEP responses. In contrast, we observe that increased alpha activity originating from early visual cortex is related to enhanced visual processing. Source localization confirmed that this enhancement was not originating from early visual areas, but from areas associated with later stages of the processing stream such as the precuneus, which has been connected to sensory integration (Al-Ramadhani et al., 2021; Xie et al., 2019). While we cannot completely rule out alternative explanations, it seems plausible to assume that inhibition of other task-irrelevant communication pathways leads to prioritised and thereby enhanced processing over relevant pathways. In line with previous literature (Morrow et al., 2023; Peylo et al., 2021; Zhigalov & Jensen, 2020b), we therefore suggest that alpha activity limits task-irrelevant feedforward communication, thereby enhancing processing capabilities in relevant downstream areas (see Fig. 1A).

      References:

      Dugué, L., Marque, P., & VanRullen, R. (2011). The phase of ongoing oscillations mediates the causal relation between brain excitation and visual perception. Journal of Neuroscience, 31(33), 11889–11893. https://doi.org/10.1523/JNEUROSCI.1161-11.2011

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences, 108(48), 19377–19382. https://doi.org/10.1073/PNAS.1117190108

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-Specific Entrainment of Gamma-Band Neural Activity by the Alpha Rhythm in Monkey Visual Cortex. Current Biology, 22(24), 2313–2318. https://doi.org/10.1016/J.CUB.2012.10.020

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human Brain Mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Structural issues with the narrative and missing information: Here, I am mostly concerned with how this makes the research difficult to access for the reader. I list the some major, followed by more specific points below:

      In the introduction the authors pit the original idea about alpha's role in gating against some recent contradictory results. If it's the aim of the study to provide evidence for either/or, predictions for the results from each perspective are missing. Also, it remains unclear how this relates to the distinction between original vs revised alpha inhibition theory (Fig. 1A). Relatedly, if this revision is an outcome rather than a postulation for this study, it shouldn't be featured in the first figure.

      We agree with the reviewer that we have not sufficiently clarified our goal as well as how different functionalities of alpha oscillations would lead to different outcomes. We have revised the introduction and restructured the results part and hope that it is now easier to follow. The results part now follows study 1 (EEG) and study 2 (MEG) chronologically, so that results can more easily be differentiated and our design choices for the second study can be explained better.

      Line 50: Recent evidence challenged a direct connection between alpha activity and visual information processing in early visual cortex. As such, both visual steady-state responses and alpha power were modulated by attention, but did not covary when investigating individual trials (Zhigalov & Jensen, 2020). Unfortunately, very few studies have investigated direct connections between alpha activity, attention and sensory signals, especially over trials. Furthermore, results seem to depend on timing of alpha activity in relation to sensory responses as well as stimulus type and outcome measure (Morrow et al., 2023).

      Accordingly, the objective of the current study is to test the alpha inhibition hypothesis compared to an alternative theory. Based on the alpha inhibition hypothesis, alpha modulation is connected to ‘gain control’ in early visual areas through modulation of excitability (Foxe & Snyder, 2011; Jensen & Mazaheri, 2010; Van Diepen et al., 2019).  In contrast, we propose that inhibitory effects of alpha modulation are exhibited at later stages of the processing stream (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020a; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1B; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      Line 80: The aim of our study was to directly test the alpha inhibition hypothesis by investigating if cue-induced modulation of alpha activity coincides with the suppression of frequency-tagging responses in task-irrelevant modalities.

      Line 99: In brief, while we observed the expected cue-induced early-visual alpha modulation, the amplitude of auditory and visual SSEP/SSEFs as well as their intermodulation frequency increased just prior to the onset of the auditory target, contradicting the alpha inhibition hypothesis. The difference between conditions of visual SSEP/SSEFs originated from sensory integration areas and correlated with early sensory alpha activity on a trial-by-trial basis, speaking to an effect of alpha modulation on signal transmission rather than inhibition of early visual areas.

      The analysis of the intermodulation frequency makes a surprise entrance at the end of the Results section without an introduction as to its relevance for the study. This is provided only in the discussion, but with reference to multisensory integration, whereas the main focus of the study is focussed attention on one sense. (Relatedly, the reference to "theta oscillations" in this sections seems unclear without a reference to the overlapping frequency range, and potentially more explanation.) Overall, if there's no immediate relevance to this analysis, I would suggest removing it.

      We thank the reviewer for pointing this out and have now added information about this frequency to the introduction. We believe that the intermodulation frequency analysis is important, as it potentially supports the notion that condition differences in the visual-frequency tagging response are related to downstream processing rather than overall visual information processing in V1. We would therefore prefer to leave this analysis in the manuscript.

      Line 75: Furthermore, when applying two different frequencies for two different sensory modalities, their intermodulation frequency (f1-f2) has been suggested to reflect cross-modal integration (Drijvers et al., 2021). Due to distinct responses, localisation and attention-dependence, frequency-tagging provides an optimal tool to study sensory signal processing and integration over time.

      Reviewer #2 (Recommendations for the authors):

      As detailed in several points below, I found that I didn't get the information I needed to fully understand design/analysis decisions. In some cases, this may just be a case of re-organising the manuscript, in others crucial info should be added:

      Specific issues:

      Page 2, line 51: How does recent evidence contradict this? Please explain.

      We have added a section that describes the results contradicting the alpha inhibition hypothesis.

      Line 50: Recent evidence challenged a direct connection between alpha activity and visual information processing in early visual cortex. As such, both visual steady-state responses and alpha power were modulated by attention, but did not covary when investigating individual trials (Zhigalov & Jensen, 2020).

      Page 3, line 78-80: "... also interested in relationships [...] on a trial-by-trial basis" - why? Please motivate.

      We thank the reviewer for highlighting this section, which we feel was not very well phrased. We have rewritten this whole paragraph and hope that our motivation for this study is now clear.

      Line 50: Recent evidence challenged a direct connection between alpha activity and visual information processing in early visual cortex. As such, both visual steady-state responses and alpha power were modulated by attention, but did not covary when investigating individual trials (Zhigalov & Jensen, 2020). Unfortunately, very few studies have investigated direct connections between alpha activity, attention and sensory signals, especially over trials. Furthermore, results seem to depend on timing of alpha activity in relation to sensory responses as well as stimulus type and outcome measure (Morrow et al., 2023).

      Page 4, line 88-92: "... implementing a blocked design" - unclear why? This is explained to some extent in the next few lines but remains unclear without knowing outcomes of the EEG experiment with more detail. Overall, it seems like this methodological detail may be better suited for a narrative in the Results section, that follows a more chronological order from the findings of the EEG experiment to the design of the MEG study.

      More generally, and maybe I missed it, I couldn't find a full account of why a block design was chosen and what the added value was. I believe that re-organising the Results section would allow precisely stating how that was an improvement over the EEG experiment.

      In line with the reviewer’s suggestion, we have now restructured the results section. The first section of the study 2 results now explains our design choices with direct reference to the results of the EEG experiment.

      Line 298: To test the robustness of our results and to employ additional control analyses, we replicated our experiment using MEG (see Fig. 7A). While an increase in visual information processing parallel to an increase in alpha modulation already contradicts the notion of alpha inhibition exerting “gain control”, affecting the whole visual cortex, our claim that alpha modulation instead affects visual information at later processing stages still required further validation. As such, our goal was to perform source analyses showing alpha modulation originating from primary visual areas affected visual information at later processing stages (e.g. not in primary visual cortex). Additionally, to exclude that the uncertainty over possible distractors affected our results, we employed a block design, where block 1 consisted only of trials without distractors and in block 2 targets were always accompanied by a distractor. Furthermore, we aligned the visual and auditory task to be more similar, both of them now featuring frequency-discrimination, which related to sound pitch (frequency) in the auditory condition and stripe-frequency of the Gabor patch in the visual condition. Lastly, to make sure our effects were driven by sensory modality-differences rather than task-difficulty differences, we included a short calibration phase. Prior to the experiment, difficulty of pitch sounds, and Gabor patch frequency were calibrated for each individual, ascertaining a success rate between 55% to 75%.

      The point above also applies to lines 95-97 where it's unclear what "aligning the visual with the auditory task" means. Also, what would be the predictions for "more nuanced interactions [...]"

      We agree that this phrasing was more than confusing and in the process of restructuring our results section, we have now revised this passage (see cited text from our manuscript to the point just above).

      Page 9, line 207-209: One of the few mentions of the "ambivalent" condition (attention to audio+vision?). To what end was that condition added to the experiment originally? The explanation that this condition was dropped from analysis because it did not show significant results does not seem methodologically sound.

      We thank the reviewer for pointing this out, as we had changed the name from ambivalent to non-specific, but this word had slipped our attention. The condition was added to the experiment as a control, which enables us to verify that our cues as well as our distractors work as intended. While interesting to analyse (and we did not drop it completely, the condition comparisons are in the supplementary material), we felt that further analysis of this condition would not contribute to addressing our research question. To be specific, the prerequisite to analysing the effect of alpha modulation is a significant effect of alpha modulation in the first place. We have now clarified the rationale for this condition, as well as our reasoning for omitting it from correlation and source analysis.

      Line 173 When presenting unspecified cues, alpha power changes were not significant, but descriptively larger compared to visual target conditions and lower compared to auditory target conditions (see suppl Fig. 2). However as significant alpha modulation was a prerequisite to test our hypotheses, we excluded this condition from further analysis.

      Page 9, line 209-212: "condition differences in alpha were only significant in block 2 [...] therefore we performed the [...] analysis [...] only for the second half of the experiment." This sounds like double-dipping. Maybe just an issue of phrasing?

      We thank the reviewer for pointing out that it may appear like ‘double dipping’. The reasoning was the same as the point above, we require a significant alpha modulation to test the effect of alpha modulation on further processing. We have revised this part to be clearer.

      Line 345: In line with previous studies (van Diepen & Mazaheri, 2017), condition differences in alpha activity were only significant in block 2, where distractors were present. As alpha modulation was a prerequisite to test our hypotheses, we performed the following analyses solely with data from block 2 (see Fig. 8).

      Page 12, line 281: Bayes factors are used here (and elsewhere), in addition to NHST. May be worthwhile to mention that briefly before use and give an intro sentence on its use, value and interpretation, and why these are added sometimes but not for all tests reported.

      We agree that we did not introduce this at all and have now added a section, which explains the inclusion as well as the interpretation of the Bayes factor.

      Line 218: To estimate the robustness of these results, we additionally conducted median split analyses between trials with high and low alpha power for each participant, as well as averaged the correlation coefficient of each participant and calculated a one-sample t-test against 0. For each analysis we provided the Bayes Factor, which estimates the strength of support for or against the null hypothesis (BF > 3.2 is considered as substantial evidence and BF > 10 is considered as strong evidence; Kass & Raftery, 1995).

      Throughout the Results section, it's not always clear which results are from the EEG or from the MEG study. Adopting the recommendation in point c) may help with that.

      According to the reviewer’s recommendation, we have restructured our results section and first present the EEG study and afterwards the MEG study.

      Similarly, it seems pivotal to add "visual" and "auditory" when mentioning the 36/40-Hz steady-state responses (or stimulation) to help the reader.

      We agree that visual/auditory 36 Hz / 40 Hz frequency-tagging responses, expecting visual/auditory target becomes lengthy and confusing very quickly. We therefore decided to introduce the abbreviation of visual steady-state evoked potentials/fields (VSSEP/VSSEF) and auditory steady-state evoked potentials/fields (ASSEP/ASSEF).

      Figure 5 - showing the same cluster as "early" and "late" in the margin for the MEG data is potentially confusing.

      We thank the reviewer for pointing this out and have now adapted the figure to just show one cluster, as we only found this one cluster in our MEG analysis.

      Reviewer #3 (Public review):

      This paper seems very strong, particularly given that the follow-up MEG study both (a) clarifies the task design and separates the effect of distractor stimuli into other experimental blocks, and (b) provides source-localization data to more concretely address whether alpha inhibition is occurring at or after the level of sensory processing, and (c) replicates most of the EEG study's key findings.

      We thank the reviewer for their positive feedback and evaluation of our work.

      There are some points that would be helpful to address to bolster the paper. First, the introduction would benefit from a somewhat deeper review of the literature, not just reviewing when the effects of alpha seem to occur, but also addressing how the effect can change depending on task and stimulus design (see review by Morrow, Elias & Samaha (2023).

      We thank the reviewer for this suggestion and agree. We have now added a paragraph to the introduction that refers to missing correlation studies and the impact of task design.

      Line 53: Unfortunately, very few studies have investigated direct connections between alpha activity, attention and sensory signals, especially over trials. Furthermore, results seem to depend on timing of alpha activity in relation to sensory responses as well as stimulus type and outcome measure (Morrow et al., 2023).

      Additionally, the discussion could benefit from more cautionary language around the revision of the alpha inhibition account. For example, it would be helpful to address some of the possible discrepancies between alpha and SSEP measures in terms of temporal specificity, SNR, etc. (see Peylo, Hilla, & Sauseng, 2021). The authors do a good job speculating as to why they found differing results from previous cross-modal attention studies, but I'm also curious whether the authors think that alpha inhibition/modulation of sensory signals would have been different had the distractors been within the same modality or whether the cues indicated target location, rather than just modality, as has been the case in so much prior work?

      We thank the reviewer for suggesting these interesting discussion points and have included a paragraph in our discussion that clarifies these issues.

      Line 543: It should be noted, the comparison between modulation in alpha activity and in SSEP/SSEFs is difficult, especially concerning timing. This is largely owed to differences in signal-to-noise due to trial averaging in the frequency versus the time domain and temporal and frequency lag in the estimation of alpha activity (Peylo et al., 2021). It is further noteworthy, that the majority of evidence for the alpha inhibition hypothesis focused on the effect of pre-target alpha modulation on behaviour and target-related potentials (Morrow et al., 2023). However, in our data alpha modulation occurs clearly ahead of SSVEP/SSVEF modulation on a scale that could not be simply explained by temporal or frequency smearing. Additionally, significant trial-by-trial correlations, which occur in the frequency domain for both signal types, underline the strong relationship between both measurements.

      Interestingly, we could show that the magnitude of the correlation between alpha power and visual information processing varied between conditions, suggesting a dynamic and adaptive regime. This notion supports the view that alpha oscillations represent a mechanism rather than a specific function, which can fulfil different roles depending on task demand and network location, which has been confirmed in a recent study revealing functionally distinct alpha networks (Clausner et al., 2024). As such, it is conceivable that alpha oscillations can in some cases inhibit local processing, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. In different contexts, utilizing unimodal targets and distractors, spatial cueing, or covert attention, different functional processes could be involved (Morrow et al., 2023). Future research should intensify efforts to disentangle these effects, investigating localized alpha networks intracranially or through combinations of fMRI, EEG and MEG, to clearly measure their effects on sensory processing and behaviour.

      Overall, the analyses and discussion are quite comprehensive, and I believe this paper to be an excellent contribution to the alpha-inhibition literature.

      Reviewer #3 (Recommendations for the authors):

      Overall, the paper is well-written, and the analyses and interpretations are strong. I think that the end of the introduction would feel more complete and more read more easily if you outlined all of your main hypotheses (not just trials signaling an auditory stimulus, but visual trials too, and what about distractor trials? This could help justify changes to task design in the MEG study), and then the key findings that motivated the follow-up design, which you then discuss (as opposed to introducing a new aim in this paragraph).

      We thank the reviewer for this positive evaluation. Based on feedback und suggestions from all reviewers, we have revised the structure of the manuscript. The introduction now states more clearly which results would be expected under the alpha inhibition theory and how our results contradict this. The results section has now been divided into two studies, which will make the rationale for our follow-up design easier to follow.

      Line 80: The aim of our study was to directly test the alpha inhibition hypothesis by investigating if cue-induced modulation of alpha activity coincides with the suppression of frequency-tagging responses in task-irrelevant modalities.

      Line 96: In brief, while we observed the expected cue-induced early-visual alpha modulation, the amplitude of auditory and visual SSEP/SSEFs as well as their intermodulation frequency increased just prior to the onset of the auditory target, contradicting the alpha inhibition hypothesis. The difference between conditions of visual SSEP/SSEFs originated from sensory integration areas and correlated with early sensory alpha activity on a trial-by-trial basis, speaking to an effect of alpha modulation on signal transmission rather than inhibition of early visual areas.

      Minor issues:

      L84 - "is" should be "was"

      L93 - "allows" should be "allowed"

      L113 - I think "changed" would suffice

      Fig 1A (text within figure on top) - "erea" should be "area" and caption title should include "of" (Illustration of the...)

      L213 - time window could be clarified

      Fig 4 -captions inconsistently capitalize words and use ) and , following the caption letters

      L253-255 - give you are looking at condition differences, do you mean the response was larger before an auditory target than before a visual target? It currently reads as if you mean that it was larger in that window right before the target as opposed to other time windows

      L368 - "behaviorally" should be "behavioral"

      L407-408 - I think auditory SSEP/SSVEFs should be auditory or visual SSEP/SSEFs, unless you are specifically only talking about auditory SSEPs and visual SSEFs

      L411 - also uses SSVEFs

      L413 - "frequently, or in the case of..."

      L555 - "predicting" should be predicted? Or do you mean only cues that correctly predicted the target?

      We are very grateful for the reviewer for pointing out these mistakes, all of which we have remedied in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point responses to the reviewers' comments:

      All three reviewers found our analysis of focal adhesion-associated oncogenic pathways (Figs 3 and S3) to be inconsistent (Reviewer 1), not convincing/consistent (Reviewer 2, #2), and too variable and not well supported (Reviewer 3, #2). This was probably the basis for the eLife assessment, which stated: “However, the study is incomplete because the downstream molecular activities of PLECTIN that mediate the cancer phenotypes were not fully evaluated.” We agree with the reviewers that the degree of attenuation of the FAK, MAP/Erk, and PI3K/AKT signaling pathways differs depending on the cell line used (Huh7 and SNU-475) and the mode of inactivation (CRISPR/Cas9-generated plectin KO, functional KO (∆IFBD), and organoruthenium-based inhibitor plecstatin-1). However, we do not share the reviewers' skepticism about the unconvincing nature of the data presented.

      Several previous studies have shown that plectin inactivation invariably leads to dysregulation of cell adhesions and associated signaling pathways in various cell systems. The molecular mechanisms driving these changes are not fully understood, but the most convincingly supported scenarios are uncoupling of keratin filaments (hemidesmosomes; (Koster et al., 2004)) and vimentin filaments (focal adhesions; (Burgstaller et al., 2010; Gregor et al., 2014)) from adhesion sites in conjunction with altered actomyosin contractility (Osmanagic-Myers et al., 2015; Prechova et al., 2022; Wang et al., 2020). This results in altered morphometry (Wang et al., 2020), dynamics (Gregor et al., 2014), and adhesion strength (Bonakdar et al., 2015) of adhesions. These changes are accompanied by reduced mechanotransduction capacity and attenuation of downstream signaling such as FAK, Src, Erk1/2, and p38 in dermal fibroblasts (Gregor et al., 2014); decrease in pFAK, pSrc, and pPI3K levels in prostate cancer cells (Wenta et al., 2022); increase in pErk and pSrc in keratinocytes (Osmanagic-Myers et al., 2006); decrease in pERK1/2 in HCC cells (Xu et al., 2022) and head and neck squamous carcinoma cells (Katada et al., 2012).  

      Consistent with these published findings, we show that upon plectin inactivation, the HCC cell line SNU475 exhibits aberrant cytoskeletal organization (vimentin and actin; Figs 4A-D, S4A-F), altered number, topography and morphometry of focal adhesions (Figs 4A, E-G, S4H,I), and ineffective transmission of traction forces (Fig 4H,I). Similar, although not quantified, phenotypes are present in Huh7 with inactivated plectin (data not shown). It is worth noting, that even robust cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (%central FA, Fig 4A,E) phenotypes differ significantly between different modes of plectin inactivation and would certainly do so if compared between cell lines. These phenotypes are heterogeneous but not inconsistent. Interestingly, both SNU-475 and Huh7 plectin-inactivated cells show similar functional consequences such as prominent decrease in migration speed (Fig 5B). This suggests that while specific aspects of cytoarchitecture are differentially affected in different cell lines, the functional consequences of plectin inactivation are shared between HCC cell lines.

      It is therefore not surprising that the activation status of downstream effectors, resulting from different degrees of cytoskeletal and focal adhesion reconfiguration, is not identical (or even comparable) between cell lines and treatment conditions. Furthermore, we compare highly epithelial (keratin- and almost no vimentin-expressing) Huh7 cells with highly dedifferentiated (low keratin- and high vimentinexpressing) SNU-475 cells, which differ significantly in their cytoskeleton, adhesions, and signaling networks. Alternative approaches to plectin inactivation are not expected to result in the same degree of dysregulation of specific signaling pathways. Effects of adaptation (CRISPR/Cas9-generated KOs and ∆IFBDs), engagement of different binding domains (CRISPR/Cas9-generated ∆IFBDs), and pleiotropic modes of action (plecstatin-1) are expected.

      In our study, we provide the reader with an unprecedented complex comparison of adhesion-associated signaling between WT and plectin-inactivated HCC cell lines. First, we compared the proteomes of WT, KO and PST-treated WT SNU-475 cells using MS-based shotgun proteomics and phosphoproteomics (Fig 3A-C). Second, we extensively and quantitatively immunoblotted the major molecular denominators of MS-identified dysregulated pathways (such as “FAK signaling”, “ILK signaling”, and “Integrin signaling”) with the following results. Data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 1.

      In addition, we show dysregulated expression (mostly downregulation) of focal adhesion constituents ITGβ1 and αv, talin, vinculin, and paxilin which nicely complements fewer and larger focal adhesions in plectin-inactivated HCC cells. In light of these results, we believe that our statement that “Although these alterations were not found systematically in both cell lines and conditions (reflecting thus presumably their distinct differentiation grade and plectin inactivation efficacy), collectively these data confirmed plectin-dependent adhesome remodeling together with attenuation of oncogenic FAK, MAPK/Erk, and PI3K/Akt pathways upon plectin inactivation” (see pages 8-9) is fully supported. Furthermore, in support of the results of MS-based (phospho)proteomic and immunoblot analyses we show strong correlation between plectin expression and the signatures of “Integrin pathway” (R<sup>2</sup>=0.15, p= 2x10<sup>-45</sup>), “FAK pathway” (R<sup>2</sup>=0.11, p= 2x10<sup>-34</sup>), “PI3K Akt/mTOR signaling” (R<sup>2</sup>=0.06, p= 2x10<sup>-20</sup>) or “Erk pathway” (R<sup>2</sup>=0.10, p= 6x10<sup>-30</sup>) in HCC samples from 1268 patients (Fig S7-2C and S7-3).

      In conclusion, we show that plectin is required for proper/physiological adhesion-associated signaling pathways in HCC cells. The HCC adhesome and associated pathways are dysregulated upon plectin inactivation and we show context-dependent varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways. In our view, presenting context-dependent variability in expression/activation of pathway molecular denominators is a trade-off for our intention to address this aspect of plectin inactivation in the complexity of different cell lines, tissues, and modes of inactivation. We prefer rather this complex approach to presenting “more convincing” black-and-white data assessed in a single cell line (Qi et al., 2022) or upon plectin inactivation by a single approach (compare with otherwise excellent studies such as (Xu et al., 2022) or (Buckup et al., 2021)). In fact, unlike the reviewers, we consider this complexity (and the resulting heterogeneity of the data) to be a strength rather than a weakness of our study.

      Reviewer 1:

      (1) The authors suggest that plectin controls oncogenic FAK, MAPK/Erk, and PI3K/Akt signaling in HCC cells, representing the mechanisms by which plectin promotes HCC formation and progression. However, the effect of plectin inactivation on these signaling was inconsistent in Huh7 and SNU-475 cells (Figure 3D), despite similar cell growth inhibition in both cell lines (Figure 2G). For example, pAKT and pERK were only reduced by plectin inhibition in SNU-475 cells but not in Huh7 cells.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of molecular denominators of signaling pathways reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. We expect, that functional consequences (such as reduced migration and anchorage-independent proliferation) arise from a combination of changes in individual pathways. The sum of often subtle changes will result in comparable effects not only on cell growth, but also on migration or transmission of traction forces. For more detailed comment, please see our response to all Reviewers on the first three pages of this letter.

      We believe, that our data show that both pAkt and pErk are attenuated upon plectin inactivation in both Huh7 and SNU-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 2.

      (2) In addition, pFAK was not changed by plectin inhibition in both cells, and the ratio of pFAK/FAK was increased in both cells.

      We agree with the reviewer that pFAK/FAK levels are either comparable or slightly higher upon plectin inactivation. However, we believe that our data convincingly show that FAK expression is downregulated in both Huh7 and Snu-475 cells. In our opinion, this results in an overall attenuation of the FAK signaling (see percentage for Normalized pFAKxNormalized FAK), which is expectedly more pronounced in migratory Snu-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 3.

      Given these results, we feel that our statement that “inhibition of plectin attenuates FAK signaling” (pages 8-9) is well supported.

      (3) Thus, it is hard to convince me that plectin promotes HCC formation and progression by regulating these signalings.

      Previous studies have shown that dysregulation of cell adhesions and attenuation of adhesionassociated FAK, MAPK/Erk, and PI3K/Akt signaling has inhibitory effects on HCC formation and progression. We show that plectin is required for the proper/physiological functioning of adhesionassociated signaling pathways in selected HCC cells. The HCC adhesome and associated pathways are dysregulated upon plectin inactivation and we show context-dependent varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways. We support these conclusions by providing the reader with proteomic and phosphoproteomic comparisons of adhesion-associated signaling between WT and plectin-inactivated HCC cell lines (Figs 3B,C and S3A,B). We further validate our findings by extensive and quantitative immunoblotting analysis (Figs 3D and S3C). In addition, we show a strong correlation between plectin expression and the signatures of “Integrin pathway” (R<sup>2</sup>=0.15, p= 2x10<sup>-45</sup>), “FAK pathway” (R<sup>2</sup>=0.11, p= 2x10<sup>-34</sup>), “PI3K Akt/mTOR signaling” (R<sup>2</sup>=0.06, p= 2x10<sup>-20</sup>) or “Erk pathway” (R<sup>2</sup>=0.10, p= 6x10<sup>-30</sup>) in HCC samples from 1268 patients (Fig S7E).

      Our data and conclusions are fully consistent with previously published studies in HCC cells. For instance, even a mild decrease in FAK levels leads to a significant reduction in colony size (see effects of KD (Gnani et al., 2017) , effects of FAK inhibitor and sorafenib in xenografts (Romito et al., 2021), or effects of inhibitors in soft agars and xenografts (Wang et al., 2016)). Similar effects were observed upon partial Akt inhibition (compare with Akt inhibitors in soft agars (Cuconati et al., 2013; Liu et al., 2020)). Of course, we cannot rule out synergistic plectin-dependent effects mediated via adhesion-independent mechanisms. To identify these mechanisms and to distinguish contribution of various consequences of cytoskeletal dysregulation to phenotypes described in this manuscript would be experimentally challenging and we feel that these studies go beyond the scope of our current study.

      As we feel that the adhesion-independent mechanisms were not sufficiently discussed in the original manuscript, we have removed the original sentence “Given the well-established oncogenic activation of these pathways in human cancer(33), our study identifies a new set of potential therapeutic targets.” (page 15) from the Discussion and added the following text: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15). See also our response to Reviewer 2, #4 and Reviewer 3, #3 and #4.

      (4) The authors claimed that Plectin inactivation inhibits HCC invasion and metastasis using in vitro and in vivo models. However, the results from in vivo models were not as compelling as the in vitro data. The lung colonization assay is not an ideal in vivo model for studying HCC metastasis and invasion, especially when Plectin inhibition suppresses HCC cell growth and survival. Using an orthotopic model that can metastasize into the lung or spleen could be much more convincing for an essential claim.

      We agree with the reviewer that the orthotopic in vivo model would be an ideal setting to address HCC metastasis experimentally. There are several published models of HCC extrahepatic metastasis, including an orthotopic model of lung metastasis (Fan et al., 2012; Voisin et al., 2024; You et al., 2016), but to our knowledge, none of these orthotopic models are commonly used in the field. In contrast, the administration of tumor cells via the tail vein of mice is a standard, well-established approach of first choice for modelling lung metastasis in a variety of tumor types (e.g. (Hiratsuka et al., 2011; Jakab et al., 2024; Lu et al., 2020)), including HCC (Jin et al., 2017; Lu et al., 2020; Tao et al., 2015; Zhao et al., 2020). 

      Furthermore, we do not believe that the use of an orthotopic model would provide a comparable advantage in terms of plectin-mediated effects on metastatic growth compared to tail vein delivery of tumor cells. Importantly, the lung colonization model used in our study allows for the injection of a defined number of HCC cells into the bloodstream, thus eliminating the effect of the primary tumor size on the number of metastasizing cells. To distinguish between effects of plectin inhibition on HCC cell growth/survival and dissemination, we carefully evaluated both the number and volume of lung metastases (Figs 6I and S6C-F). The observed reduction in the number of metastases (Figs 6I and S6D) reflects the initiation/early phase of metastasis formation, which is strongly influenced by the adhesion, migration, and invasion properties of the HCC cells and corresponds well with the phenotypes described after plectin inactivation in vitro (Figs 4H,I; 5; 6A-E; S5; and S6A,B). The reduction in the volume of metastases (Figs 6I and S6E) reflects the effects of plectin inhibition on HCC cell growth and metastatic outgrowth and corresponds well with the in vitro data shown in Figs 2G,H and S2F,G.

      (5) Also, in Figure 6H, histology images of lungs from this experiment need to be shown to understand plectin's effect on metastasis better.

      We are grateful to the reviewer for bringing our attention to the lung colonization assay results presented. The description of the experiments in the text of the original manuscript was incorrect. The animals monitored by in vivo bioluminescence imaging (shown in Fig 6H) are the same as the mice from which cleared whole lung lobes were analyzed by lattice light sheet fluorescence microscopy (shown in Fig. 6I). The corrected description is now provided in the revised manuscript as follows: “To identify early phase of metastasis formation, we next monitored the HCC cell retention in the lungs using in vivo bioluminescence imaging (Fig. 6H). This experimental cohort was expanded for WT-injected mice which were administered PST…” (page 11).

      Therefore, lungs from all animals shown in Fig 6H,I were CUBIC-cleared and analyzed by lattice light sheet fluorescence microscopy. As requested by Reviewer 2, Recommendation #1, we provide in the revised manuscript (Fig S6F) “whole slide scan results for all the groups” which could help to understand plectin's effect on metastasis better”. To address the reviewer's concern, we also post-processed cleared and visualized lungs for hematoxylin staining and immunolabeled them for HNF4α. A representative image is shown as a panel A in Author response image 1. Post-processing of CUBIC-cleared and immunolabeled lung lobes resulted in partial tissue destruction and some samples were lost. In addition, as the entire experimental setup was designed for the early phase of metastasis formation, only small Huh7 foci were formed (compared to the larger metastases that developed within 13 weeks after inoculation shown in the panel B). As the IHC for HNF4α provides significantly lower sensitivity compared to the immunofluorescence images provided in the manuscript, we were only able to identify a few HNF4α-positive foci. Overall, we consider our immunofluorescence images to be qualitatively and quantitatively superior to IHC sections. However, if the reviewer or the editor considers it beneficial, we are prepared to show our current data as a part of the manuscript.

      Author response image 1.

      (A) HNF4α staining of lung tissue after CUBIC clearing from mice inoculated with WT Huh7 from the timepoint of BLI, when the positive signal in chest area has been detected. This timepoint was then selected for the comparison of initial stages of lung colonization. (B) H&E and HNF4α staining from lung tissue of mice inoculated with WT Huh7 cells from the survival experiment. Scale bars, 50 µm.

      (6) Figure 6G, it is unclear how many mice were used for this experiment. Did these mice die due to the tumor burdens in the lungs?

      The number of animals is given in the legend to Fig 6G (page 34; N = 14 (WT), 13 (KO)). Large Huh7 metastases were identified in the lungs of animals that could be analyzed post-mortem by IHC (see panel B in the figure above). No large metastases were found in other organs examined, such as the liver, kidney and brain. It is therefore highly likely that these mice died as a result of the tumor burden in the lungs. A similar conclusion was drawn from the results of the lung colonization model in the previous studies (Jin et al., 2017; Zhao et al., 2020).

      (7) The whole paper used inhibition strategies to understand the function of plectin. However, the expression of plectin in Huh7 cells is low (Figure 1D). It might be more appropriate to overexpress plectin in this cell line or others with low plectin expression to examine the effect on HCC cell growth and migration.

      For this study, we selected two model HCC cell lines – Huh7 and SNU-475. Our intention was to investigate the role of plectin in “well-differentiated” (Huh7) and “poorly differentiated” (SNU-475) HCC cells, including thus early and advanced stages of HCC development (as categorized before (Boyault et al., 2007; Yuzugullu et al., 2009a); see also our description and rationale on page 6). As anticipated, less migratory “epithelial-like” Huh7 cells are characterized by relatively high E-cadherin, low vimentin, and low plectin expression levels (Fig 1D). In contrast, migratory “mesenchymal-like” SNU-475 cells are characterized by relatively low E-cadherin, high vimentin, and high plectin expression levels (Fig 1D). Therefore, the majority of analyses were performed in both relatively low plectin-expressing Huh7 and high plectin-expressing SNU-475 cells. It is noteworthy, that inactivation of plectin had similar (although less pronounced) inhibitory effects on growth and migration in both Huh7 and SNU-475 cells.

      We agree with the reviewer that “It might be more appropriate to overexpress plectin in this cell line or others with low plectin expression to examine the effect on HCC cell growth and migration”. In fact, we have received similar suggestions since we started publishing our studies on plectin. There are two reasons, which preclude the successful overexpression experiments. First, there are about 14 known isoforms of plectin (Prechova et al., 2023). Although, previous studies have analyzed the phenotypic rescue potential of some plectin isoforms using transient transfection (e.g. (Burgstaller et al., 2010; Osmanagic-Myers et al., 2015; Prechova et al., 2022)), the isoform variability precludes rescue/overexpression experiments if the causative isoform is not known. Second, plectin is a giant cytoskeletal crosslinker protein of more than 4,500 amino acids with binding sites for intermediate filaments, F-actin, and microtubules. Overexpression of the approximately 500 kDa-large crosslinker invariably leads to the collapse of cytoskeletal networks in every cell type we have tested so far. See also our response to Reviewer 3, #2.

      Reviewer 2:

      (1) The annotation of mouse numbers is confusing. In Figures 2A B D E F, it should be the same experiment, but the N numbers in A are 6 and 5. In E and F they are 8 and 3. Similarly, in Figure 2H, in the tumor size curve, the N values are 4,4,5,6. In the table, N values are 8,8,10,11 (the authors showed 8,7,8,7 tumors that formed in the picture). 

      We are grateful to the reviewer for bringing our attention to the inconsistency the number of animals in DEN-induced hepatocarcinogenesis. Results from two independent cohorts are presented in the manuscript. The first cohort was used for MRI screening (Fig 2A-C) and at the second screening timepoint of 44 weeks, approximately 75% of animals died during anesthesia. Therefore, the second cohort of Ple<sup>ΔAlb</sup> and Ple<sup>fl/fl</sup> mice was used for macroscopic confirmation and histology (Figs 2D-F and S2A). We agree with the reviewer that the original presentation of the data may be misleading; therefore, we have rephrased the sentence describing macroscopic confirmation and histology (Figs 2D-F and S2A) as follows: “Decreased tumor burden in the second cohort of Ple<sup>ΔAlb</sup> mice was confirmed macroscopically…” (page 7).

      For the experiments shown in Fig 2H, mice were injected in both hind flanks. We have added this information to the figure legend along with the correct number of tumors.

      (2) In Figure 3D and Figure S3C, the changes in most of the proteins/phosphorylation sites are not convincing/consistent. These data are not essential for the conclusion of the paper and WB is semi-quantitative. Maybe including more plots of the proteins from proteomic data could strengthen their detailed conclusions about the link between Plectin and the FAK, MAPK/Erk, PI3K/Akt pathways as shown in 3E.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of pathway molecular denominators reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. See also the detailed response to all reviewers (on the first three pages of this letter) and the responses to Reviewer 1, #1 and #2, Reviewer 3, #4.

      Our immunoblot analysis is based on NIR fluorescent secondary antibodies which were detected and quantified using an Odyssey imaging system (LI-COR Biosciences). This approach allows a wider linear detection range than chemiluminescence without a signal loss and is considered to provide quantitative immunoblot detection (Mathews et al., 2009; Pillai-Kastoori et al., 2020) (see also manufacturer's website: https://www.licor.com/bio/applications/quantitative-western-blots/).

      Following the reviewer's recommendation, we have carefully reviewed our proteomic and phosphoproteomic data. There are no further MS-based data (other than those already presented in the manuscript) to support the association of plectin with the FAK, MAPK/Erk, PI3K/Akt pathways.

      (3) Figure S7A and B, The pictures do not show any tumor, which is different from Figure 7A and B (and from the quantification in S7A lower right). Is it just because male mice were used in Figure 7 and female mice were used in Figure S7? Is there literature supporting the sex difference for the Myc-sgP53 model?

      As indicated in the Figure legends and in the corresponding text in the Results section (page 12), the Fig 7A,B shows Myc;sgTp53-driven hepatocarcinogenesis in male mice, whereas Fig S7C,D shows results from the female cohort. In general, the HDTVi-induced HCC onset and progression differs considerably between individual experiments, and it is therefore crucial to compare data within an experimental cohort (as we have done for Ple<sup>ΔAlb</sup> and Ple<sup>fl/fl</sup> mice). Nevertheless, we cannot exclude the influence of sexual dimorphism on the results presented. The existence of sexual dimorphism in liver cancer is supported by a substantial body of evidence derived from various studies (e.g. (Bigsby and CaperellGrant, 2011; Bray et al., 2024)). To date, no reports have specifically addressed sexual dimorphism in Myc;sgTp53 HDTVI-induced liver cancer. This is likely due to the fact that the vast majority of studies using this model have only presented data for one sex. However, a study using an HDTVI-administered combination of c-MET and mutated beta-catenin oncogenes to induce HCC in mice observed elevated levels of alpha-fetoprotein (AFP) in males when compared to females (Bernal et al., 2024). The study suggests that estrogen may have a protective effect in female mice, as ovariectomized females had AFP levels comparable to those observed in males. Our data suggest that female hormones may have a similar effect in the Myc;sgTp53 HDTVI-induced liver cancer model.

      (4) Figure 2F, S2A, Ple<sup>ΔAlb</sup> mice more frequently formed larger tumors, as reflected by overall tumor size increase. The interpretation of the authors is "possibly implying reduced migration or increased cohesion of plectin-depleted cells". It is quite arbitrary to make this suggestion in the absence of substantial data or literature to support this theory.

      We agree with the reviewer that our statement “Notably, Ple<sup>ΔAlb</sup> mice more frequently formed larger tumors, as reflected by overall tumor size increase (Fig. 2F; Figure 2—figure supplement 1A), possibly implying reduced migration or increased cohesion of plectin-depleted cells(25).” (page 7) is rather speculative. As we did not further address the formation of larger tumors in Ple<sup>ΔAlb</sup> mice further in the current study, we wanted to provide the readers with some, even speculative, hypotheses. In support of our hypothesis, we cite our own publication (#26; Jirouskova et al., J Hepatol., 2018), where we show that plectin inactivation in Ple<sup>ΔAlb</sup> livers results in upregulation of the epithelial marker E-cadherin. Previous studies have shown that similar increase in E-cadherin expression levels reflects mesenchymalto-epithelial transition (e.g. (Adhikary et al., 2014; Auersperg et al., 1999; Wendt et al., 2011)) and is often associated with reduced cancer cell migration/invasion. This is consistent with our finding that “migrating plectin-disabled SNU-475 cells exhibited more cohesive, epithelial-like features while progressing collectively. By contrast, WT SNU-475 leader cells were more polarized and found to migrate into scratch areas more frequently than their plectin-deficient counterparts (Figure 5—figure supplement 1B). Consistent with this observation, individually seeded SNU-475 cells less frequently assumed a polarized, mesenchymal-like shape upon plectin inactivation in both 2D and 3D environments (Fig. 5C). Moreover, plectin-inactivated SNU-475 cells exhibited a decrease in N-cadherin and vimentin levels when compared to WT counterparts (Figure 5—figure supplement 1C).” (page 10).

      In conclusion, we have shown that plectin-deficient hepatocytes express higher levels of E-cadherin and hepatocyte-derived SNU-475 cells express less N-cadherin and vimentin. In addition, we show that SNU475 cells exhibited more cohesive, epithelial-like features in scratch-wound experiments. To address the reviewer's concern and to further support our statement about the increased cohesiveness of plectindeficient HCC cells we have included the citation of the recent study #27 (Xu et al., 2022). Using the MHCC97H and MHCC97L HCC cell lines, this study shows that plectin downregulation “inhibits HCC cell migration and epithelial mesenchymal transformation”, which is fully consistent with our hypothesis. To mitigate the impression of an unsubstantiated statement, we also discuss adhesion-independent plectin-mediated mechanisms in the revised Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (5) Mutation or KO PLEC has been shown to cause severe diseases in humans and mice, including skin blistering, muscular dystrophy, and progressive familial intrahepatic cholestasis. Please elaborate on the potential side effects of targeting Plectin to treat HCC.

      Indeed, mutation or ablation of plectin has been implicated in many diseases (collectively known as plectinopathies). These multisystem disorders include an autosomal dominant form of epidermolysis bullosa simplex (EBS), limb-girdle muscular dystrophy, aplasia cutis congenita, and an autosomal recessive form of EBS that may be associated with muscular dystrophy, pyloric atresia, and/or congenital myasthenic syndrome. Several mutations have also been associated with cardiomyopathy and malignant arrhythmias. Progressive familial intrahepatic cholestasis has also been reported. In genetic mouse models, loss of plectin leads to skin fragility, extensive intestinal lesions, instability of the biliary epithelium, and progressive muscle wasting (for more details see (Vahidnezhad et al., 2022)). 

      It is therefore important to evaluate potential side effects, and plectin inactivation therefore presents challenges comparable to other anti-HCC targets. For instance, Sorafenib, the most widely used chemotherapy in recent decades, targets numerous serine/threonine and tyrosine kinases (RAF1, BRAF, VEGFR 1, 2, 3, PDGFR, KIT, FLT3, FGFR1, and RET) that are critical for proper non-pathological functions (Strumberg et al., 2007; Wilhelm et al., 2006; Wilhelm et al., 2004). The combinatorial therapy of atezolizumab and bevacizumab targets also PD-L1 in conjunction with VEGF, which plays an essential role in bone formation (Gerber et al., 1999), hematopoiesis (Ferrara et al., 1996), or wound healing (Chintalgattu et al., 2003). To allow readers to read a comprehensive account of the pathological consequences of plectin inactivation, we included two additional citations (Prechova et al., 2023; Vahidnezhad et al., 2022)  and rephrased Introduction section as follows: “…multiple reports have linked plectin with tumor malignancy(12) and other pathologies (Prechova et al., 2023; Vahidnezhad et al., 2022), mechanistic insights…” (page 4-5).

      Reviewer 3:

      (1) The rationale for using Huh7 cells in the manuscript is not well explained as it has the lowest Plectin expression levels.

      For this study, we selected two model HCC cell lines - Huh7 and SNU-475. Our intention was to address the role of plectin in “well-differentiated” (Huh7) and “poorly differentiated” (SNU-475) HCC cells, thus including early and advanced stages of HCC development (as categorized before (Boyault et al., 2007; Yuzugullu et al., 2009b) see also our description and reasoning on page 6). The Huh7 cell line is also a well-established and widely used model suitable for both in vitro and in vivo settings (e.g. (Du et al., 2024; Fu et al., 2018; Si et al., 2023; Zheng et al., 2018).

      As anticipated, less migratory “epithelial-like” Huh7 cells are characterized by relatively high E-cadherin, low vimentin, and low plectin expression levels (Fig 1D). In contrast, migratory “mesenchymal-like” SNU475 cells are characterized by relatively low E-cadherin, high vimentin, and high plectin expression levels (Fig 1D). Therefore, the majority of analyses were performed in both relatively low plectin-expressing Huh7 and high plectin-expressing SNU-475 cells. It is noteworthy, that inactivation of plectin had similar (although less pronounced) inhibitory effects on the phenotypes in both Huh7 and SNU-475 cells. We believe that these findings highlight the importance of plectin in HCC growth and metastasis, as plectin inactivation has inhibitory effects on both early (low plectin) and advanced (high plectin) stages of HCC.

      (2) The KO cell experiments should be supplemented with overexpression experiments.

      We agree with the reviewer that it would be helpful to complement our plectin inactivation experiments by overexpressing plectin in the HCC cell lines used in this study. In fact, we have received similar suggestions since we started to publish our studies on plectin. There are two reasons, which preclude the successful overexpression experiments. First, there is about 14 known isoforms of plectin (Prechova et al., 2023). Although previous studies have analyzed the phenotypic rescue potential of some plectin isoforms using transient transfection (e.g. (Burgstaller et al., 2010; Osmanagic-Myers et al., 2015; Prechova et al., 2022)), the isoform variability precludes rescue/overexpression experiments if the causative isoform is not known. Second, plectin is a giant cytoskeletal crosslinker protein of more than 4,500 amino acids with binding sites for intermediate filaments, F-actin, and microtubules. Overexpression of the approximately 500 kDa-large crosslinker invariably leads to the collapse of cytoskeletal networks in every cell type we have tested so far. See also our response to Reviewer 1, #7.

      (3) There is significant concern that while ablation of Ple led to reduced tumor number, these mice had larger tumors. The data indicate that Plectin may have distinct roles in HCC initiation versus progression. The data are not well explained and do not fully support that Plectin promotes hepatocarcinogenesis.

      In the DEN-induced HCC model MRI screening revealed fewer tumors and also tumor volume was reduced at 32 and 44 weeks post-induction (Fig 2A-C). Larger tumors formed in Ple<sup>ΔAlb</sup> compared to Ple<sup>fl/fl</sup> livers (Figs 2F and S2A) refer only to a subset of macroscopic tumors visually identified at necropsy. Larger Ple<sup>ΔAlb</sup> tumors were not observed in the Myc;sgTp53 HDTVI-induced HCC model (data not shown). In contrast, plectin deficiency reduced the size of xenografts formed in NSG mice (Fig 2H), and agar colonies grown from Huh7 and SNU-475 cells with inactivated plectin were also smaller (Fig S2F). In all in vivo and in vitro approaches presented in the manuscript, plectin inactivation reduced the number of colonies/xenografts/tumors. As hepatocarcinogenesis is a multistep process including initiation, promotion, and progression (Pitot, 2001), we feel confident in concluding that plectin inactivation inhibits hepatocarcinogenesis and we consider this conclusion to be fully supported by the data presented in the manuscript.

      However, we agree with the reviewer that larger macroscopic Ple<sup>ΔAlb</sup> tumors in the DEN-induced HCC model are intriguing. As we do not see similar effects (or even trends) in other approaches used in this study, we cannot exclude the contribution of plectin-deficient environment in Ple<sup>ΔAlb</sup> livers during longterm (44 weeks) tumor formation and growth. In our previous study (Jirouskova et al., 2018), we showed that plectin deficiency in Ple<sup>ΔAlb</sup> livers leads to biliary tree malformations, collapse of bile ducts and ductules, and mild ductular reaction. We could speculate that Ple<sup>ΔAlb</sup> livers suffer from continuous bile leakage into the parenchyma, which would exacerbate all models of long-term pathology.

      As we did not further address the formation of larger tumors in Ple<sup>ΔAlb</sup> mice further in the current study, we offered the reader the hypothesis that large tumors could “…possibly implying reduced migration or increased cohesion of plectin-depleted cells25.” In support of our hypothesis, we cite our own publication (#26; Jirouskova et al., J Hepatol., 2018), where we show that plectin inactivation in Ple<sup>ΔAlb</sup> livers results in upregulation of the epithelial marker E-cadherin. Previous studies have shown that similar increase in E-cadherin expression levels reflects mesenchymal-to-epithelial transition (e.g. (Adhikary et al., 2014; Auersperg et al., 1999; Wendt et al., 2011)) and is often associated with reduced cancer cell migration/invasion. This is consistent with our finding that “migrating plectin-disabled SNU475 cells exhibited more cohesive, epithelial-like features while progressing collectively. By contrast, WT SNU-475 leader cells were more polarized and found to migrate into scratch areas more frequently than their plectin-deficient counterparts (Figure 5—figure supplement 1B). Consistent with this observation, individually seeded SNU-475 cells less frequently assumed a polarized, mesenchymal-like shape upon plectin inactivation in both 2D and 3D environments (Fig. 5C). Moreover, plectin-inactivated SNU-475 cells exhibited a decrease in N-cadherin and vimentin levels when compared to WT counterparts (Figure 5—figure supplement 1C).” (page 10).

      In conclusion, we have shown that plectin-deficient hepatocytes express higher levels of E-cadherin and hepatocyte-derived SNU-475 cells less N-cadherin and vimentin. In addition, we show that SNU-475 cells exhibited more cohesive, epithelial-like features in scratch-wound experiments. To address the reviewer's concern and to further support our claim of increased cohesiveness of plectin-deficient HCC cells we included the citation of the recent study(27). Using the MHCC97H and MHCC97L HCC cell lines, this study shows that plectin downregulation “inhibits HCC cell migration and epithelial mesenchymal transformation” and is therefore fully consistent with our hypothesis. To mitigate the impression of an unsubstantiated statement, we also discuss adhesion-independent plectin-mediated mechanisms in the revised Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesionindependent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (4) Figure 3 showed that Plectin does not regulate p-FAK/FAK expression. Therefore, the statement that Plectin regulates the FAK pathway is not valid. Furthermore, there are too many variables in turns of p-AKT and p-ERK expression, making the conclusion not well supported.

      We agree with the reviewer that pFAK/FAK levels are either comparable or slightly higher upon plectin inactivation. However, we believe that our data convincingly show that FAK expression is downregulated in both Huh7 and Snu-475 cells. In our opinion, this results in an overall attenuation of the FAK signaling (see percentage for Normalized pFAKxNormalized FAK), which is expectedly more pronounced in migratory Snu-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values highlighted in red:

      Author response table 4.

      Given these results, we believe that our statement that “inhibition of plectin attenuates FAK signaling” (pages 8-9) is well supported.

      We believe, that our data show that both pAkt and pErk are attenuated upon plectin inactivation in both Huh7 and SNU-475 cells. The following data (presented in Figs 3D and S3C) are shown as a percentage of untreated WT, with downregulated values highlighted in red:

      Author response table 5.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of pathway molecular denominators reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. See also the detailed response to all Reviewers (on the first three pages of this letter) and the responses to Reviewer 1, #1 and #2 and Reviewer 2, #4.

      (5) The studies of plecstatin-1 in HCC should be expanded to a panel of human HCC cells with various Plectin expression levels in turns of cell growth and cell migration. The IC50 values should be determined and correlate with Plectin expression.

      Following the reviewer's suggestion, we have included graphs showing IC50 values for Huh7 (low plectin) and SNU-475 (high plectin) cells as Fig S2E. As expected, the IC50 values are higher for SNU-475 cells. Corresponding parts of the Figure legends have been changed. We refer to new data in the Results section as follows: “If not stated otherwise, we applied PST in the final concentration of 8 µM, which corresponds to the 25% of IC50 for Huh7 cells (Figure 2—figure supplement 1E).” (page 7). We also provide details of the IC50 determination in the revised Supplement Materials and methods section (pages 5-6).

      (6) One of the major issues is the mechanistic studies focusing on Plectin regulating HCC migration/metastasis, whereas the in vivo mouse studies focus on HCC formation (Figures 3 and 7). These are distinct processes and should not be mixed.

      In our study, we investigated the role of plectin in the development and dissemination of HCC. Using DEN- and Myc;sgTp53 HDTVI-induced HCC models (Figs 2A-F, S2A, 7A-C, and S7A-D), we show the effects of plectin inactivation on HCC formation in vivo. These studies are complemented by xenografts (Figs 2H and S2G) and in vitro colony formation assay (Figs 2G and S2F). Using an in vivo lung colonization assay (Figs 6G-I and S6C-F), we show the effects of plectin inactivation on the metastatic potential of HCC cells. In complementary in vitro studies, we show how plectin deficiency affects migration (Figs 5 and S5) and invasion (Figs 6A-E and S6A,B). 

      Our mechanistic studies show that plectin inactivation leads to dysregulation of cytoskeletal networks, adhesions, and adhesion-associated signaling. We believe that we have provided substantial experimental data suggesting that the proposed mechanisms play a role in plectin-mediated inhibition of both HCC development and dissemination. Of course, we cannot rule out additional, adhesionindependent mechanisms for HCC formation. To clarify this, we have revised the Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (7) Figure 7B showed that Ple KO mice were treated with PST, but the data are not presented in the manuscript. Tumor cell proliferation and apoptosis rates should be analyzed as well.

      We do not show any effects of PST in Ple<sup>ΔAlb</sup> mice. As stated in the Fig 7B legend: “Myc;sgTp53 HCC was induced in Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and PST-treated Ple<sup>fl/fl</sup> (Ple<sup>fl/fl</sup>+PST) male mice as in (A). Shown are representative images of Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and Ple<sup>fl/fl</sup>+PST livers from mice with fully developed multifocal HCC sacrificed 6 weeks post-induction.”.

      Following the reviewer's recommendation, we include the analysis of proliferation and apoptosis rates as revised Fig S7A,B. Please note, that no differences in apoptosis and proliferation rates were found between experimental conditions. Due to additional data, the original Fig S7 – 1 has been split into revised Fig S7 – 1 and Fig S7 – 2.

      (8) The status of FAK, AKT, and ERK pathway activation was not analyzed in mouse liver samples. In Figure 7D, most of the adjusted p-values are not significant.

      We are aware that the majority of FDR corrected p-values shown in the Fig 7D are not significant. In fact, we deliberated with our colleagues from the laboratory of Prof. Samuel Meier-Menches (Department of Analytical Chemistry, University of Vienna), who conducted all the proteomic studies presented in this manuscript, on whether to present such "weak" data. Following a lengthy discussion, a decision was taken to include them despite the anticipation of criticism from the reviewers. The rationale for including these data is that, despite the lack of statistical significance, the findings are consistent with those of MS/immunoblot analyses of HCC cells (Figs 3 and S3) and patient data (Figs 7E, S7-2). The lack of statistical significance observed in the presented data is a consequence of the limited number of animals included in the Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and PST-treated Ple<sup>fl/fl</sup> cohorts, which has resulted in a high degree of variability in the MS results. We agree with the reviewer that the inclusion of immunoblot analysis would provide further support for our conclusions. However, we do not have any remaining liver tissue that could be analyzed.

      (9) There is no evidence to support that PST is capable of overcoming therapy resistance in HCC. For example, no comparison with the current standard care was provided in the preclinical studies.

      We are grateful to the reviewer for bringing our attention to the incorrect statement in the Abstract: “…we show that plectin inhibitor plecstatin-1 (PST) is well-tolerated and capable of overcoming therapy resistance in HCC”. To address the reviewer's concern, we rephrased the Abstract as follows: “…we show that plectin inhibitor plecstatin-1 (PST) is well-tolerated and potently inhibits HCC progression”.

      Recommendations for the authors: 

      Reviewer 2 (Recommendations for the authors):

      (1) In Figures 6I and S6C, it would be better to show the whole slide scan result for all the groups.

      Following the reviewer's recommendation, we include the whole slide scan result for all the groups as revised Fig S6F.

      (2) In Figures S7C and D, what do the highlighted/colored dots represent? They are not mentioned in the figure legend or the results.

      Following the reviewer's recommendation, we include the explanation in the revised Figure legends (page 30).

      (3) In Figure 2H, the experiment schedule showed "6w Huh7 t.v.i.", but should it be subcutaneous injection?

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The schematics was corrected. The schematic has been corrected. We have also noticed an error in the table summarizing the number of tumors formed (N) and have corrected the values for the WT+PST and KO conditions.

      (4) Supplemental Materials and Methods, Xenograft tumorigenesis, Error: 2.5×106 Huh7 cells in 250 ml PBS mice were administered subcutaneously in the left and right hind flanks. It probably should be "250ul".

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The corresponding part of the Materials and Methods section has been corrected (page 2).

      (5) In Figure legend Supplementary Figure 6 C,D,E : "Representative magnified images from lung lobes with GFP-positive WT, KO, and WT+PST SNU-475 nodules". There is no picture for the WT+PST SNU-475 group.

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The corresponding part of the Figure legend (“WT+PST SNU-475”) has been deleted (page 27).

      (6) In the Figure legend for Figure 6H, "Representative BLI images of WT, KO, and PST-treated WT (WT+PST) SNU-475 cells-bearing mice are shown". Should it be Huh7, not SNU-475?

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The description of the cell line has been corrected (page 34).

      (7) The statement that current therapies rely on multikinase inhibitors is no longer correct.

      We are grateful to the reviewer for bringing our attention to the incorrect statement. To address the reviewer's concern, we rephrased the original part of Discussion section: “Current therapies for HCC rely on multikinase inhibitors (such as sorafenib) that provide only moderate survival benefit(60,61) due to primary resistance and the plasticity of signaling networks(62)” as follows: “Current systemic therapies for advanced HCC rely on a combination of multikinase inhibitor (such as sorafenib) or anti-VEGF /VEGF inhibitor (such as bevacizumab) treatment with immunotherapy(59). Multikinase inhibitors provide only moderate survival benefit(60,61) due to primary resistance and the plasticity of signaling networks(62), and only a subset of patients benefits from addition of immunotherapy in HCC treatment(63)” (page 15).

      References

      Adhikary, A., S. Chakraborty, M. Mazumdar, S. Ghosh, S. Mukherjee, A. Manna, S. Mohanty, K.K. Nakka, S. Joshi, A. De, S. Chattopadhyay, G. Sa, and T. Das. 2014. Inhibition of epithelial to mesenchymal transition by E-cadherin up-regulation via repression of slug transcription and inhibition of Ecadherin degradation: dual role of scaffold/matrix attachment region-binding protein 1 (SMAR1) in breast cancer cells. The Journal of biological chemistry. 289:25431-25444.

      Auersperg, N., J. Pan, B.D. Grove, T. Peterson, J. Fisher, S. Maines-Bandiera, A. Somasiri, and C.D. Roskelley. 1999. E-cadherin induces mesenchymal-to-epithelial transition in human ovarian surface epithelium. Proc Natl Acad Sci U S A. 96:6249-6254.

      Bernal, A., M. McLaughlin, A. Tiwari, F. Cigarroa, and L. Sun. 2024. Abstract 772: Investigation of gender disparity in liver tumor formation using a hydrodynamic tail vein injection mouse model. Cancer Research. 84:772-772.

      Bigsby, R.M., and A. Caperell-Grant. 2011. The role for estrogen receptor-alpha and prolactin receptor in sex-dependent DEN-induced liver tumorigenesis. Carcinogenesis. 32:1162-1166.

      Bonakdar, N., A. Schilling, M. Sporrer, P. Lennert, A. Mainka, L. Winter, G. Walko, G. Wiche, B. Fabry, and W.H. Goldmann. 2015. Determining the mechanical properties of plectin in mouse myoblasts and keratinocytes. Exp Cell Res. 331:331-337.

      Boyault, S., D.S. Rickman, A. de Reynies, C. Balabaud, S. Rebouissou, E. Jeannot, A. Herault, J. Saric, J. Belghiti, D. Franco, P. Bioulac-Sage, P. Laurent-Puig, and J. Zucman-Rossi. 2007. Transcriptome classification of HCC is related to gene alterations and to new therapeutic targets. Hepatology. 45:42-52.

      Bray, F., M. Laversanne, H. Sung, J. Ferlay, R.L. Siegel, I. Soerjomataram, and A. Jemal. 2024. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 74:229-263.

      Buckup, M., M.A. Rice, E.C. Hsu, F. Garcia-Marques, S. Liu, M. Aslan, A. Bermudez, J. Huang, S.J. Pitteri, and T. Stoyanova. 2021. Plectin is a regulator of prostate cancer growth and metastasis. Oncogene. 40:663-676.

      Burgstaller, G., M. Gregor, L. Winter, and G. Wiche. 2010. Keeping the vimentin network under control: cell-matrix adhesion-associated plectin 1f affects cell shape and polarity of fibroblasts. Mol Biol Cell. 21:3362-3375.

      Chintalgattu, V., D.M. Nair, and L.C. Katwa. 2003. Cardiac myofibroblasts: a novel source of vascular endothelial growth factor (VEGF) and its receptors Flt-1 and KDR. J Mol Cell Cardiol. 35:277-286. Cuconati, A., C. Mills, C. Goddard, X. Zhang, W. Yu, H. Guo, X. Xu, and T.M. Block. 2013. Suppression of AKT anti-apoptotic signaling by a novel drug candidate results in growth arrest and apoptosis of hepatocellular carcinoma cells. PLoS One. 8:e54595.

      Du, Y.Q., B. Yuan, Y.X. Ye, F.L. Zhou, H. Liu, J.J. Huang, and Y.F. Wei. 2024. Plumbagin Regulates Snail to Inhibit Hepatocellular Carcinoma Epithelial-Mesenchymal Transition in vivo and in vitro. J Hepatocell Carcinoma. 11:565-580.

      Fan, Z.C., J. Yan, G.D. Liu, X.Y. Tan, X.F. Weng, W.Z. Wu, J. Zhou, and X.B. Wei. 2012. Real-time monitoring of rare circulating hepatocellular carcinoma cells in an orthotopic model by in vivo flow cytometry assesses resection on metastasis. Cancer Res. 72:2683-2691.

      Ferrara, N., K. Carver-Moore, H. Chen, M. Dowd, L. Lu, K.S. O'Shea, L. Powell-Braxton, K.J. Hillan, and M.W. Moore. 1996. Heterozygous embryonic lethality induced by targeted inactivation of the VEGF gene. Nature. 380:439-442.

      Fu, Q., Q. Zhang, Y. Lou, J. Yang, G. Nie, Q. Chen, Y. Chen, J. Zhang, J. Wang, T. Wei, H. Qin, X. Dang, X. Bai, and T. Liang. 2018. Primary tumor-derived exosomes facilitate metastasis by regulating adhesion of circulating tumor cells via SMAD3 in liver cancer. Oncogene. 37:6105-6118.

      Gerber, H.P., T.H. Vu, A.M. Ryan, J. Kowalski, Z. Werb, and N. Ferrara. 1999. VEGF couples hypertrophic cartilage remodeling, ossification and angiogenesis during endochondral bone formation. Nat Med. 5:623-628.

      Gnani, D., I. Romito, S. Artuso, M. Chierici, C. De Stefanis, N. Panera, A. Crudele, S. Ceccarelli, E. Carcarino, V. D'Oria, M. Porru, E. Giorda, K. Ferrari, L. Miele, E. Villa, C. Balsano, D. Pasini, C. Furlanello, F. Locatelli, V. Nobili, R. Rota, C. Leonetti, and A. Alisi. 2017. Focal adhesion kinase depletion reduces human hepatocellular carcinoma growth by repressing enhancer of zeste homolog 2. Cell Death Differ. 24:889-902.

      Gregor, M., S. Osmanagic-Myers, G. Burgstaller, M. Wolfram, I. Fischer, G. Walko, G.P. Resch, A. Jorgl, H. Herrmann, and G. Wiche. 2014. Mechanosensing through focal adhesion-anchored intermediate filaments. FASEB J. 28:715-729.

      Hiratsuka, S., S. Goel, W.S. Kamoun, Y. Maru, D. Fukumura, D.G. Duda, and R.K. Jain. 2011. Endothelial focal adhesion kinase mediates cancer cell homing to discrete regions of the lungs via E-selectin up-regulation. Proc Natl Acad Sci U S A. 108:3725-3730.

      Jakab, M., K.H. Lee, A. Uvarovskii, S. Ovchinnikova, S.R. Kulkarni, S. Jakab, T. Rostalski, C. Spegg, S. Anders, and H.G. Augustin. 2024. Lung endothelium exploits susceptible tumor cell states to instruct metastatic latency. Nat Cancer. 5:716-730.

      Jin, H., C. Wang, G. Jin, H. Ruan, D. Gu, L. Wei, H. Wang, N. Wang, E. Arunachalam, Y. Zhang, X. Deng, C. Yang, Y. Xiong, H. Feng, M. Yao, J. Fang, J. Gu, W. Cong, and W. Qin. 2017. Regulator of Calcineurin 1 Gene Isoform 4, Down-regulated in Hepatocellular Carcinoma, Prevents Proliferation, Migration, and Invasive Activity of Cancer Cells and Metastasis of Orthotopic Tumors by Inhibiting Nuclear Translocation of NFAT1. Gastroenterology. 153:799-811 e733.

      Jirouskova, M., K. Nepomucka, G. Oyman-Eyrilmez, A. Kalendova, H. Havelkova, L. Sarnova, K. Chalupsky, B. Schuster, O. Benada, P. Miksatkova, M. Kuchar, O. Fabian, R. Sedlacek, G. Wiche, and M. Gregor. 2018. Plectin controls biliary tree architecture and stability in cholestasis. J Hepatol. 68:1006-1017.

      Katada, K., T. Tomonaga, M. Satoh, K. Matsushita, Y. Tonoike, Y. Kodera, T. Hanazawa, F. Nomura, and Y. Okamoto. 2012. Plectin promotes migration and invasion of cancer cells and is a novel prognostic marker for head and neck squamous cell carcinoma. J Proteomics. 75:1803-1815.

      Koster, J., S. van Wilpe, I. Kuikman, S.H. Litjens, and A. Sonnenberg. 2004. Role of binding of plectin to the integrin beta4 subunit in the assembly of hemidesmosomes. Mol Biol Cell. 15:1211-1223.

      Liu, H., Q. Chen, D. Lu, X. Pang, S. Yin, K. Wang, R. Wang, S. Yang, Y. Zhang, Y. Qiu, T. Wang, and H. Yu. 2020. HTBPI, an active phenanthroindolizidine alkaloid, inhibits liver tumorigenesis by targeting Akt. FASEB J. 34:12255-12268.

      Lu, H.H., S.Y. Lin, R.R. Weng, Y.H. Juan, Y.W. Chen, H.H. Hou, Z.C. Hung, G.A. Oswita, Y.J. Huang, S.Y. Guu, K.H. Khoo, J.Y. Shih, C.J. Yu, and H.C. Tsai. 2020. Fucosyltransferase 4 shapes oncogenic glycoproteome to drive metastasis of lung adenocarcinoma. EBioMedicine. 57:102846.

      Mathews, S.T., E.P. Plaisance, and T. Kim. 2009. Imaging systems for westerns: chemiluminescence vs. infrared detection. Methods in molecular biology (Clifton, N.J.). 536:499-513.

      Osmanagic-Myers, S., M. Gregor, G. Walko, G. Burgstaller, S. Reipert, and G. Wiche. 2006. Plectincontrolled keratin cytoarchitecture affects MAP kinases involved in cellular stress response and migration. J Cell Biol. 174:557-568.

      Osmanagic-Myers, S., S. Rus, M. Wolfram, D. Brunner, W.H. Goldmann, N. Bonakdar, I. Fischer, S. Reipert, A. Zuzuarregui, G. Walko, and G. Wiche. 2015. Plectin reinforces vascular integrity by mediating crosstalk between the vimentin and the actin networks. J Cell Sci. 128:4138-4150.

      Pillai-Kastoori, L., A.R. Schutz-Geschwender, and J.A. Harford. 2020. A systematic approach to quantitative Western blot analysis. Analytical biochemistry. 593:113608.

      Pitot, H.C. 2001. Pathways of progression in hepatocarcinogenesis. Lancet (London, England). 358:859860.

      Prechova, M., Z. Adamova, A.L. Schweizer, M. Maninova, A. Bauer, D. Kah, S.M. Meier-Menches, G. Wiche, B. Fabry, and M. Gregor. 2022. Plectin-mediated cytoskeletal crosstalk controls cell tension and cohesion in epithelial sheets. J Cell Biol. 221.

      Prechova, M., K. Korelova, and M. Gregor. 2023. Plectin. Curr Biol. 33:R128-R130.

      Qi, L., T. Knifley, M. Chen, and K.L. O'Connor. 2022. Integrin alpha6beta4 requires plectin and vimentin for adhesion complex distribution and invasive growth. J Cell Sci. 135.

      Romito, I., M. Porru, M.R. Braghini, L. Pompili, N. Panera, A. Crudele, D. Gnani, C. De Stefanis, M. Scarsella, S. Pomella, S. Levi Mortera, E. de Billy, A.L. Conti, V. Marzano, L. Putignani, M. Vinciguerra, C. Balsano, A. Pastore, R. Rota, M. Tartaglia, C. Leonetti, and A. Alisi. 2021. Focal adhesion kinase inhibitor TAE226 combined with Sorafenib slows down hepatocellular carcinoma by multiple epigenetic effects. J Exp Clin Cancer Res. 40:364.

      Si, T., L. Huang, T. Liang, P. Huang, H. Zhang, M. Zhang, and X. Zhou. 2023. Ruangan Lidan decoction inhibits the growth and metastasis of liver cancer by downregulating miR-9-5p and upregulating PDK4. Cancer Biol Ther. 24:2246198.

      Strumberg, D., J.W. Clark, A. Awada, M.J. Moore, H. Richly, A. Hendlisz, H.W. Hirte, J.P. Eder, H.J. Lenz, and B. Schwartz. 2007. Safety, pharmacokinetics, and preliminary antitumor activity of sorafenib: a review of four phase I trials in patients with advanced refractory solid tumors. Oncologist. 12:426-437.

      Tao, Q.F., S.X. Yuan, F. Yang, S. Yang, Y. Yang, J.H. Yuan, Z.G. Wang, Q.G. Xu, K.Y. Lin, J. Cai, J. Yu, W.L. Huang, X.L. Teng, C.C. Zhou, F. Wang, S.H. Sun, and W.P. Zhou. 2015. Aldolase B inhibits metastasis through Ten-Eleven Translocation 1 and serves as a prognostic biomarker in hepatocellular carcinoma. Mol Cancer. 14:170.

      Vahidnezhad, H., L. Youssefian, N. Harvey, A.R. Tavasoli, A.H. Saeidian, S. Sotoudeh, A. Varghaei, H. Mahmoudi, P. Mansouri, N. Mozafari, O. Zargari, S. Zeinali, and J. Uitto. 2022. Mutation update: The spectra of PLEC sequence variants and related plectinopathies. Human mutation. 43:17061731.

      Voisin, L., M. Lapouge, M.K. Saba-El-Leil, M. Gombos, J. Javary, V.Q. Trinh, and S. Meloche. 2024. Syngeneic mouse model of YES-driven metastatic and proliferative hepatocellular carcinoma. Dis Model Mech. 17.

      Wang, D.D., Y. Chen, Z.B. Chen, F.J. Yan, X.Y. Dai, M.D. Ying, J. Cao, J. Ma, P.H. Luo, Y.X. Han, Y. Peng, Y.H. Sun, H. Zhang, Q.J. He, B. Yang, and H. Zhu. 2016. CT-707, a Novel FAK Inhibitor, Synergizes with Cabozantinib to Suppress Hepatocellular Carcinoma by Blocking Cabozantinib-Induced FAK Activation. Mol Cancer Ther. 15:2916-2925.

      Wang, W., A. Zuidema, L. Te Molder, L. Nahidiazar, L. Hoekman, T. Schmidt, S. Coppola, and A. Sonnenberg. 2020. Hemidesmosomes modulate force generation via focal adhesions. J Cell Biol. 219.

      Wendt, M.K., M.A. Taylor, B.J. Schiemann, and W.P. Schiemann. 2011. Down-regulation of epithelial cadherin is required to initiate metastatic outgrowth of breast cancer. Mol Biol Cell. 22:24232435.

      Wenta, T., A. Schmidt, Q. Zhang, R. Devarajan, P. Singh, X. Yang, A. Ahtikoski, M. Vaarala, G.H. Wei, and A. Manninen. 2022. Disassembly of alpha6beta4-mediated hemidesmosomal adhesions promotes tumorigenesis in PTEN-negative prostate cancer by targeting plectin to focal adhesions. Oncogene. 41:3804-3820.

      Wilhelm, S., C. Carter, M. Lynch, T. Lowinger, J. Dumas, R.A. Smith, B. Schwartz, R. Simantov, and S. Kelley. 2006. Discovery and development of sorafenib: a multikinase inhibitor for treating cancer. Nat Rev Drug Discov. 5:835-844.

      Wilhelm, S.M., C. Carter, L. Tang, D. Wilkie, A. McNabola, H. Rong, C. Chen, X. Zhang, P. Vincent, M. McHugh, Y. Cao, J. Shujath, S. Gawlak, D. Eveleigh, B. Rowley, L. Liu, L. Adnane, M. Lynch, D. Auclair, I. Taylor, R. Gedrich, A. Voznesensky, B. Riedl, L.E. Post, G. Bollag, and P.A. Trail. 2004. BAY 43-9006 exhibits broad spectrum oral antitumor activity and targets the RAF/MEK/ERK pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis. Cancer Res. 64:7099-7109.

      Xu, R., S. He, D. Ma, R. Liang, Q. Luo, and G. Song. 2022. Plectin Downregulation Inhibits Migration and Suppresses Epithelial Mesenchymal Transformation of Hepatocellular Carcinoma Cells via ERK1/2 Signaling. Int J Mol Sci. 24.

      You, A., M. Cao, Z. Guo, B. Zuo, J. Gao, H. Zhou, H. Li, Y. Cui, F. Fang, W. Zhang, T. Song, Q. Li, X. Zhu, H. Yin, H. Sun, and T. Zhang. 2016. Metformin sensitizes sorafenib to inhibit postoperative recurrence and metastasis of hepatocellular carcinoma in orthotopic mouse models. J Hematol Oncol. 9:20.

      Yuzugullu, H., K. Benhaj, N. Ozturk, S. Senturk, E. Celik, A. Toylu, N. Tasdemir, M. Yilmaz, E. Erdal, K.C. Akcali, N. Atabey, and M. Ozturk. 2009a. Canonical Wnt signaling is antagonized by noncanonical Wnt5a in hepatocellular carcinoma cells. Molecular Cancer. 8:90.

      Yuzugullu, H., K. Benhaj, N. Ozturk, S. Senturk, E. Celik, A. Toylu, N. Tasdemir, M. Yilmaz, E. Erdal, K.C. Akcali, N. Atabey, and M. Ozturk. 2009b. Canonical Wnt signaling is antagonized by noncanonical Wnt5a in hepatocellular carcinoma cells. Mol Cancer. 8:90.

      Zhao, J., Y. Hou, C. Yin, J. Hu, T. Gao, X. Huang, X. Zhang, J. Xing, J. An, S. Wan, and J. Li. 2020. Upregulation of histamine receptor H1 promotes tumor progression and contributes to poor prognosis in hepatocellular carcinoma. Oncogene. 39:1724-1738.

      Zheng, H., Y. Yang, C. Ye, P.P. Li, Z.G. Wang, H. Xing, H. Ren, and W.P. Zhou. 2018. Lamp2 inhibits epithelial-mesenchymal transition by suppressing Snail expression in HCC. Oncotarget. 9:3024030252.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of why diabetes is a risk factor for more severe Covid-19 disease. The authors offer solid evidence that cathepsin L is more active in diabetic individuals, that this higher activity is recapitulated at the cellular level in the presence of high glucose, and that high glucose leads to higher cathepsin L maturation. While not all aspects of the relationship between diabetes and cathepsin L (e.g., effects of metabolic acidosis) have been investigated, the work should be of interest to researchers in diabetes, virology, and immunology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by He et al. investigates the relationship of an increased susceptibility of diabetes patients to COVID-19. The paper raises the possibility that hyperglycemia-induced cathepsin L maturation could be one of the driving forces in this pathology, suggesting that an increased activity of CTSL leads to accelerated virus infection rates due to an elevated processing of the SARS-CoV-2 spike protein.

      In a clinical case-control study, the team found that the severity of corona infections was higher in diabetic patients, and their CTSL levels correlated well with the progression of the disease. They further showed an increase in CTSL activity in the long term as well as acute hyperglycemia. SARS-CoV-2 increasingly infected cells that were cultured in serum from diabetic patients, the same was observed using high glucose medium. No effect was observed in the medium with increased concentrations of insulin. CTSL knockout abolished the glucose-dependent increase in infection.

      Increased glucose levels did not correlate with an increase in CTSL transcription. Rather He et al. could show that high glucose levels led to CTSL translocation from the ER into the lysosome. It was the glucose-dependent processing of the protease to its active form which promoted infection.

      Strengths:

      It is a complete study starting from a clinical observation and ending on the molecular mechanism. A strength is certainly the wide selection of experiments. The clinical study to investigate the effect of glucose on CTSL concentrations in healthy individuals sets the stage for experiments in cell culture, animal models, and human tissue. The effect of CTSL knockout cell lines on glucose-induced SARS-CoV2 infection rates is convincing. Finally, the team used a combination of Western blots and confocal microscopy to identify the underlying molecular mechanisms. The authors manage to keep the diabetic condition at the center of their study and therefore extend on previous knowledge of glucose-induced CTSL activation and their consequences for COVID-19 infections. By doing so, they create a novel connection between CTSL involvement in SARS-CoV2 infections and diabetes.

      Weaknesses:

      (1) The authors suggest that hyperglycemia as a symptom of diabetes leads to an increased infection rate in those patients. Throughout their study, the team focuses on two select symptoms of a diabetic condition, hyperglycemia and hyperinsulinemia. The team acknowledges in the discussion that there could be various other reasons. Hyperglycemia can lead to metabolic acidosis and a shift in blood pH. As CTSL activity is highly dependent on pH, it would have been crucial to include this parameter in the study.

      We sincerely appreciate your valuable comment. We agree that hyperglycemia can lead to metabolic acidosis and alter blood pH. However, the normal range for blood pH in humans is relatively narrow, typically ranging from 7.35 to 7.45. In our study, we ensured that blood pH remained within this normal range for both diabetic and healthy control samples. To address your concern, we conducted experiments to investigate CTSL activity in response to pH fluctuations within this physiological range. The updated Fig. 4a now presents these findings, demonstrating consistent CTSL activity despite pH variations. Statistical analysis was performed using one-way ANOVA with Tukey’s post hoc test to ensure robustness. We have also amended the figure legend and provided corresponding descriptions in the final edition manuscript (line 15-18, page 7).

      Author response image 1.

      (2) The study rarely differentiates between cellular and extracellular CTSL activity. A more detailed explanation for the connection between the intracellular CTSL and serum CTSL in diabetic individuals, presumably via lysosomal exocytosis, could be helpful with regard to the final model to give a more complete picture.

      Thank you for your insightful comments. Previous studies have elucidated the process by which lysosomal CTSL is transported via vesicles and subsequently secreted from the cell membrane through exocytosis (references 1-5). To provide a more comprehensive understanding, we have incorporated this information on Fig. 6h, page 32 of the final edition manuscript. This addition aims to enhance clarity regarding the connection between intracellular and serum CTSL activity in diabetic individuals, particularly through lysosomal exocytosis.

      Author response image 2.

      References:

      (1) Reddy A et al. Plasma membrane repair is mediated by Ca(2+)-regulated exocytosis of lysosomes. Cell. 2001 Jul 27;106(2):157-69. doi: 10.1016/s0092-8674(01)00421-4. PMID: 11511344.

      (2) Hasanagic M et al. Different Pathways to the Lysosome: Sorting out Alternatives. Int Rev Cell Mol Biol. 2015;320:75-101. doi: 10.1016/bs.ircmb.2015.07.008. Epub 2015 Aug 19. PMID: 26614872.

      (3) Reiser J et al. Specialized roles for cysteine cathepsins in health and disease. J Clin Invest. 2010 Oct;120(10):3421-31. doi: 10.1172/JCI42918. Epub 2010 Oct 1. PMID: 20921628; PMCID: PMC2947230.

      (4) Jaiswal JK et al. Membrane proximal lysosomes are the major vesicles responsible for calcium-dependent exocytosis in nonsecretory cells. J Cell Biol. 2002 Nov 25;159(4):625-35. doi: 10.1083/jcb.200208154. Epub 2002 Nov 18. PMID: 12438417; PMCID: PMC2173094.

      (5) Coutinho MF et al. Mannose-6-phosphate pathway: a review on its role in lysosomal function and dysfunction. Mol Genet Metab. 2012 Apr;105(4):542-50. doi: 10.1016/j.ymgme.2011.12.012. Epub 2011 Dec 23. PMID: 22266136.

      (3) In the early result section, an effect of hyperglycemia on total CTSL concentrations is described, but the data is not very convincing. Over the course of the manuscript, the hypothesis shifts increasingly towards an increase in protease trans-localization and processing to the active form rather than a change in total protease amounts. The overall importance of CTSL concentrations remains questionable.

      Thank you for your insightful feedback. We have addressed your concerns regarding the impact of hyperglycemia on CTSL concentrations. Fig. 2h-j illustrate the effect of acute hyperglycemia on both CTSL concentration and activity in 15 healthy male volunteers over a 160-minute period. During this short timeframe, CTSL concentration remained stable, as evidenced by consistent RNA results from cells exposed to varying glucose levels (Supplementary Fig.1). However, there was a significant increase in CTSL activity, indicating that glucose elevation rapidly triggers CTSL maturation through propeptide cleavage. This activation process occurs more rapidly than CTSL protein synthesis. In summary, acute hyperglycemia specifically elevates CTSL activity, while chronic hyperglycemia may impact both CTSL activity and concentration (Fig. 2a-d). Additionally, Tournu C, et al. (1998) (reference 1) and Shi Q, et al. (2018) (reference 2) have reported that increased glucose metabolism promotes the maturation and secretion of CTSL and other proteases. These findings align with our evidence that hyperglycemia drives CTSL maturation, as discussed at line 10-25, page 12 in the final edition manuscript.

      References:

      (1) Tournu C et al. Glucose controls cathepsin expression in Ras-transformed fibroblasts. Arch Biochem Biophys. 1998 Dec 1;360(1):15-24. doi: 10.1006/abbi.1998.0916. PMID: 9826424.

      (2) Shi Q et al. Increased glucose metabolism in TAMs fuels O-GlcNAcylation of lysosomal Cathepsin B to promote cancer metastasis and chemoresistance. Cancer Cell. 2022 Oct 10;40(10):1207-1222.e10. doi: 10.1016/j.ccell.2022.08.012. Epub 2022 Sep 8. PMID: 36084651.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors hypothesized that individuals with diabetes have elevated blood CTSL levels, which facilitates SARS-CoV-2 infection. The authors conducted in vitro experiments, revealing that elevated glucose levels promote SARS-CoV-2 infection in wild-type cells. In contrast, CTSL knockout cells show reduced susceptibility to high glucose-promoted effects. Additionally, the authors utilized lung tissue samples obtained from both diabetic and non-diabetic patients, along with db/db diabetic and control mice. Their findings indicate that diabetic conditions lead to an elevation in CTSL activity in both humans and mice.

      Strengths:

      The authors have effectively met their research objectives, and their conclusions are supported by the data presented. Their findings suggest that high glucose levels promote CTSL maturation and translocation from the endoplasmic reticulum to the lysosome, potentially contributing to diabetic comorbidities and complications.

      Weaknesses:

      (1) In Figure 1e, the authors measured plasma levels of COVID-19 related proteins, including ACE2, CTSL, and CTSB, in both diabetic and non-diabetic COVID-19 patients. Notably, only CTSL levels exhibited a significant increase in diabetic patients compared to non-diabetic patients, and these levels varied throughout the course of COVID-19. Given that the diabetes groups encompass both male and female patients, it is essential to ascertain whether the authors considered the potential impact of gender on CTSL levels. The diabetes groups comprised a higher percentage of male patients (61.3%) compared to the non-diabetes group, where males constituted only 38.7%.

      Thank you for your insightful feedback. In response to your concerns regarding the potential impact of gender on CTSL levels in diabetic and non-diabetic COVID-19 patients, we conducted analyses to address this issue. While our initial study involved 62 COVID-19 patients, with 31 having diabetes and 31 without, matching based on gender and age, we acknowledged the challenge of obtaining balanced gender distribution in both groups due to the difficulty of collecting blood samples from COVID-19 patients. To mitigate potential gender bias resulting from small sample sizes, we conducted a supplementary clinical study involving 122 non-COVID-19 volunteers, including 61 individuals with diabetes and 61 without. The percentage of males in the diabetes group was 50.8%, while in the healthy group, males constituted 44.3% (P value = 0.468), indicating no significant gender bias. We have incorporated this information into the discussion section on line 4-13, page 11 in the final edition manuscript, to provide clarity on this aspect of our study.

      (2) Lines 145-149: "The results showed that WT Huh7 cell cultured in high glucose medium exhibited a much higher infective rate than those in low glucose medium. However, CTSL KO Huh7 cells maintained a low infective rate of SARS-CoV-2 regardless of glucose or insulin levels (Fig. 3f-h). Therefore, hyperglycemia enhanced SARS-CoV-2 infection dependent on CTSL." However, this evidence may be insufficient to support the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. The human hepatoma cell line Huh7 might not be an ideal model to validate the authors' hypothesis regarding high blood glucose promoting SARS-CoV-2 infection through CTSL.

      Thank you for your valuable feedback. We have addressed the concerns regarding the sufficiency of evidence supporting the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. Specifically, we have revised the expression to state, “Therefore, hyperglycemia enhanced SARS-CoV-2 infection through CTSL.” as suggested, in line 9, page 7 in the final edition manuscript. Additionally, we acknowledge the potential involvement of other bioactive factors, such as 1,5-anhydro-D-glucitol (1,5-AG), in mediating SARS-CoV-2 infection in patients with diabetes, as outlined in the discussion section from line 13-21, page 13 in the final edition manuscript.

      Regarding the choice of the human hepatoma cell line Huh7 as a model for investigating hyperglycemia-induced CTSL maturation and SARS-CoV-2 infection, we recognize the importance of tissue specificity and the liver’s significance as a target organ for COVID-19. Despite potential limitations, such as generalization of liver function abnormalities and lack of tissue specificity in SARS-CoV-2 impact, Huh7 cells offer practical advantages as a mature cell model for studying SARS-CoV-2 infection, including accessibility, susceptibility to infection, and stable proliferation (reference 1-3). We have elaborated on these considerations in the discussion section at line 19-23, page 11 in the final edition manuscript, to provide context for our choice of experimental model.

      References:

      (1) Gupta A et al. Extrapulmonary manifestations of COVID-19. Nat Med. 2020 Jul;26(7):1017-1032. doi: 10.1038/s41591-020-0968-3. Epub 2020 Jul 10. PMID: 32651579.

      (2) Nie X et al. Multi-organ proteomic landscape of COVID-19 autopsies. Cell. 2021 Feb 4;184(3):775-791.e14. doi: 10.1016/j.cell.2021.01.004. Epub 2021 Jan 9. PMID: 33503446; PMCID: PMC7794601.

      (3) Ciotti M et al. The COVID-19 pandemic. Crit Rev Clin Lab Sci. 2020 Sep;57(6):365-388. doi: 10.1080/10408363.2020.1783198. Epub 2020 Jul 9. PMID: 32645276.

      (3) The Abstract and Introduction sections lack effective organization.

      Thank you for your valuable comments. We have rewritten the Abstract and Introduction sections and incorporated the updated descriptions in the final edition manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) When referring to diabetes, does this exclusively include diabetes type 2?

      Thank you for your inquiry. In our study, the term “diabetes” encompasses the condition of hyperglycemia in a broad sense, rather than specifically indicating type 1 diabetes (T1DM) or type 2 diabetes (T2DM). This broader definition aligns with the scope of our research objectives and findings, particularly observed in the cell experiments conducted. We have clarified this point in the revised discussion section, from line 6-9, page 12 in the final edition manuscript, to provide additional context for readers.

      (2) The titles of the individual paragraphs are not very strong and descriptive. More precise titles help to structure the paper better for the reader.

      Thank you for your valuable comments. We have rewritten the title of each section to make it more precise for readers and incorporated the updated descriptions in the manuscript.

      (3) Fig.3c, adding a 0 nM insulin control would be nice.

      Thank you for your suggestion. We have revised Fig.3c according to your advice. The revised figure was located at page 29 in the final edition manuscript. The corresponding figure legend has also been revised.

      Author response image 3.

      (4) Fig.3e non-infection control would be nice.

      Thank you for your suggestion. We have incorporated your feedback by adding a non-infection control in Fig. 3e. In this revised figure, we included a measurement of SARS-CoV-2 pseudovirus infection assessed through the fluorescence captured by a reader. Cells infected by the pseudovirus exhibited activation of the firefly luciferase, resulting in the release of fluorescence. Conversely, non-infected control cells showed no fluorescence, with the reader recording a value of zero. The updated figure can now be found on page 29 in the final edition manuscript, and we have adjusted the corresponding figure legend accordingly.

      Author response image 4.

      (5) In Figure 5, the processing of CTSL in cells (b-c) strongly differs from processing in tissue (d-e) focusing on amounts of dc-mCTSL. Do you have an explanation for this? Overall, blots are hard to judge by eye and it would be nice to include blots with shorter exposure.

      Thank you for your insightful feedback. The differences observed in the processing of CTSL between cells (Fig. 5b) and tissues (Fig. 5d-e) may be attributed to the complexities inherent in tissue samples, which can impact the clarity of the images. Furthermore, in human tissue samples, it is pertinent to consider that patients in the diabetes group had their blood glucose levels controlled within or near the normal range prior to lung surgery. As a result, the evidence supporting CTSL maturation in human lung tissue blotting images may be less compelling. We have addressed this aspect in the revised results section (lines 10-13, page 9). Additionally, we will consider including blots with shorter exposure to enhance visual clarity in future studies.

      (6) Considering Fig2B and Figure S1, the evidence of an effect of hyperglycemia or high glucose medium on total CTSL protein concentration is not very strong. In my opinion, this claim in the results section for Fig2 should be revisited.

      Thank you for your valuable suggestion. We have revisited the section in question and made appropriate revisions. The original sentence has been modified to accurately reflect the findings: "We found that plasma CTSL activity was strongly positively correlated with chronic hyperglycemia indicated by HbA1c and was significantly higher in diabetic patients than in euglycemic individuals (Fig. 2a, c). Additionally, plasma CTSL concentration showed a positive trend with chronic hyperglycemia indicated by HbA1c (Fig. 2b, d)". These changes have been incorporated into the revised results section (lines 12-16, page 5).

      (7) Overall, data hinting to increased CTSL activity is stronger than protein amount. This being said, in hyperglycemia, blood pH can be affected (metabolic acidosis). As CTSL has higher activity at low pH, could the increase in activity be caused by a drop in pH? Can you include this aspect in your manuscript? For example, is there a pH difference in serum of nondiabetic vs diabetic patients?

      Thank you for your valuable input. We have already addressed the potential impact of pH changes on CTSL activity in our response to Weakness No. 1. As indicated, although hyperglycemia can lead to metabolic acidosis and changes in blood pH, the pH levels observed in our study remained within the normal range (7.35 to 7.45). Therefore, we conducted experiments to investigate CTSL activity in response to changes in pH, which showed consistent activity levels within this range. This information has been included in our revised manuscript (line 15-18, page 7).

      Reviewer #2 (Recommendations For The Authors):

      (1) The Abstract and Introduction sections lack effective organization. The manuscript's style resembles that of Cell Journal rather than aligning with the customary format of eLife.

      Thank you for your valuable comments. The Abstract and Introduction sections have been reorganized to be more precise for readers has been included in our revised manuscript. Additionally, we have meticulously updated the manuscript's style to align with the standard format of eLife in our revised manuscript, especially key resources table of materials and methods sections.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable manuscript attempts to identify the brain regions and cell types involved in habituation to dark flash stimuli in larval zebrafish. Habituation being a form of learning widespread in the animal kingdom, the investigation of neural mechanisms underlying it is an important endeavor. The authors use a combination of behavioral analysis, neural activity imaging, and pharmacological manipulation to investigate brain-wide mechanisms of habituation. However, the data presented are incomplete and do not show a convincing causative link between pharmacological manipulations, neural activity patterns, and behavioral outcomes.

      We thank the reviewers and editors for their careful reading and reviews of our work. We are grateful that they appreciate the value in our experimental approach and results. We acknowledge what we interpret as the major criticism, that in our original manuscript we focused too heavily on the hypothesized role of GABAergic neurons in driving habituation. This hypothesis will remain only indirectly supported until we can identify a GABAergic population of neurons that drives habituation. Therefore, we have revised our manuscript, decreasing the focus on GABA, and rather emphasizing the following three points:

      1) By performing the first Ca2+ imaging experiments during dark flash habituation, we identify multiple distinct functional classes of neurons which have different adaptation profiles, including non-adapting and potentiating classes. These neurons are spread throughout the brain, indicating that habituation is a complex and distributed process.

      2) By performing a pharmacological screen for dark flash habituation modifiers, we confirm habituation behaviour manifests from multiple distinct molecular mechanisms that independently modulate different behavioural outputs. We also implicate multiple novel pathways in habituation plasticity, some of which we have validated through dose-response studies.

      3) By combining pharmacology and Ca2+ imaging, we did not observe a simple relationship between the behavioural effects of a drug treatment and functional alterations in neurons. This observation further supports our model that habituation is a multidimensional process, for which a simple circuit model will be insufficient.

      We would like to point out that, in our opinion, there appears to be a factual error in the final sentence of the eLife assessment:

      “However, the data presented are incomplete and do not show a convincing causative link between pharmacological manipulations, neural activity patterns, and behavioral outcomes”.

      We believe that a “convincing causative link” between pharmacological manipulations and behavioural outcomes has been clearly demonstrated for PTX, Melatonin, Estradiol and Hexestrol through our dose response experiments. Similarly a link between pharmacology and neural activity patterns has also been directly demonstrated. As mentioned in (3), we acknowledge that our data linking neural activity and behaviour is more tenuous, as will be more explicitly reflected in our revised manuscript.

      Nevertheless, we maintain that one of the primary strengths of our study is our attempt to integrate analyses that span the behavioural, pharmacological, and neural activity-levels.

      In our revised manuscript, we have substantially altered the Abstract and Discussion, removed the Model figure (previously Figure 8), and changed the title from :

      “Inhibition drives habituation of a larval zebrafish visual response”

      to:

      “Functional and pharmacological analyses of visual habituation learning in larval zebrafish”

      Text changes from the initial version are visible as track changes in the word document: “LamireEtAl_2022_eLifeRevisions.docx”

      Reviewer #1 (Public Review):

      This manuscript addresses the important and understudied issue of circuit-level mechanisms supporting habituation, particularly in pursuit of the possible role of increases in the activity of inhibitory neurons in suppressing behavioral output during long-term habituation. The authors make use of many of the striking advantages of the larval zebrafish to perform whole brain, single neuronal calcium imaging during repeated sensory exposure, and high throughput screening of pharmacological agents in freely moving, habituating larvae. Notably, several blockers/antagonists of GABAA(C) receptors completely suppress habituation of the O-bend escape response to dark flashes, suggesting a key role for GABAergic transmission in this form of habituation. Other substances are identified that strikingly enhance habituation, including melatonin, although here the suggested mechanistic insight is less specific. To add to these findings, a number of functional clusters of neurons are identified in the larval brain that has divergent activity through habituation, with many clusters exhibiting suppression of different degrees, in line with adaptive filtration during habituation, and a single cluster that potentiates during habituation. Further assessment reveals that all of these clusters include GABAergic inhibitory neurons and excitatory neurons, so we cannot take away the simple interpretation that the potentiating cluster of neurons is inhibitory and therefore exerts an influence on the other adapting (depressing) clusters to produce habituation. Rather, a variety of interpretations remain in play.

      Overall, there is great potential in the approach that has been used here to gain insight into circuit-level mechanisms of habituation. There are many experiments performed by the authors that cannot be achieved currently in other vertebrate systems, so the manuscript serves as a potential methodological platform that can be used to support a rich array of future work. While there are several key observations that one can take away from this manuscript, a clear interpretation of the role of GABAergic inhibitory neurons in habituation has not been established. This potential feature of habituation is emphasized throughout, particularly in the introduction and discussion sections, meaning that one is obliged as a reader to interrogate whether the results as they currently stand really do demonstrate a role for GABAergic inhibition in habituation. Currently, the key piece of evidence that may support this conclusion is that picrotoxin, which acts to block some classes of GABA receptors, prevents habituation. However, there are interpretations of this finding that do not specifically require a role for modified GABAergic inhibition. For instance, by lowering GABAergic inhibition, an overall increase in neural activity will occur within the brain, in this case below a level that could cause a seizure. That increase in activity may simply prevent learning by massively increasing neural noise and therefore either preventing synaptic plasticity or, more likely, causing indiscriminate synaptic strengthening and weakening that occludes information storage. Sensory processing itself could also be disrupted, for instance by altering the selectivity of receptive fields. Alternatively, it could be that the increase in neural activity produced by the blockade of inhibition simply drives more behavioral output, meaning that more excitatory synaptic adaptation is required to suppress that output. The authors propose two specific working models of the ways in which GABAergic inhibition could be implemented in habituation. An alternative model, in which GABAergic neurons are not themselves modified but act as a key intermediary between Hebbian assemblies of excitatory neurons that are modified to support memory and output neurons, is not explored. As yet, these or other models in which inhibition is not required for habituation, have not been fully tested.

      This manuscript describes a really substantial body of work that provides evidence of functional clusters of neurons with divergent responses to repeated sensory input and an array of pharmacological agents that can influence the rate of a fundamentally important form of learning.

      We thank the reviewer for their careful consideration of our work, and we agree that multiple models of how habituation occurs remain plausible. As discussed above and below in more detail, we have revised our manuscript to better reflect this. We hope the reviewer will agree that this has improved the manuscript.

      Reviewer #2 (Public Review):

      In this study, Lamire et al. use a calcium imaging approach, behavioural tests, and pharmacological manipulations to identify the molecular mechanisms behind visual habituation. Overall, the manuscript is well-written but difficult to follow at times. They show a valuable new drug screen paradigm to assess the impact of pharmacological compounds on the behaviour of larval zebrafish, the results are convincing, but the description of the work is sometimes confusing and lacking details.

      We thank the reviewer for identifying areas where our description lacked details. We apologize for these omissions and have attempted to add relevant details as described below. We note that all of the analysis code is available online, though we appreciate that navigating and extracting data from these files is not straightforward.

      The volumetric calcium imaging of habituation to dark flashes is valuable, but the mix of responses to visual cues that are not relevant to the dark flash escape, such as the slow increase back to baseline luminosity, lowers the clarity of the results. The link between the calcium imaging results and free-swimming behaviour is not especially convincing, however, that is a common issue of head-restrained imaging with larval zebrafish.

      We agree with the reviewer that the design of our stimulus, and specifically the slow increase back to baseline luminosity, is perhaps confusing for the interpretation of some of the response profiles of neurons. We originally chose this stimulus type (rather than a square wave of 1s of darkness, for example) in order to better highlight the responses of the larvae to the onset of darkness (rather than the response to abruptly returning to full brightness). We therefore believe that the slow return to baseline is an important feature of the stimulus,, which better separates activity related to the fast offset from activity related to light onset. And since all of the foundational behavioural data (Randlett et al., Current Biology 2019), and pharmacological data, used this stimulus type, we did not change it for the Ca2+ imaging experiments. Our use of relatively slow nuclear-targeted GCaMP indicators also means that the temporal resolution of our imaging experiments is relatively poor, and therefore we felt that using a stimulus that highlighted light offset might be best.

      We also fully acknowledge in the Results section that the behaviour of the head embedded fish is not the same as that of free-swimming fish, and that therefore establishing a direct link between these types of experiments is complicated. This is an unavoidable caveat in the head-embedded style experiments. To further emphasize this, we have also added a paragraph to the discussion where this is acknowledged explicitly.

      “We also found that the same pharmacological treatments that result in strong alterations to habituation behaviour in freely swimming larvae ([fig:5]), resulted in relatively subtle and complex functional alterations in the circuit ([fig:6]). Making direct comparisons between freely-swimming behaviour and head-fixed Ca2+ imaging is always challenging due to the differences in behaviour observed in the two contexts, and therefore our failure to identify a clear logic in these experiments may have technical explanations that will require approaches to measure neural activity from unrestrained and freely-behaving animals to resolve . Alternatively, these results are again consistent with the idea that habituation is a multidimensional and perhaps highly non-linear phenomenon in the circuit, which cannot be captured by a simple model.”

      The strong focus on GABA seems unwarranted based on the pharmacological results, as only Picrotoxinin gives clear results, but the other antagonists do not give a consistent results. On the other hand, the melatonin receptor agonists, and oestrogen receptor agonists give more consistent results, including more convincing dose effects.

      We agree that our manuscript focused too strongly on GABA and have toned this down. We are currently performing genetic experiments aimed at identifying the Melatonin, Estrogen and GABA receptors that function during habituation, which we think will be necessary to move beyond pharmacology and the necessary caveats that such experiments bring.

      The pharmacological manipulation of the habituation circuits mapped in the first part does not arrive at any satisfying conclusion, which is acknowledged by the authors. These results do reinforce the disconnect between the calcium imaging and the behavioural experiments and undercut somewhat the proposed circuit-level model.

      We agree with this criticism and have toned down the focus on GABA specifically in the circuit, and have removed the speculative model previously in Figure 8.

      Overall, the authors did identify interesting new molecular pathways that may be involved in habituation to dark flashes. Their screening approach, while not novel, will be a powerful way to interrogate other behavioural profiles. The authors identified circuit loci apparently involved in habituation to dark flashes, and the potentiation and no adaptation clusters have not been previously observed as far as I know.

      The data will be useful to guide follow-up experiments by the community on the new pathway candidates that this screen has uncovered, including behaviours beyond dark flash habituation.

      We again thank the reviewer for both their support of our approach, and in pointing out where our conclusions were not well supported by our data.

      Reviewer #3 (Public Review):

      To analyze the circuit mechanisms leading to the habituation of the O-bed responses upon repeated dark flashes (DFs), the authors performed 2-photon Ca2+ imaging in larvae expressing nuclear-targeted GCaMP7f pan-neuronally panning the majority of the midbrain, hindbrain, pretectum, and thalamus. They found that while the majority of neurons across the brain depress their responsiveness during habituation, a smaller population of neurons in the dorsal regions of the brain, including the torus longitudinalis, cerebellum, and dorsal hindbrain, showed the opposite pattern, suggesting that motor-related brain regions contain non-depressed signals, and therefore likely contribute to habituation plasticity.

      Further analysis using affinity propagation clustering identified 12 clusters that differed both in their adaptation to repeated DFs, as well as the shape of their response to the DF.

      Next by the pharmacological screening of 1953 small molecule compounds with known targets in conjunction with the high-throughput assay, they found that 176 compounds significantly altered some aspects of measured behavior. Among them, they sought to identify the compounds that 1) have minimal effects on the naive response to DFs, but strong effects during the training and/or memory retention periods, 2) have minimal effects on other aspects of behaviors, 3) show similar behavioral effects to other compounds tested in the same molecular pathway, and identified the GABAA/C Receptor antagonists Bicuculline, Amoxapine, and Picrotoxinin (PTX). As partial antagonism of GABAAR and/or GABACR is sufficient to strongly suppress habituation but not generalized behavioral excitability, they concluded that GABA plays a very prominent role in habituation. They also identified multiple agonists of both Melatonin and Estrogen receptors, indicating that hormonal signaling may also play a prominent role in habituation response.

      To integrate the results of the Ca2+ imaging experiments with the pharmacological screening results, the authors compared the Ca2+ activity patterns after treatment with vehicle, PTX, or Melatonin in the tethered larvae. The behavioral effects of PTX and Melatonin were much smaller compared with the very strong behavioral effects in freely-swimming animals, but the authors assumed that the difference was significant enough to continue further experiments. Based on the hypothesis that Melatonin and GABA cooperate during habituation, they expected PTX and Melatonin to have opposite effects. This was not the case in their results: for example, the size of the 12(Pot, M) neuron population was increased by both PTX and Melatonin, suggesting that pharmacological manipulations that affect habituation behavior manifest in complex functional alterations in the circuit, making capturing these effects by a simple difficult.

      Since the 12(𝑃𝑜𝑡, 𝑀) neurons potentiate their responses and thus could act to progressively depress the responses of other neuronal classes, they examined the identity of these neurons with GABA neurons. However, GABAergic neurons in the habituating circuit are not characterized by their Adaptation Profile, suggesting that global manipulations of GABAergic signaling through PTX have complex manifestations in the functional properties of neurons.

      Overall, the authors have performed an admirably large amount of work both in whole-brain neural activity imaging and pharmacological screening. However, they are not successful in integrating the results of both experiments into an acceptably consistent interpretation due to the incongruency of the results of different experiments. Although the authors present some models for interpretation, it is not easy for me to believe that this model would help the readers of this journal to deepen the understanding of the mechanisms for habituation in DF responses at the neural circuit level.

      This reviewer would rather recommend the authors divide this manuscript into two and publish two papers by adding some more strengthening data for each part such as cellular manipulations, e.g. ablation to prove the critical involvement of 12(Pot, M) neurons in habituation.

      We thank the reviewer for their careful consideration of our manuscript, and we agree that our emphasis on a particular model of DF habituation, namely the potentiation of GABAergic synapses, was overly speculative. We hope they will agree that our revised manuscript better reflect the results from our experiments, and we have tried to more specifically emphasize the incongruency in our behavioural and Ca2+ imaging data after pharmacological treatment, which we agree shows that a simple model is insufficient to capture both of these sets of observations.

      We have opted not to split the paper into two, since we feel that the collective message of this paper and approach combining molecular and functional analysis will be of interest. Moreover, we feel that the molecular and functional analyses feed off of each other and provide a level of complementarity that would be lost if the manuscript would be split, even if the message in this particular case is rather complex

      Reviewer #1 (Recommendations For The Authors):

      There is much to commend about this manuscript. The advantages of studying habituation in the zebrafish larva are very clearly demonstrated, including the wonderful calcium imaging across the brain and the relatively high throughput screening of large numbers of different pharmacological agents. The habituation to dark flashes in freely moving larvae is also striking and the very large effect size serves the screening beautifully. Thus, if we take the really substantial amount of work of a very high standard that has been done here, there is clearly potential for an important new contribution to the literature. However, as you will see from my public review, I am of the opinion that a specific role for the modification of GABAergic inhibitory systems has not yet been established through this work. While the potential role for GABAergic inhibitory neurons in habituation, either as the key modifiable element or as an intermediary between memory and motor output, is an attractive theory with many strengths, your study as it currently stands does not categorically demonstrate that one of those two options holds. For instance, the more traditional view, that adaptive filtration is mediated by weakened synaptic connectivity between excitatory sensory systems and excitatory motor output or reduced intrinsic excitability in those same neurons, could still be in operation here. By lowering GABAergic influence over post-synaptic targets with picrotoxin, it is possible that motor output remains highly active, and even lower activity or synaptic drive from those excitatory sensory systems that feed into the output may still reliably produce behavioral output. Alternatively, it could be the formation of a memory of the familiar stimulus is disrupted by reduced inhibition that alters sensory coding either by introducing noise or reducing the selectivity of receptive fields. I believe that there are several options to address these concerns:

      1) You could change the emphasis of the manuscript so that it is less focused on inhibition and instead emphasizes the categorization of clusters of neurons that have divergent responses during habituation, including either strong suppression to potentiation. To this, you add a high throughput screening system with a wide range of different agents being tested, several of which produce a significant effect on habituation in either direction. These observations in themselves provide powerful building blocks for future work.

      2) If GABAergic neurons play a key role in habituation in this paradigm, then picrotoxin is having its effect by blocking receptors on excitatory neurons. Thus, it seems that selectively imaging GABAergic neurons before and after the application of these drugs is not likely to reveal the contribution of GABAergic synaptic influence on excitatory targets. More important is to get a stronger sense of how the GABAergic neurons change their activity throughout habituation and then influence the downstream target neurons of those GABAergic neurons (some of which may themselves be inhibitory and participating in disinhibition). For instance, you could interrogate whether anti-correlations in activity levels exist between presynaptic inhibitory neurons and putative post-synaptic targets. This analysis could be further bolstered by removing that relationship in the presence of Picrotoxin, thereby demonstrating a direct influence of inhibition from a GABAergic presynaptic partner on a postsynaptic target. While this would constitute a lot more work, it is likely to yield greater insight into a specific role for GABAergic neurons in habituation, and I suspect much of that information is in the existing datasets.

      3) To really reveal causal roles for inhibition in this form of habituation, it seems to me that there needs to be some selective intervention in GABAergic neuronal activity, ideally bidirectionally, to transiently interrupt or enhance habituation. Optogenetic or chemogenetic stimulation/inactivation is one option in this regard, which I imagine would be challenging to implement and certainly involves a lot of further work, particularly if you are then going to target specific subpopulations of GABAergic neurons. I appreciate that this option seems way beyond the scope of a review process and would probably constitute a follow-up study.

      We agree with the reviewer that we have not “categorically demonstrated” that GABAergic inhibitory neurons drive habituation by increasing their influence on the circuit, and appreciate the suggestions for how to reformulate our manuscript to better reflect this. We have opted to follow suggestion (1), and have considerably changed the focus of the manuscript.

      The additional analysis suggested in (2) is very interesting, but since we can not identify which cells are inhibitory in our imaging experiments with picrotoxinin treatment, nor which are pre- or post-synaptic, we feel that this analysis will be very unconstrained. Also, if GABA is acting as an inhibitory neurotransmitter, it therefore is expected to act to drive anticorrelations among pre and postsynaptic neurons through inhibition. Therefore, blockage of GABA through PTX would be expected to result in increased correlations, regardless of our hypothesized role of neurons during habituation. Our current efforts are aimed at identifying critical neurons driving habituation plasticity, and we will perform such analysis once we have mechanisms for identifying these neurons.

      Finally, we agree that (3) is the obvious and only way to demonstrate causation here, and this is where we are working towards. However, since we currently have no means of genetically targeting these neurons, we are not able to perform these suggested experiments today.

      I have some additional concerns that I would really appreciate you addressing:

      1) The behavioral habituation is striking in the freely moving larvae, but very hard to monitor in the larvae that are immobilized for calcium imaging. Are there steps that could be taken in the long run to improve direct observation of the habituation effect in these semi-stationary fish? For instance, is it possible to observe eye movements or some more subtle behavioral readout than the O-bend reflex? I apologize if this is a naïve question, but I am not entirely familiar with this specific experimental paradigm.

      In the Dark Flash paradigm, we do not have readouts beyond the “O-bend” response itself, which is characterized by a large-angle bend of the tail and turning maneuver. We have not observed other, more subtle behavioural responses, such as eye or fin movements, for example. If we would be able to identify alternative behavioural outputs that were more robustly performed during head-embedded preparations, this would indeed be an advantage allowing us to more directly interpret the Ca2+ imaging results with respect to behaviour.

      2) The dark flash as a stimulus to which the larvae habituate is obviously used as a powerful and ethologically relevant stimulus. However, it does leave an element of traditional habituation paradigms out, which is a novel stimulus that can be used to immediately re-instate the habituated response (otherwise known as dishabituation). Is there a way that you can imagine implementing that with zebrafish larvae, for instance through systematically altering a visual feature, such as spatial frequency or orientation? This would be a powerful development in my view as it would not only allow you to rule out motor or sensory fatigue as an underlying cause of reduced behavior but also it would provide an extra feature that strengthens your assessment of neuronal response profiles in candidate populations of inhibitory and excitatory neurons.

      We agree that identifying a dishabituating stimulus would be very powerful for our experiments. For short-term habituation of the acoustic startle response, Wolman et al demonstrated that dishabituation occurs after a touch stimulus (Wolman et al., PNAS, 2011; https://doi.org/10.1073/pnas.1107156108). We attempted to dishabituate the O-Bend response with tap and touch stimuli, and this unfortunately did not occur. Our understanding of dishabituation is that this generally requires a second stimulus that elicits the same behaviour as the habituated stimulus (e.g. both acoustic and touch-stimuli elicit the Mauthner-dependent C-bend response). In zebrafish the only stimulus that has been identified that elicits the O-bend is a dark-flash. This lack of an appropriate alternative stimulus is perhaps why we have been unsuccessful in identifying a dishabituating stimulus.

      3) You have written about the concept of 'short' and 'long' response shapes when using calcium imaging as a proxy for neural activity, surmising that the short response shape may reflect transient bursting. Although calcium imaging obviously has many advantages, this feature reveals one notable limitation of calcium imaging in contrast to electrophysiology, in that the time course of the signal is considerably longer and does not allow you with confidence to fully detect the response profile of neurons. Is there some kind of further deconvolution process that you could implement to improve the fidelity of your calcium imaging to the occurrence of action potentials? The burstiness of neurons is obviously important as it can indicate a particular type of neuron (for instance fast-spiking inhibitory neurons) or it might reveal a changing influence on post-synaptic neurons. For instance, bursting can be a response to inhibition due to the triggering of T-type calcium channels in response to hyperpolarization.

      One of the major limitations to Ca2+ imaging is the lack of temporal resolution. In our particular approach, using nuclear-targeted H2B-GCaMP indicators, further reduces our temporal resolution. Deconvolution approaches can be used in some instances to approximate spike rate, since the rise-time of Ca2+ indicators can be relatively fast. However, in our imaging we chose to image larger volumes at the expense of scan rate, where our imaging is performed at only 2hz. Therefore, deconvolution and spike-rate estimation is not appropriate. Considering these limitations, we would argue that the fact that we can observe differences in kinetics of the 'short' and 'long' response shapes indicates that they likely show very different response kinetics, which we hope to confirm by electrophysiology once we have established ways of targeting these neurons for recordings.

      4) I note that among the many substances you screened with is MK801. An obvious candidate mechanism in habituation is the NMDA receptor, given the importance of this receptor for so many forms of learning and bidirectional synaptic plasticity. If I am to understand correctly, this NMDA receptor blocker actually enhances habituation in the zebrafish larvae, similar to melatonin. That is a very surprising observation, which is worth looking into further or at least discussed in the manuscript. The finding would, at least, be consistent with the idea that plasticity is not occurring at excitatory synapses and could potentially bolster the argument that plasticity of inhibitory synapses is at play in this particular form of habituation.

      This is a very important point. We were also particularly interested in MK801, which has been shown to inhibit other forms of habituation, like short-term acoustic habituation (Wolman et al., PNAS, 2011; https://doi.org/10.1073/pnas.1107156108). In our experiments we did see that fish become even less responsive to dark flashes when treated with MK-801 (SSMD fingerprint data: Prob-Train = -0.39, Prob-Test = -1.58) which would indicate that MK-801 promotes dark flash habituation, similar to Melatonin. However, we also observed that MK-801 caused a decrease in the performance in the other visual assay we tested: the optomotor response (OMR-Perf = -0.93), indicating that MK-801 causes a generalized decrease in visual responses, perhaps by acting on circuits within the retina. Therefore, based on these experiments with global drug applications, we cannot determine if MK-801 influences the plasticity process in dark-flash habituation, and this is why we did not pursue it further in this project.

      Anyway, I hope that you take these suggestions as constructive and, in the spirit that they are intended, as possible routes for improving an already very interesting manuscript.

      We are very grateful for your suggestions, which we feel has helped us to improve our manuscript substantially.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the manuscript is well-written, but confusing at times. The results are not always presented in a consistent way, and I found myself having to dig in the raw data or code to find answers. There is a certain disconnect between the free-swimming results, and the calcium imaging, which is somewhat inevitable based on other published work. But I am unsure of what they each bring to the other, as the results from Fig.6 do not match at all the changes observed in the behavioural assays, it almost feels like two separate studies and the inconsistencies make the model appear unlikely.

      We agree that there is a disconnect at the behavioural level in our free-swimming and head-embedded imaging experiments. However, this does not necessarily mean that the activity we observe during the imaging experiments cannot be informative about processes that are also occurring in freely-swimming fish. For example, it is possible that the dark-flash circuit is responding and habitating similarly in the head-embedded and freely-swimming preparations, but that in the latter context there is an additional blockade on motor output that massively decreases the propensity of the fish to initiate any movements. In such a case, the “disconnect between the free-swimming results, and the calcium imaging” would indicate that the relationship between neural activity and habituation behaviour is rather complex.

      Without a method to record activity from freely swimming fish at our disposal, we can not determine this, one way or the other.

      We hope that we now acknowledge these concerns appropriately in the discussion:

      “We also found that the same pharmacological treatments that result in strong alterations to habituation behaviour in freely swimming larvae ([fig:5]), resulted in relatively subtle and complex functional alterations in the circuit ([fig:6]). Making direct comparisons between freely-swimming behaviour and head-fixed Ca2+ imaging is always challenging due to the differences in behaviour observed in the two contexts, and therefore our failure to identify a clear logic in these experiments may have technical explanations that will require approaches to measure neural activity from unrestrained and freely-behaving animals to resolve . Alternatively, these results are again consistent with the idea that habituation is a multidimensional and perhaps highly non-linear phenomenon in the circuit, which cannot be captured by a simple model. “

      I am not convinced by the results surrounding GABA, from the inconsistent GABA receptor antagonist profile to the post hoc identification of GABAergic neurons as it is currently done in the manuscript. I think that the current focus on GABA does a disservice to the manuscript. However, the novel findings surrounding the potential role of Melatonin, and Estrogen, in habituation are quite interesting.

      We agree that we focused too heavily on our hypothesized role for GABA in our original manuscript, and we hope that the reviewer agrees that our updated manuscript is an improvement. We also thank the reviewer for their interest in our Melatonin and Estrogen results, for which follow up studies are ongoing to characterize the effects of these hormones and their receptors on habituation.

      There is an assumption that all the adaptation profiles are related to the DF (although that is somewhat alleviated in the discussions of the ON responses) and not to the luminosity changes. But there is no easy way to deconvolve those two in the current experiments. I would like the timing of the fluorescence rise to be quantified compared to the dark flash stimulus onset, potentially spike inference methods could help with giving a better idea of the timing of those responses. Based on the behavioural responses that were <500ms in Randlet O et al, eLife, 2019; we would expect only the fastest DF responses to be linked to the behaviour.

      We agree that we are unable to disambiguate responses to the dark flash that initiate the O-bend response, and those that are related to only changes in luminosity. As discussed above, our Ca2+ imaging approach is severely limited in temporal resolution and therefore spike inference methods are not appropriate.

      Major comments

      Fig.1: There seems to be a very variable lag between the motor events and DF responses, furthermore, it does not seem that the motor responses follow a similar habituation rate as in 1Bi. Although this only shows the smoothed 'movement cluster' from the rastermap, it could hide individual variability. It would be important to know what the 'escape' rate was in the embedded experiment, as

      Fig.1 sup.1 seems to indicate there was little to no habituation. It would also be needed to know which motor events are considered linked to the DF stimulus, and how that was decided. Was there a movement intensity threshold and lag limit in the response?

      We interpret this concern as relating to the data presented in Figure 6A, where we quantify the habituation rate in the head-embedded experiments. As we have discussed, both above and in the manuscript, we saw very strongly muted responses to DFs in the head-embedded preparation, but we neglected to describe our method of quantifying the responses. We have added the following description to the methods:

      “To quantify responses to the dark flash stimuli we used motion artifacts in the imaging data to identify frames associated with movements ([fig:1]-[fig:S1]). Motion artifact was quantified using the “corrXY” parameter from suite2p, which reflects the peak of phase correlation comparing each acquired frame and reference image used for motion correction. The “motion power” was quantified as the standard deviation of a 3-frame rolling window, which was smoothed in time using a Savitzky-Golay filter (window length = 15 frames, polyorder = 2). A response to a dark flash was defined as a “motion power” signal greater than 3 (z-score) occurring within 10-seconds of the dark-flash onset, and was used to quantify habituation in the head-embedded preparation ([fig:6]A).“

      Line 94: This seems to be a strong claim based on the sparse presence of non-habituating, or potentiating, neurons in downstream regions. However, these neurons appear to be extremely rare, and as mentioned in my comment above, the behavioural habituation appears minimal. These neurons could encode the luminosity and be part of other responses, such as light-seeking in Karpenko S et al, eLife, 2020 or escape directionality in Heap et al, Neuron, 2018. Furthermore, dimming information has been shown to have parallel processing pathways in Robles E et al, JCN, 2020; so it would make sense that not all the observed responses in this manuscript would be involved in behavioural habituation to dark flashes.

      We agree that without functional interventions, we do not know which of the neurons we have categorized are specifically involved in the dark flash response habituation. It is possible that the non-adapting and potentiating neurons are involved in other behaviours. We have therefore removed this statement.

      Line 103: It appears that several of those responses are to the changes in luminosity and not the DF itself, especially the ON and sustained responses. Based on the previous DF habituation study from Randlet O et al, eLife, 2019; the latency of the response is below 0.5s. So the behaviour-relevant responses must only include the shortest latency one, as discussed above.

      We appreciate the point that the reviewer is making here, but we are less clear about what the difference between “changes in luminosity” and a “dark flash” response are, since a dark flash consists of a change in luminosity. We take it that the reviewer means the difference between a luminance stimulus that elicits an O-bend, from one that does not. In order to disambiguate the two, one would likely need to use stimuli where the luminosity changes, but do not elicit O-bends.

      Perhaps due to the limited temporal resolution of our Ca2+ imaging data, we do not see a clear difference in the onset of the stimulus response for any of the functional clusters that would help us to determine which neurons are more relevant to the acute DF response.

      Fig.2B. It is very difficult to make out the actual average z-scored fluorescence, a supplementary figure would help by making these bigger. A plot to quantify the maximum response would also be useful to judge how it changes between the first few and few last DF. Another plot to give the time between the onset of the responses and the onset of the DF stimulus is also needed to judge which cluster may be relevant to the DF escapes observed in the free-swimming experiments.

      We agree with the reviewer that interpreting these datasets are challenging. We did include the actual average z-scored fluorescence in Figure 6—figure supplement 1, panel D. This figure also includes a comparison between the predicted Ca2+ response to the dark flash (the stimulus convolved with the approximate GCaMP response kernel), which shows that all OFF-responding neuronal classes show very similar rise time response kinetics, and thus this analysis does not help to judge whether a cluster is more or less relevant to O-bend responses in the free-swimming experiments. We appreciate that there are differences in opinion about the best way to present the data, but we have opted to leave our original presentation.

      Line 130: Is a correlation below 0.1 meaningful or significant? It does not seem like this cluster would be a motor or decision cluster.

      Our goal with this correlational analysis to motor signals was to identify if certain clusters of DF responsive neurons were more associated with motor output, and therefore may be more downstream in the sensori-motor cascade. Cluster 4 showed the highest median correlation across the population of cells. Whether a median correlation of ~0.1 is “meaningful” is impossible for us to answer, but it is highly “significant” in the statistical sense, as is evident by the 99.99999% confidence intervals plotted. We note that these cells were not selected based on their correlation to the motor stimulus, but only to the dark flash stimulus. There are “motor” clusters that show much higher correlations to the motors signals, as is evident in Figure 1G.

      Line 165: Did the changes observed for Pimozide fall below the significance threshold, were lethal, or were the results not repeated? It does not appear in source data 2.

      Pimozide was lethal in our screen and therefore does not appear in the source data file. Indeed, in our previous experiments with Pimozide we had already established that a 10uM dose is lethal, and that the maximal effective dose we tried was 1uM as reported in (Randlett et al., Current Biology, 2019).

      We have clarified this in the text:

      “While the false negative rate is difficult to determine since so little is known about the pharmacology of the system, we note that of the three small molecules we previously established to alter dark flash habituation that were included in the screen, Clozapine, Haloperidol and Pimozide , the first two were identified among our hits while Pimozide was lethal at the 10\muM screening concentration.”

      Fig.1B and Fig.3B are the same data, which is awkward and should be explicitly stated. But the legends do not match in terms of the rest period. Which is correct? It is also important to note the other behavioural assays in the 'rest' period.

      We thank the reviewer for pointing out this discrepancy in the legend. We have corrected the typo in the figure legend of Figure 3B :

      “Habituation results in a progressive decrease in responsiveness to dark flashes repeated at 1-minute intervals, delivered in 4 training blocks of 60 stimuli, separated by 1hr of rest (from 0:00-7:00).”

      We have also added a statement that the data is the same as that in Figure 1B.

      Figure 3-4: SSMD fingerprint, there is no description of the different behavioural parameters. What they represent is left to the reader's inference. There is no mention of SpontDisp in the GitHub for example, so it is hard to know how these different parameters were measured. Even referring to the previous manuscript on habituation (Randlet O et al, eLife, 2019) does not shed light on most of them, for example, I suppose TwoMvmt represents the 'double responses' from the previous manuscript. Furthermore, there are inconsistencies between 3C and 4B, some minor (SpontDisp becomes SpntDisp), but Curve-Tap has disappeared for example, and I suspect became BendAmp-Tap. A more thorough description of these measures, and making the naming scheme consistent, are essential for readers to know what they are looking at.

      We again thank the reviewer for their careful assessment of our data, and we apologize for this sloppiness. We have gone through and made the naming of these parameters consistent in both figures, and have added another supplementary table that describes in more detail what each parameter is, and how it relates to the analysis code (Figure3_sourcedata3_SSMDFingerprintParameters.xls). This was an essential missing piece of information from our original manuscript.

      Line 206: While this prioritization makes sense, how was it implemented, how was the threshold decided and which were they? A table, or supplementary figure, would help to clarify the reason behind the choices. Fig.4C being cropped only around the response probability makes it impossible to judge if the criteria were respected, as the main heatmap is too small. For example, the choice of GABA receptor antagonists is somewhat puzzling, as besides PTX it does not seem that the other compounds had strong effects, with Amoxapine for example having seemingly as much effect on Naive and Train, with little in Test. And Bicuculline gave negative SSMD for prob in the three cases. The dose-response for PTX does lend credence to its effect, but I would have liked the other compounds, especially bicuculline. The melatonin results, for example, are much more convincing and interesting in our opinion.

      While in hindsight it may have been possible to do the hit prioritization in a systematic way using thresholding and ranking, we did this manually by inspecting the clustered fingerprints. We have clarified this in the text: “This manual prioritization led to the identification of the GABAA/C Receptor antagonists…”

      While we agree that it is not possible to judge how well we performed this prioritization based on the images presented, we note that we do provide the full fingerprint data in the supplementary data, for which the reader is welcome to draw their own conclusions.

      We have not performed further experiments with amoxapine, so we can not comment further on this. We did perform additional experiments with bicuculline, for which we did see effects similar to those of PTX, were habituation was inhibited. However, the effects are weaker and more variable than what we observe with PTX, and bicuculline also inhibits the initial responses of the larvae, causing their Naive response to be lower. Therefore we did not include it in our manuscript. We include these data here in Author response image 1 to reassure the Reviewer that picrotoxinin is not the only GABA Receptor antagonist for which we see inhibitory effects on habituation.

      Author response image 1.

      Fig.6: Why was the melatonin concentration used only 1um instead of 10um on the screen?

      Based on dose response experiments (Figure 5B, and others not shown), we found that the effect of Melatonin on habituation saturates at about 1uM, and therefore we used this dose.

      Line 277: As the correlation with motor output is marginal at best, and the authors recognize the lack of behaviour in tethered animals, I would be careful about such speculation. Especially since the other changes are complex and go in all directions.

      While we appreciate the reviewer's caution, we feel that our statement is appropriately hedged using “might be”. We have also removed the statement “and thus is most closely associated with behavioural initiation”.

      We now state:

      “However, opposite effects of PTX and Melatonin were observed for 4_L^{strgD} neurons ([fig:6]C), which we found to be most strongly correlated with motor output ([fig:2]F). Therefore, this class might be most critical for habituation of response Probability.”

      Fig.7: I am not sure how convincing these results are. 7F may have been more convincing, but to be thorough the authors would need to register the Gad1b identity to the calcium imaging and use their outline to extract the neuron's fluorescence. As it is, in the tectum, it is hard to be sure that all the identified neurons are indeed Gad1b positive, as that population is intermingled with other neuronal populations. The authors should consider the approach of Lovett-Barron M et al, Nat Neuro, 2020. Alternatively, the authors can tone down the language used in this section to match the confidence level of the association they propose.

      Figure 7A-E are what can be considered “virtual colocalization” analyses, where we are comparing the localization of data acquired in different experiments using image registration to common atlas coordinates. We agree that these results alone will never be very strong evidence for the identification of individual cells. The MultiMAP approach of Lovett-Barron is a powerful approach, though it makes the assumption that registration accuracy will be subcellular, which in practice may often not be the case. We believe that a better approach is to label the cells of interest during the Ca2+ imaging experiment itself, as we did 7F and G. The challenge in this experiment is binarizing the ROIs and thus deciding what is and is not a Gad1b-positive cell. In our opinion, the fact that these two independent experiments came to the same conclusion regarding Cluster 10 and 11 is good evidence that these cell types are likely predominantly GABAergic.

      As discussed above, we have re-written the manuscript to tone down our claims about the role of GABA and GABAergic neurons in habituation, which we hope the reviewer will agree better reflects the limitations of the data in Figure 6 and 7.

      Line 317: Based on the somewhat inconsistent results of the other GABA antagonists, I would be careful. Picrotoxin has been reported to antagonize other receptors besides GABA, see Das P et al, Neuropharma, 2003. So the results may be explained by a complex set of effects on multiple pathways with PTX.

      Off target effects are an important concern with any pharmacological experiment, and perhaps especially in zebrafish where receptors and targets can be quite divergent from those in mammals where most drug targets have been characterized. We have added this sentiment to the discussion:

      “We cannot rule out the possibility that off-targets of PTX, or subtle non-specific changes in excitatory/inhibitory balance alter habituation behaviour.”

      Line 400-403, 430: There are some conflicting statements regarding the potential role of clusters 1 and 2 in DF habituation. Do the authors think they play a role in the behaviour measured in this manuscript? Could they clarify what they mean?

      We see how our original statement in line 429 about the presence of cluster 1 and 2 neurons in the TL implied a role in dark flash habituation. This was not our intent, and we have removed “which also contains high concentrations of on-responding neurons”.

      Our thoughts on these neurons are now stated in the discussion as:

      “We also observed classes exhibiting an On-response profile ( and ). These neurons fire at the ramping increase in luminance after the DF, making it unlikely that they play a role in aspects of acute DF behaviour we measured here. These neurons exist in both non-adapting and depressing forms suggesting a yet unidentified role in behavioural adaptation to repeated DFs.“

      Minor comments

      Line 73 (and elsewhere): Why use adaptation instead of habituation (also in the adaptation profile)? Do you suspect your observations do not reflect habituation, but a sensory adaptation mechanism?

      We have used the convention that “habituation” refers to observations at the behavioural level, while “depression” and “potentiation” refer to observations at the neuronal level. We use the term “adaptation” to refer to neuronal adaptations of either sign (depression or potentiation), as in line 73.

      We believe that our observations reflect neuronal adaptations that underlie habituation behaviour.

      Line 71: It is debatable that the strongest learning happens in the first block, the difference between the first and last response seems to grow larger with each successive block. What do the authors mean by 'strongest'

      We agree that “strongest” was ambiguous. We have changed this to “initial”:

      “We focused on a single training block of 60 DFs to identify neuronal adaptations that occur during the initial phase of learning ”

      Fig.1F: there is no rastermap call in the GitHub repository, was the embedding done in the GUI? If so, it should also be shared for reproducibility's sake.

      Yes, Fig.1F was created using the suite2p GUI, as we have now clarified in the methods:

      “The clustered heatmap image of neural activity (([fig:3]F) was generated using the suite2p GUI using the “Visualize selected cells” function, and sorting the neurons using the rastermap algorithm ”

      The image is available in the “Figure1 - Ca2Imaging.svg” file available here: https://github.com/owenrandlett/lamire_2022/tree/main/LamireEtAl_2022

      Line 101: while true that AffinityPropagation does not require input on the number of clusters, preference can influence the number of clusters. It seems that at least two values were tested in the search for the clusters, can the authors comment on how many clusters the other preference value converged (or failed to converge) on?

      Indeed, as with any clustering approach, the resultant clusters are highly dependent on the input parameters, in this case the “preference”, as well as “damping” and the choice of affinity metric. By varying these parameters one can arrive at anywhere between 2 and hundreds of clusters.

      It is for this reason that we feel that the anatomical analyses of these clusters is very important, making the assumption that neurons of differing functional types will have different localizations in the brain, as we explained in the Results:

      “While these results indicate the presence of a dozen functionally distinct neuron types, such clustering analyses will force categories upon the data irrespective of if such categories actually exist. To determine if our cluster analyses identified genuine neuron types, we analyzed their anatomical localization ([fig:2]C-E). Since our clustering was based purely on functional responses, we reasoned that anatomical segregation of these clusters would be consistent with the presence of truly distinct types of neurons.”

      We also acknowledge in the Results that the clustering approach has limitations:

      “These results highlight a diversity of functional neuronal classes active during DF habituation. Whether there are indeed 12 classes of neurons, or if this is an over- or under-estimate, awaits a full molecular characterization. Independent of the precise number of neuronal classes, we proceed under the hypothesis that these clusters define neurons that play distinct roles in the DF response and/or its modulation during habituation learning“

      Fig.2. My understanding is that the cluster numbers are arbitrary unless there is a meaning to them, which then should be explained. I would recommend grouping the clusters per functional category as in Fig.6 to make it easier for the reader.

      Cluster number reflects the ordering in the hierarchical clustering tree shown in Figure 2B. We feel that this is the most logical representation of their functional similarity. We have clarified this in the Methods:

      “ We then used the Affinity Propagation clustering from scikit-learn , with “affinity” computed as the Pearson product-moment correlation coefficients (corrcoef in NumPy ), preference=-9, and damping=0.9, and clustered using Hierarchical clustering (cluster.hierarchy in SciPy ). Cluster number was assigned based on the ordering of the hierarchical clustering tree. ”

      Fig.3 SSMD fingerprint, it would be much easier for the readers if the list of parameters was clearer and rotated 90 degrees. Maybe in a supplementary figure to show what each represents.

      We agree that the SSMD fingerprint is very difficult to interpret. As discussed above, we have now included a supplementary table (Figure3_sourcedata2_SSMDFingerprintParameters.xlsx) where we have clarified what each parameter represents.

      Fig.4: The use of the same colours across the clustering methods is confusing, especially after the use of colours for the SSMD fingerprint in Fig.3. and at the bottom of 4A. Fig.4A for example could have been colour coded according to the most affected behaviour in the fingerprint at the bottom.

      Fig.4B the coloured text is difficult to read, especially for the lighter colours.

      We agree that our use of color is not perfect, but we have attempted to use them consistently: for example when referring to a functional cluster, or a drug manipulation. We don’t think that there is a sufficient number of distinguishable colors for us to never use the same color twice.

      Fig.4C if the goal is to show similarity, the relevant drugs could be placed adjacent to each other. One could also report the Euclidean distance, or compute how correlated the different fingerprints are within one pharmacological target space.

      The goal of Fig 4C is to highlight where Bicuculline, Amoxapine, Picrotoxinin, Melatonin, Ethinyl Estradiol and Hexestrol lie within the clustered heatmap of the behavioural fingerprints (Fig 4A), and<br /> demonstrate how the probability of response to dark flashes is modulated by these drugs. In our analyses, “similarity” is a function of the clustering distance.

      Fig.6D 'Same data as M, ...' I assume should be 'Same data as C,...'

      Indeed, thank you for pointing out this error that we have corrected.

      Fig. 7 How many GCaMP6s double transgenic larvae were imaged?

      6 fish were imaged, as is stated in the legend to Fig 7G

      Line 407: all is repeated.

      We apologize, but we do not see what is repeated at line 407. Can you please clarify?

      Line 481: Would testing spontaneous activity after training for 7h be unbiased, could there be fatigue effects?

      We tested for fatigue effects in our previous study, comparing larvae that received the training for 7hrs and those that did not, and we saw no deficits in spontaneous activity, tap response, or OMR performance (Figure S1, Randlett et al., Current Biology, 2019).

      Line 610: There are some inconsistencies between the authors' contributions in the manuscript and the one provided to eLife.

      Thank you, we will double check this in the resubmission forms. The authors' contributions in the manuscript are correct.

      Reviewer #3 (Recommendations For The Authors):

      I would rather recommend the authors divide this manuscript into two and publish two papers by adding some more strengthening data for each part such as cellular manipulations, e.g. ablation to prove the critical involvement of 12(Pot, M) neurons in habituation.

      We thank the reviewer for their suggestion, but have opted not to split the paper into two. We feel that the collective message of this paper and approach combining molecular and functional analysis will be of interest, and we believe the incongruencies in our results reflects the complexity inherent within the system.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Major comments:

      1) The authors conclude that the bone growth defects are chondrocyte-specific, highlighting no changes in the IGF pathway. However, other bone cells such as mesenchymal progenitors, osteoblasts, osteocytes, and marrow stromal cells are also lateral plate mesoderm derived and likely have roles in the bone growth phenotypes (a). Additionally, while the size decrease of the proliferative zone was stated, no actual proliferation assays such as BrdU were conducted (b). With the elements being of such small size in the mutants, the defects are likely to be found at the earliest stages of limb development at E11.5-E13.5 and may be due to mesenchymal to chondrocyte transitions or defects in osteoblast lineage development (c). Overall, the skeletal characterization is not rigorous and does not identify even a likely cellular mechanism. Further, a molecular mechanism by which SMN functions in mesenchymal progenitors, chondrocytes, or osteoblast lineage cells has not been assessed (d).

      (a, c) As the reviewer commented, it seems to be a very important point to evaluate whether there is any problem in embryonic development from the time of mesenchymal cell condensation of the limb bud to the primary ossification center. However, when Hensel et al evaluated bone growth in P3 of severe SMA mice, the growth defect was not very large, with control femur length 3.5 mm and mutant 3.2 mm. it seems that even if SMN defects occur, there is no major problem with endochondral bone formation in the embryonic period (Hensel et al., 2020).

      In this study, the SMN2 1-copy mutant with the bone growth defect was found to have a similar reduction in SMN protein to the severe SMA mouse model in experiments quantifying SMN protein. When Hensel et al. performed an in vitro ossification test on primary osteoblasts from the other severe SMA mouse model (Taiwanese severe SMA), they found no significant difference compared to controls. In femurs at P3 from severe SMA mice, they found no difference in bone voxel density and bone thickness (Hensel et al., 2020). In our data, bone thickness was not different in Figure 1 and Figure 1 – figure supplement 2, and BMD was actually greater. Thus, we believe that osteoblast and osteocyte function does not appear to be impaired by the absence of SMNs. When we looked at cortical osteoblasts in our new Figure 1-figure supplement 2, there did not appear to be a significant difference in density.

      Furthermore, it is unlikely that BMSCs contributed to the bone growth we observed up to 2 weeks of age. the Lepr+Cxcl12+ BMSC population, which constitutes 94% ± 4% of CFU-F colonies formed by bone marrow cells (Zhou et al.k, 2014), is Prrx1-positive, and is known to be capable of osteogenesis in vivo, was only shown to differentiate into osteoblasts and form new bone in adults over 8 weeks of age. In the Lepr-cre; tdTomato; Col2.3-GFP mouse model, few cells expressing the osteoblast marker Col2.3-GFP are found before 2 months, and only about 3% of femur trabecular and cortical osteocytes express tdTomato at 2 months (Zhou et al., 2014). In Cxcl12-CreER; tdTomato; Col2.3-GFP mouse model, the researchers did not find tomato positivity in osteoblasts and osteocytes even after administration of tamoxifen at P3 and analysis 1 year later (Matsushita et al., 2020).

      We, therefore, concluded that the bone growth abnormalities observed in SMN2 1-copy mutants are due to problems in endochondral ossification caused by chondrocyte defects and not due to other Prrx1-lineage skeletal cells.

      (b) According to the reviewer's suggestion, we evaluated cell proliferation in the new Figure 1J-L by performing immunostaining for the Ki67 proliferation marker in growth plates.

      (d) As the reviewer pointed out, we enhanced the mechanism study and found the reduction of chondrocyte-derived IGF signaling and hypertrophic marker in new Figure 2. We evaluated the density of osteoblasts and osteoclasts, which can affect bone mineralization. We highlighted the limited impact of BMSCs on bone growth in the first two weeks of life. In a previous study, SMN-deleted osteoblasts did not show any issues with ossification (Hensel et al., 2020). In fact, osteoblast density in the SMN2 1-copy mutant was not different from the control, indicating that the skeletal abnormalities can largely be attributed to deficiencies in endochondral ossification caused by chondrocytes. Since chondrocytes are the local source of IGF and our mutants exhibit phenotypes similar to mouse models with reduced IGF, such as downregulated expression of Igf1 and Igfbp3, downregulated IGF-induced hypertrophic gene expression, reduced AKT phosphorylation, proliferation, and growth plate zone length, SMN-deleted chondrocytes probably showed these phenotypes due to decreased IGF secretion. Now, we added new Figure 2A-C, and E.

      2) Is the liver the only organ/tissue that supplied IGF to the chondrocytes or are other lateral plate mesoderm-derived cells potential suppliers? It's not possible to pin SMN deletion in chondrocytes as intrinsic ignoring the other bone cell types that it is depleted from in the Prrx1Cre genetic model.

      Recently, Oichi et al. reported that the local IGF source in the growth plate is chondrocytes by in situ hybridization and p-AKT staining (Oichi et al., 2023). When we measured IGF in chondrocytes isolated from articular cartilage, the expressions of Igf1 andIgfbp3 were markedly reduced in chondrocytes with SMN deletion compared to controls (New Figure 2E), suggesting that intrinsic SMN expression in chondrocytes plays an important role in the growth plate.

      3) Why is SMN protein being isolated from FAPs to assess levels in the null/SMN2 single copy/double copy mutants when the bone defects are supposed to be a chondrocyte-specific phenotype? This protein expression needs to be confirmed in chondrocytes themselves, and or other Prrx1Cre lineaged skeletal cells.

      According to the reviewer’s suggestion, we attempted to evaluate the protein levels in chondrocytes of the SMN2 1-copy mutant. However, we were unable to obtain sufficient numbers of chondrocytes, because of poor proliferation of mutant chondrocytes compared to controls in culture conditions. We could obtain ~10^4 viable cells from 1 mouse of SMN2 1-copy mutant. Therefore, our only options for confirming SMN deletion in chondrocytes were DNA and RNA work. As in the Prrx1-lineage FAPs that the amount of SMN protein correlates with the expression levels of full-length SMN mRNA (Figure 2H-J), we expect that the SMN protein in chondrocytes would be fully depleted due to poor full-length SMN mRNA expression (Figure 2H).

      4) Figure 2E should have example images of each type of NMJ characterization.

      We revised our figure by adding the example images in new Figure 3E.

      5) What are the overall NMJ numbers in the normal formation period? Are these constant into the juvenile period when the authors say the deterioration occurs?

      We appreciate the reviewer's constructive comments, and it would be interesting to see if we could see a difference in the total number of NMJs. However, there is one NMJ in every myofiber, and each muscle has hundreds to thousands of myofibers. The technical difficulty of confocal imaging an entire muscle, which can be several millimeters across, precludes experiments that count every NMJ and show a difference. It may be possible to do so by combining clearing and confocal line scanning techniques. In our analysis of the NMJ, the formation of the NMJ in the mutant appears to be normal. Additionally, the number of myofibers seems to be the same, and there may be no difference in the total NMJ number.

      6) For transplantation experiments the authors sorted YFP or TOMATO+ cells from the Prrx1Cre mice muscles, but refer to them as FAPs. It is known that other cells including tenocyte-like cells, pericytes, and vascular smooth muscle cells are identified by this reporter line. Staining for TOMATO colocalization with PDGFRA would help to clarify this.

      In the method ‘Hindlimb fibro-adipogenic progenitors isolation’ section, we sorted 7AAD–Lin–Vcam–Sca1+ population refers to FAPs. For FAPs transplantation, we also used YFP or TOMATO+ FAPs (7AAD–Lin–Vcam–Sca1+). The ‘FAPs transplantation’ method section did not specify the FAPs population in detail. This has been fixed in the new method. Sca1 (Ly6a) is an effective marker for identifying FAPs within Prrx1-lineage cells, as well as Pdgfra (Leinroth et al., 2022).

      7) The authors only compare the SMN2 single copy mutant transplantation to contralateral to show rescue, but how does this compare to overall wt morphology?

      According to the reviewer’s constructive comment, we compared them with wild-type morphology (new Figure 7A-D).

      8) The asterisks of TOMATO+ in Figure 6A are confusing. FAPs do not usually clump together to form such large plaques and are normally much thinner tendrils. What is the reason for this?

      As the reviewer states, FAPs have a fibroblast-like morphology with elongated thinner tendrils. The Figure 6A image in the figure shows a Z-sliced cell body portion of FAP, where the nucleus is located, and it appears blunt. We attached imaged tomato+ FAPs, in which their cell body parts are plaque-like.

      Author response image 1.

      Tomato+ FAPs in muscle

      9) Would transplantation of healthy FAPs after NMJ maturation in SMN mutants still rescue the phenotype? Assessment of this is key for therapy intervention timelines moving forward.

      It will be very interesting to see if the phenotype improves after NMJ maturation by healthy FAPs transplantation, but this is a technically difficult experiment to do because we found that FAPs do not implant effectively when injected into naive adult muscle. The transplantation into the adult is sufficiently possible if accompanied by an injury, but this eventually leads to new formation of NMJ again. Thus, it seems impossible to do transplantation experiment after NMJ maturation through general methods. If we discover a method to efficiently rescue SMNs from FAPs or identify a factor that affects FAPs' influence on NMJ, then we may be able to conduct this experiment.

      Reference

      Hensel, N., Brickwedde, H., Tsaknakis, K., Grages, A., Braunschweig, L., Lüders, K. A., Lorenz, H. M., Lippross, S., Walter, L. M., Tavassol, F., Lienenklaus, S., Neunaber, C., Claus, P., & Hell, A. K. (2020). Altered bone development with impaired cartilage formation precedes neuromuscular symptoms in spinal muscular atrophy. Human Molecular Genetics, 29(16), 2662–2673. https://doi.org/10.1093/hmg/ddaa145

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Matsushita, Y., Nagata, M., Kozloff, K. M., Welch, J. D., Mizuhashi, K., Tokavanich, N., Hallett, S. A., Link, D. C., Nagasawa, T., Ono, W., & Ono, N. (2020). A Wnt-mediated transformation of the bone marrow stromal cell identity orchestrates skeletal regeneration. Nature Communications, 11(1). https://doi.org/10.1038/s41467-019-14029-w

      Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Zhou, B. O., Yue, R., Murphy, M. M., Peyer, J. G., & Morrison, S. J. (2014). Leptin-receptor-expressing mesenchymal stromal cells represent the main source of bone formed by adult bone marrow. Cell Stem Cell, 15(2), 154–168. https://doi.org/10.1016/j.stem.2014.06.008

      Reviewer #2

      Major comments:

      1) Regarding bone deficits - CT analysis of bones should be more comprehensive than Figure 1A shows. How about cross-sections? (a) Are bone phenotypes also age-dependent? (b) PCR was done only for SMA and related proteins (such as IGF). IGF protein in the blood and relevant organs should be studied. Why not include biomarkers of osteoblasts or/and osteoclasts and their regulators? (c)

      (a) We appreciate the reviewer’s constructive comment. we added longitudinal section views in new Figure 1A and a description of trabecular bone volume and secondary ossification center in the main text.

      (b) Age-dependent evaluation is an important point. By adulthood, the difference between the SMN2 1-copy mutant and the control is much larger, and even at birth there is a slight difference, although not as large as at 2 weeks of age. We focused our phenotyping on bone growth at 2 weeks of age, a time when new bone formation by BMSCs is less influential, when bone growth is primarily driven by endochondral ossification of chondrocytes, and before the defect in the NMJ is primarily manifested.

      (c) As the reviewer comments, it is important that IGF are evaluated in tissues other than liver. However, the liver is most likely the source of systemic IGF, as shown by the liver-specific deletion of Igf1 and knockout of Igfals, a protein that forms the IGF ternary complex, which is predominantly expressed in the liver. This resulted in a 90% drop in serum IGF levels and a phenotype of shortened femur length and growth plates in the double KO mice (Yakar et al., 2002).

      The local IGF source in the growth plate is chondrocytes confirmed by Igf1 in situ hybridization and p-AKT staining (Oichi et al., 2023). From the In situ hybridization data, we can observe that bone marrow and bone do not express Igf1 at all, but only perichondrium and chondrocytes in the resting zone express Igf1 mRNA. Therefore, we can see that the only supplier of IGF among LPM-derived cells is chondrocytes, and in the new figure 2, we measured IGF pathway expression and AKT phosphorylation in chondrocytes. We have confirmed that the expression of Igf1/Igfbp3 is reduced in chondrocytes with SMN deletion.

      To assess serum IGF level, we could not set up this experiment condition during our revision period due to the requirement of administrative procedures for purchasing new apparatuses and the limitation of our research funds. However, as previously stated, there is no difference in the expression of Igf1 and Igfals in the liver, which accounts for 90% of serum IGF levels. Therefore, we did not anticipate significant variations in serum IGF levels.

      Evaluation of osteoblasts or osteoclasts was done by section staining due to sampling difficulties for PCR. we assessed osteoblasts and osteoclasts state in new Figure 1-figure supplement 2.

      2) What is the relationship between deficits of bone deficits and muscle deficits or even NMJ deficits? Are they inter-related? Is skeletal muscle development also defective in Smn∆MPC mice? Can NMJ deficits result from bone deficits? Or vice versa?

      Unfortunately, the reviewer's comments are very difficult to clarify in our study using the Prrx1-cre model. In skeletal muscle development, the myofiber number was not significantly different in our mouse models. A study has shown that inactivating noggin, a BMP antagonist expressed in condensed cartilage and immature chondrocytes, results in severe skeletal defects without affecting the early stages of muscle differentiation (Tylzanowski et al., 2006). Therefore, bone may not have a significant impact on the early development of muscle, but later in postnatal development it may have an impact on motor performance issues. The relationship between bone and NMJ hasn't been studied. The impact of bone defects on motor skill may result in muscle weakness and NMJ problems. In our study, we showed that NMJ deficit rescue by transplantation of FAPs and decreased IGF in chondrocytes, a key source of local IGF. This suggests that the functions of FAPs in NMJ and chondrocytes in bone deficit are crucial, rather than each other's influence.

      3) Regarding the rescue experiment, the interpretation of the data should be careful. Evidently, healthy FAPs (td-Tomato positive) were transplanted into TA muscles of 10 days-old SMN2 1-copy SmnΔMPC mice, and NMJs were looked at P56. The control was contralateral TA that was injected with the vehicle. As described above, the data had huge SEM and were difficult to interpret or believe. The control perhaps was wrong if FAPs act by releasing "chemicals" because FAPs from one leg may go to other muscles via blood. Second, if FAPs act via contact, the data shown did not support this. Two red FAPs were shown in Figure 6, one of which was superimposed with a nerve track to one of the three NMJs. This NMJ however did not show any difference to the other two, which did not support a contact mechanism. These rescue data were not convincing.

      We appreciate the reviewer’s critical comment, but the reviewer appears to have confused the minimum and maximum range bars in the box-and-whisker plot with the SEM error bar in the bar graph. We apologize for the insufficient description of the figure legends section. We revised them. New Figure 7C, which is a bar graph, has a sufficiently short SEM error bar. In contrast, box-and-whisker plots B and D depict the minimum and maximum range, instead of the SEM, and they are significantly different with a p-value of less than 0.001. If FAPs affect the NMJ via a paracrine factor or ECM with a short range of action, they may rescue the NMJ defect in a non-contact-dependent manner, without affecting the contralateral muscle. Also, the FAPs are heterogeneous, so if only a certain subpopulation rescues, the tomato+ FAP in the figure may not be the rescuing cells.

      4) For most experiments, the "n" numbers were too small. 3-5 mice were used for bone characterization. For the NMJ, most experiments were done with 3 mice. It was unclear how many NMJs were looked at. Perhaps due to small n numbers, the SEM values were enormous (for example, in Figure 6).

      As with the response to the previous comment, this is due to confusion between box-and-whisker plots and bar graphs, and our data was determined to be significant using the appropriate statistical method.

      5) Also for experimental design, some experiments included four genotypes of mice (Fig. 1 J,K) whereas some had only three (Fig.1 A, B, C, D and Fig.3) and others had two (many other figures).

      In the first experiments to confirm the phenotypes, we tested the 2-copy mutant, but it was not significantly different from the wild type, and in subsequent experiments, we mainly tested the only 1-copy mutant.

      6) What was the reason why mixed muscles were used for NMJ characterization (TA versus EDL)? Why not pick a type I-fiber muscle and a type II-fiber muscle?

      We appreciate the constructive comment from the reviewer. Firstly, we conducted a phenotype analysis on the TA muscle. For electrophysiological recording, the EDL muscle should be used for intact nerve with muscle preparation, technically. Additionally, for TEM imaging, EDL was a suitable muscle to locate NMJ positions before TEM processing. Both TA and EDL muscles are adjacent and have similar fiber-type compositions. It would be important to observe in different fiber types of muscles, but when we first identified the phenotype, various types of limb muscles showed similar defects, so we focused on specific muscles.

      7) The description of mouse strains was confusing. SMN2 transgenic mice (with different copies) were not described in the methods.

      We apologize for the insufficient description of the method section. By crossing mice with the SMN2+/+ homologous allele, SMN2 heterologous mice with only one SMN2 allele are SMN2 1-copy mice (SMN2+/0) and SMN2 homologous mice are SMN2 2-copy mice (SMN2+/+). We revised our manuscript method ‘Animals’ section.

      Reference Oichi, T., Kodama, J., Wilson, K., Tian, H., Imamura Kawasawa, Y., Usami, Y., Oshima, Y., Saito, T., Tanaka, S., Iwamoto, M., Otsuru, S., & Enomoto-Iwamoto, M. (2023). Nutrient-regulated dynamics of chondroprogenitors in the postnatal murine growth plate. Bone Research, 11(1). https://doi.org/10.1038/s41413-023-00258-9

      Tylzanowski, P., Mebis, L., and Luyten, F. P. (2006). The noggin null mouse phenotype is strain dependent and haploinsufficiency leads to skeletal defects. Dev. Dyn. 235, 1599–1607. doi: 10.1002/dvdy.20782

      Yakar, S., Rosen, C. J., Beamer, W. G., Ackert-Bicknell, C. L., Wu, Y., Liu, J. L., Ooi, G. T., Setser, J., Frystyk, J., Boisclair, Y. R., & LeRoith, D. (2002). Circulating levels of IGF-1 directly regulate bone growth and density. Journal of Clinical Investigation, 110(6), 771–781. https://doi.org/10.1172/JCI0215463

      Reviewer #3

      1) The authors used Prrx1Cre mouse with floxed Smn exon7(Smnf7) mouse carrying multiple (one or two) copies of the human SMN2 gene. Is it expressed both in chondrocytes and mesenchymal progenitors in the limb?

      We appreciate the reviewer's comment. We analyzed the deletion of Smn in chondrocytes and FAPs via Cre using genomic PCR and qRT-PCR, as depicted in new Figure 2. The SMN2 allele, which is expressed throughout the body, can rescue Smn knockout mouse lethality (Monani et al., 2000). Indeed, the short limb length and lethality observed in SMN2 0-copy mutants were mitigated by the presence of multiple copies of SMN2. Therefore, both Chondrocytes and FAPs may express SMN2 transcripts from the transgenic SMN2 allele.

      2) Page 10 regarding Fig.2E, please show pretzel-like structure. In Figure 2E, plaque, perforated, open, and branched are shown; however, the pretzel is not shown. The same issue is for the Fig. 3D explanation in the text on page 12.

      We appreciate the reviewer's constructive feedback. We included illustrative figures of all types of NMJ characterization, and the branched type is identical to the pretzel type. Therefore, we have replaced ‘branched’ with ‘pretzel’ in our text and revised Figure 3E by incorporating the example images.

      3) The explanation of the electrophysiology for Fig.4 in the text on pages 12 and 15 (RRP) is not so convincing for the readers. It is advisable to add TEM data for transplantation if it is not technically difficult.

      We appreciate the reviewer's critical feedback. Because we did not measure RRP directly, we removed speculation about the possibility of RRP difference. If observing the active zone with TEM and the docking synaptic vesicle would help quantify RRP, it is technically difficult to obtain images of sufficient quality to distinguish the active zones with our current TEM imaging technique.

      4) The authors used the word FAP for 7AAD(-)Lin(-)Vcam(-)Sca1(+). It is recommended to show the expression of PDGFR alpha. Furthermore, as the authors stated in the text, mesenchymal progenitors (FAPs) are heterogeneous. Please discuss this point further. Other reports show at least 6 subpopulations using single-cell analyses (Cell Rep. 2022).

      In the report, Ly6a (Sca1) is a good marker for FAPs, as well as Pdgfra (Leinroth et al., 2022). The 6 subpopulations expressed Ly6a. The one of subpopulations associated with NMJ was discovered. This population expressed Hsd11b1, Gfra1, and Ret and is located adjacent to the NMJ and responds to denervation, indicating an increased possibility of interaction with the NMJ organization. In further our study, we aim to determine which subpopulations are crucial for NMJ maturation by transplanting them to mutants for rescue.

      5) How do authors determine the number of FAP cells for transplantation?

      The FAPs transplantation was performed according to a previously reported our study (Kim et al., 2021).

      Reference Kim, J. H., Kang, J. S., Yoo, K., Jeong, J., Park, I., Park, J. H., Rhee, J., Jeon, S., Jo, Y. W., Hann, S. H., Seo, M., Moon, S., Um, S. J., Seong, R. H., & Kong, Y. Y. (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight, 7(10). https://doi.org/10.1172/jci.insight.158380

      Leinroth, A. P., Mirando, A. J., Rouse, D., Kobayahsi, Y., Tata, P. R., Rueckert, H. E., Liao, Y., Long, J. T., Chakkalakal, J. V., & Hilton, M. J. (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports, 39(6), 110785. https://doi.org/10.1016/j.celrep.2022.110785

      Monani, U. R., Sendtner, M., Coovert, D. D., Parsons, D. W., Andreassi, C., Le, T. T., Jablonka, S., Schrank, B., Rossol, W., Prior, T. W., Morris, G. E., & Burghes, A. H. M. (2000). The human centromeric survival motor neuron gene (SMN2) rescues embryonic lethality in Smn(-/-) mice and results in a mouse with spinal muscular atrophy. Human Molecular Genetics, 9(3), 333–339. https://doi.org/10.1093/hmg/9.3.333

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Detecting unexpected epistatic interactions among multiple mutations requires a robust null expectation - or neutral function - that predicts the combined effects of multiple mutations on phenotype, based on the effects of individual mutations. This study assessed the validity of the product neutrality function, where the fitness of double mutants is represented as the multiplicative combination of the fitness of single mutants, in the absence of epistatic interactions. The authors utilized a comprehensive dataset on fitness, specifically measuring yeast colony size, to analyze epistatic interactions.

      The study confirmed that the product function outperformed other neutral functions in predicting the fitness of double mutants, showing no bias between negative and positive epistatic interactions. Additionally, in the theoretical portion of the study, the authors applied a wellestablished theoretical model of bacterial cell growth to simulate the growth rates of both single and double mutants under various parameters. The simulations further demonstrated that the product function was superior to other functions in predicting the fitness of hypothetical double mutants. Based on these findings, the authors concluded that the product function is a robust tool for analyzing epistatic interactions in growth fitness and effectively reflects how growth rates depend on the combination of multiple biochemical pathways.

      Strengths:

      By leveraging a previously published extensive dataset of yeast colony sizes for single- and double-knockout mutants, this study validated the relevance of the product function, commonly used in genetics to analyze epistatic interactions. The finding that the product function provides a more reliable prediction of double-mutant fitness compared to other neutral functions offers significant value for researchers studying epistatic interactions, particularly those using the same dataset.

      Notably, this dataset has previously been employed in studies investigating epistatic interactions using the product neutrality function. The current study's findings affirm the validity of the product function, potentially enhancing confidence in the conclusions drawn from those earlier studies. Consequently, both researchers utilizing this dataset and readers of previous research will benefit from the confirmation provided by this study's results.

      Weaknesses:

      This study exhibits several significant logical flaws, primarily arising from the following issues: a failure to differentiate between distinct phenotypes, instead treating them as identical; an oversight of the substantial differences in the mechanisms regulating cell growth between prokaryotes and eukaryotes; and the adoption of an overly specific and unrealistic set of assumptions in the mutation model. Additionally, the study fails to clearly address its stated objective-investigating the mechanistic origin of the multiplicative model. Although it discusses conditions under which deviations occur, it falls short of achieving its primary goal. Moreover, the paper includes misleading descriptions and unsubstantiated reasoning, presented without proper citations, as if they were widely accepted facts. Readers should consider these issues when evaluating this paper. Further details are discussed below.

      (1) Misrepresentation of the dataset and phenotypes

      The authors analyze a dataset on the fitness of yeast mutants, describing it as representative of the Malthusian parameter of an exponential growth model. However, they provide no evidence to support this claim. They assert that the growth of colony size in the dataset adheres to exponential growth kinetics; in contrast, it is known to exhibit linear growth over time, as indicated in [Supplementary Note 1 of https://doi.org/10.1038/nmeth.1534]. Consequently, fitness derived from colony size should be recognized as a different metric and phenotype from the Malthusian parameter. Equating these distinct phenotypes and fitness measures constitutes a fundamental error, which significantly compromises the theoretical discussions based on the Malthusian parameter in the study.

      The reviewer is correct in pointing out that colony-size measurements are distinct from exponential growth kinetics. We acknowledge that our original text implied that the dataset directly measured the exponential growth rate (Malthusian parameter), when in fact it was measuring yeast colony expansion rates on solid media. Colony growth under these conditions often follows a biphasic pattern in that there is typically an initial microscopic phase where cells can grow exponentially, but as the colony expands further then the growth dynamics become more linear (Meunier and Choder 1999). We have revised our text to state clearly what the experiment measured.

      However, while colony size does not exhibit exponential growth kinetics, several studies have argued that the rate of colony expansion is related to the exponential growth rate of cells growing in non-limiting nutrient conditions in liquid culture. This is because colony growth is dominated by cells at the colony boundaries that have access to nutrients and are in exponential growth. Cells in the colony interior lack nutrients and therefore contribute little to colony growth. This has been shown both in theoretical and experimental studies, finding that the linear growth rate of the colony is directly linked to the single-cell exponential growth rate (Pirt 1967; Gray and Kirwan 1974; Korolev et al. 2012; Gandhi et al. 2016; Meunier and Choder 1999). In particular, the above studies suggest that the linear colony growth rate is directly proportional to the square root of the exponential growth rate. Therefore, one would expect that the validity of the product model for one fitness measure implies its validity for the other measure. In addition, colony size was found to be highly correlated with the exponential growth rate of cells in non-limiting nutrients in liquid culture (Baryshnikova et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). For these reasons, we treated the colony size and exponential growth rate as interchangeable in our original manuscript. 

      To address the important point raised by the reviewer, we now explain more clearly in the text what the analyzed data on colony size show and why we believe it is reflective of the exponential growth rate. Finally, we note that our results supporting the product neutrality function are consistent with the work of (Mani et al. 2008), which used smaller datasets based on liquid culture growth rates (Jasnos and Korona 2007; Onge et al. 2007).

      The text in Section 2.3 now reads:

      “Having verified empirically that the Product neutrality function is supported by the latest data for cell proliferation, we now turn our attention to its origins. Addressing this question requires some mechanistic model of biosynthesis. However, most mechanistic models of growth apply directly to single cells in rich nutrient conditions, which may not directly apply to the SGA measurements of colony expansion rates. In particular, colony growth has been shown to follow a biphasic pattern (Meunier et al. 1999). A first exponential phase is followed by a slower linear phase as the colony expands. Previous modeling and empirical work indicates that this second linear expansion rate reflects the underlying exponential growth of cells in the periphery of the colony (Pirt 1967; Gray et al. 1974; Gandhi et al. 2016; Baryshnikova, Costanzo, S. Dixon, et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). More precisely, mathematical models show the linear colony-size expansion rate is directly proportional to the square root of the exponential growth rate under non-limiting conditions. Intuitively, this relationship arises because colony growth is dominated by the expansion of the population of cells in an annulus at the colony border that are exposed to rich nutrient conditions. These cells expand at a rate similar to the exponential rate of cells growing in a rich nutrient liquid culture. In contrast, the cells in the interior of the colony experience poor nutrient conditions, grow very slowly, and do not contribute to colony growth.

      This intimate relationship between both proliferation rates allows us to explore the origin of the Product neutrality function in mechanistic models of cell growth. Indeed, if colony-based fitnesses follow a Product model, then

      where the superscript c indicates colony-based values for the fitness W and the growth rate λ. Taking into account the relationship between single-cell exponential growth rates and colony growth rates, we can write

      where the superscript l denotes liquid cultures. Combining these expressions, we obtain

      In other words, from the perspective of the Product neutrality function, fitnesses based on colony expansion rates are equivalent to fitnesses based on single-cell exponential growth rates. The prevalence of the Product neutrality model—both in the SGA data and in previous studies on datasets from liquid cultures (Jasnos et al. 2007; Onge et al. 2007; Mani et al. 2008)—encourages the exploration of its origin in mechanistic models of cell growth.”

      (2) Misapplication of prokaryotic growth models

      The study attempts to explain the mechanistic origin of the multiplicative model observed in yeast colony fitness using a bacterial cell growth model, particularly the Scott-Hwa model. However, the application of this bacterial model to yeast systems lacks valid justification. The Scott-Hwa model is heavily dependent on specific molecular mechanisms such as ppGppmediated regulation, which plays a crucial role in adjusting ribosome expression and activity during translation. This mechanism is pivotal for ensuring the growth-dependency of the ribosome fraction in the proteome, as described in [https://doi.org/10.1073/pnas.2201585119]. Unlike bacteria, yeast cells do not possess this regulatory mechanism, rendering the direct application of bacterial growth models to yeast inappropriate and potentially misleading. This fundamental difference in regulatory mechanisms undermines the relevance and accuracy of using bacterial models to infer yeast colony growth dynamics.

      If the authors intend to apply a growth model with macroscopic variables to yeast double-mutant experimental data, they should avoid simply repurposing a bacterial growth model. Instead, they should develop and rigorously validate a yeast-specific growth model before incorporating it into their study.

      There is nothing that is prokaryote specific in the Scott-Hwa model. It does not include the specific ppGpp mechanism to regulate ribosome fraction that does not exist in eukaryotes.  The general features of the model, like how the ribosome fraction is proportional to the growth rate have indeed been validated in yeast (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022). Performing a detailed physiological analysis of budding yeast across varying growth conditions in order to build a more extensive model is beyond the scope of this work. Finally, we note that the Weiße model, which we also analyzed, is also generic and has replicated empirical measurements both from bacteria and yeast (Weiße et al. 2015).

      To clarify this point in the text, we have added the following to Section 2.3: 

      “Experimental measurements in other organisms suggest that the observations leading to this model, including that the cellular ribosome fraction increases with growth rate, are in fact generic and also seen in the yeast S. cerevisiae (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022).”

      (3) Overly specific assumptions in the theoretical model

      he theoretical model in question assumes that two mutations affect only independent parameters of specific biochemical processes, an overly restrictive premise that undermines its ability to broadly explain the occurrence of the multiplicative model in mutations. Additionally, experimental evidence highlights significant limitations to this approach. For example, in most viable yeast deletion mutants with reduced growth rates, the expression of ribosomal proteins remains largely unchanged, in direct contradiction to the predictions of the Scott-Hwa model, as indicated in [https://doi.org/10.7554/eLife.28034]. This discrepancy emphasizes that the ScottHwa model and its derivatives do not reliably explain the growth rates of mutants based on current experimental data, suggesting that these models may need to be reevaluated or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.

      In the data from the Barkai lab referenced by the reviewer (reproduced below), we see that the ribosomal transcript fraction is in fact proportional to growth rate in response to gene deletions in contradiction to the reviewer’s interpretation. However, it is notable that the ribosomal transcript fraction is a bit higher for a given growth rate if that growth rate is generated by a mutation rather than generated by a suboptimal nutrient condition. We know that the very simple Scott-Hwa model is not a perfect representation of the cell. Nevertheless, it does recapitulate important aspects of growth physiology and therefore we thought it is useful to analyze its response to mutations and compare those responses to the different neutrality functions.  We never claimed the Scott-Hwa model was a perfect model and fully agree with the referee’s statement above that “... these models may need to be reevaluated, or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.” Indeed, we say as much in our discussion where we wrote: 

      “While we focused on coarse-grained models for their simplicity and mechanistic interpretability, they might be too simple to effectively model large double-mutant datasets and the resulting double-mutant fitness distributions. We therefore expect the combination of high throughput genetic data with the analysis of larger-scale models, for instance based on Flux Balance Analysis, Metabolic Control Analysis, or whole-cell modeling, to lead to important complementary insights regarding the regulation of cell growth and proliferation.”

      To further clarify this point, we discuss and cite the Barkai lab data for gene deletions see Figure 2 from Metzl-Raz et al. 2017.

      (4) Lack of clarity on the mechanistic origin of the multiplicative model

      The study falls short of providing a definitive explanation for its primary objective: elucidating the "mechanistic origin" of the multiplicative model. Notably, even in the simplest case involving the Scott-Hwa model, the underlying mechanistic basis remains unexplained, leaving the central research question unresolved. Furthermore, the study does not clearly specify what types of data or models would be required to advance the understanding of the mechanistic origin of the multiplicative model. This omission limits the study's contribution to uncovering the biological principles underlying the observed fitness patterns.”

      We appreciate the reviewer’s interest in a more complete mechanistic explanation for the product model of fitness. The primary goal of this study was to explore the validity of the Product model from the perspective of coarse-grained models of cell growth, and to extract mechanistic insights where possible. We view our work as a first step toward a deeper understanding of how double-mutant fitnesses combine, rather than a final, all-encompassing theory. As the referee notes, we are limited by the current state of the field, which has an incomplete understanding of cell growth. 

      Nonetheless, our analysis does propose concrete, mechanistically informed explanations. For example, we highlight how growth-optimizing feedback—such as cells’ ability to reallocate ribosomes or adjust proteome composition—naturally leads to multiplicative rather than additive or minimal fitness effects. We also link the empirical deviations from pure multiplicative behavior to differences in how specific pathways re-balance under perturbation, and we suggest that a product-like rule emerges when multiple interconnected processes each partially limit cell growth.

      In the discussion, we clarify what additional data and models we think will be required to advance this question. Namely, we propose extending our approach through larger-scale, more detailed modeling frameworks – that may include explicit modeling of ppGpp or TOR activities in bacteria or eukaryotic cells, respectively. We also emphasize the importance of refining the measurement of cell growth rates to uncover subtle deviations from the product rule that could yield greater mechanistic insight. By integrating high-throughput genetic data with nextgeneration computational models, it should be possible to hone in on the specific biological principles (e.g., metabolic bottlenecks, resource reallocation) that underlie the multiplicative neutrality function.

      Reviewer #2 (Public review):

      The paper deals with the important question of gene epistasis, focusing on asking what is the correct null model for which we should declare no epistasis.

      In the first part, they use the Synthetic Genetic Array dataset to claim that the effects of a double mutation on growth rate are well predicted by the product of the individual effects (much more than e.g. the additive model). The second (main) part shows this is also the prediction of two simple, coarse-grained models for cell growth.

      I find the topic interesting, the paper well-written, and the approach innovative.

      One concern I have with the first part is that they claim that:

      "In these experiments, the colony area on the plate, a proxy for colony size, followed exponential growth kinetics. The fitness of a mutant strain was determined as the rate of exponential growth normalized to the rate in wild type cells."

      There are many works on "range expansions" showing that colonies expand at a constant velocity, the speed of which scales as the square root of the growth rate (these are called "Fisher waves", predicted in the 1940', and there are many experimental works on them, e.g. https://www.pnas.org/doi/epdf/10.1073/pnas.0710150104) If that's the case, the area of the colony should be proportional to growth_rate X time^2 , rather than exp(growth_rate*time), so the fitness they might be using here could be the log(growth_rate) rather than growth_rate itself? That could potentially have a big effect on the results.

      We thank the reviewer for their thoughtful remarks. As they rightly pointed out, a large body of literature supports that colonies expand at constant velocity both from a theoretical and experimental standpoint. 

      As discussed in the answer to the first question of Reviewer 1, this body of work also suggests that the linear expansion rate of the colony front is directly related to the single-cell exponential growth rate of the cells at the periphery. Hence, although the macroscopic colony growth may not be exponential in time, measuring colony size (or radial expansion) across different genotypes still provides a consistent and meaningful proxy for comparing their underlying growth capabilities. 

      In particular, these studies suggest (consistently with Fisher-wave theory) that the linear growth rate of the colony 𝐾 is proportional to the square root of the exponential growth rate 𝜆. Under the assumption that the product model is valid for a given double mutant and for the exponential growth rate, we would have that

      The associated wave-front velocities would then be predicted to be

      In other words, if the product model is valid for fitness measures based on exponential growth rates, it should also be valid for fitness measures based on linear colony growth rates. 

      We now include this discussion in the revised version of Section 2.3.

      Additional comments/questions:

      (1) What is the motivation for the model where the effect of two genes is the minimum of the two?

      The motivation for the minimal model is the notion that there might be a particular process that is rate-limiting for growth due to a mutation. In this case, a mutation in process X makes it really slow and process Y proceeds in parallel and has plenty of time to finish its job before cell division takes place. In this case, even a mutation to process Y might not slow down growth because there is an excess amount of time for it to be completed. Thus, the double mutant might then be anticipated to have the growth rate associated with the single mutation to process X. We now add a similar description when we introduce the different neutrality functions in Section 2.1.

      (2) How seriously should we take the Scott-Hwa model? Should we view it as a toy model to explain the phenomenon or more than that? If the latter, then since the number of categories in the GO analysis is much more than two (47?) in many cases the analysis of the experimental data would take pairs of genes that both affect one process in the Scott-Hwa model - and then the product prediction should presumably fail? The same comment applies to the other coarse-grained model.

      From our perspective, models like the Scott-Hwa model constitute the simplest representation of growth based on data that is not trivial. Moreover, the Scott-Hwa model is able to incorporate interactions between two different biological processes. We believe models, like the Scott-Hwa and Weiße models, should be viewed as more than mere toy models because they have been backed up by some empirical data, such as that showing the ribosome fraction increases with growth rate. However, the Scott-Hwa model is inherently limited by its low dimensionality and relative simplicity. We do not claim that such models can provide a full picture of the cell. As argued in the main text, we have chosen to focus on such models because of their tractability and in the hope of extracting general principles. We nonetheless agree with the reviewer that they do not have the capacity to represent interactions between genes in the same biological process. We now note this limitation in the text. 

      (3) There are many works in the literature discussing additive fitness contributions, including Kaufmann's famous NK model as well as spin-glass-type models (e.g. Guo and Amir, Science Advances 2019, Reddy and Desai, eLife 2021, Boffi et al., eLife 2023) These should be addressed in this context.

      We thank the reviewer for pointing out this part of the literature. We do believe these works constitute a relevant body of work tackling the emergence of epistasis patterns from a theoretical grounding, and now reference and discuss them in the text. 

      (4) The experimental data is for deletions, but it would be interesting to know the theoretical model's prediction for the expected effects of beneficial mutations and how they interact since that's relevant (as mentioned in the paper) for evolutionary experiments. Perhaps in this case the question of additive vs. multiplicative matters less since the fitness effects are much smaller.

      This is an interesting question. Since mutations increasing the growth rate generated by gene deletions or other systematic perturbations are rare, we did not focus on them. Of course, as the reviewer notes, in the case of evolution experiments, these fitness enhancing mutations are selected for. To address the reviewer's question, we can first consider the Scott-Hwa model. In this case, the analytical solution remains valid in the case of fitness enhancing mutations so that the fitness of the double mutant will be the product neutrality function multiplied by an additional interaction term (see Figure 3). The mathematical derivation predicts that the double mutant fitness can potentially grow indefinitely. Indeed, the denominator can be equal to zero in some cases. In simulations, we see that the observation for deleterious mutations does not seem to hold for beneficial mutations (new supplementary Figure S5 shown below). Indeed, no model seems to replicate double mutant fitnesses much better than any other. This suggests that the growth-optimizing feedback we discuss in section 2.3 may have compound effects that ultimately make double-mutant fitnesses much larger than any model predicts.

      We recognize this may be an important point, and discuss it in detail in the revised section 2.3 as well as in the discussion.

      Baryshnikova, Anastasia, Michael Costanzo, Scott Dixon, Franco J. Vizeacoumar, Chad L. Myers, Brenda Andrews, and Charles Boone. 2010. “Synthetic Genetic Array (SGA) Analysis in Saccharomyces Cerevisiae and Schizosaccharomyces Pombe.” Methods in Enzymology 470 (March):145–79.

      Elsemman, Ibrahim E., Angelica Rodriguez Prado, Pranas Grigaitis, Manuel Garcia Albornoz, ictoria Harman, Stephen W. Holman, Johan van Heerden, et al. 2022. “Whole-Cell Modeling in Yeast Predicts Compartment-Specific Proteome Constraints That Drive Metabolic Strategies.” Nature Communications 13 (1): 801.

      Gandhi, Saurabh R., Eugene Anatoly Yurtsev, Kirill S. Korolev, and Jeff Gore. 2016. “Range Expansions Transition from Pulled to Pushed Waves as Growth Becomes More Cooperative in an Experimental Microbial Population.” Proceedings of the National Academy of Sciences of the United States of America 113 (25): 6922–27.

      Gray, B. F., and N. A. Kirwan. 1974. “Growth Rates of Yeast Colonies on Solid Media.” Biophysical Chemistry 1 (3): 204–13.

      Jasnos, Lukasz, and Ryszard Korona. 2007. “Epistatic Buffering of Fitness Loss in Yeast Double Deletion Strains.” Nature Genetics 39 (4): 550–54.

      Korolev, Kirill S., Melanie J. I. Müller, Nilay Karahan, Andrew W. Murray, Oskar Hallatschek, and David R. Nelson. 2012. “Selective Sweeps in Growing Microbial Colonies.” Physical Biology 9 (2): 026008.

      Mani, Ramamurthy, Robert P. St Onge, John L. Hartman 4th, Guri Giaever, and Frederick P. Roth. 2008. “Defining Genetic Interaction.” Proceedings of the National Academy of Sciences of the United States of America 105 (9): 3461–66.

      Metzl-Raz, Eyal, Moshe Kafri, Gilad Yaakov, Ilya Soifer, Yonat Gurvich, and Naama Barkai. 2017. “Principles of Cellular Resource Allocation Revealed by Condition-Dependent Proteome Profiling.” eLife 6 (August). https://doi.org/10.7554/elife.28034.

      Meunier, J. R., and M. Choder. 1999. “Saccharomyces Cerevisiae Colony Growth and Ageing: Biphasic Growth Accompanied by Changes in Gene Expression.” Yeast (Chichester, England) 15 (12): 1159–69.

      Miller, James H., Vincent J. Fasanello, Ping Liu, Emery R. Longan, Carlos A. Botero, and Justin C. Fay. 2022. “Using Colony Size to Measure Fitness in Saccharomyces Cerevisiae.” PloS e 17 (10): e0271709.

      Onge, Robert P. St, Ramamurthy Mani, Julia Oh, Michael Proctor, Eula Fung, Ronald W. Davis, Corey Nislow, Frederick P. Roth, and Guri Giaever. 2007. “Systematic Pathway Analysis Using High-Resolution Fitness Profiling of Combinatorial Gene Deletions.” Nature Genetics 39 (2): 199–206.

      Pirt, S. J. 1967. “A Kinetic Study of the Mode of Growth of Surface Colonies of Bacteria and Fungi.” Journal of General Microbiology 47 (2): 181–97.

      Weiße, Andrea Y., Diego A. Oyarzún, Vincent Danos, and Peter S. Swain. 2015. “Mechanistic Links between Cellular Trade-Offs, Gene Expression, and Growth.” Proceedings of the National Academy of Sciences of the United States of America 112 (9): E1038–47.

      Xia, Jianye, Benjamin J. Sánchez, Yu Chen, Kate Campbell, Sergo Kasvandik, and Jens Nielsen. 2022. “Proteome Allocations Change Linearly with the Specific Growth Rate of Saccharomyces Cerevisiae under Glucose Limitation.” Nature Communications 13 (1): 2819.

      Zackrisson, Martin, Johan Hallin, Lars-Göran Ottosson, Peter Dahl, Esteban Fernandez-Parada, Erik Ländström, Luciano Fernandez-Ricaud, et al. 2016. “Scan-O-Matic: High-Resolution Microbial Phenomics at a Massive Scale.” G3 (Bethesda, Md.) 6 (9): 3003–14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work revealed an important finding that the blood-brain barrier (BBB) functionality changes with age and is more pronounced in males. The authors applied a non-invasive, contrast-agent-free approach of MRI called diffusion-prepared arterial spin labeling (DP-pCASL) to a large cohort of healthy human volunteers. DP-pCASL works by tracking the movement of magnetically labeled water (spins) in blood as it perfuses brain tissue. It probes the molecular diffusion of water, which is sensitive to microstructural barriers, and characterizes the signal coming from fast-moving spins as blood and slow-moving spins as tissue, using different diffusion gradients (b-values). This differentiation is then used to assess the water exchange rates (kw) across the BBB, which acts as a marker for BBB functionality. The main finding of the authors is that kw decreases with age, and in some brain regions, kw decreases faster in males. The neuroprotective role of the female sex hormone, estrogen, on BBB function is discussed as one of the explanations for this finding, supported by literature. The study also shows that BBB function remains stable until the early 60s and remarkably decreases thereafter.

      Strengths:

      The two main strengths of the study are the MRI method used and the amount of data. The authors employed a contrast-agent-free MRI method called ASL, which offers the opportunity to repeat such experiments multiple times without any health risk - a significant advantage of ASL. Since ASL is an emerging field that requires further exploration and testing, a study evaluating blood-brain barrier functionality is of great importance. The authors utilized a large dataset of healthy humans, where volunteer data from various studies were combined to create a substantial pool. This strategy is effective for statistically evaluating differences in age and gender.

      Weaknesses:

      R1.0: Gender-related differences are only present in some brain regions, not in the whole brain or gray matter - which is usually the assumption unless stated otherwise. From the title, this was not clear. Including simulations could increase readers' understanding related to model fitting and the interdependence of parameters, if present. The discussion follows a clear line of argument supported by literature; however, focusing solely on AQP4 channels and missing a critical consideration of other known/proven changes in transport mechanisms through the BBB and their effects substantially weakens the discussion. 

      Thanks for your insightful feedback and suggestions. We have made the following changes to the manuscript:

      (1) The title has been modified to highlight the sex differences in specific brain regions: “Age-Related Decline in Blood-Brain Barrier Function is More Pronounced in Males than Females in Parietal and Temporal Regions.”

      (2) To study the potential impact of prolonged ATT seen in males on estimated kw, we simulated kw distribution for females by adjusting ATT by +60 ms to match males' ATT. This led to marginally higher kw values (Supplemental Figure S2), suggesting that the kw difference between males and females is not a direct result of prolonged ATT. Additionally, we have added a section titled “Data and Code Availability Statements” in the revised manuscript to indicate that we are willing to share the reconstruction toolbox with interested groups. The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF, and ATT maps, which can run on Windows or Mac computers.

      (3) We agree with the reviewer that BBB water exchange can be facilitated by other transport mechanisms, as we mentioned in the introduction: “Water exchange across the BBB occurs at a relatively high level and is mediated by passive diffusion, active co-transport through the endothelial membrane, and facilitated diffusion through the dedicated water channel, aquaporin-4 (AQP4), at the end-feet of astrocytes.” We emphasized our findings related to AQP4 based on the technical properties of DP-pCASL, which is more sensitive to the exchange occurring across astrocyte end-feet. We also acknowledge that different techniques can be helpful to study other components of BBB water exchange, and we have added the following discussion to the updated manuscript: “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method. These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging. In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states. Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements.”

      Reviewer #1 (Recommendations For The Authors): 

      R1.1 The manuscript is well-organized and presents arguments in a logical order. The visual representation of results in the form of figures is sufficient (see style suggestions below). 

      Thanks for your suggestions on improving the figures, we have updated figures for better visualization (Please see our response to R1.5, R1.6, R1.7 and R1.8).

      R1.2 It would be beneficial if the model/toolbox could be made publicly available so that fellow researchers from the community could apply and test it in their research. 

      We have added a section “Data and code availability statements” in the revised manuscript to indicate we’re willing to share the toolbox to the interested groups (L529 in the annotated manuscript). The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF and ATT maps, which can run on windows or MAC computers. Indeed, we have been sharing our reconstruction toolbox with over 50 collaboration sites. The following screenshots are examples of three steps performed by the toolbox (shared by one collaborator):

      Author response image 1.

      Step 1: Loading raw data and calculate T1 map

      Author response image 2.

      Step 2: Motion correction and skull stripping

      Author response image 3.

      Step 3: kw, CBF and ATT quantification (nii files will be saved)

      R1.3 Line 46 states that the technique is novel, but it has been introduced and used before (Shao, et al. MRM 2019). It sure is innovative but the term novel is too strong and may confuse the readers that it is something new introduced in this manuscript.

      Thanks for the suggestion, we agree the term ‘novel’ may cause confusion about the technique, we have removed it in the revised manuscript (L48, L50).

      R1.4 Line 395, kw was generated using PLD = 1.8s with b = 0, 50 s/mm2. Is only one-time point enough for estimating kw? To me, it is not clear how robust is the kw estimation with only one PLD.

      According to the single-pass approximation (SPA) model (1), kw can be accurately estimated when the PLD is longer than the ATT. We recruited cognitively normal participants in this study and found the longest ATT to be 1526.7±117.4 and 1468.1±166.9 ms in aged (62-92 years) males and females, respectively. A PLD of 1.8 s was chosen to balance the SNR of the data and the accuracy of the model fitting, which should be sufficient for this study. However, for future studies involving diseased populations with prolonged ATT, a longer PLD should be used, or a multi-PLD protocol could be helpful to improve the robustness of quantification accuracy.

      We have added a limitation statement in the revised manuscript (L407): "A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2)."

      R1.5 Suggestion: Figure 3A, colormap for kw appears suboptimal. Regional differences are hard to see.

      Thanks for the suggestion, we have updated the range of color scale (from [0, 200], to [70, 160]) to highlight the regional differences in the updated Figure 3:

      We prefer to use the same blue colormap that we and our collaborators have been using this for publications to maintain consistence. We also acknowledged the limitation of the spatial resolution of kw maps in the updated manuscript (L412): “To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2)”

      R1.6 Suggestion: use same/similar colormaps for the same parameters (kw, ATT, CBF) to help the reader follow across Figures 3, 4, and 5.

      Thanks for your suggestion, we agree that using the same color would be easier for readers to follow the context. However, figures 4 and 5 were created to show the age and sex dependent changes, so that we used warm and cold colors to indicate effects of decrease and increase, respectively. We clarified the choice of colormap in the figure captions (L260, L284): “The effects of decrease or increase were represented by warm colors (yellow to red) and cold (gray to blue) colors, respectively.”

      R1.7 Suggestion: please be consistent with the ordering of parameters in Figures 3, 4, and 5.

      Thanks for the suggestion, we have updated Figure 3 to consistently show kw, CBF and ATT results in order from left to right:

      R1.8 Suggestion: use the same scaling (e.g.[|1.9|, |11 |] for Fig. 4, [|1.9|, |4|] for Figure 5) to enhance comparability across parameters in the subfigures.

      Thanks for the suggestion, we agree that the same scaling would enhance the comparability across parameters. We have updated the color scales for Figure 5 using maximal |T| = 4:

      However, range of maximal |T| was relatively large for Figure 4 (i.e. 5 for kw, 11 for CBF and 7 for ATT), and using the same color scale might oversaturate the regional responses or diminish the visibility of regional differences. Therefore, we prefer to keep the original color scale for Figure 4.

      R1.9 In Figure 5, the interaction of age with sex in kw parameter seems to be more on one side of the brain. What could be the reasons for possible lateralization? 

      We agree with the reviewer that the age and sex interaction effects emphasized on one side is an interesting finding. While we do not have a clear explanation now, we suspect it may relate to aging-related asymmetrical vascular burdens. Giannakopoulos et al. reported that vascular scores, indicating higher vascular burden, were significantly higher in the left hemisphere across all Clinical Dementia Rating scores. Moreover, the predominance of Alzheimer’s disease and vascular pathology in the right hemisphere correlated with significantly higher Clinical Dementia Rating scores  (3). We added the following to the updated manuscript to discuss this potential mechanism (L370): “… We also observed an asymmetric effect on left and right brain hemispheres, which might be associated with asymmetrically developed vascular burdens in aging (3)."

      R1.10 A comparison between the present study and DCE MRI as well as other ASL methods evaluating BBB function with age is missing. ASL techniques probing transverse relaxation and DCE MRI have reported increased kw with age in humans as well as in animal models. What could be the reasons? 

      We agree with the reviewer that BBB water exchange measured by other methods should be sufficiently discussed, especially regarding their age-related changes. We added the following discussion in the updated manuscript (L415): “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13).”

      R1.11 Line 163/164, a rapid decrease of CBF in males in the region of the hippocampus is reported. It would be beneficial to discuss this in discussion further (has this been reported before, possible reasons, etc). 

      Thanks for the suggestion, we agree that the accelerated CBF decline in males in the hippocampus is an important finding, we have added discussion in the revised manuscript (L300): "Furthermore, we found a more pronounced age-related decline in CBF in the hippocampus of males compared to females (Fig. 2, Supplemental Table S2). To the best of our knowledge, no study has previously reported this accelerated hippocampal CBF decline in males. This finding may be linked to the accelerated hippocampal volume loss in males, as reported in a study analyzing 19,793 generally healthy UK Biobank participants (14). Lower hippocampal perfusion has been associated with poor memory performance (15, 16), suggesting that males might be more vulnerable to potential cognitive decline (17).

      R1.12 Lines 198-202 describe a simulation done to test the dependence of kw on ATT. This is important and could be explained more in detail. Adding simulation results (numeric or figure) to supplementary materials would increase reproducibility and understanding for others. 

      We apologize for not referencing to the simulation results in the main text. We simulated kw distribution for females by adjusting ATT by +60 ms to matching males’ ATT, leading to a marginally higher kw values. And these results were shown in the Supplemental Figure S2 C (yellow):

      We have now referenced the simulation results in the updated manuscript (L206).

      R1.13 No limitations of the presented work are mentioned. A critical perspective would increase the scientific impact on future research decisions and implementation of this method by others. 

      Thanks for the suggestion, we agree the limitations need to be acknowledged. We have added a limitation paragraph in the revised manuscript (L406): "Limitations of the study and future directions: There are a few limitations of this study. A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2). To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2). Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological stages (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13). Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies. Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to the unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.

      Reviewer #2 (Public Review):

      Summary: 

      This study used a novel diffusion-weighted pseudo-continuous arterial spin labelling (pCASL) technique to simultaneously explore age- and sex-related differences in brain tissue perfusion (i.e., cerebral blood flow (CBF) & arterial transit time (ATT) - a measure of CBF delivery to brain tissue) and blood-brain barrier (BBB) function, measured as the water exchange (kw) across the BBB. While age- and sex-related effects on CBF are well known, this study provides new insights to support the growing evidence of these important factors in cerebrovascular health, particularly in BBB function. Across the brain, the decline in CBF and BBB function (kw) and elevation in ATT were reported in older adults, after the age of 60, and more so in males compared to females. This was also evident in key cognitive regions including the insular, prefrontal, and medial temporal regions, stressing the consideration of age and sex in these brain physiological assessments. 

      Strengths: 

      Simultaneous assessment of CBF with BBB along with transit time and at the voxel-level helped elucidate the brain's vulnerability to age and sex-effects. It is apparent that the investigators carefully designed this study to assess regional associations of age and sex with attention to exploring potential non-linear effects. 

      Weaknesses: 

      R2.0 It appears that no brain region showed concurrent CBF and BBB dysfunction (kw), based on the results reported in the main manuscript and supplemental information. Was an association analysis between CBF and kw performed? There is a potential effect of the level of formal education on CBF (PMID: 12633147; 15534055), which could have been considered and accounted for as well, especially for a cohort with stated diversity (age, race, sex). 

      Thank you for your positive feedback and comments on the potential associations between BBB kw and other physiological parameters (e.g., CBF) and socioeconomic factors (e.g., education). We have made the following changes to the updated manuscript:

      (1) We conducted additional linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized in Supplemental Table S6. We found that BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be influenced by different aspects of neurovascular function represented by CBF and ATT at different stages of aging.

      (2) One limitation of this study is the lack of information on participants’ geographical, cultural, physical characteristics, and socioeconomic factors. While we included race as a covariate to account for potential variations observed in previous research, race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes. We have acknowledged this limitation by adding the following discussion in the updated manuscript: “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research. However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health. For example, education has been shown to be highly relevant to regional CBF changes in AD. Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      Reviewer #2 (Recommendations For The Authors): 

      General comments: 

      I commend the authors on a very well-written and laid-out study. General remarks have been provided in the short assessment and public review sections. 

      We would like to thank the reviewer for the insightful suggestions and overall positive feedback. We have substantial revised and improved our manuscript, and point-to-point responses can be found in the following sections and in the annotated manuscript.

      Specific comments: 

      Results: 

      R2.1 Line 127: "since race may influence the changes in perfusion and kw with aging, it was included as a covariate". It is not clear how race - a simplistic term for ethnicity or to be more specific ancestry has been shown to influence changes in perfusion? Is it known for a fact that for example, older Black people have lower/higher CBF or kw compared to Asians or Asians to Caucasian Americans? Can this be extrapolated to Japanese Brazilians having different patterns of regional CBF to Caucasian or Black Brazilians or similar patterns of CBF to Japanese people in Japan since they share similar race? Do Dutch people in the Netherlands share CBF characteristics to their descendants in the US or in South Africa? Would the geographical, cultural, and other physical characteristics of one's ethnicity or lineage impact CBF? Race is often used as a poor substitute for the complex interactions of physical, socioeconomic, and geopolitical factors that produce disparities that may have measurable biological effects including CBF. But it is not clear why being one race vs the other will impact CBF, without carefully parcelling out the many factors beyond biology, if any. Is any of the participants in the study mixed race? How about recently settled individuals who may identify for example as Black but have spent all their life up to adult years outside of the US and marked here in the study as simply African American? Not that I am saying this is the case. However this simplification may require more careful analysis. 

      In our study, no participant indicated to be mixed-race, and unfortunately we do not have additional information about their specific ancestry or information about their geographical, cultural, and other physical characteristics. We acknowledge that race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes, including perfusion and BBB function. The use of race as a covariate in our study is intended to account for potential variations observed in previous research, rather than to imply a direct causal relationship.

      Research has shown differences in blood flow among racial groups (18, 19). However, these differences are not solely attributable to race, and they are also shaped by environmental exposures, lifestyle factors, healthcare access, and other social determinants of health (20). We have added the following discussion in the updated manuscript (L436): “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      R2.2 Figure 3: Could the standard deviation of the reported values be also stated so the variance can be appreciated? 

      Thanks for the suggestion, we have added the standard deviation of the kw, CBF and ATT values on the updated Figure 3:

      R2.3 Discussions: Line 280: .."observed distinct trajectory of kw changes with aging as compared with CBF and ATT. I presume this as compared to the earlier statements (line 268) of pervasive increase in ATT and decrease in CBF across the brain. Were there any brain regions that showed increased ATT, decreased CBF and kw as a function of age or even sex?? Was there any association between CBF and kw in any brain regions, across the participants after controlling for sex differences? If there is a suspicion of early BBB dysfunction (line 286) preceding cognitive decline that has been also suspected with CBF, is this concomitant with CBF in most people? This could maybe make CBF an easier and more straightforward biomarker since its effects mirror that of BBB? I suspect it generally does not, even in healthy aging. It would have been great to shed more light on this with your results and in your discussion.

      Thank you for your comments. By 'distinct trajectory of kw changes with aging,' we refer to the ‘turning point’ in age at which kw starts declining. BBB kw remained relatively stable and began to decline in the early 60s, while CBF consistently decreased and ATT consistently increased with age, although the rates of change differed at 22 years and 36 years, respectively. Using linear regressions for voxel analysis, Figure 4 shows that age-dependent decreases in CBF and increases in ATT were observed in most of the brain. However, significant age-related decreases in kw were more localized to specific brain regions and were mostly accompanied by simultaneous decreases in CBF and increases in ATT. We highlighted this finding in the updated manuscript (L250): “In the brain regions showing significant age-related kw decreases (Fig. 4A), these decreases are mostly accompanied by CBF decreases (Fig. 4B) and ATT increases (Fig. 4C).”

      Thank you for your suggestion regarding the relationship between kw and CBF. We further conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized Supplemental Table S6.

      This new supplemental tables shows many interesting results. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years.

      We have added the following discussion to the updated manuscript (L307): 'We observed a distinct trajectory of kw changes with aging compared to CBF and ATT. To study the potential regional associations between kw and CBF and ATT, we conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining), respectively. The results are shown in Supplemental Table S6. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, PHG, and MTL in participants aged 8-61 years (when kw was relatively consistent across ages), but no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional brain regions, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be affected by different aspects of neurovascular function represented by CBF and ATT at different stages of aging."

      Other notes: 

      R2.4 While reading the results section, two things that jump out at me when I saw the sex differences: 1) hematocrit and 2) menopausal status. I saw in the discussion that these were touched on. I may have missed this in the methods, was hematocrit collected and included in the parameters estimates?? Was the menopausal status including ERT (estrogen replacement therapies) recorded and factored in? If not these could be included as limitations that may confound the results, especially when the age groups were split to include a group comprising or potentially both pre-and post-menopausal females (36-61). 

      We do not have the information about hematocrit nor menopausal status and they were not included in data analysis. We agree this is a limitation of the current study and we discussed in the updated manuscript (L442): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”

      R2.5 The general vascular health of the cohort is not well described especially if some of the participants were from sickle cell study. While they are cognitively normal and free from major medical illnesses, or neurological disorders, did the sample also include individuals with considerable vascular risk factors and metabolic syndrome (known to affect CBF), especially in the older cohort?? 

      We agree with the reviewer that vascular health can significantly impact perfusion and BBB function. Since the data presented in this study were collected from multiple cohorts, vascular risk factors were not available in all cohorts and thus were not included as covariates in the data analysis. To account for potential vascular variations across participants, we included CBF and ATT as covariates in our analysis on age related BBB kw changes. We have added discussion in the updated manuscript (L442, same as our response to the previous comment): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”.

      References:

      (1) K. S. St Lawrence, D. Owen, D. J. Wang, A two-stage approach for measuring vascular water exchange and arterial transit time by diffusion-weighted perfusion MRI. Magn Reson Med 67, 1275-1284 (2012).

      (2) X. Shao, C. Zhao, Q. Shou, K. S. St Lawrence, D. J. Wang, Quantification of blood–brain barrier water exchange and permeability with multidelay diffusion‐weighted pseudo‐continuous arterial spin labeling. Magnetic Resonance in Medicine  (2023).

      (3) P. Giannakopoulos, E. Kövari, F. R. Herrmann, P. R. Hof, C. Bouras, Interhemispheric distribution of Alzheimer disease and vascular pathology in brain aging. Stroke  (2009).

      (4) A. Mahroo, S. Konstandin, M. Günther, Blood–Brain Barrier Permeability to Water Measured Using Multiple Echo Time Arterial Spin Labeling MRI in the Aging Human Brain. Journal of Magnetic Resonance Imaging 59, 1269-1282 (2024).

      (5) Y. Ohene et al., Increased blood–brain barrier permeability to water in the aging brain detected using noninvasive multi‐TE ASL MRI. Magnetic resonance in medicine 85, 326-333 (2021).

      (6) B. R. Dickie, H. Boutin, G. J. Parker, L. M. Parkes, Alzheimer's disease pathology is associated with earlier alterations to blood–brain barrier water permeability compared with healthy ageing in TgF344‐AD rats. NMR in Biomedicine 34, e4510 (2021).

      (7) Y. Ying et al., Heterogeneous blood‐brain barrier dysfunction in cerebral small vessel diseases. Alzheimer's & Dementia  (2024).

      (8) V. Zachariou et al., Regional differences in the link between water exchange rate across the blood–brain barrier and cognitive performance in normal aging. GeroScience, 1-18 (2023).

      (9) Y. Zhang et al., Increased cerebral vascularization and decreased water exchange across the blood-brain barrier in aquaporin-4 knockout mice. PLoS One 14, e0218415 (2019).

      (10) Y. Ohene et al., Non-invasive MRI of brain clearance pathways using multiple echo time arterial spin labelling: an aquaporin-4 study. NeuroImage 188, 515-523 (2019).

      (11) Y. V. Tiwari, J. Lu, Q. Shen, B. Cerqueira, T. Q. Duong, Magnetic resonance imaging of blood–brain barrier permeability in ischemic stroke using diffusion-weighted arterial spin labeling in rats. Journal of Cerebral Blood Flow & Metabolism 37, 2706-2715 (2017).

      (12) Z. Wei et al., Non-contrast assessment of blood-brain barrier permeability to water in mice: an arterial spin labeling study at cerebral veins. NeuroImage, 119870 (2023).

      (13) Y. Jia et al., Transmembrane water-efflux rate measured by magnetic resonance imaging as a biomarker of the expression of aquaporin-4 in gliomas. Nature Biomedical Engineering 7, 236-252 (2023).

      (14) L. Nobis et al., Hippocampal volume across age: Nomograms derived from over 19,700 people in UK Biobank. NeuroImage: Clinical 23, 101904 (2019).

      (15) S. Rane et al., Inverse correspondence between hippocampal perfusion and verbal memory performance in older adults. Hippocampus 23, 213-220 (2013).

      (16) S. Heo et al., Resting hippocampal blood flow, spatial memory and aging. Brain research 1315, 119-127 (2010).

      (17) O. Gannon, L. Robison, A. Custozzo, K. Zuloaga, Sex differences in risk factors for vascular contributions to cognitive impairment & dementia. Neurochemistry international 127, 38-55 (2019).

      (18) A. E. Leeuwis et al., Cerebral blood flow and cognitive functioning in a community-based, multi-ethnic cohort: the SABRE study. Frontiers in aging neuroscience 10, 279 (2018).

      (19) L. R. Clark et al., Association of cardiovascular and Alzheimer’s disease risk factors with intracranial arterial blood flow in Whites and African Americans. Journal of Alzheimer's Disease 72, 919-929 (2019).

      (20) D. R. Williams, S. A. Mohammed, Discrimination and racial disparities in health: evidence and needed research. Journal of behavioral medicine 32, 20-47 (2009).

      (21) N. Scarmeas et al., Association of life activities with cerebral blood flow in Alzheimer disease: implications for the cognitive reserve hypothesis. Archives of neurology 60, 359-365 (2003).

      (22) N.-T. Chiu, B.-F. Lee, S. Hsiao, M.-C. Pai, Educational level influences regional cerebral blood flow in patients with Alzheimer’s disease. Journal of Nuclear Medicine 45, 1860-1863 (2004).

      (23) R. C. Gur et al., Gender differences in age effect on brain atrophy measured by magnetic resonance imaging. Proceedings of the National Academy of Sciences 88, 2845-2849 (1991).

      (24) M. J. Cipolla, J. A. Godfrey, M. J. Wiegman, The effect of ovariectomy and estrogen on penetrating brain arterioles and blood-brain barrier permeability. Microcirculation 16, 685-693 (2009).

      (25) A. C. Wilson et al., Reproductive hormones regulate the selective permeability of the blood-brain barrier. Biochim Biophys Acta 1782, 401-407 (2008).

      (26) M. S. Stringer et al., Tracer kinetic assessment of blood–brain barrier leakage and blood volume in cerebral small vessel disease: Associations with disease burden and vascular risk factors. NeuroImage: Clinical 32, 102883 (2021).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study elucidates the molecular divergence of caspase 3 and 7 in the vertebrate lineage. Convincing biochemical and mutational data provide evidence that in humans, caspase 7 has lost the ability to cleave gasdermin E due to changes in a key residue, S234. However, the physiological relevance of the findings is incomplete and requires further experimental work.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      In this study, Xu et al. provide insights into the substrate divergence of CASP3 and CASP7 for GSDME cleavage and activation during vertebrate evolution vertebrates. Using biochemical assays, domain swapping, site-directed mutagenesis, and bioinformatics tools, the authors demonstrate that the human GSDME C-terminal region and the S234 residue of human CASP7 are the key determinants that impede the cleavage of human GSDME by human CASP7.

      Strengths

      The authors made an important contribution to the field by demonstrating how human CASP7 has functionally diverged to lose the ability to cleave GSDME and showing that reverse-mutations in CASP7 can restore GSDME cleavage. The use of multiple methods to support their conclusions strengthens the authors' findings. The unbiased mutagenesis screen performed to identify S234 in huCASP7 as the determinant of its GSDME cleavability is also a strength.

      Weaknesses

      While the authors utilized an in-depth experimental setup to understand the CASP7-mediated GSDME cleavage across evolution, the physiological relevance of their findings are not assessed in detail. Additional methodology information should also be provided.

      Specific recommendations for the authors

      (1) The authors should expand their evaluation of the physiological relevance by assessing GSDME cleavage by the human CASP7 S234N mutant in response to triggers such as etoposide or VSV, which are known to induce CASP3 to cleave GSDME (PMID: 28045099). The authors could also test whether the human CASP7 S234N mutation affects substrate preference beyond human GSDME by testing cleavage of mouse GSDME and other CASP3 and CASP7 substrates in this mutant.

      (1) The physiological relevance was discussed in the revised manuscript (lines 328-340). Our study revealed the molecular mechanism underlying the divergence of CASP3- and CASP7-mediated GSDME activation in vertebrate. One of the physiological consequences is that in humans, CASP7 no longer directly participates in GSDME-mediated cell death, which enables CASP7 to be engaged in other cellular processes. Another physiological consequence is that GSDME activation is limited to CASP3 cleavage, thus restricting GSDME activity to situations more specific, such as that inducing CASP3 activation. The divergence and specialization of the physiological functions of different CASPs are consistent with and possibly conducive to the development of refined regulations of the sophisticated human GSDM pathways, which are executed by multiple GSDM members (A , B, C, D, and E), rather than by GSDME solely in teleost, such as Takifugu. More physiological consequences of CASP3/7 divergence in GSDME activation need to be explored in future studies.

      With respect to the reviewer’s suggestion of assessing GSDME cleavage by the human CASP7 S234N mutant in response to triggers such as etoposide or VSV: (i) CASP7 S234N is a creation of our study, not a natural human product, hence its response to CASP7 triggers cannot happen under normal physiological conditions except in the case of application, such as medical application, which is not the aim of our study. (ii) CASP3/7 activators (such as raptinal) induced robust activation of the endogenous CASP3 (Heimer et al., Cell Death Dis. 2019;10:556) and CASP7 (Author response image 1, below) in human cells. Since CASP3 is the natural activator of GSDME, the presence of the triggers inevitably activates GSDME via CASP3. Hence, under this condition, it will be difficult to examine the effect of CASP7 S234N.

      Author response image 1.

      HsCASP7 activation by raptinal. HEK293T cells were transfected with the empty vector (-), or the vector expressing HsCASP7 or HsCASP7-S234N for 24 h. The cells were then treated with or without (control) 5 μM raptinal for 4 h. The cells were lysed, and the lysates were blotted with anti-CASP7 antibody.

      (2) As suggested by the reviewer, the cleavage of other CASP7 substrates, i.e., poly (ADP-ribose) polymerase 1 (PARP1) and gelsolin, by HsCASP7 and S234N mutant was determined. The results showed that HsCASP7 and HsCASP7-S234N exhibited similar cleavage capacities. Figure 5-figure supplement 1 and lines 212-214.

      (2) It would also be interesting to examine the GSDME structure in different species to gain insight into the nature of mouse GSDME, which cannot be cleaved by either mouse or human CASP7.

      Because the three-dimensional structure of GSDME is not solved, we are unable to explore the structural mechanism underlying the GSDME cleavage by caspase. Since our results showed that the C-terminal domain was essential for caspase-mediated cleavage of GSDME, it is likely that the C-terminal domain of mouse GSDME may possess some specific features that render it to resist mouse and human CASP7.

      (3) The evolutionary analysis does not explain why mammalian CASP7 evolved independently to acquire an amino acid change (N234 to S234) in the substrate-binding motif. Since it is difficult to experimentally identify why a functional divergence occurs, it would be beneficial for the authors to speculate on how CASP7 may have acquired functional divergence in mammals; potentially this occurred because of functional redundancies in cell death pathways, for example.

      According to the reviewer’s suggestion, a speculation was added. Lines 328-340.

      (4) For the recombinant proteins produced for these analyses, it would be helpful to know whether size-exclusion chromatography was used to purify these proteins and whether these purified proteins are soluble. Additionally, the SDS-PAGE in Figure S1B and C show multiple bands for recombinant mutants of TrCASP7 and HsCASP7. Performing protein ID to confirm that the detected bands belong to the respective proteins would be beneficial.

      The recombinant proteins in this study are soluble and purified by Ni-NTA affinity chromatography. Size-exclusion chromatography was not used in protein purification.

      For the SDS-PAGE in Figure 4-figure supplement 1B and C (Figure S1B and C in the previous submission), the multiple bands are most likely due to the activation cleavage of the TrCASP7 and HsCASP7 variants, which can result in multiple bands, including p10 and p20. According to the reviewer’s suggestion, the cleaved p10 was verified by immunoblotting. Figure 4-figure supplement 1B and C.

      (5) For Figures 3C and 4A, it would be helpful to mention what parameters or PDB files were used to attribute these secondary structural features to the proteins. In particular, in Figure 3C, residues 261-266 are displayed as a β-strand; however, the well-known α-model represents this region as a loop. Providing the parameters used for these callouts could explain this difference.

      For Figure 3C, in the revised manuscript, we used the structure of mouse GSDMA3 (PDB: 5b5r) for the structural analysis of HsGSDME. As indicated by the reviewer, the region of 261-266 is a loop. The description was revised in lines 172 and 174, Figure 3C and Figure 3C legend.

      For Figure 4A, the alignment of CASP7 was constructed by using Esprit (https://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) with human CASP7 (PDB:1k86) as the template. The description was revised in the Figure legend.

      (6) Were divergent sequences selected for the sequence alignment analyses (particularly in Figure 6A)? The selection of sequences can directly influence the outcome of the amino acid residues in each position, and using diverse sequences can reduce the impact of the number of sequences on the LOGO in each phylogenetic group.

      In Figure 6A, the sequences were selected without bias. For Mammalia, 45 CASP3 and 43 CASP7 were selected; for Aves, 41 CASP3 and 52 CASP7 were selected; for Reptilia, 31CASP3 and 39 CASP7 were selected; for Amphibia, 11 CASP3 and 12 CASP7 were selected; for Osteichthyes, 40 CASP3 and 43 CASP7 were selected. The sequence information was shown in Table 1 and Table 2.

      (7) For clarity, it would help if the authors provided additional rationale for the selection of residues for mutagenesis, such as selecting Q276, D278, and H283 as exosite residues, when the CASP7 PDB structures (4jr2, 3ibf, and 1k86) suggest that these residues are enriched with loop elements rather than the β sheets expected to facilitate substrate recognition in exosites for caspases (PMID: 32109412). It is possible that the inability to form β-sheets around these positions might indicate the absence of an exosite in CASP7, which further supports the functional effect of the exosite mutations performed.

      According to the suggestion, the rationale for the selection of residues for mutagenesis was added (lines 216-222). Unlike the exosite in HsCASP1/4, which is located in a β sheet, the Q276, D278, and H283 of HsCASP7 are located in a loop region (Figure 5-figure supplement 2), which may explain the mutation results and the absence of an exosite in HsCASP7 as suggested by the reviewer.

      Reviewer #2 (Public Review):

      The authors wanted to address the differential processing of GSDME by caspase 3 and 7, finding that while in humans GSDME is only processed by CASP3, Takifugu GSDME, and other mammalian can be processed by CASP3 and 7. This is due to a change in a residue in the human CAPS7 active site that abrogates GSDME cleavage. This phenomenon is present in humans and other primates, but not in other mammals such as cats or rodents. This study sheds light on the evolutionary changes inside CASP7, using sequences from different species. Although the study is somehow interesting and elegantly provides strong evidence of this observation, it lacks the physiological relevance of this finding, i.e. on human side, mouse side, and fish what are the consequences of CASP3/7 vs CASP3 cleavage of GSDME.

      Our study revealed the molecular mechanism underlying the divergence of CASP3- and CASP7-mediated GSDME activation in vertebrate. One of the physiological consequences is that in humans, CASP7 no longer directly participates in GSDME-mediated cell death, which enables CASP7 to be engaged in other cellular processes. Another physiological consequence is that GSDME activation is limited to CASP3 cleavage, thus restricting GSDME activity to situations more specific, such as that inducing CASP3 activation. The divergence and specialization of the physiological functions of different CASPs are consistent with and possibly conducive to the development of refined regulations of the sophisticated human GSDM pathways, which are executed by multiple GSDM members (A , B, C, D, and E), rather than by GSDME solely in teleost, such as Takifugu. More physiological consequences of CASP3/7 divergence in GSDME activation need to be explored in future studies. Lines 328-340.

      Fish also present a duplication of GSDME gene and Takifugu present GSDMEa and GSDMEb. It is not clear in the whole study if when referring to TrGSDME is the a or b. This should be stated in the text and discussed in the differential function of both GSDME in fish physiology (i.e. PMIDs: 34252476, 32111733 or 36685536).

      The TrGSDME used in this study belongs to the GSDMEa lineage of teleost GSDME. The relevant information was added. Figure 1-figure supplement 1 and lines 119, 271, 274-276, 287 and 288.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) For the chimeric and truncated constructs, such as HsNT-TrCT, TrNT-HsCT, Hsp20-Trp10, Trp20-Hsp10, etc., the authors should provide a table denoting which amino acids were taken from each protein to create the fusion or truncation.

      According to the reviewer’s suggestion, the information of the truncate/chimeric proteins was provided in Table 4.

      (2) Both reviewers agree that functional physiological experiments are needed to increase the significance of the work. Specifically, the physiological relevance of these findings can be assessed by using western blotting to monitor GSDME cleavage by the human CASP7 S234N mutant compared with wild type CASP7 in response to triggers such as etoposide or VSV, which are known to induce CASP3 to cleave GSDME (PMID: 28045099).

      Additionally, the authors can assess cell death in HEK293 cells, HEK293 cells transfected with TrGSDME, HEK293 cells expressing TrCASP3/7 plus TrGSDME, and TrCASP3/7 plus the D255R/D258A mutant. These cells can be stimulated, and pyroptosis can be assessed by using ELISA to measure the release of the cytoplasmic enzyme LDH as well as IL-1β and IL-18, and the percentage of cell death (PI+ positive cells) may also be assessed.

      (1) With respect to the physiological relevance, please see the above reply to Reviewer 1’s comment of “Specific recommendations for the authors, 1”.

      (2) As shown in our results (Fig. 2), co-expression of TrCASP3/7 and TrGSDME in HEK293T cells induced robust cell death without the need of any stimulation, as evidenced by LDH release and TrGSDME cleavage. In the revised manuscript, similar experiments were performed as suggested, and cell death was assessed by Sytox Green staining (Figure 2-figure supplement 3A and B) and immunoblot to detect the cleavage of both wild type and mutant TrGSDME (Figure 2-figure supplement 3C). The results confirmed the results of Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      Although the authors try to summarize the principal results of this study, please rewrite the abstract section to make it easier to follow and to empathise the implications of their results.

      We have modified the Abstract as suggested by the reviewer.

      Introduction:

      The authors do not mention anything about the implication of the inflammasome activation to get pyroptosis by GSDM cleave by inflammatory caspases. Please consider including this in the introduction section as they do in the discussion section.

      The introduction was modified according to the reviewer’s suggestion. Lines 58-61.

      From the results section the authors name the human GSDM as HsGSDM and the human CASP as HsCASP, maybe the author could use the same nomenclature in the introduction section. The same for the fish GSDM (Tr) and CASP.

      According to the reviewer’s suggestion, the same nomenclature was used in the introduction.

      Line 39. Remove the word necrotic.

      “necrotic” was removed .

      Line 42. Change channels by pores. In the manuscript, change channels by pores overall.

      “channels” was replaced by “pores”.

      Line 42: Include that: by these pores can be released the proinflammatory cytokines and if these pores are not solved then pyroptosis occurs. Please rephrase this statement.

      According to the reviewer's suggestion, the sentence was rephrased. Lines 46-48.

      Line 45. GSDMF is not an approved gene name, its official nomenclature is PJVK (Uniprot Q0ZLH3). Please use PJVK instead GSDMF.

      GSDMF was changed to PJVK.

      Line 103: Can the authors explain better the molecular determinant?

      The sentence was revised, line 109.

      Results:

      Line 110: Reference for this statement. The reference for this statement was added in line 116.

      Figure 1A, B: Concentration or units used of HsCASP?

      The unit (1 U) of HsCASPs was added to the figure legend (line 661).

      Line 113: Add Hs or Tr after CASP would be helpful to follow the story.

      “CASP” was changed to “HsCASP”.

      Fig 1D: Why the authors do not use the DMPD tetrapeptide (HsGSDME CASP3 cut site) in this assay? Comparing with the data obtained in Fig 3B the TrCASP3 activity is going to be very closer to that obtained for VEID o VDQQD in the CASP3 panel.

      The purpose of Figure 1D was to determine the cleavage preference of TrCASPs. For this purpose, a series of commercially available CASP substrates were used, including DEVD, which is commonly used as a testing substrate for CASP3. Figure 3B was to compare the cleavage of HsCASP3/7 and TrCASP3/7 specifically against the motifs from TrGSDME (DAVD) and HsGSDME (DMPD).

      Figure 1D and Figure 3B are different experiments and were performed under different conditions. In Figure 1D, CASP3 was incubated with the commercial substrates at 37 ℃ for 2 h, while in Figure 3B, CASP3/7 were incubated with non-commercial DAVD (motif from TrGSDME) and DMPD (motif from HsGSDME) at 37 ℃ for 30 min. More experimental details were added to Materials and Methods, lines 443 and 447.

      Fig 1H: What is the concentration used of the inhibitors?

      The concentration (20 μM) was added to the figure legend (line 669).

      Does the Hs CASP3/7 fail to cleave the TrGSDME mutants (D255R and D258A)? the authors do not show this result so they cannot assume that HsCASP3/7 cleave that sequence (although this is to be expected).

      The result of HsCASP3/7 cleavage of the TrGSDME mutants was added as Figure 1-figure supplement 2 and described in Results, line 133.

      Line 132-133: Can the author specify where is placed the mCherry tag? In the N terminal or C terminal portion of the different engineered proteins?

      The mCherry tag is attached to the C-terminus. Figure 2 legend (line 676).

      Fig 2A: Although is quite clear, a column histogram showing the quantification is going to be helpful.

      The expression of TrGSDME-FL, -NT and -CT was determined by Western blot, and the result was added as Figure 2-figure supplement 1.

      Fig 2A, B, C: After how many hours of expression are the pictures taken? Can the authors show a Western blot showing that the expression of the different constructions is similar?

      The time was added to Figure 2 legend and Materials and Methods (line 466). The expression of TrGSDME-FL, -NT and -CT was determined by Western blot, and the result was added as Figure 2-figure supplement 1.

      Fig 2C: Another helpful assay can be to measure the YO-PRO or another small dye internalization, to complete the LDH data.

      According the reviewer’s suggestion, in addition to LDH release, Sytox Green was also used to detect cell death. The result was added as Figure 2-figure supplement 2 and described in Results, line 146.

      Fig 2C: In the figure y axe change LHD by LDH.

      The word was corrected.

      Fig 2D: Change HKE293T by HEK293T in the caption.

      The word was corrected.

      Fig 2G: Please add the concentration used with the two plasmids co-transfection. A Western blot showing CASP3/7 expression vs TrGSDME is missing. Is that assay after 24h? please specify better the methodology.

      The concentration of plasmid used in co-transfection and the time post transfection were added to the Materials and Methods (lines 422 and 424). In addition, the expression of CASP3/7 was added to Figure 2I.

      Fig 2 J, K: Change HKE293T by HEK293T in the figure caption. The concentration of the caspase inhibitors is missing. Depending on the concentration used, these inhibitors used could provoke toxicity on the cells by themselves.

      The word was corrected in the figure caption. The inhibitor concentration (10 μM) was added to the figure legend (line 690).

      Line 151: TrCASP3/7 instead of CASP3/7

      CASP3/7 was changed to TrCASP3/7.

      Fig 3A, 3B: Please add the units used of the HsCASP

      The unit was added to the figure legends (lines 697).

      Fig 3A: Can the authors add the SDS-PAGE to see the Nt terminal portion as has been done in Fig 1A? Maybe in a supplementary figure.

      The SDS-PAGE was added as Figure 3-figure supplement 1.

      Fig 3B: If the authors could add some data about the caspase activity using any other CASP such as CASP2, CASP1 to compare the activity data with CASP3 and CASP7 would be helpful.

      The proteolytic activity of TrCASP1 was provided as Figure 3-figure supplement 2.

      Fig 3C: To state this (Line 160), the authors should use another prediction software to reach a consensus with the sequences of the first analysis. In fact, what happens when GSDME is modelled 3-dimensionally by comparing it to crystalized structures such as mouse GSDMA? If the authors add an arrow indicating where the Nt terminal portion ends and where Ct portion begins would make the figure clearer.

      According to the suggestions of both reviewers, in the revised manuscript, we used mouse GSDMA3 (PDB: 5b5r) for the structural analysis of HsGSDME, which showed that the 261-266 region of HsGSDME was a loop. As a result, Figure 3C was revised. Relevant change in Results: lines 172 and 174.

      As suggested by the reviewer, we modelled the three-dimensional structure of HsGSDME by using SWISS-MODEL with mouse GSDMA3 as the template (Author response image 2, below).

      Author response image 2.

      The three-dimensional structure model of HsGSDME. (A) The structure of HsGSDME was modeled by using mouse GSDMA3 (MmGSDMA3) as the template. The N-terminal domain (1-246 aa) and the C-terminal domain (279-468 aa) of HsGSDME are shown in red and blue, respectively. (B) The superposed structure of HsGSDME (cyan) and MmGSDMA3 (purple).

      Fig 3F: if this is an immunoblotting why NT can be seen? In other Western blots only the CT is detected, why? The use of the TrGSDME mouse polyclonal needs more details (is a purify Ab, was produced for this study, what are the dilution used...)

      Since the anti-TrGSDME antibody was generated using the full-length TrGSDME, it reacted with both the N-terminal and the C-terminal fragments of TrGSDME in Figure 3F. In Figure 3G, the GSDME chimera contained only TrGSDME-CT, so only the CT fragment was detected by anti-TrGSDME antibody. More information on antibody preparation and immunoblot was added to “Materials and Methods” (lines 390 and 391).

      Fig 4B: Can the authors show in which amino acid the p20 finish for each CASP? (Similarly, as they have done in panel 3E)

      Fig 4B was revised as suggested.

      Fig 5F: With 4 units of WT CASP7 the authors show a HsGSDME Ct in the same proportion than when the S234N mutant is used (at lower concentrations). How do the authors explain this?

      The result showed that the cleavage by 4U of HsCASP7 was comparable to the cleavage by 0.25U of HsCASP7-S234N, indicating that S234 mutation increased the cleavage ability of HsCASP7 by 16 folds.

      Line 203: Can the authors show an alignment between this region of casp1/4 and 7? Maybe in supplementary figures.

      As reported by Wang et. al (PMID: 32109412), the βIII/βIII’ sheet of CASP1/4 forms the exosite critical for GSDMD recognition. The structural comparison among HsCASP1/4/7 and the sequence alignment of HsCASP1/4 βIII/βIII’ region with its corresponding region in HsCASP7 were added as Figure 5-figure supplement 2.

      Line 205: A mutation including S234N with the exosite mutations (S234+Q276W+D278E+H283S) is required to support this statement.

      The sentence of “suggesting that, unlike human GSDMD, HsGSDME cleavage by CASPs probably did not involve exosite interaction” was deleted in the revised manuscript.

      Fig 5I, 5J: which is the amount of HsGSDME and TrGSDME? I would place these figures in supplementary material.

      The protein expression of TrGSDME/HsGSDME was shown in the figure. Fig 5I and 5J were moved to Figure 5-figure supplement 3.

      Line 218: I would specify that this importance is in HUMAN CASP7 to cleavage Human GSDME.

      “CASP7” and “GSDME” were changed to “HsCASP7” and “HsGSDME”, respectively.

      Fig 6C: 4 units is the amount of S234N mutant needed to see an optimal HsGSDME cleavage in Fig 5F.

      In Figure 6C, the cleavage efficacy of HsCASP3-N208S was apparently decreased compared to that of HsCASP3, and 4U of HsCASP3-N208S was roughly equivalent to 1U of HsCASP3 in cleavage efficacy. In Figure 5F, cleavage by 4U of HsCASP7 was comparable to the cleavage by 0.25U of HsCASP7-S234N. Together, these results confirmed the critical role of S234/N208 in HsCASP3/7 cleavage of HsGSDM.

      Fig 6I: Could be the fact that the mouse GSDME has a longer Ct than human GSDME affect the interaction with CASP7? Less accessible to the cut site? Needs a positive control of mouse GSDME with mouse Caspase 3.

      Although mouse GSDME (MmGSDME) (512 aa) is larger than HsGSDME (496 aa), the length of the C-terminal domain of MmGSDME (186 aa) is comparable to that of HsGSDME (190 aa).

      Author response image 3.

      Conserved domain analysis of mouse (upper) and human (lower) GSDME.

      As suggested by the reviewer, the cleavage of MmGSDME by mouse caspase-3 (MmCASP3) was added as Figure 6-figure supplement 2 and described in Results, lines 258.

      Material and Methods:

      -Overall, concentrations or amounts used in this study regarding the active enzyme or plasmids used are missing and need to be added.

      The missing concentrations of the enzymes and plasmids were added in Material and Methods (lines 421, 453, 457, and 470) or figure legends (Figure 1 and 3).

      -It would be helpful if the authors label in the immunoblotting panels what is the GSDME that they are using. (Hs GSDME FL...).

      As suggested, the labels were added to Figures 1A ,1B, and 3.

      -Add the units of enzyme used.

      The units of enzyme were added to figure legends (Figure 1A, 3A, 3D, and 3F) or Material and Methods (lines 453 and 457).

      The GSDME sequence obtained for Takifugu after amplification of the RNA extracted should be shown and specified (GSDMEa or GSDMEb). From which tissue was the RNA extracted?

      The details were added to Materials and Methods (lines 398 and 402).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines psychophysics, fMRI, and TMS to reveal a causal role of FEF in generating an attention-induced ocular dominance shift, with potential relevance for clinical applications. The evidence supporting the claims of the authors is solid, but the theoretical and mechanistic interpretation of results and experimental approaches need to be strengthened. The work will be of broad interest to perceptual and cognitive neuroscience.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Based on a "dichoptic-background-movie" paradigm that modulates ocular dominance, the present study combines fMRI and TMS to examine the role of the frontoparietal attentional network in ocular dominance shifts. The authors claimed a causal role of FEF in generating the attention-induced ocular dominance shift.

      Strengths:

      A combination of fMRI, TMS, and "dichoptic-background-movie" paradigm techniques is used to reveal the causal role of the frontoparietal attentional network in ocular dominance shifts. The conclusions of this paper are mostly well supported by data.

      Weaknesses:

      (1) The relationship between eye dominance, eye-based attention shift, and cortical functions remains unclear and merits further delineation. The rationale of the experimental design related to the hemispheric asymmetry in the FEF and other regions should be clarified.

      Thanks for the reviewer’s comments! We have further clarified the relationship between eye dominance shift, eye-based attention, and cortical functions in the Introduction and Discussion. In the Introduction, we introduce the modulating effects of eye-based attention on eye dominance. On one hand, eye-based attention can enhance eye dominance of the attended eye in real time (see page 3 first paragraph or below):

      ”For instance, presenting top-down attentional cues to one eye can intensify the competition strength of input signals in the attended eye during binocular rivalry (Choe & Kim, 2022; Zhang et al., 2012) and shift the eye balance towards the attended eye (Wong et al., 2021).”

      On the other hand, prolonged eye-based attention can induce a shift of eye dominance to the unattended eye (see page 3 second paragraph or below):

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).”

      Moreover, we discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or below, which also respond to this reviewer’s comment of Weakness #2):

      “Then how does FEF regulate the attention-induced ocular dominance shift? Our previous work has found that the aftereffect (for simplicity, hereafter we use aftereffect to denote the attention-induced ocular dominance shift) can be produced only when the adapting stimuli involve adequate interocular competition, and is measurable only when the testing stimuli are not binocularly fused (Song et al., 2023). Given the indispensability of interocular competition, we explained those findings in the framework of the ocular-opponency-neuron model of binocular rivalry (Said & Heeger, 2013). The model suggests that there are some opponency neurons which receive excitatory inputs from monocular neurons for one eye and inhibitory inputs from monocular neurons for the other eye (e.g. AE-UAE opponency neurons receive excitatory inputs from the attended eye (AE) and inhibitory inputs from the unattended eye (UAE)). Then a difference signal is computed so that the opponency neurons fire if the excitatory inputs surpass the inhibitory inputs. Upon activation, the opponency neurons will in turn suppress the monocular neurons which send inhibitory signals to them.

      Based on this model, we proposed an ocular-opponency-neuron adaptation account to explain the aftereffect, and pointed out that the attentional system likely modulated the AE-UAE ocular opponency neurons (Song et al., 2023). So why would FEF modulate the AE-UAE opponency neurons? The reason may be two fold. Firstly, understanding the logic during the dichoptic-backward-movie viewing may require filtering out the distracting information (from the unattended eye) and sustaining attention (to the attended eye), which is exactly the role of FEF (Esterman et al., 2015; Lega et al., 2019).

      Secondly, due to the special characteristics of binocular vision system, filtering the distracting input from the unattended eye may have to rely on the interocular suppression mechanism. According to the ocular-opponency-neuron model, this is achieved by the firing of the AE-UAE opponency neurons that send inhibitory signals to the UAE monocular neurons.

      As mentioned previously, the firing of the AE-UAE opponency neurons requires stronger activity for the AE monocular neurons than for the UAE monocular neurons. This is confirmed by the results shown in Figure 8 of Song et al. (2023) that monocular response for the attended eye during the entire adaptation phase was slightly stronger than that for the unattended eye. Accordingly, during adaptation the AE-UAE opponency neurons were able to activate for a longer period thus adapted to a larger extent than the UAE-AE opponency neurons. This would cause the monocular neurons for the unattended eye to receive less inhibition from the AE-UAE opponency neurons in the post-test as compared with the pre-test, leading to a shift of ocular dominance towards the unattended eye. In this vein, the magnitude of this aftereffect should be proportional to the extent of adaptation of the AE-UAE relative to UAE-AE opponency neurons. Attentional enhancement on the AE-UAE opponency neurons is believed to strengthen this aftereffect, as it has been found that attention can enhance adaptation (Dong et al., 2016; Rezec et al., 2004). Inhibition of FEF likely led such attentional modulation to be much less effective. Consequently, the AE-UAE opponency neurons might not have the chance to adapt to a sufficiently larger extent than the UAE-AE opponency neurons, leading to a statistically non-detectable aftereffect in Experiment 2. Therefore, the results of Experiments 2-4 in the present study suggest that within the context of the ocular-opponency-neuron adaptation account, FEF might be the core area to fulfill the attentional modulations on the AE-UAE opponency neurons.”

      We used the experimental design with hemispheric asymmetry in the FEF and other regions for two reasons. First, many studies have shown that the dorsal attentional network has a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010). This was also indicated by the results of Experiment 1 (Figure 3). Second, we found that a recent research applying TMS to FEF and IPS stimulated only the right hemisphere (Gallotto et al., 2022). Therefore, we selected the right FEF and right IPS as the target regions for cTBS. In the Methods section of Experiment 2, we have elucidated the reasons for the selection of cTBS target regions (see page 35, first paragraph or below):

      “Given that the dorsal attentional network primarily consists of the FEF and the IPS (Corbetta & Shulman, 2002; Mayrhofer et al., 2019), with a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010), we selected the right FEF and right IPS from the four clusters identified in Experiment 1 as the target regions for cTBS (Gallotto et al., 2022).”

      (2) Theoretically, how the eye-related functions in this area could be achieved, and how it interacts with the ocular representation in V1 warrant further clarification.

      Thanks for the reviewer’s comment! In the revised manuscript, we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or the quoted paragraphs under this reviewer’s first Public comment).

      Reviewer #2 (Public Review):

      Summary

      Song et al investigate the role of the frontal eye field (FEF) and the intraparietal sulcus (IPS) in mediating the shift in ocular dominance (OD) observed after a period of dichoptic stimulation during which attention is selectively directed to one eye. This manipulation has been previously found to transiently shift OD in favor of the unattended eye, similar to the effect of short-term monocular deprivation. To this aim, the authors combine psychophysics, fMRI, and transcranial magnetic stimulation (TMS). In the first experiment, the authors determine the regions of interest (ROIs) based on the responses recorded by fMRI during either dichoptic or binocular stimulation, showing selective recruitment of the right FEF and IPS during the dichoptic condition, in line with the involvement of eye-based attention. In a second experiment, the authors investigate the causal role of these two ROIs in mediating the OD shift observed after a period of dichoptic stimulation by selectively inhibiting with TMS (using continuous theta burst stimulation, cTBS), before the adaptation period (50 min exposure to dichoptic stimulation). They show that, when cTBS is delivered on the FEF, but not the IPS or the vertex, the shift in OD induced by dichoptic stimulation is reduced, indicating a causal involvement of the FEF in mediating this form of short-term plasticity. A third control experiment rules out the possibility that TMS interferes with the OD task (binocular rivalry), rather than with the plasticity mechanisms. From this evidence, the authors conclude that the FEF is one of the areas mediating the OD shift induced by eye-selective attention.

      Strengths

      (1) The experimental paradigm is sound and the authors have thoroughly investigated the neural correlates of an interesting form of short-term visual plasticity combining different techniques in an intelligent way.

      (2) The results are solid and the appropriate controls have been performed to exclude potential confounds.

      (3) The results are very interesting, providing new evidence both about the neural correlates of eye-based attention and the involvement of extra-striate areas in mediating short-term OD plasticity in humans, with potential relevance for clinical applications (especially in the field of amblyopia).

      Weaknesses

      (1) Ethics: more details about the ethics need to be included in the manuscript. It is only mentioned for experiment 1 that participants "provided informed consent in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences". (Which version of the Declaration of Helsinki? The latest version requires the pre-registration of the study. The code of the approved protocol together with the code and date of the approval should be provided.) There is no mention of informed consent procedures or ethics approval for the TMS experiments. This is a huge concern, especially for brain stimulation experiments!

      Response: Thanks for the reviewer’s comment! In the revised manuscript, we have provided the code of the approved protocol and date of the approval (see page 25 second paragraph or below):

      “This study was approved (H21058, 11/01/2021) by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.”

      Indeed, ethics approval and informed consent were obtained for each experiment. To avoid duplication in the text, we only presented the ethics instructions in the Methods section of Experiment 1. We have now clarified in that section that all the experiments in this study were approved by the IRB in our Institute.

      (2) Statistics: the methods section should include a sub-section describing in detail all the statistical analyses performed for the study. Moreover, in the results section, statistical details should be added to support the fMRI results. In the current version of the manuscript, the claims are not supported by statistical evidence.

      Response: Thanks for the reviewer’s suggestion! In the Methods section of revised manuscript, we have added a section to describe the detailed statistical analyses for each experiment (see page 37 last paragraph for Experiment 2 and page 38 last paragraph for Experiment 3 or below):

      “Statistical analyses were performed using MATLAB. A 3 (stimulation site: Vertex, FEF, IPS) × 2 (test phase: pre-test and post-test) repeated measures ANOVA was used to investigate the effect of cTBS delivery on ocular dominance shift. Moreover, for the blob detection test, the target detection rate of each experimental condition was calculated by dividing the summed number of detected blob targets by the total number of blob targets. Then, a 2 (eye: attended eye, unattended eye) × 3 (stimulation site: Vertex, FEF, IPS) repeated measures ANOVA on the detection performance was performed. Post-hoc tests were conducted using paired t-tests (2-tailed significance level at α = 0.05), and the resulting p-values were corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini & Hochberg, 1995).”

      “In addition to the data analysis in Experiment 2, we complemented the standard inferential approach with the Bayes factor (van den Bergh et al., 2023; van Doorn et al., 2021; Wagenmakers et al., 2018), which allows quantifying the relative evidence that the data provide for the alternative (H1) or null hypothesis (H0). We conducted the Bayesian repeated measures ANOVA using JASP with default priors and computed inclusion Bayes factors (BFincl) which suggest the evidence for the inclusion of a particular effect calculated across matched models. A BF greater than 1 provides support for the alternative hypothesis. Specifically, a BF between 1 and 3 indicates weak evidence, a BF between 3 and 10 indicates moderate evidence, and a BF greater than 10 indicates strong evidence (van Doorn et al., 2021). In contrast, a BF below 1 provides evidence in favor of the null hypothesis.”

      Furthermore, in the Results section of revised manuscript, we have added the statistical details to support the fMRI results (see page 9 last paragraph or below):

      “To seek these brain regions, we used the AFNI program “3dttest++” to access the difference of ‘dichoptic-binocular’ contrast between the experimental and control runs. The AFNI program “ClustSim” was then applied for multiple comparison correction, yielding a minimum significant cluster size of 21 voxels (voxel wise p = .001; cluster threshold α = 0.05). We found 4 clusters showing stronger responses to the dichoptic movies than to the binocular movies especially in the experimental runs.”

      (3) Interpretation of the results: the TMS results are very interesting and convincing regarding the involvement of the FEF in the build-up of the OD shift induced by dichoptic stimulation, however, I am not sure that the authors can claim that this effect is related to eye-based attention, as cTBS has no effect on the blob detection task during dichoptic stimulation. If the FEF were causally involved in eye-based attention, one would expect a change in performance in this task during dichoptic stimulation, perhaps a similar performance for the unattended and attended eye. The authors speculate that the sound could have an additional role in driving eye-based attention, which might explain the lack of effect for the blob discrimination task, however, this hypothesis has not been tested.

      Response: Thanks for the reviewer’s comment! Following this reviewer’s insightful suggestion, we have conducted a new experiment to examine the effect of sound on blob detection task (see Experiment 4 in the revised manuscript). The procedure was similar to that of Experiment 2 except that the sound was no longer presented during the dichoptic-backward-movie adaptation. The results showed that the interocular difference of blob detection rate after sound elimination remained unaffected by the cTBS, which disagreed with our explanation in the previous version of manuscript. Based on the new data, we now question the validity to use the blob detection rate to precisely quantify eye-based attention, and have tried to explain why the blob detection results do not contradict with our account for the function role of FEF in modulating the aftereffect in the Discussion of the revised manuscript (see page 23 second paragraph to page 24 first paragraph or below):

      “An unresolved issue is why inhibiting the cortical function of FEF did not impair the performance of blob detection task. One potential explanation is that the synchronized audio in Experiment 2 might help increase the length of time that the regular movie dominated awareness. However, the results of Experiment 4 did not support this explanation, in which the performance of blob detection survived from the inhibition of FEF even when silent movies were presented. Although this issue remains to be explored in future work, it does not contradict with our notion of FEF modulating AE-UAE opponency neurons. It should be noted that our notion merely states that FEF is the core area for attentional modulations on activities of AE-UAE opponency neurons. No other role of FEF during the adaptation is assumed here (e.g. boosting monocular responses or increasing conscious level of stimuli in the attended eye). In contrast, according to the most original definition, the blob detection performance serves as an estimation of visibility (or consciousness level) of the stimuli input from each eye, despite the initial goal of adopting this task is to precisely quantify eye-based attention (which might be impractical). Thus, according to our notion, inhibition of FEF does not necessarily lead to deteriorate performance of blob detection. Furthermore, our findings consistently indicated that the visibility of stimuli in the attended eye was markedly superior to that of stimuli in the unattended eye, yet the discrepancy in the SSVEP monocular responses between the two eyes was minimal though it had reached statistical significance (Song et al., 2023). Therefore, blob detection performance in our work may only faithfully reflect the conscious level in each monocular pathway, but it is probably not an appropriate index tightly associated with the attentional modulations on monocular responses in early visual areas. Indeed, previous work has argued that attention but not awareness modulates neural activities in V1 during interocular competition (Watanabe et al., 2011), but see (Yuval-Greenberg & Heeger, 2013). We have noticed and discussed the counterintuitive results of blob detection performance in our previous work (Song et al., 2023). Here, with the new counterintuitive finding that inhibition of FEF did not impair the performance of blob detection, we suspect that blob detection performance in the “dichoptic-backward-movie” adaptation paradigm may not be an ideal index that can be used to accurately quantify eye-based attention.

      (4) Writing: in general, the manuscript is well written, but clarity should be improved in certain sections.

      (a) fMRI results: the first sentence is difficult to understand at first read, but it is crucial to understand the results, please reformulate and clarify.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have reformulated this sentence (see page 9 last paragraph or below):

      “It was only in the dichoptic condition of experimental runs that participants had to selectively pay more attention to one eye (i.e., eye-based attention). Therefore, we speculate that if certain brain regions exhibit greater activities in the dichoptic condition as compared to the binocular condition in the experimental runs but not in the control runs, the activation of these brain regions could be attributable to eye-based attention.”

      (b) Experiment 3: the rationale for experiment one should be straightforward, without a long premise explaining why it would not be necessary.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have streamlined the lengthy premise explaining to make the rationale of Experiment 3 more straightforward (see page 15 last two paragraphs or below):

      “The results of Experiment 2 support the notion that eye-based attention was the cause for attention-induced ocular dominance plasticity. However, an alternative account is that the significant two-way interaction between test phase and stimulation site did not stem from any persistent malfunction of FEF in modulating ocular dominance, but rather it was due to some abnormality of binocular rivalry measures in the post-test that occurred after stimulation at the FEF only (and not at the other two brain sites). For instance, stimulation at the FEF might simply reduce the ODI measured in the binocular rivalry post-test.

      Therefore, we conducted Experiment 3 to examine how suppression of the three target sites would impact binocular rivalry performance, in case that any unknown confounding factors, which were unrelated to adaptation but related to binocular rivalry measures, contributed to the results.”

      (c) Discussion: the language is a bit familiar here and there, a more straightforward style should be preferred (one example: p.19 second paragraph).

      Response: Thanks for the reviewer’s suggestion! We have carefully revised the language in the discussion. The discussion following the example paragraph has been largely rewritten.

      (5) Minor: the authors might consider using the term "participant" or "observer" instead of "subject" when referring to the volunteers who participated in the study.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have replaced the term “subject” with “participant”.

      Reviewer #3 (Public Review):

      Summary:

      This study studied the neural mechanisms underlying the shift of ocular dominance induced by "dichoptic-backward-movie" adaptation. The study is self-consistent.

      Strengths:

      The experimental design is solid and progressive (relationship among three studies), and all of the raised research questions were well answered.

      The logic behind the neural mechanisms is solid.

      The findings regarding the cTMS (especially the position/site can be useful for future medical implications).

      Weaknesses:

      Why does the "dichoptic-backward-movie" adaptation matter? This part is severely missing. This kind of adaptation is neither intuitive like the classical (Gbison) visual adaptation, nor practical as adaptation as a research paradigm as well as the fundamental neural mechanism. If this part is not clearly stated and discussed, this study is just self-consistent in terms of its own research question. There are tons of "cool" phenomena in which the neural mechanisms are apparent as "FEF controls vision-attention" but never tested using TMS & fMRI, but we all know that this kind of research is just of incremental implications.

      Response: Thanks for the reviewer’s comment! We designed the "dichoptic-backward-movie" adaptation to study the perceptual consequence and mechanisms of sustained attention to a monocular pathway. Since the overall visual input to both eyes during adaptation were identical, any effect (i.e. the change of ocular dominance in our study) after adaptation can be easily ascribed to unbalanced eye-based attention between the two eyes rather than unbalanced input energy across the eyes. In typical short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is undoubtedly distributed to the non-deprived eye. The fact that in a short-term monocular deprivation paradigm the deprived eye is also the unattended eye prevents researchers from ascertaining whether unbalanced eye-based attentional allocation contributes to the shift of ocular dominance just like unbalanced visual input across the two eyes. That is why the “dichoptic-backward-movie” adaptation was adopted in the present study. This new paradigm balances the input energy across the eyes but leaves attention unbalanced across the eyes. In the revised manuscript, we have added the description of the “dichoptic-backward-movie” adaptation (see page 3 last paragraph and page 4 first paragraph or below). Hope this complementary information improves the clarity.

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).” In short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is biased towards the non-deprived eye. However, it is difficult to tease apart the potential contribution of unbalanced eye-based attention from the consequence of the unbalanced input energy, as the deprived eye is also the unattended eye. Therefore, the advantage of the “dichoptic-backward-movie” adaptation paradigm is to balance the input energy across the eyes but leave attention unbalanced across the eyes.

      Our previous work (Song et al., 2023) has shown that eye-based attention plays a role in the formation of ocular dominance shift following adaptation to dichoptic backward movie. However, because the “dichoptic-backward-movie” adaptation paradigm is new, to our knowledge, no literature has ever discovered the brain areas that are responsible for eye-based attention. Our fMRI experiment for the first time resolves this issue, which, we believe, is one of the novelties of the present study. Attention is a pretty general definition of our ability to select limited information for preferential or privileged processing, yet it includes numerous aspects (e.g. spatial attention for spatial locations, feature-based attention for visual features, object-based attention for objects, social attention for social cues, and eye-based attention for monocular pathways etc). Are we 100% sure that the same brain network always underlies every aspect of attention including eye-based attention? No test, no answer. Maybe the answer is Yes, but we are not aware of any evidence for that from literature. It is not unlikely that attention is like an elephant while researchers are like blind people touching the elephant from different angles. Even if all previous researchers have touched the side of the elephant and state that an elephant is no different from a wall, as long as one researcher grabs the elephant’s tail, the “wall” knowledge will be falsified. From this perspective of the essence of science (falsifiable), we have the confidence to say that our fMRI experiment on eye-based attention is novel, because to our knowledge our experiment is the first one to explore the issue. On the basis of the fMRI experiment (otherwise we would have no idea on which precise brain site to apply the cTBS), we could successfully complete the subsequent TMS experiments.

      Of course, if the reviewer can kindly point out any previous neuroimaging work we missed that has already disclosed the neural mechanisms underlying human’s eye-based attention, we would truly appreciate the reviewer very much. But even so, we would like to emphasize that the purpose of the current study was actually not to use TMS & fMRI to confirm that “FEF controls visual attention”. As we mentioned in the Abstract and expanded the introduction in the last two paragraphs of Introduction, the goal of the TMS experiments is to examine the causal role of eye-based attention in producing the aftereffect of “dichoptic-backward-movie” adaptation. This research question is also new, thus we do not think the TMS experiments are incremental, either. Our findings provided direct causal evidence for the effect of FEF on modulating ocular dominance through eye-based attention. Please see the last two sentences in the first paragraph on page 20 in the revised manuscript or below,

      “Interestingly, in our Experiment 2 this aftereffect was significantly attenuated after we temporarily inhibited the cortical function of FEF via cTBS. This finding indicates the crucial role of FEF in the formation of attention-induced ocular dominance shift.”

      as well as the last sentence of the Abstract,

      “…and in this network, FEF plays a crucial causal role in generating the attention-induced ocular dominance shift.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The hemispheric asymmetry in the eye-based attention-related cortex should be further examined and discussed. For example, IPS in both hemispheres was identified in the fMRI experiment. It is not clear why only the right IPS was stimulated in the TMS experiment.

      Response: Thanks for the comment. We have elucidated the reasons for the experimental design with hemispheric asymmetry in FEF and IPS. Please see our response to the Weakness #1 raised by Reviewer #1 in the Public Review section.

      (2) It is known that the frontoparietal cortex plays a role in the contralateral shift of attentional allocation. Meanwhile, the latest stage of ocular-specific representation is V1. The authors should discuss how the eye-related function can be achieved in FEF.

      Response: Thanks for the comment. we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph in the revised manuscript, and our response to the Weakness #2 raised by Reviewer #1 in the Public Review section).

      (3) To further validate the role of FEF in eye-related attention shifts, the authors may consider using the traditional monocular deprivation paradigm with fMRI and TMS. It would be valuable to compare the neural mechanisms related to the classical monocular deprivation paradigm with the current findings.

      Response: Thanks for the reviewer’s suggestion! That is indeed an interesting research topic that we are currently exploring. The current study investigated the attention-induced ocular dominance shift with the “dichoptic-backward-movie-adaptation” paradigm. This paradigm is substantially different from traditional short-term monocular deprivation. In our Neuroscience Bulletin paper (Song et al. 2023), we discuss the reason as follows.

      “An alternative account of our results is the homeostatic plasticity mechanism. The function of this mechanism is to stabilize neuronal activity and prevent the neuronal system from becoming hyperactive or hypoactive. For this goal, the mechanism moves the neuronal system back toward its baseline after a perturbation [51, 52]. In our case, the aftereffect can be explained such that the visual system boosts the signals from the unattended eye to maintain the balance of the network’s excitability. However, this account cannot easily explain why the change of neural ocular dominance led by prolonged eye-based attention was observed here using the binocular rivalry testing stimuli, but absent in the previous research using the binocularly fused stimuli [11]. In contrast, a recent SSVEP study also using the binocularly fused stimuli has successfully revealed a shift of neural ocular dominance after two hours of monocular deprivation [31], which is in line with the homeostatic plasticity account. Therefore, the mechanisms underlying the “dichoptic-backward-movie” adaptation and monocular deprivation are probably not fully overlapped with each other; and the binocular rivalry mechanism described in the ocular-opponency-neuron model seems to be more preferable than the homeostatic plasticity mechanism in accounting for the present findings.”

      Therefore, before asking whether FEF plays a role in the attention-induced ocular dominance shift in a traditional monocular deprivation paradigm, one should probably first examine whether attention also plays a role in traditional monocular deprivation, and whether the ocular-opponency-neuron adaptation account can also be used to explain the traditional monocular deprivation effect. Our newly accepted paper “Negligible contribution of adaptation of ocular opponency neurons to the effect of short-term monocular deprivation” (https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1282113/full) gives a generally negative answer to the second question. And as to the first question, we have one manuscript under review and another ongoing study. In other words, to get a satisfactory answer to this particular comment of this reviewer, we need to first obtain clear answers to the two above questions. We think this is far beyond the scope of one single manuscript.

      (4) The authors only presented regular movies to the dominant eye to maximize the ocular dominance shift. This critical information of design should be clarified, not only in the method section.

      Response: Thanks for the reviewer’s suggestion! In the Results section of Experiment 2, we have added a description of this critical information of design (see page 11 last paragraph to page 12 first paragraph or below):

      “Then, participants adapted to the “dichoptic-backward-movie” in which regular movie images were presented to the dominant eye to maximize the effect of eye dominance shift (Song et al., 2023). Meanwhile they were asked to detect some infrequent blob targets presented on the movie images in one eye at the same time.”

      (5) The frame rate of the movie is 30 fps, which is much lower than a typical 60 fps visual presentation, does this have an effect on the adaptation outcome?

      Response: To our best of knowledge, there is no evidence that the frame rate of the movie influences the aftereffect of attention-induced ocular dominance shift. In our previous research, the frame rate of the movie during adaptation was 25 fps, which still produced a stable adaptation aftereffect (Song et al., 2023). And the frame rate of the movie was 30 fps in our monocular deprivation work (Lyu et al., 2020), which showed a similar monocular deprivation effect we previously observed in an altered reality study (Bai et al., 2017). The frame rate of the altered-reality video in Bai et al.’s (2017) work was 60 fps. All these clues suggest that the frame rate does not have an effect on the adaptation outcome.

      (6) Figure 5: The ODSE derived from ODI in Experiment 3 should also be illustrated, for a better comparison with results from Experiment 2.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have added the results of ODSE in Experiment 3 to Figure 5 (see page 15 or below):

      Author response image 1.

      Figure 5. The results of (A) the ocular dominance index (ODI), (B) the ocular dominance shift effects (ODSE) in Experiment 2, (C) the ODI and (D) the ODSE in Experiment 3. The bars show the grand average data for each condition. The individual data are plotted with gray lines or dots. The dashed gray line represents the absolute balance point for the two eyes (ODI = 0.5). Error bars indicate standard errors of means. * p < .05; ** p < .01; n.s. p > .05.

      (7) Spelling issues: "i.e." → "i.e.,"

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have changed “i.e.” to “i.e.,”.

      Reviewer #2 (Recommendations For The Authors):

      Linked to weakness 3: Ideally, a control experiment with cTBS and dichoptic stimulation without sound but with the blob discrimination task should be performed to be able to make important claims about the neural mechanisms involved in eye-based attention.

      Response: Thanks for the comment. We have performed a new experiment as the reviewer suggested. Please see our response to the Weakness #3 raised by Reviewer #2 in the Public Review section.

      Reviewer #3 (Recommendations For The Authors):

      (1) The neural mechanisms are so apparent. We all know the FEF\IPS\SC matter in vision and attention and gaze. This is not groundbreaking.

      Response: As we addressed in our response to Reviewer #3’s public comment, the current study aimed at investigating the causal mechanism for eye-based attentional modulation of ocular dominance plasticity rather than simply the role of FEF\IPS\SC in visual attention. Moreover, eye-based attention is a less investigated aspect of visual attention. The neural mechanism underlying eye-based attention is still largely unknown, and seeking the brain areas for controlling eye-based attention is the necessary preparation work for applying the cTBS. We have responded in detail to Reviewer #3’s public comment why we think both the fMRI and TMS experiments are novel to the field, which we will not reiterate it here to avoid redundancy.

      (2) Why does the "dichoptic-backward-movie" adaptation matter? Is playing a backward movie to one eye realistic? Does that follow the efficient coding? Is that a mere consequence of information theory?

      Response: Thanks for the comments. We have added the description of the “dichoptic-backward-movie” adaptation paradigm in the revised manuscript (see page 3 last paragraph and page 4 first paragraph or our response to this reviewer’s Public comment).

      Is it realistic to play backward movie to one eye? We feel this question is somehow ambiguous to us. If the reviewer means the technical operability for such stimulus presentation, we can assure it since we have used this paradigm in both the current and previously published studies. To be more specific, we made the video stimuli in advance. The left half of the video was the regular movie and the right half was the backward version of the same movie (or vice versa). When viewing such video stimuli through stereoscopes, participants could only see the left half of the video with the left eye and the right half of the video with the right eye. In other words, the regular movie and backward movie were viewed dichoptically. Alternatively, if the reviewer means that such dichoptic presentation rarely happens in real world thus not realistic, we agree with the reviewer on one hand. On the other hand, we have explained on page 3 last paragraph and page 4 first paragraph why it is a particular useful paradigm for the main purpose of the present study. Let us make a similar example. The phenomenon of binocular rivalry rarely happens in everyday life. So people may say binocular rivalry is not realistic. However, our visual system does have the ability to deal with such conflicting visual inputs across the eyes, even binocular rivalry is unrealistic! Sometimes it is fun to investigate those seemingly unrealistic functions of our brains since those may also reveal the mystery of our neural system. As we know, despite binocular rivalry is uncommon in daily life, it is frequently used to investigate awareness. And in our work, we use binocular rivalry to measure perceptual ocular dominance.

      Finally, the reviewer queried about if the "dichoptic-backward-movie" adaptation paradigm follow efficient coding and information theory. The information theory and efficient coding assume that messages with low expectedness or of rare occurrence would attract more attention and induce larger neural responses than those with high expectedness. In the "dichoptic-backward-movie" adaptation paradigm, the backward movie should be less expected since the actions of the characters in the backward movie appeared illogical. Thus, according to the information theory and efficient coding, it would be expected that more attention was paid to the backward movie and thus the backward movie might dominate the awareness for a longer period during adaptation (Zhang et al., 2012). However, we instructed participants to follow the regular movie during adaptation. The results of blob detection task also showed a better task performance when the targets appeared in the eye presented with the regular movie, which contradicted with the prediction of the information theory and efficient coding. Thus, it seems not very likely that the "dichoptic-backward-movie" adaptation followed efficient coding and information theory.

      References

      Bai, J., Dong, X., He, S., & Bao, M. (2017). Monocular deprivation of Fourier phase information boosts the deprived eye’s dominance during interocular competition but not interocular phase combination. Neuroscience, 352, 122-130. https://doi.org/10.1016/j.neuroscience.2017.03.053

      Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1), 289-300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

      Choe, E., & Kim, M.-S. (2022). Eye-specific attentional bias driven by selection history. Psychonomic Bulletin & Review, 29(6), 2155-2166. https://doi.org/10.3758/s13423-022-02121-0

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215. https://doi.org/10.1038/nrn755

      Dong, X., Gao, Y., Lv, L., & Bao, M. (2016). Habituation of visual adaptation. Sci Rep, 6, 19152. https://doi.org/10.1038/srep19152

      Duecker, F., Formisano, E., & Sack, A. T. (2013). Hemispheric differences in the voluntary control of spatial attention: direct evidence for a right-hemispheric dominance within frontal cortex. Journal of Cognitive Neuroscience, 25(8), 1332-1342. https://doi.org/10.1162/jocn_a_00402

      Esterman, M., Liu, G., Okabe, H., Reagan, A., Thai, M., & DeGutis, J. (2015). Frontal eye field involvement in sustaining visual attention: evidence from transcranial magnetic stimulation. Neuroimage, 111, 542-548. https://doi.org/10.1016/j.neuroimage.2015.01.044

      Gallotto, S., Schuhmann, T., Duecker, F., Middag-van Spanje, M., de Graaf, T. A., & Sack, A. T. (2022). Concurrent frontal and parietal network TMS for modulating attention. iScience, 25(3), 103962. https://doi.org/10.1016/j.isci.2022.103962

      Lega, C., Ferrante, O., Marini, F., Santandrea, E., Cattaneo, L., & Chelazzi, L. (2019). Probing the neural mechanisms for distractor filtering and their history-contingent modulation by means of TMS. Journal of Neuroscience, 39(38), 7591-7603. https://doi.org/10.1523/JNEUROSCI.2740-18.2019

      Lunghi, C., Burr, D. C., & Morrone, C. (2011). Brief periods of monocular deprivation disrupt ocular balance in human adult visual cortex. Curr Biol, 21(14), R538-539. https://doi.org/10.1016/j.cub.2011.06.004

      Lyu, L., He, S., Jiang, Y., Engel, S. A., & Bao, M. (2020). Natural-scene-based Steady-state Visual Evoked Potentials Reveal Effects of Short-term Monocular Deprivation. Neuroscience, 435, 10-21. https://doi.org/10.1016/j.neuroscience.2020.03.039

      Mayrhofer, H. C., Duecker, F., van de Ven, V., Jacobs, H. I., & Sack, A. T. (2019). Hemifield-specific correlations between cue-related blood oxygen level dependent activity in bilateral nodes of the dorsal attention network and attentional benefits in a spatial orienting paradigm. Journal of Cognitive Neuroscience, 31(5), 625-638. https://doi.org/10.1162/jocn_a_01338

      Rezec, A., Krekelberg, B., & Dobkins, K. R. (2004). Attention enhances adaptability: evidence from motion adaptation experiments. Vision Res, 44(26), 3035-3044. https://doi.org/10.1016/j.visres.2004.07.020

      Sack, A. T. (2010). Using non-invasive brain interference as a tool for mimicking spatial neglect in healthy volunteers. Restorative neurology and neuroscience, 28(4), 485-497. https://doi.org/10.3233/RNN-2010-0568

      Said, C. P., & Heeger, D. J. (2013). A model of binocular rivalry and cross-orientation suppression. PLoS computational biology, 9(3), e1002991. https://doi.org/10.1371/journal.pcbi.1002991

      Song, F., Lyu, L., Zhao, J., & Bao, M. (2023). The role of eye-specific attention in ocular dominance plasticity. Cerebral Cortex, 33(4), 983-996. https://doi.org/10.1093/cercor/bhac116

      van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2023). Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP. Advances in Methods and Practices in Psychological Science, 6(2), 25152459231168024. https://doi.org/10.1177/25152459231168024

      van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š., Ly, A., Marsman, M., Matzke, D., Gupta, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E. J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5

      Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Selker, R., Gronau, Q. F., Dropmann, D., Boutin, B., Meerhoff, F., Knight, P., Raj, A., van Kesteren, E. J., van Doorn, J., Šmíra, M., Epskamp, S., Etz, A., Matzke, D., de Jong, T., van den Bergh, D., Sarafoglou, A., Steingroever, H., Derks, K., Rouder, J. N., & Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1323-7

      Watanabe, M., Cheng, K., Murayama, Y., Ueno, K., Asamizuya, T., Tanaka, K., & Logothetis, N. (2011). Attention but not awareness modulates the BOLD signal in the human V1 during binocular suppression. Science, 334(6057), 829-831. https://doi.org/10.1126/science.1203161

      Wong, S. P., Baldwin, A. S., Hess, R. F., & Mullen, K. T. (2021). Shifting eye balance using monocularly directed attention in normal vision. J Vis, 21(5), 4. https://doi.org/10.1167/jov.21.5.4

      Yuval-Greenberg, S., & Heeger, D. J. (2013). Continuous flash suppression modulates cortical activity in early visual cortex. J Neurosci, 33(23), 9635-9643. https://doi.org/10.1523/jneurosci.4612-12.2013

      Zhang, P., Jiang, Y., & He, S. (2012). Voluntary attention modulates processing of eye-specific visual information. Psychol Sci, 23(3), 254-260. https://doi.org/10.1177/0956797611424289

      Zhou, J., Reynaud, A., & Hess, R. F. (2014). Real-time modulation of perceptual eye dominance in humans. Proc Biol Sci, 281(1795). https://doi.org/10.1098/rspb.2014.1717

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study has uncovered some important initial findings about how certain extracellular vehicles (EVs) from the mother might impact the energy usage of an embryo. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The study's title might be a bit too assertive as the evidence linking maternal mtDNA transmission to changes in embryo energy use is still correlative.

      We would like to express our sincere gratitude to the editors and reviewers for their invaluable comments on this work. Their feedback has been instrumental in enhancing the quality of our manuscript; we have incorporated their suggestions to the best of our abilities.

      Reviewer #1 (Public Review):

      Q1. Bolumar et al. isolated and characterized EV subpopulations, apoptotic bodies (AB), Microvesicles (MV), and Exosomes (EXO), from endometrial fluid through the female menstrual cycle. By performing DNA sequencing, they found the MVs contain more specific DNA sequences than other EVs, and specifically, more mtDNA were encapsulated in MVs. They also found a reduction of mtDNA content in the human endometrium at the receptive and post-receptive period that is associated with an increase in mitophagy activity in the cells, and a higher mtDNA content in the secreted MVs was found at the same time. Last, they demonstrated that the endometrial Ishikawa cell-derived EVs could be taken by the mouse embryos and resulted in altered embryo metabolism.

      This is a very interesting study and is the first one demonstrating the direct transmission of maternal mtDNA to embryos through EVs.

      A1. Thank you for your kind comments.

      Reviewer #2 (Public Review):

      Q2. In Bolumar, Moncayo-Arlandi et al. the authors explore whether endometrium-derived extracellular vesicles contribute mtDNA to embryos and therefore influence embryo metabolism and respiration. The manuscript combines techniques for isolating different populations of extracellular vesicles, DNA sequencing, embryo culture, and respiration assays performed on human endometrial samples and mouse embryos.

      Vesicle isolation is technically difficult and therefore collection from human samples is commendable. Also, the influence of maternally derived mtDNA on the bioenergetics of embryos is unknown and therefore novel. However, several experiments presented in the manuscript fail to reach statistical significance, likely due to the small sample sizes. Additionally, the experiments do not demonstrate a direct effect of mtDNA transfer on embryo bioenergetics. This has the unfortunate consequence of making several of the authors' conclusions speculative.

      In my opinion the manuscript supports the following of the authors' claims:

      1) Different amounts of mtDNA are shed in human endometrial extracellular vesicles during different phases of the menstrual cycle

      2) Endometrial microvesicles are more enriched for mitochondrial DNA sequences compared to other types of microvesicles present in the human samples

      3) Fluorescently labelled DNA from extracellular vesicles derived from an endometrial adenocarcinoma cell line can be incorporated into hatched mouse embryos.

      4) Culture of mouse embryos with endometrial extracellular vesicles can influence embryo respiration and the effect is greater when cultured with isolated exosomes compared to other isolated microvesicles

      A2. Thank you for your detailed feedback. We have made every effort to enhance the manuscript in this revised version, ensuring that our conclusions are grounded in solid evidence and that they avoid any speculation.

      My main concerns with the manuscript:

      Q3. The authors demonstrate that microvesicles contain the most mtDNA, however, they also demonstrate that only isolated exosomes influence embryo respiration. These are two separate populations of extracellular vesicles.

      A3. This manuscript focuses on the DNA content secreted by the endometrium and captured by the embryo. We identified both mitochondrial DNA and genomic DNA. We have found that mitochondrial DNA is predominantly secreted and encapsulated within microvesicles, while all three types of vesicles encapsulate genomic DNA. Specifically, based on the results we presented in Response A8 to the reviewers and included in the latest version of the manuscript, we observed that exosomes contain the highest amount of genomic DNA. Furthermore, exosomes have the greatest impact on embryo bioenergetics, suggesting that this DNA content may primarily exert this effect. We have thoroughly revised the manuscript, focusing our message on DNA content.

      Q4. mtDNA is not specifically identified as being taken up by embryos only DNA.

      A4. We agree with the reviewer; as we mention in answer A9, EdU does not specifically label mitochondrial DNA. To solve this issue, we incubated a synthetic molecule of labeled mtDNA with embryos and analyzed mtDNA incorporation using confocal microscopy. We co-cultured hatched mouse embryos (3.5 days) with an ATP8 sequence conjugated with Biotin overnight at 37ºC and 5% CO2. We then permeabilized embryos, incubated them with Streptavidine-Cy3 for 45 min, and visualized the results using an SP8 confocal microscope (Leica). We observed mtDNA internalization by cells of the hatched embryos; please see new supplementary Figure 7 and lines 234-237 on page 9 and lines 583-592 M&M on page 21.

      Q5. The authors do not rule out that other components packaged in extracellular vesicles could be the factors influencing embryo metabolism.

      A5. The vesicular subtypes contain molecules beyond DNA, such as microRNAs, proteins, or lipids. Our laboratory has studied the transmission of vesicles and their relationship with their contents (particularly microRNAs) and their connection to maternal-fetal communication. In this study, we focused on genomic/mitochondrial DNA. We cannot exclude the possibility that other molecules may influence metabolism; this statement is already noted in the discussion section on lines 328-331 on page 12.

      Q6. Taken together, these concerns seem to contradict the implication of the title of the manuscript – the authors do not demonstrate that inheritance of maternal mtDNA has a direct causative effect on embryo metabolism.

      A6. We have modified the title to better align with the manuscript’s results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Reviewer #1 (Recommendations for The Authors):

      Q7. Would it be possible to validate the mtDNA content and mitophagy activity in different periods using the Ishikawa cells?

      A7. Unfortunately, this validation cannot be achieved with in vitro cultures of cell lines, especially with a cell line such as the endometrial adenocarcinoma-derived Ishikawa cell line. While mimicking the menstrual cycle (as observed in Figure 3 of the manuscript) is entirely artificial, we believe that the statistically significant results obtained in human samples faithfully represent the biological processes involved. Using a cell line, in our opinion, would not provide us with novel information.

      Q8. Characterization of the EVs subpopulations from Ishikawa cells and direct evidence to show the EdU labeled DNA is contained in the EVs are necessary.

      A8. To address this concern, we designed a novel experiment. We cultured Ishikawa cells in the presence of Edu, isolated the three types of vesicles, and evaluated labeled DNA content by flow cytometry (as illustrated in Supplementary Figure 5). All three types of vesicles exhibited positive EdU-DNA labeling; notably, the exosomal fraction demonstrated substantially higher DNA content than the other vesicle populations. Please see new supplementary Figure 5 and lines 217-218 on page 9, and lines 576-582 of the M&M on pages 20-21.

      Q9. Would EdU incorporate into the genomic DNA or mitochondrial DNA?

      A9. EdU (5-ethynyl-2′-deoxyuridine) is a nucleoside analog of thymidine and becomes incorporated into DNA during active DNA synthesis. EdU labels all newly synthesized DNA, both genomic and mitochondrial; however, we cannot differentiate between them with this technique.

      Q10. It is difficult to assess whether the EV-derived DNA was taken by the TE or ICM without immunostaining of cell lineage markers in mouse embryos.

      A10. We did not aim to label the inner cell mass, as the vesicles primarily enter through trophectodermal cells. The images presented in Figure 4 and Supplementary Figure 5 depict trophectoderm cells.

      Q11. It is also valuable to perform co-staining of Mitotracker to show the co-localization of EdU labelled DNA and the mitochondrial.

      A11. Per the reviewer's suggestion, we conducted an experiment as described in the following text. We isolated MVs from the culture media of EdU-treated Ishikawa cells and co-incubated them with embryos overnight. The resulting images (See Author response image 1) show an embryo subjected to staining with EdU-tagged DNA labeled with Alexa Fluor 488 (green), Mitotracker Deep Red (red), and nuclei (blue). Detailed views of the embryo are presented in panels A and B. Notably, we observed co-localization of mitochondria and EdU-tagged DNA, as indicated by the white arrows. Despite this intriguing finding, we chose not to include these results in the initial version of the manuscript; however, if the editor deems it appropriate, we would be delighted to incorporate them into the final version. The experimental procedure for co-localization of EdU DNA-tagged with mitochondria involved the following steps: Mitotracker Deep Red FM (Thermo Fisher Scientific, M22426) was added to the embryo media at a final concentration of 200 nM, and the embryos were subsequently incubated for 45-60 minutes prior to fixation.

      Author response image 1.

      Co-localization of mitochondria and EdU-tagged DNA in mouse embryos. Representative micrograph of an embryo co-incubated with MVs isolated from the culture media of Ishikawa cells treated with EdU. EdU-tagged DNA was labeled with Alexa Fluro 488 (green). Mitotracker Deep Red (mitochondria; red) and nuclei (blue). A and B) magnified images of the embryo show detailed co-localization of mitochondria and EdU-tagged DNA (white arrows). Negative control) Embryos incubated with MVs isolated from control Ishikawa cells (without EdU incubation) and stained with the click-it reaction cocktail. A and B showed magnified images of the embryo. Notice the absence of EdU-Alexa Fluro 488 signals (green).

      Reviewer #2 (Recommendations for The Authors):

      Q12. It would be helpful if the authors could provide citations and rationale for why they chose specific molecular markers to validate the different population of extracellular vesicles.

      A12. Different extracellular populations are defined by molecular marker signatures that reflect their origin. VDAC1 forms ionic channels in the mitochondrial membrane, has a role in triggering apoptosis, and has been described as characteristic of ABs.[1]

      The ER protein Calreticulin has also been used as an AB marker [2]; however, other studies have noted the presence of Calreticulin in MVs. [1] This apparent non-specificity may derive from apoptotic processes, during which the ER membrane fragments and forms vesicles smaller than ABs, which would contain Calreticulin and sediment at higher centrifugal forces.[3,4] In fact, proteomic studies have linked the presence of Calreticulin with vesicular fractions of a size range relevant for MVs [5] and ABs [6].

      ARF6, a GTP-binding protein implicated in cargo sorting and promoting MV formation, has been proposed as an MV marker. [7,8]

      Classic markers of EXOs include molecules involved in biogenesis, such as tetraspanins (CD63, CD9, CD81), Alix, TSG101, and flotillin-1.[9,10] Nonetheless, studies have recently reported the widespread nature of such markers among various EV populations, although with different relative abundances (such as is the case for CD9, CD63, HSC70, and flotillin-1[11]). Notably, certain molecular markers (such as TSG101[1,11]) have been ratified as specific to EXOs.

      References

      1. D. K. Jeppesen, M. L. Hvam, B. Primdahl-Bengtson, A. T. Boysen, B. Whitehead, L. Dyrskjøt, T. F. Orntoft, K. A. Howard, M. S. Ostenfeld, J. Extracell. Vesicle. 2014, 3, 25011, doi: 10.3402/jev.v3.25011.

      2. J. van Deun, P. Mestdagh, R. Sormunen, V. Cocquyt, K. Vermaelen, J. Vandesompele, M. Bracke, O. De Wever, A. Hendrix, J. Extracell. Vesicles. 2014, 3:24858, doi: 10.3402/jev.v3.24858.

      3. L. Abas, C. Luschnig, Anal. Biochem. 2010, 401, 217-227, doi: 10.1016/j.ab.2010.02.030.

      4. C. Lavoie, J. Lanoix, F. W. Kan, J. Paiement, J. Cell Sci. 1996, 109(6), 1415-1425.

      5. M. Tong, T. Kleffmann, S. Pradhan, C. L. Johansson, J. DeSousa, P. R. Stone, J. L. James, Q. Chen, L. W. Chamley, Hum. Reprod. 2016, 31(4), 687-699, doi: 10.1093/humrep/dew004.

      6. P. Pantham, C. A. Viall, Q. Chen, T. Kleffmann, C. G. Print, L. W. Chamley, Placenta. 2015, 36, 1463e1473, doi: 10.1016/j.placenta.2015.10.006.

      7. V. Muralidharan-Chari, J. Clancy, C. Plou, M. Romao, P. Chavrier, G. Raposo, C. D'Souza-Schorey, Curr. Biol. 2009, 19, 1875-1885.

      8. C. Tricarico, J. Clancy, C. D'Souza-Schorey, Small GTPases. 2016, 0(0), 1-13.

      9. M. Colombo, G. Raposo, C. Théry, Annu. Rev. Cell. Dev. Biol. 2014, 30, 255-289, doi: 10.1146/annurev-cellbio-101512-122326.

      10. S. Mathivanan, H. Ji, R. J. Simpson, J. Proteomics. 2010, 73(10), 1907-1920.

      11. J. Kowal, G. Arras, M. Colombo, M. Jouve, J. P. Morath, B. Primdal-Bengtson, F. Dingli, D. Loew, M. Tkach, C. Théry, Proc. Natl. Acad. Sci. U. S. A. 2016, 113(8), E968-77.

      Q13. The PCA analysis in supplementary figure 4 A&B needs more explanation for why they think separation of the two conditions based on principal component 1 is sufficient. The small number of replicates makes me concerned because principal component 2 does not show similarity of replicates for the DNase treated samples. Also, 4C has no description in the figure legend.

      A13. The PCA results show a clear separation between the two conditions; we believe this separation is primarily driven by the differences observed in principal component 1 (PC1). We would like to address the concerns raised by the reviewer with the following points:

      1. Interpretation of PCs: In PCA, the principal components represent orthogonal axes capturing the highest variance in the data. PC1 accounts for 56% and 57% of the variance in the two conditions, respectively. The significant variance explained by PC1 suggests that it effectively captures the major sources of variation between the samples.

      2. Sample Replicates and Variability: The concern regarding the small number of replicates is acknowledged, and we understand its impact on the analysis. Despite the limited number of replicates, the consistent pattern of separation in PC1 between the two conditions provides confidence in the observed separation. We also agree that PC2 does not show an apparent similarity among the DNase-treated samples; however, this does not diminish the significance of PC1, which robustly separates the two conditions.

      We include the Figure legend for 4C: “C) Principal component analysis shows EV sample grouping due to specificity in coding-gene sequences.

      Q14. I am confused by the phrasing in the last two sentences of the top paragraph on page 7. Why would apoptotic bodies all have similar content if they encapsulate a greater amount of material making their contents less specific? Please clarify.

      A14. This sentence intended to convey the fact that apoptotic bodies (ABs) are formed from apoptotic cells, they are larger in size, and their content is more non-specific - this non-specific nature arises as they do not encapsulate molecules specifically, unlike the other two types of vesicles. For more detailed information on ABs in human reproduction, we published an extensive review in 2018 (see below).

      Simon C, Greening DW, Bolumar D, Balaguer N, Salamonsen LA, Vilella F. Extracellular Vesicles in Human Reproduction in Health and Disease. Endocr. Rev. 2018 Jun 1;39(3):292-332. doi: 10.1210/er.2017-00229. PMID: 29390102.

      Q15. The first and last sentences of the last paragraph of page 8 seem to contradict each other. Please clarify.

      A15. We observe an enrichment in the amount of mitochondrial DNA in samples during the receptive and post-receptive phases. While the data may not show statistical significance, we observed a trend towards greater enrichment in receptivity compared to pre-receptivity. The lack of significant differences could be attributed to inherent variability among patients. We have also altered the text on page 8 to avoid confusion.

      Q16. Quantification of the rates of DNA incorporation into embryos would strengthen Figure 4 and Supplementary Figure 5.

      A16. We acknowledge the reviewer's feedback, and in response, we conducted an assay to quantify the total DNA incorporated into the embryos. We isolated EVs from the control Ishikawa cell culture media and EdU-treated Ishikawa cell culture media to achieve this. Subsequently, we co-incubated both types of EVs with ten embryos overnight in G2 plus media at 37ºC and 5% CO2.

      After co-incubation, we collected embryos and the culture media containing co-incubated EVs. We then isolated total DNA using the QIAamp® DNA Mini kit (Qiagen; 51304). To label the EdU-DNA particles, we performed a click-it reaction using the Click-iT™ EdU Alexa Fluor™ 488 flow cytometry assay Kit (Thermo Fisher Scientific, ref: C10420) per the manufacturer's instructions. Subsequently, we cleaned and purified DNA using AMPure beads XP (Beckman Coulter, A63882) and eluted DNA in 150 L of 0.1 M Tris-EDTA. Finally, we measured the fluorescence of each sample using a Victor3 plate reader (PerkinElmer). To ensure accuracy, we subtracted the background signal from non-labeled DNA-derived EVs and embryos incubated without EVs for each sample. Despite conducting the experiment twice, we encountered challenges in obtaining clear results, possibly due to the limitation of the technique's resolution.

      Q17. If mtDNA is most enriched in MVs but only embryos cultured with Exos demonstrated differences in respiration the authors need to comment on this discrepancy.

      A17. We ask the reviewer to refer to Answer A3; we have thoroughly revised the manuscript, focusing our message on DNA content.

      Q18. The authors should change the definitive language in the title of the manuscript because all evidence presented is correlative.

      A18.We have modified the title to better align with the manuscript's results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Q19. I realize this is beyond what the authors intend for the scope of this paper, however, on page 6 the authors describe membranous structures within the ABs but say they couldn't study their presence with organelle-specific markers. Why? Presence of organelles in these vesicles is very interesting!

      A19. As the reviewer rightly points out, we did not study ABs in this manuscript. Analysis of the electron microscopy images suggests the presence of fragments of organelles, most likely originating from apoptotic processes; however, we did not use any specific markers to confirm our assertion. We have modified the text to avoid any confusion. Please see Page 6, Lines 120-121, for further details.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors have examined gene expression between life cycle stages in a range of brown macroalgae to examine whether there are conserved aspects of biological features. 

      Strengths: 

      The manuscript incorporates large gene expression datasets from 10 different species and therefore enables a comprehensive assessment of the degree of conservation of different aspects of gene expression and underlying biology. 

      The findings represent an important step forward in our understanding of the core aspects of cell biology that differ between life cycle phases and provide a substantial resource for further detailed studies in this area. Convincing evidence is provided for the conservation of lifecycle-specific gene expression between species, particularly in core housekeeping gene modules. 

      Weaknesses: 

      I found a few weaknesses in the methodology and experimental design. I think the manuscript could have been clearer when linking the findings to the biology of the brown algae. 

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Ratchinski et al presents a comprehensive analysis of developmental and life history gene expression patterns in brown algal species. The manuscript shows that the degree of generation bias or generation-specific gene expression correlates with the degree of dimorphism. It also reports conservation of life cycle features within generations and marked changes in gene expression patterns in Ectocarpus in the transition between gamete and early sporophyte. The manuscript also reports considerable conservation of gene expression modules between two representative species, particularly in genes associated with conserved functional characteristics. 

      Strengths: 

      The manuscript represents a considerable "tour de force" dataset and analytical effort. While the data presented is largely descriptive, it is likely to provide a very useful resource for studies of brown algal development and for comparative studies with other developmental and life cycle systems. 

      Weaknesses: 

      Notwithstanding the well-known issues associated with inferring function from transcriptomics-only studies, no major weaknesses were identified by this reviewer. 

      Reviewing Editor Comments:

      The overall assessment of the reviewers does not contain major aspects of concern. We nevertheless recommend that the authors carefully consider the constructive comments, as this will further improve their manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      (1) Line 32: The abstract states 'considerable conservation of co-expressed gene modules', but the degree of conservation between Ectocarpus and D. dichotoma appeared limited to specific subsets of genes with highly conserved housekeeping functions, e.g., translation. I think the wording of the abstract should be rephrased to better reflect this. 

      We agree that genes with housekeeping functions figure strongly in the gene modules that showed strong conservation between Ectocarpus species 7 and D. dichotoma (and we actually highlight this point in the manuscript) but we do not believe that this invalidates the conservation. In the analysis shown in Figure 6A, for example, high scores were obtained for both connectivity and density for about a third of the gene modules and these modules cover broad range of cellular functions. This is a significant result given the large phylogenetic distance and we feel that "considerable conservation" is appropriate as a description of the level of correlation. 

      (2) Introduction - The Introduction needs a better explanation of the biology of the life cycle phases. Some of this information is present in the 1st paragraph of Materials and Methods, although it would be preferable to include this information within the main text, ideally within the Introduction before the Results are described. For example, when are flagella present? The presence of flagella could be indicated in Figure 3. The ecology of the life cycle is also not described. Are life cycles present in the same ecological niche? Do they co-exist or occupy distinct environments? It would be useful to understand how the observed genotypes could relate to this wider aspect of the brown algal biology. 

      We have added a sentence to explain that zoids (gametes and spores) are the only flagellated stages of the life cycle (line 678). In addition, in the legend for Figure 3, we have indicated which of the life cycle stages analysed in panel 3A consisted entirely or partially of flagellated cells. We have also added information about phenology to the Introduction. 

      (3) Line 127. 'The proportion of generation specific genes was positively correlated with the level of dimorphism'. The level of dimorphism between species was not clear to me. This needs to be clearly displayed in Figure 1B. 

      We had attempted to illustrate the level of dimorphism, using the size of each generation as a measurable proxy, in Figure S1 but we agree that the information was not very clearly presented. To improve clarity, we now provide independent size scales for each generation of the life cycle in this figure and state in the legend that "Size bars indicate the approximate sizes of each generation of each life cycle, providing an indication of the degree of dimorphism between the two generations.". In the text, Figure S1 is cited earlier in the paragraph but we now repeat the citation of the figure at the end of the sentence "The proportion of generation-specific genes (...) was positively correlated with the level of dimorphism" so that the reader can specifically consult the supplementary figure for this phenotypic parameter. 

      (4) Line 267. Are there known differences in cell wall composition between life cycle phases or within each generation as individual life cycle phases mature (e.g., differences between unicellular and multicellular stages)? 

      Detailed comparative analyses of cell wall composition at different stages of the life cycle have not been carried out for brown algae. However, Congo red stains Ectocarpus gametophytes but not sporophytes (Coelho et al., 2011), indicating a difference in cell wall composition between the two generations. Zoids (spores and gametes) do not have a cell wall and calcofluor white staining of meio-spores has indicated that a cell wall only starts to be deposited 24-48 hours post-release (Arun et al., 2013).

      (5) Line 388. The authors should comment on the accuracy of OrthoFinder for different gene types across this degree of divergence (250 MYA). The best conservation was found in genes with housekeeping characteristics (line 401). It may be that these gene modules show the highest degree of conservation in expression patterns, but I also wonder whether they pattern may also emerge because finding true orthologues is easier for highly conserved gene families. 

      We do not believe that this is the case because, as mentioned above, the "housekeeping" modules cover quite a broad range of cellular functions. Note also that the modules were given functional labels based on their being clearly enriched in genes corresponding to a particular class of function but not all the genes in a module have a predicted function that corresponds to the functional classification. 

      However, we have carried out an analysis to look for evidence of the bias proposed by the reviewer. For this, we used BLASTp identity scores as an approximate proxy for pairwise identity between Ectocarpus species 7 and D. dichotoma one-to-one orthologues in each module and plotted the mean identity score for each module against the Fischer test p-value of the contingency table in Figure 6C (Author response image 1).

      Author response image 1.

      Plot of estimations of the mean percent shared identity between the orthologues within each module (based on mean BLASTp identity scores) against log10(pvalue) values obtained with the Fisher's exact test applied in Figure 6C to determine whether pairs of modules shared a greater number of one-to-one orthologues than expected from a random distribution. Error bars indicate the standard deviation. 

      This analysis did not detect any correlation between the degree of sequence conservation of orthologues in a module and the degree of conservation of the module between Ectocarpus species 7 and D. dichotoma.

      Minor comments 

      (1) Line 650 loose should be lose.

      The error has been corrected.

      (2) Line 695 filtered through a 1 μm filter to remove multicellular gametophyte fractions. Is this correct? It seems too small to allow gametes to pass through. 

      Yes, the text is correct, a 1 μm filter was used. The gametes do pass through this filter, presumably because they do not have a rigid cell wall, allowing them to squeeze through the filter when a light pressure is applied. 

      (3) Line 709 - DDT should be DTT 

      The error has been corrected.

      Reviewer #2 (Recommendations for the authors): 

      (1) It is not clear why the chosen species for analysis do not include fucoid algae, which display a high degree of dimorphism between generations and which are relatively well studied with respect to gene expression patterns during early development. Indeed, it was recently shown that gene expression patterns in developing embryos of Fucus spp. obey the "hourglass" pattern whereby gene expression shows a minima of transcription age index (i.e., higher expression of evolutionarily older genes) associated with differentiation at the phylotypic stage. I am somewhat surprised that the manuscript does not consider this feature in the analysis or discussion. 

      Brown algae of the order Fucales have diploid life cycles and therefore do not alternate between a sporophyte and gametophyte generation. It is for this reason that we thought that it was more interesting to compare Ectocarpus species 7 with D. dichotoma, which has a haploid-diploid life cycle.

      (2) In Discussion, the comparison of maternal to zygote transition in animals and land plants, which show a high degree of dimorphism, with Ectocarpus would be strengthened by data/discussion from other brown algae that show a high degree of dimorphism. 

      Animals have diploid life cycles and dimorphism in that lineage generally refers to sexual rather than generational dimorphism. Land plants do have highly dimorphic haploiddiploid life cycles but it is unclear how this characteristic relates to events that occur during the maternal to zygote transition. In Ectocarpus, the transition from gamete to the first stages of sporophyte development involved more marked changes in gene expression than we observed when comparing the mature sporophyte and gametophyte generations (Figure 3C). At present, there is no evidence that events during these two transitions are correlated. The relationship between changes in gene expression during very early sporophyte development and during alternation of life cycle generations could be investigated further using a highly dimorphic kelp model system such as Saccharina latissima but we are not aware of any studies that have specifically addressed this point. 

      (3) Since marked changes were observed during the transition from gamete to early sporophyte in Ectocarpus, it would be interesting to know how gene expression patterns change during the transition from gamete to partheno-sporophyte. Would the same patterns of downregulation and upregulation be expected? 

      The sporophyte individuals derived from gamete parthenogenesis (parthenosporophytes) are indistinguishable morphologically and functionally from diploid sporophytes derived from gamete fusions (see line 76). They also express generation marker genes in a comparable manner (Peters et al., 2008). Based on these observations, we have treated partheno-sporophytes and diploid sporophytes as equivalent in our experiments. For clarity, we have now distinguished partheno-sporophyte from diploid sporophyte samples in Table S1. 

      (4) The authors show a correlation between the degree of dimorphism and generation-biased or generation-specific expression. How was the degree of dimorphism quantified? 

      The degree of dimorphism is illustrated in Figure S1 using the relative size of the two generations as a proxy. Size estimations are approximate because the size of an individual of a particular species is quite variable but the ten species nonetheless represent a very clear gradient of dimorphism due to the extreme differences in size between generations of species at each end of the scale, with the sporophyte generation being several orders of magnitude larger than the gametophyte generation or visa versa. 

      References

      Arun A, Peters NT, Scornet D, Peters AF, Cock JM, Coelho SM. 2013. Non-cell autonomous regulation of life cycle transitions in the model brown alga Ectocarpus. New Phytol 197:503– 510. doi:10.1111/nph.12007

      Coelho SM, Godfroy O, Arun A, Le Corguillé G, Peters AF, Cock JM. 2011. OUROBOROS is a master regulator of the gametophyte to sporophyte life cycle transition in the brown alga Ectocarpus. Proc Natl Acad Sci USA 108:11518–11523. doi:10.1073/pnas.1102274108

      Peters AF, Scornet D, Ratin M, Charrier B, Monnier A, Merrien Y, Corre E, Coelho SM, Cock JM. 2008. Life-cycle-generation-specific developmental processes are modified in the immediate upright mutant of the brown alga Ectocarpus siliculosus. Development 135:1503–1512.doi:10.1242/dev.016303

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The authors assess the effectiveness of electroporating mRNA into male germ cells to rescue the expression of proteins required for spermatogenesis progression in individuals where these proteins are mutated or depleted. To set up the methodology, they first evaluated the expression of reporter proteins in wild-type mice, which showed expression in germ cells for over two weeks. Then, they attempted to recover fertility in a model of late spermatogenesis arrest that produces immotile sperm. By electroporating the mutated protein, the authors recovered the motility of ~5% of the sperm, although the sperm regenerated was not able to produce offspring using IVF.

      We actually did not write that “sperm regenerated was not able to produce offspring using IVF” but rather that IVF was not attempted because the number of rescued sperm was too low. To address this important point, the ability of sperm to produce embryos was therefore challenged by two different assisted reproduction technologies, that are IVF and ICSI. To increase the number of motile sperm for IVF experiments, we have injected both testes from one male. We also conducted intracytoplasmic sperm injection (ICSI) experiments, using only rescued sperm, identified as motile sperm with a normal flagellum. The results of these new experiments have demonstrated that the rescued ARMC2 sperm successfully fertilized eggs and produced embryos at the two-cell stage by IVF and blastocysts by ICSI. These outcomes are presented in Figure 12.

      This is a comprehensive evaluation of the mRNA methodology with multiple strengths. First, the authors show that naked synthetic RNA, purchased from a commercial source or generated in the laboratory with simple methods, is enough to express exogenous proteins in testicular germ cells. The authors compared RNA to DNA electroporation and found that germ cells are efficiently electroporated with RNA, but not DNA. The differences between these constructs were evaluated using in vivo imaging to track the reporter signal in individual animals through time. To understand how the reporter proteins affect the results of the experiments, the authors used different reporters: two fluorescent (eGFP and mCherry) and one bioluminescent (Luciferase). Although they observed differences among reporters, in every case expression lasted for at least two weeks. 

      The authors used a relevant system to study the therapeutic potential of RNA electroporation. The ARMC2-deficient animals have impaired sperm motility phenotype that affects only the later stages of spermatogenesis. The authors showed that sperm motility was recovered to ~5%, which is remarkable due to the small fraction of germ cells electroporated with RNA with the current protocol. The 3D reconstruction of an electroporated testis using state-of-the-art methods to show the electroporated regions is compelling. 

      The main weakness of the manuscript is that although the authors manage to recover motility in a small fraction of the sperm population, it is unclear whether the increased sperm quality is substantial to improve assisted reproduction outcomes. The quality of the sperm was not systematically evaluated in the manuscript, with the endpoints being sperm morphology and sperm mobility. 

      We would like to thank the reviewers for their comments. As previously stated above, we produced additional rescue experiments and performed CASA, morphology observation, IVF and ICSI with the rescued sperm. The rescued ARMC2 sperm exhibited normal morphology (new figure 11 and Supp Fig 8), motility (figure 11), and fecundity (figure 12).  Whereas sperm from untreated KO males were unable to fertilize egg by IVF, the rescued sperm fertilized eggs in vitro at a significant level (mean 62%, n=5), demonstrating that our strategy improves the sperm quality and assisted reproduction outcome (from 0 to 62%). 

      Some key results, such as the 3D reconstruction of the testis and the recovery of sperm motility, are qualitative given the low replicate numbers or the small magnitude of the effects. The presentation of the sperm motility data could have been clearer as well. For example, on day 21 after Armc2-mRNA electroporation, only one animal out of the three tested showed increased sperm motility. However, it is unclear from Figure 11A what the percentage of sperm motility for this animal is since the graph shows a value of >5% and the reported aggregate motility is 4.5%. It would have been helpful to show all individual data points in Figure 11A. 

      We provide now in figure 11A, a graph showing the percentage of rescued sperm for all animals. (scatter dot plot). Moreover, we performed additional CASA experiments to analyze in detail sperm motility (Figure 11A2-A3). Individual CASA parameters for motile sperm cells were extracted as requested by reviewer 3 and represented in a new graph (Fig 11 A2). 

      The expression of the reporter genes is unambiguous; however, better figures could have been presented to show cell type specificity. The DAPI staining is diffused, and it is challenging to understand where the basement membranes of the tubules are. For example, in Figures 7B3 and 7E3, the spermatogonia seems to be in the middle of the seminiferous tubule. The imaging was better for Figure 8. Suboptimal staining appears to lead to mislabeling of some germ cell populations. For example, in Supplementary Figure 4A3, the round spermatid label appears to be labeling spermatocytes. Also, in some instances, the authors seem to be confusing, elongating spermatids with spermatozoa, such as in the case of Supplementary Figures 4D3 and D4.

      Thanks for the comments, some spermatogenic cells were indeed mislabeled as you mentioned. We have therefore readjusted the labeling accordingly. We also changed spermatozoa to mature spermatids. The new sentence is now: “At the cellular level, fluorescence was detectable in germ cells (B1-B3) including Spermatogonia (Sg), Spermatocytes (Scytes),round Spermatids (RStids), mature spermatids (m-Sptids) and Sertoli cells (SC)”. Moreover, to indicate the localization of the basal membrane, we have also labelled myoid cells.

      The characterization of Armc2 expression could have been improved as well. The authors show a convincing expression of ARMC2 in a few spermatids/sperm using a combination of an anti-ARMC2 antibody and tubules derived from ARMC2 KO animals. At the minimum, one would have liked to see at least one whole tubule of a relevant stage.  

      Thanks for the remark. 

      We present now new images showing transversal section of seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text.

      Overall, the authors show that electroporating mRNA can improve spermatogenesis as demonstrated by the generation of motile sperm in the ARMC2 KO mouse model. 

      Thank you

      Reviewer #2 (Public Review): 

      Summary: 

      Here, the authors inject naked mRNAs and plasmids into the rete testes of mice to express exogenous proteins - GFP and later ARMC2. This approach has been taken before, as noted in the Discussion to rescue Dmc1 KO infertility. While the concept is exciting, multiple concerns reduce reviewer enthusiasm. 

      Strengths: 

      The approach, while not necessarily novel, is timely and interesting.  Weaknesses: 

      Overall, the writing and text can be improved and standardized - as an example, in some places in vivo is italicized, in others it's not; gene names are italicized in some places, others not; some places have spaces between a number and the units, others not. This lack of attention to detail in the preparation of the manuscript is a significant concern to this reviewer - the presentation of the experimental details does cast some reasonable concern with how the experiments might have been done. While this may be unfair, it is all the reviewers have to judge. Multiple typographical and grammatical errors are present, and vague or misleading statements. 

      Thanks for the comment, we have revised the whole manuscript to remove all the mistakes. We have also added new experiments/figures to strengthen the message. Finally, we have substantially modified the discussion.

      Reviewer #3 (Public Review):

      Summary: 

      The authors used a novel technique to treat male infertility. In a proof-of-concept study, the authors were able to rescue the phenotype of a knockout mouse model with immotile sperm using this technique. This could also be a promising treatment option for infertile men. 

      Strengths: 

      In their proof-of-concept study, the authors were able to show that the novel technique rescues the infertility phenotype in vivo. 

      Weaknesses: 

      Some minor weaknesses, especially in the discussion section, could be addressed to further improve the quality of the manuscript. 

      We have substantially modified the discussion, following the remarks of the reviewers.

      It is very convincing that the phenotype of Armc2 KO mice could (at least in part) be rescued by injection of Armc2 RNA. However, a central question remains about which testicular cell types have been targeted by the constructs. From the pictures presented in Figures 7 and 8, this issue is hard to assess. Given the more punctate staining of the DNA construct a targeting of Sertoli cells is more likely, whereas the more broader staining of seminiferous tubules using RNA constructs is talking toward germ cells. Further, the staining for up to 119 days (Figure 5) would point toward an integration of the DNA construct into the genome of early germ cells such as spermatogonia and/or possibly to Sertoli cells. 

      Thanks for the comment. We would like to recall the peculiar properties of the non-insertional Enhanced Episomes Vector (EEV) plasmid, which is a non-viral episome based on the Epstein-Barr virus (EBV: Epstein-Barr Virus). It allows the persistence of the plasmid for long period of time without integration. Its maintenance within the cell is made possible by its ability to replicate in a synchronous manner with the host genome and to segregate into daughter cells. This is due to the fact that EEV is composed of two distinct elements derived from EBV: an origin of replication (oriP) and an EpsteinBarr Nuclear Antigen 1 (EBNA1) expression cassette (Gil, Gallaher, and Berk, 2010).   The oriP is a locus comprising two EBNA1-binding domains, designated as the Family of Repeats (FR) and Dyad Symmetry (DS). The FR is an array of approximately 20 EBNA1-binding sites (20 repeats of 30 bp) with high affinity, while the DS comprises four lower-affinity sites operating in tandem (Ehrhardt et al., 2008). 

      The 641-amino-acid EBNA1 protein contains numerous domains. The N-terminal domains are rich in glycines and alanines, which enable interaction with host chromosomes. The C-terminal region is responsible for binding to oriP (Hodin, Najrana, and Yates, 2013). The binding of EBNA1 to the DS element results in the recruitment of the origin of replication. This results in the synchronous initiation of extra-chromosomal EEV replication with host DNA at each S phase of the cell cycle (Düzgüneş, Cheung, and Konopka 2018). Furthermore, EBNA1 binding to the FR domain induces the formation of a bridge between metaphase chromosomes and the vector during mitosis. This binding is responsible for the segregation of the EEV episome in daughter cells (Düzgüneş, Cheung, and Konopka 2018). It is notable that EEV is maintained at a rate of 90-95% per cell division.

      Because of the intrinsic properties of EEV described above, the presence of the reporter protein at 119 day after injection was likely due to the maintenance of the plasmid, mostly in Sertoli cells, and not to the DNA integration of the plasmid.

      Of note, the specificity of EEV was already indicated in the introduction (lines 124-128 clean copy). Nevertheless, we have added more information about EEV to help the readers.  

      Given the expression after RNA transfection for up to 21 days (Figure 4) and the detection of motile sperm after 21 days (Figure 11), this would point to either round spermatids or spermatocytes.  These aspects need to be discussed more carefully (discussion section: lines 549-574).

      We added a sentence to highlight that spermatids are transfected and protein synthetized at this stage and this question is discussed in details (see lines 677-684 clean copy).

      It would also be very interesting to know in which testicular cell type Armc2 is endogenously expressed (lines 575-591)

      Thanks for the remarks. We present now new images showing the full seminiferous tubules as requested by reviewer 1 (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that Armc2 is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text. (lines 570-579 clean copy).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The article is well-structured and easy to read. Nonetheless, there are typos and mistakes in some places that are distracting to the reader, such as the capitalization of the word "Oligo-" in the title of the manuscript, the use of the word "Materiel" in the title of the Materials and methods and the presence of space holders "Schorr staining was obtained from Merck (XXX)".  Thank you, we corrected the misspelling of "Materials and Methods" and corrected our error: "obtained from Merck (Darmstadt, Germany)". We also carefully corrected the manuscript to remove typos and mistakes.

      The discussion is too lengthy, with much repetition regarding the methods used and the results obtained. For example, these are two sentences from the discussion. "The vector was injected via the rete testis into the adult Armc2 KO mice. The testes were then electroporated." I would recommend shortening these passages.

      Thanks for your comments, we removed the sentences and we have substantially modified the discussion, following the remarks of the reviewers.

      The work is extensive, and many experiments have been done to prove the points made. However, a more in-depth analysis of critical experiments would have benefited the manuscript significantly. A more thorough analysis of sperm mobility and morphology using the CASA system would have been an initial step.

      In response to the observations made, additional CASA experiments and sperm motility analysis were conducted, as illustrated in Figure 11 (A2-A3). Individual CASA parameters for motile sperm cells were extracted as suggested and represented in a new graph (Fig 11 A2). We have observed significant differences between WT and rescued sperm. In particular, the VSL and LIN parameters were lower for rescued sperm. Nevertheless, these differences were not sufficient to prevent IVF, maybe because the curvilinear velocity (VCL) was not modified.

      In the case of ARMC2 localization, an analysis of the different stages of spermatogenesis to show when ARMC2 starts to be expressed. 

      Thanks for the remarks. This is an important remark pointed out by all reviewers. As explained above, we have performed more experiments. We present now new images showing transversal section of seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatid layers. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text. (lines 575579 clean copy).

      Finally, exploring additional endpoints to understand the quality of the sperm generated, such as the efficiency of ICSI or sperm damage, could have helped understand the degree of the recovery.

      This point was underlined in public review. We paste here our answer: “To address this important point, the ability of sperm to produce embryos was therefore challenged by two different assisted reproduction technologies, that are IVF and ICSI. To increase the number of motile sperm for IVF experiments, we have injected both testes from one male. We also conducted intracytoplasmic sperm injection (ICSI) experiments, using only rescued sperm, identified as motile sperm with a normal flagellum. The results of these new experiments have demonstrated that the rescued ARMC2 sperm successfully fertilized eggs and produced embryos at the two-cell stage by IVF and blastocysts by ICSI. These outcomes are presented in Figure 12.”

      Reviewer #2 (Recommendations For The Authors):

      38,74 intracellular

      Thanks, we changed it accordingly: "Intracytoplasmic sperm injection (ICSI) is required to treat such a condition, but it has limited efficacy and has been associated with a small increase in birth defects" and "such as intracytoplasmic sperm injection (ICSI)".

      39 "limited efficacy" Versus what? And for what reason? "small increase in birth defects" - compared to what? 

      We changed to “… but it is associated with a small increase in birth defect with comparison to pregnancies not involving assisted conception.”

      40 Just thinking through the logic of the argument thus far - the authors lay out that there are people with OAT (true), ICSI must be used (true), ICSI is bad (not convincing), and therefore a new strategy is needed... so is this an alternative to ICSI? And this is to restore fertility, not "restore spermatogenesis"

      - because ICSI doesn't restore spermatogenesis. This logic flow needs to be cleaned up some

      Thanks we changed it accordingly: “restore fertility.”

      45 "mostly"?

      Thank you, we removed the word: “We show that mRNA-coded reporter proteins are detected for up to 3 weeks in germ cells, making the use of mRNA possible to treat infertility.”

      65 Reference missing. 

      We added the following reference Kumar, N. and A. K. Singh (2015). "Trends of male factor infertility, an important cause of infertility: A review of literature." J Hum Reprod Sci 8(4): 191-196.

      68 Would argue meiosis is not a reduction of the number of chromosomes - that happens at the ends of meiosis I and II - but the bulk of meiosis is doubling DNA and recombination; would re-word; replace "differentiation" with morphogenesis, which is much more commonly used:

      Thank you, we have changed the sentence accordingly: "proliferation (mitosis of spermatogonia), reduction of the number of chromosomes (meiosis of spermatocytes), and morphogenesis of sperm (spermiogenesis)".

      70 "almost exclusively" is an odd term, and a bit of an oxymoron - if not exclusively, then where else are they expressed? Can you provide some sense of scale rather than using vague words like "large", "almost", "several", "strongly" and "most...likely" - need some support for these claims by being more specific: 

      Thanks for the comment, we changed the sentence: "The whole process involves around two thousand genes, 60% of which are expressed exclusively in the testes."

      73 "severe infertility" is redundant - if they are infertile, is there really any more or less about it? I think what is meant is patients with immotile sperm can be helped by ICSI - so just be more specific... 

      We changed the transition : “Among infertility disorders, oligo-astheno-teratozoospermia  (OAT) is the most frequent (50 % (Thonneau, Marchand et al. 1991); it is likely to be of genetic origin. Spermatocytograms of OAT patients show a decrease in sperm concentration, multiple morphological defects and defective motility. Because of these combined defects, patients are infertile and can only conceive by IntraCytoplasmic Sperm Injection (ICSI). IntraCytoplasmic Sperm Injection (ICSI) can efficiently overcome the problems faced. However, there are …”

      75 "some" is vague - how many concerns, and who has them? Be specific!

      Thanks for the comment, we removed the word.

      76-7 Again, be specific - "real" has little meaning - what is the increased risk, in % or fold? This is likely a controversial point, so make sure you absolutely support your contention with data .

      77 "these"? There was only one concern listed - increased birth defects; and "a number" is vague - what number, 1 or 1,000,000? A few (2-3), dozens, hundreds? 

      Thanks for the comment, we have reworded the sentence: “Nevertheless, concerns persist regarding the potential risks associated with this technique, including blastogenesis defect, cardiovascular defect, gastrointestinal defect, musculoskeletal defect, orofacial defect, leukemia, central nervous system tumors, and solid tumors. Statistical analyses of birth records have demonstrated an elevated risk of birth defects, with a 30–40% increased likelihood in cases involving ICSI, and a prevalence of birth defects between 1% and 4%.” We have added a list of references to support these claims.

      79-81 So, basically transgenesis? Again, vague terms "widely" - I don't think it's all that widely used yet... and references are missing to support the statement that integration of DNA into patient genomes is widely used. Give specific numbers, and provide a reference to support the contention. 

      Thanks for the comment, we removed the word widely and add references.

      81-5 Just finished talking about humans, but now it appears the authors have switched to talking about mice - got to let the readers know that! Unless you're talking about the Chinese group that deleted CCR5 in making transgenic humans? 

      Your feedback is greatly appreciated. In response to your comments, the sentence in question has been amended to provide a more comprehensive understanding. Indeed, the text refers to experiences carried in mice. The revised wording is as follows: “Given the genetic basis of male infertility, the first strategy, tested in mice, was to overcome spermatogenic failure associated with monogenic diseases by delivery of an intact gene to deficient germ cells (Usmani, Ganguli et al. 2013). 

      84-5 "efficiently" and "high" - provide context so the reader can understand what is meant - do the authors mean the experiments work efficiently, or that a high percentage of cells are transfected? And give some numbers or range of numbers - you're asking the readers to take your word for things when you choose adjectives - instead, provide values and let the readers decide for themselves.

      Thanks for the comment, we have reworded the sentence: Gene therapy is effective in germ cells, as numerous publications have shown that conventional plasmids can be transferred into spermatogonia in several species with success, allowing their transcription in all cells of the germinal lineage (Usmani, Ganguli et al. 2013, Michaelis, Sobczak et al. 2014, Raina, Kumar et al. 2015, Wang, Liu et al. 2022).

      93 Reference at the end of the sentence "most countries"

      Thanks, we changed the sentence and added the reference: the new sentence is "… to avoid any eugenic deviations, transmissible changes in humans are illegal in 39 countries (Liu 2020)” (Liu, S. (2020). "Legal reflections on the case of genomeedited babies." Glob Health Res Policy 5: 24

      93-4 Odd to say "multiple" and then list only one. 

      Thanks for the comment, we have reworded the sentence: “Furthermore, the genetic modification of germ cell lines poses biological risks, including the induction of cancer, off-target effects, and cell mosaicism. Errors in editing may have adverse effects on future generations. It is exceedingly challenging to anticipate the consequences of genetic mosaicism, for instance, in a single individual. (Sadelain, Papapetrou et al. 2011, Ishii 2017).”

      97 Is this really a "small" change? Again, would use adjectives carefully - to this reviewer, this is not a small change, but a significant one! And "should be" is not altogether convincing

      Thanks for the comment, we have reworded the sentence: “Thanks to this change, the risk of genomic insertion is avoided, and thus there is no question of heritable alterations.”

      What chance is there of retrotransposition? Is there any data in the literature for that, after injecting millions of copies of RNA one or more might be reverse transcribed and inserted into the genome?

      This is certainly possible and is the putative origin for multiple intronless spermatid-expressed genes: 

      The expert poses an interesting question, but one that unfortunately remains unanswered at present. Most papers on mRNA therapy state that there is no risk concerning genomic integration, but no reference is given (for instance see mRNA-based therapeutics: looking beyond COVID-19 vaccines. Lancet. 2024 doi: 10.1016/S0140-6736(23)02444-3). This is an important question, which deserves to be evaluated, but is beyond the scope of this manuscript. Nevertheless is remaining very debating (Igyarto and Qin 2024).

      98 Odd to say "should be no risk" and then conclude with "there is no question" - so start the sentence with 'hedging', and then end with certainty - got to pick one or the other.

      Thanks for the comment, we have reworded the sentence

      99 "Complete" - probably not, would delete:

      We removed the word: “The first part of this study presents a characterization of the protein expression patterns obtained following transfection of naked mRNA coding for reporter genes into the testes of mice”

      101-2 Reference missing, as are numbers - what % of cases? 

      Thank you, we changed the sentence and added the reference: “Among infertility disorders, oligoastheno-teratozoospermia  (OAT) is the most frequent (50 % (Thonneau, Marchand et al. 1991)” Thonneau, P., S. Marchand, A. Tallec, M. L. Ferial, B. Ducot, J. Lansac, P. Lopes, J. M. Tabaste and A. Spira (1991). "Incidence and main causes of infertility in a resident population (1,850,000) of three French regions (1988-1989)." Hum Reprod 6(6): 811-816.

      103 Once again, the reference is missing:

      We have added these references: (Colpi, Francavilla et al. 2018) (Cavallini 2006)

      104-5 Awkward transition.

      Thanks, we changed the transition: “The first part of this study presents a characterization of the protein expression patterns obtained following transfection of naked mRNA coding for reporter genes into the testes of mice. The second part is to apply the protocol to a preclinical mouse model of OAT.”

      105 Backslash is odd - never seen it used in that way before

      Removed

      108 "completely infertile" is redundant;

      Thank you, we changed it accordingly: “Patients and mice carrying mutations in the ARMC2 gene present a canonical OAT phenotype and are infertile”.

      and is a KO mouse really "preclinical"? 

      The definition of preclinical research, is research involving the use of animals to ascertain the potential efficacy of a drug, procedure, or treatment. Preclinical studies are conducted prior to any testing in humans. Our KO mouse model has been shown to mimic human infertility. Indeed Armc2-/-mice exhibit a phenotype that is identical to that observed in humans. Our study is in line with this definition. For this reason, we have decided to maintain our current position and to use the term "preclinical" in the article. 

      110  Delete "sperm".

      Thank you, we changed it accordingly: “The preclinical Armc2 deficient (Armc2 KO) mouse model is therefore a valuable model to assess whether in vivo injection of naked mRNA combined with electroporation can restore spermatogenesis”

      111  "Easy"? Really? 

      We changed it accordingly: “We chose this model for several reasons: first, Armc2 KO mice are sterile and all sperm exhibit short, thick or coiled flagella [13].”

      112-3 "completely immobile" is redundant - either they are immobile or not.

      Thank you, we changed it accordingly: “As a result, 100 % of sperm are immobile, thus it should be easy to determine the efficacy of the technique by measuring sperm motility with a CASA system.”

      108-33 Condense this lengthy text into a coherent few sentences to give readers a sense of what you sought to accomplish, broadly how it was done, and what you found. This reads more like a Results section

      Thanks for the comment, we shortened the text.

      Materials and Methods 

      The sections appear to have been written by different scientists - the authors should standardize so that similar detail and formatting are used - e.g., in some parts the source is in parentheses with catalog number, in others not, some have city, state, country, others do not... the authors should check eLife mandates for this type of information and provide. 

      We are grateful for your feedback. We standardized the text, and if we had missed some, as outlined on the E-Life website, we can finish to format the article once it has been accepted for publication in the journal before sending the VOR.

      134 Misspelling

      We corrected the misspelling  

      142 Just reference, don't need to spell it out.

      Thanks, we changed it accordingly: “and the Armc2 KO mouse strain obtained by CRISPR-Cas9 (Coutton, Martinez et al. 2019). Experiments”

      150 What is XXX?

      We would like to express our gratitude for bringing this error to our attention. We have duly rectified the issue: “obtained from Merck (Darmstadt, Germany).”

      157-60 Are enough details provided for readers to repeat this if necessary? Doesn't seem so to this reviewer; if kits were followed, then can say "using manufacturer's protocol", or refer to another manuscript - but this is too vague. 

      Thanks, we change it accordingly: After expansion, plasmids were purified with a NucleoBond Xtra Midi kit (740410-50; Macherey-Nagel, Düren, Germany) using manufacturer's protocol.”

      165 Again, too few details - how was it purified? What liquid was it in?

      Thanks for the comment, the EEV plasmids were purified like all other plasmids. We change the text: “All plasmids,EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid ( given by Dr. Conti MD at UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOM-S017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation” 

      170 Seems some words are missing - and will everyone know Dr. Conti by last name alone? Would spell out, and the details of the plasmid must either be provided or a reference given; how was amplification done? Purification? What was it resuspended in? 

      Thank for the remark, the mcherry plasmids were purified like all other plasmids. We change the text: “All plasmids,EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid ( given by Dr. Conti MD, UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOM-S017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation”

      175 Again, for this plasmid provide more information - catalog number, reference, etc; how amplified and purified, what resuspension buffer?

      Thank you for the remark, as We mentioned, we add this sentence for the preparation: “All plasmids, EEV CAGs-GFP-T2A-Luciferase,((EEV604A-2), System Bioscience, Palo Alto, CA, USA), mCherry plasmid (given by Dr. Conti MD at UCSF, San Francisco, CA, USA) and EEV-Armc2-GFP plasmid (CUSTOMS017188-R2-3,Trilink,San Diego, USA) were amplified by bacterial transformation” and we add these sentence “The EEV-Armc2-GFP plasmid used for in vivo testes microinjection and electroporation was synthesized and customized by Trilink (CUSTOM-S017188-R2-3,San Diego, USA).”

      183 What sequence, or isoform was used? Mouse or human? 

      Thanks, we changed accordingly: “This non-integrative episome contains the mice cDNA sequences of Armc2 (ENSMUST00000095729.11)”

      186-7 Provide sequence or catalog number; what was it resolubilized in?

      Thanks we changed accordingly “the final plasmid concentration was adjusted to 9 μg μL-1 in water.” We provided the sequence of EEV-Armc2-GFP in supp data 6.

      207-219 Much better, this is how the entire section needs to be written! 

      237-240 Font

      Thanks for the comment, we changed it accordingly

      246 Cauda, and sperm, not sperm cells

      Thanks for the comment, we changed it accordingly

      255-6 Which was done first? Would indicate clearly.

      Thanks for the comment, we changed the sentence: “Adult mice were euthanized by cervical dislocation and then transcardiac perfused  with 1X PBS”

      281-2 Provide source for software - company, location, etc: 

      We changed it accordingly: FIJI software (Opened source software) was used to process and analyze images and Imaris software (Oxford Instruments Tubney Woods, Abingdon, Oxon OX13 5QX, UK) for the 3D reconstructions.  

      323 um, not uM. 

      Thanks for the comment, we changed our mistake: “After filtration (100 µm filter)”

      Results 

      369 Weighed.  

      Thanks for the comment, we changed our mistake: “the testes were measured and weighed”

      371 No difference in what, specifically?

      Thanks for the comment, we changed the sentence to: “No statistical differences in length and weight were observed between control and treated testes”

      375 "was respected"? What does this mean?

      Thanks for the comment, we changed the sentence to “The layered structure of germ cells were identical in all conditions”

      378  This is highly unlikely to be true, as even epididymal sperm from WT animals are often defective - the authors are saying there were ZERO morphological defects? Or that there was no difference between control and treated? Only showing 2-3 sperm for control vs treatment is not sufficient.

      Your observation that the epididymal spermatozoa from wild-type animals exhibited defective morphology is indeed true. The prevalence of these defects varies by strain, with an average incidence of 20% to 40% (Kawai, Hata et al., 2006; Fan, Liu et al., 2015). To provide a more comprehensive representation, we conducted a Harris-Shorr staining procedure and included a histogram of the percentage of normal sperm in each condition (new figure 2F4). Furthermore, Harris-Shorr staining of the epididymal sperm cells revealed that there were no discernible increases in morphological defects when mRNA and EEV were utilized, in comparison with the control. We add the sentence “At last, Harris-Shorr staining of the epididymal sperm cells demonstrated that there were no increases in morphological defects when mRNA and EEV were used in comparison with the control”.

      379  "safe" is not the right word - better to say "did not perturb spermatogenesis". 

      Thanks, we changed it accordingly: “these results suggest that in vivo microinjection and electroporation of EEV or mRNA did not perturb spermatogenesis”

      382-3 This sentence needs attention, doesn't make sense as written: 

      Thanks for the remark, we changed the sentence to: “No testicular lesions were observed on the testes at any post injection time”

      389  How long after injection? 

      Thanks for the comment, we changed the sentence to: “It is worth noting that both vectors induced GFP expression at one day post-injection”

      390  Given the duration of mouse spermatogenesis (~35 days), for GFP to persist past that time suggests that it was maintained in SSCs? How can the authors explain how such a strong signal was maintained after such a long period of time? How stable are the episomally-maintained plasmids, are they maintained 100% for months? And if they are inherited by progeny of SSCs, shouldn't they be successively diluted over time? And if they are inherited by daughter cells such that they would still be expressed 49 days after injection, shouldn't all the cells originating from that SSC also be positive, instead of what appear to be small subsets as shown in Fig. 3H2? Overall, this reviewer is struggling to understand how a plasmid would be inherited and passed through spermatogenesis in the manner seen in these results. 

      Thanks for the comment. 

      This point was already underlined in public review. We paste here our answer: “The non-insertional Enhanced Episomes Vector (EEV) plasmid is a non-viral episome based on the Epstein-Barr virus (EBV: Epstein-Barr Virus). Its maintenance within the cell is made possible by its ability to replicate in a synchronous manner with the host genome and to segregate into daughter cells. This is due to the fact that EEV is composed of two distinct elements derived from EBV: an origin of replication (oriP) and an Epstein-Barr Nuclear Antigen 1 (EBNA1) expression cassette (Gil, Gallaher, and Berk, 2010).   The oriP is a locus comprising two EBNA1-binding domains, designated as the Family of Repeats (FR) and Dyad Symmetry (DS). The FR is an array of approximately 20 EBNA1-binding sites (20 repeats of 30 bp) with high affinity, while the DS comprises four lower-affinity sites operating in tandem (Ehrhardt et al., 2008). 

      The 641-amino-acid EBNA1 protein contains numerous domains.The N-terminal domains are rich in glycines and alanines, which enable interaction with host chromosomes. The C-terminal region is responsible for binding to oriP (Hodin, Najrana, and Yates, 2013a). The binding of EBNA1 to the DS element results in the recruitment of the origin of replication. This results in the synchronous initiation of extra-chromosomal EEV replication with host DNA at each S phase of the cell cycle (Düzgüneş, Cheung, and Konopka 2018a). Furthermore, EBNA1 binding to the FR domain induces the formation of a bridge between metaphase chromosomes and the vector during mitosis. This binding is responsible for the segregation of the EEV episome in daughter cells (Düzgüneş, Cheung, and Konopka 2018b). It is notable that EEV is maintained at a rate of 90-95% per cell division.”

      Because of the intrinsic properties of EEV described above, the presence of the reporter protein at 119 day after injection was likely due to the maintenance of the plasmid, mostly in Sertoli cells, and not to the DNA integration of the plasmid.

      Of note, the specificity of EEV was already indicated in the introduction. Nevertheless, we have added more information about it to help the readers (lines 124-128 clean copy)  

      398 Which "cell types"? 

      Your feedback is greatly appreciated, and the sentence in question has been amended to provide a more comprehensive understanding. The revised wording is as follows: These results suggest that GFPmRNA and EEV-GFP targeted different seminiferous cell types, such as Sertoli cells and all germline cells, or that there were differences in terms of transfection efficiency.

      409 Why is it important to inject similar copies of EEV and mRNA? Wouldn't the EEV be expected to generate many, many more copies of RNA per molecule than the mRNAs when injected directly?? 

      We removed the word importantly. 

      415 How is an injected naked mRNA stably maintained for 3 weeks? What is the stability of this mRNA?? Wouldn't its residence in germ cells for 21 days make it more stable than even the most stable endogenous mRNAs? Even mRNAs for housekeeping genes such as actin, which are incredibly stable, have half-lives of 9-10 hours.

      We appreciate your inquiry and concur with your assessment that mRNA stability is limited.  It is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the expression of the GFP protein induced by the mRNA. To draw the reader's attention to this point, we have added the following sentence to the text “It is important to underline that the signal measured is the fluorescence emitted by the GFP. This signal is dependent of both the half-lives of the plasmid/mRNA and the GFP. Therefore, the kinetic of the signal persistence (which is called here expression) is a combination of the persistence of the vector and the synthetized protein. See lines 469-472 clean copy. 

      This being said, it is difficult to compare the lifespan of a cellular mRNA with that of a mRNA that has been modified at different levels, including 5’Cap, mRNA body, poly(A)tail modifications, which both increase mRNA stability and translation (see The Pivotal Role of Chemical Modifications in mRNA Therapeutics  (2022) https://doi.org/10.3389/fcell.2022.901510). This question is discussed lines 687698 clean copy

      467 "safely" should be deleted

      Thanks, we removed the word: “To validate and confirm the capacity of naked mRNA to express proteins in the testes after injection and electroporation”

      470  Except that apoptotic cells were clearly seen in Figure 2:

      We would like to thank the reviewer for their comment. We agree that the staining of the provided sections were of heterogenous quality. To address the remark, we carried out additional HE staining for all conditions, and we now present testis sections correctly stained obtained in the different condition in Fig. 2 and Supp. 7. Our observations revealed that the number of apoptotic cells remained consistent across all conditions.

      471  "remanence"?

      We appreciate your feedback and have amended the sentence to provide clear meaning. The revised wording is as follows: “The assessment of the temporal persistence of testicular mCherry fluorescent protein expression revealed a robust red fluorescence from day 1 post-injection, which remained detectable for at least 15 days (Fig. Supp. 3 B2, C2, and D2).”

      489 IF measures steady-state protein levels, not translation; should say you determined when ARMC2 was detectable. 

      Thanks for the remark, we changed the sentence to: “ By IF, we determined when ARMC2 protein was detectable during spermatogenesis.”

      491 Flagella

      Thanks for the comment, we changed our mistake: “in the flagella of the elongated spermatids (Fig 9A)”

      Discussion 

      The Discussion is largely a re-hashing of the Methods and Results, with additional background.

      Message stability must be addressed - how is a naked mRNA maintained for 21 days?

      As previously stated, it is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the synthetized GFP protein. This point and the stability of protein in the testis is now discussed lines 677-684 (clean copy).

      556 How do the authors define "safe"?

      Thanks for the comment, we changed the sentence to be clearer: “Our results also showed that the combination of injection and electroporation did not perturb spermatogenesis when electric pulses are carefully controlled”

      563 Synthesized

      Thanks, we changed it accordingly

      602 Again, this was not apparent, as there were more apoptotic cells in Fig. 2 - data must be provided to show "no effect".

      As previously stated, we carried out additional HE staining for all conditions, as can be observed in Fig. 2 . Our observations revealed that the number of apoptotic cells remained consistent across all conditions.

      629-30 This directly contradicts the authors' contention in the Introduction that ICSI was unsafe - how is this procedure going to be an advancement over ICSI as proposed, if ICSI needs to be used?? Why not just skip all this and do ICSI then?? Perhaps if this technique was used to 'repair' defects in spermatogonia or spermatocytes, then that makes more sense. But if ICSI is required, then this is not an advancement when trying to rescue a sperm morphology/motility defect.

      In light of the latest findings (Fig 12), we have revised this part of the discussion and this paragraph no longer exist.

      Nevertheless, to address specifically the reviewer’s remark, we would like to underline that ICSI with sperm from fertile donor is always more efficient than ICSI with sperm from patient suffering of OAT condition. Our strategy, by improving sperm quality, will improve the efficiency of ICSI and at the end will increase the live birth rate resulting from the first fresh IVF cycle.

      640-2 What is meant by "sperm organelles" And what examples are provided for sperm proteins being required at or after fertilization? 

      This paragraph was also strongly modified and the notion of protein persistence during spermatogenesis was discussed in the paragraph on fluorescent signal duration. See lines 698-705.

      651 "Dong team"??

      Thanks for the comment, we added the references. 

      Figure 2D2 - tubule treated with EEV-GFP appears to have considerably more apoptotic cells - this reviewer counted ~10 vs 0 in control; also, many of the spermatocytes appear abnormal in terms of their chromatin morphology - the authors must address this by staining for markers of apoptosis - not fair to conclude there was no difference when there's a very obvious difference! 

      We would like to thank the reviewer for their comment. This point was already addressed. As previously stated, we provide now new testis sections for all condition (see Fig. 2). Our observations revealed that the number of apoptotic cells remained consistent across all conditions.

      Figure 2D3 staining is quite different than D1-2, likely a technical issue - looks like no hematoxylin was added? Need to re-stain so results can be compared to the other 2 figures 

      As previously stated, we carried out additional HE staining for all conditions, and new images are provided, with similar staining. 

      Figure 3 - the fluorescent images lack any context of tubule structure so it is nearly impossible to get a sense of what cells express GFP, or whether they're in the basal vs adluminal compartment - can the authors outline them? Indicate where the BM and lumen are. 

      We would like to thank the reviewer for their comment. This figure provides actually a global view of the green fluorescent protein (GFP) expression at the surface of the testis. The entire testis was placed under an inverted epifluorescence microscope, and a picture of the GFP signal was recorded. For this reason, it is impossible to delineate the BM and the lumen. It should be noted that the fluorescence likely originates from different seminiferous tubules.

      Author response image 1.

      So, for Figure 3 if the plasmid is being uptaken by cells and maintained as an episome, is it able to replicate? Likely not. 

      Yes! it is the intrinsic property of the episome, see the detailed explanation provided above about the EEV plasmid

      So, initially, it could be in spermatogonia, spermatocytes, and spermatids. As time progressed those initially positive spermatids and then spermatocytes would be lost - and finally, the only cells that should be positive would be the progeny of spermatogonia that were positive - but, as they proliferate shouldn't the GFP signal decline? 

      Because EEV is able  to replicate in a synchronous manner with the host genome and to segregate into daughter cells at a level of 90% of the mother cell, the expected decline is very slow.

      And, since clones of germ cells are connected throughout their development, shouldn't the GFP diffuse through the intercellular bridges so entire clones are positive? Was this observed? 

      We did not perform IF experiments further than 7 days after injection, a time too short to observe what the reviewer suggested. Moreover, if at 1 day after injection, GFP synthesized from injected EEV was found in both germ cells and Sertoli cells (Fig 7), after one week, the reporter proteins were only observable in Sertoli cells. This result suggests that EEV is maintained only in Sertoli cells, thus preventing the observation of stained clones.

      Can these sections be stained for the ICB TEX14 so that clonality can be distinguished? Based on the apparent distance between cells, it appears some are clones, but many are not... 

      We thank the reviewer for this suggestion but we are not able to perform testis sectioning and costaining experiments because the PFA treatment bleaches the GFP signal. We also tested several GFP antibodies, but all failed.  

      Nevertheless, we were able to localize and identify transfected cells thank to the whole testis optical clearing, combined with a measure of GFP fluorescence and three-dimensional image reconstructions. 

      For Figure 4, with the mRNA-GFP, why does the 1-day image (which looks similar to the plasmidtransfected) look so different from days 7-21? 

      And why do days 7-21 look so different from those days in Fig 3? 

      Thank you for your feedback. It is an excellent question. Because of the low resolution of the whole testis epifluorescences imaging and light penetration issue, we decided to carry-out whole testis optical clearing and three-dimensional image reconstructions experiments, in order to get insights on the transfection process. At day 1, GFP synthesized from EEV injection was found in spermatogonia, spermatocytes and Sertoli cells (Fig 7).  After one week, the reporter protein synthesized from injected EEV was only observable in Sertoli cells.

      In contrast, for mRNA, on day 1 and day 7 post-injection, GFP fluorescent signal was associated with both Sertoli cells and germ cells. This explains why patterns between mRNA-GFP and EEV-GFP are similar at day 1 and different at day 7 between both conditions. 

      Why do the authors think the signal went from so strong at 21 to undetectable at 28? What changed so drastically over those 7 days?

      What is the half-life of this mRNA supposed to be? It seems that 21 days is an unreasonably long time, but then to go to zero at 28 seems also odd... Please provide some explanation, and context for whether the residence of an exogenous mRNA for 21 days is expected. 

      As previously stated, it is our hypothesis that the source of the confusion lies in the fact that we injected mRNA coding for the GFP protein, rather than mRNA tagged with GFP. After a three-week observation period, we did not observe the mRNA, but we observed the GFP protein produced by the mRNA. The time of observation of the reporter proteins expressed by the respective mRNA molecules (mCherry, luciferase, or GFP) ranged from 15 to 21 days. Proteins have very different turnover rates, with half-lives ranging from minutes to days. Half-lives depend on proteins but also on tissues. As explained in the discussion, it has been demonstrated that proteins involved in spermatogenesis exhibit a markedly low turnover rate and this explains the duration of the fluorescent signal. 

      The authors should immunostain testis sections from controls and those with mRNA and plasmid and immunostain with established germ cell protein fate markers to show what specific germ cell types are GFP+

      Thank you for your feedback. As previously mentioned, we were unable to perform testis sectioning and co-staining because the PFA treatment bleaches the GFP signal and because we were unable to reveal GFP with an GFP antibody, for unknown reasons.

      For the GFP signal to be maintained past 35 days, the plasmid must have integrated into SSCs - and for that to happen, the plasmid would have to cross the blood-testis-barrier... is this expected? 

      We are grateful for your observation. 

      First, as explained above, we do not think that the plasmid has been integrated. 

      Concerning the blood-testing barrier.  It bears noting that electroporation is a technique that is widely utilized in biotechnology and medicine for the delivery of drugs and the transfer of genes into living cells (Boussetta, Lebovka et al. 2009). This process entails the application of an electric current, which induces the formation of hydrophilic pores in the lipid bilayer of the plasma membrane (Kanduser, Miklavcic et al. 2009). The pores remain stable throughout the electroporation process and then close again once it is complete. Consequently, as electroporation destabilizes the cell membrane, it can also destabilize the gap junctions responsible of the blood-testis barrier. This was actually confirmed by several studies, which have observed plasmid transfection beyond the blood-testis barrier with injection into rete testis following electroporation (Muramatsu, Shibata et al. 1997, Kubota, Hayashi et al. 2005, Danner, Kirchhoff et al. 2009, Kanduser, Miklavcic et al. 2009, Michaelis, Sobczak et al. 2014).

      Figure 9 - authors should show >1 cell - this is insufficient; also, it's stated it's only in the flagella, but it also appears to be in the head as well. And is this just the principal piece?? And are the authors sure those are elongating vs condensing spermatids? Need to show multiple tubules, at different stages, to make these claims

      We have partly answered to this question in the public review; We pastehere  our answer

      “We present now new images showing the full seminiferous tubules as requested (see supp fig 6). In this new figure, it is clear that Armc2 is only expressed in spermatids. We have also added in this figure an analysis of the RNA-seq database produced by Gan's team (Gan, Wen et al. 2013), confirming that ArmC2 expression is predominantly expressed at the elongated spermatid stage. This point is now clearly indicated in the text.”

      Concerning the localization of the protein in the head, we confirm that the base of the manchette is stained but we have no explanation so far. This point is now indicated in the manuscript.

      Figure 10B2 image - a better resolution is necessary

      We are grateful for your feedback. We concede that the quality of the image was not optimal. Consequently, We have replaced it with an alternative.

      Figure 11 - in control, need to show >1 sperm; and lower-mag images should be provided for all samples to show population-wide effects; showing 1 "normal" sperm per group (white arrows) is insufficient: 

      We are grateful for your feedback. We conducted further experiments and provide now additional images in Supp. figure 8.

      Reviewer #3 (Recommendations For The Authors)

      In this study, Vilpreux et al. developed a microinjection/electroporation method in order to transfect RNA into testicular cells. The authors studied several parameters of treated testis and compared the injection of DNA versus RNA. Using the injection of Armc2 RNA into mice with an Armc2 knockout the authors were able to (partly) rescue the fertility phenotype. 

      Minor points. 

      Figure 6 + lines 553+554: might it be that the staining pattern primarily on one side of the testis is due to the orientation of the scissor electrode during the electroporation procedure and the migration direction of negatively charged RNA molecules (Figure 6)? 

      Your input is greatly appreciated. We concur that the observed peripheral expression is due to both the electroporation and injection. Accordingly, we have amended the sentence as follows: "The peripheral expression observed was due to the close vicinity of cells to the electrodes, and to a peripheral dispersal of the injected solution, as shown by the distribution of the fluorescent i-particles NIRFiP-180."

      Discussion of the safety aspect (lines 601-608): The authors state several times that there are no visible tissue changes after the electroporation procedure. However, in order to claim that this procedure is "safe", it is necessary to examine the offspring born after microinjection/electroporation. 

      Your input is greatly appreciated. Consequently, the term "safe" has been replaced with "did not perturb spermatogenesis" in accordance with the provided feedback. Your assertion is correct; an examination of the offspring born would be necessary to ascertain the safety of the procedure. Due to the quantity of motile sperm obtained, it was not possible to produce offspring through natural mating. However, novel Armc2-/--rescued sperm samples have been produced and in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) experiments have been conducted. The results demonstrate that the Armc2-/--rescued sperm can successfully fertilize eggs and produce two-cell embryos by IVF and blastocysts by ICSI. These outcomes are visually represented in Figure 12. The development of embryos up to the blastocyst stage is a step in the right direction.

      The discussion section could be shortened. Lines 632-646 are largely a repetition of the introductory section. In addition, the Dong paper (ref. 25) may be interesting; however, this part could also be shortened (lines 647-676). This reviewer would prefer the authors to focus on the technique (different application sites and applied nucleotides) and proof of concept for (partial) phenotype rescue in the knockout mice. 

      Your contribution is highly valued. In light of your observations and the latest findings, we have substantially revised the discussion accordingly.

      Line 63: oocytes rather than eggs.

      We are grateful for your input, but we have decided to retain our current position and to use the term "eggs" rather than "oocytes" in our writing because the definition of an oocyte is a female gametocyte or germ cell involved in reproduction. In other words, oocyte corresponds to a germ cell inside the ovary and after ovulation become an egg.  

      Boussetta, N., N. Lebovka, E. Vorobiev, H. Adenier, C. Bedel-Cloutour and J. L. Lanoiselle (2009). "Electrically assisted extraction of soluble matter from chardonnay grape skins for polyphenol recovery." J Agric Food Chem 57(4): 1491-1497.

      Cavallini, G. (2006). "Male idiopathic oligoasthenoteratozoospermia." Asian J Androl 8(2): 143-157.

      Colpi, G. M., S. Francavilla, G. Haidl, K. Link, H. M. Behre, D. G. Goulis, C. Krausz and A. Giwercman (2018). "European Academy of Andrology guideline Management of oligo-asthenoteratozoospermia." Andrology 6(4): 513-524.

      Coutton, C., G. Martinez, Z. E. Kherraf, A. Amiri-Yekta, M. Boguenet, A. Saut, X. He, F. Zhang, M. Cristou-Kent, J. Escoffier, M. Bidart, V. Satre, B. Conne, S. Fourati Ben Mustapha, L. Halouani, O. Marrakchi, M. Makni, H. Latrous, M. Kharouf, K. Pernet-Gallay, M. Bonhivers, S. Hennebicq, N. Rives, E. Dulioust, A. Toure, H. Gourabi, Y. Cao, R. Zouari, S. H. Hosseini, S. Nef, N. Thierry-Mieg, C. Arnoult and P. F. Ray (2019). "Bi-allelic Mutations in ARMC2 Lead to Severe Astheno-Teratozoospermia Due to Sperm Flagellum Malformations in Humans and Mice." Am J Hum Genet 104(2): 331-340.

      Danner, S., C. Kirchhoff and R. Ivell (2009). "Seminiferous tubule transfection in vitro to define postmeiotic gene regulation." Reprod Biol Endocrinol 7: 67.

      Gan, H., L. Wen, S. Liao, X. Lin, T. Ma, J. Liu, C. X. Song, M. Wang, C. He, C. Han and F. Tang (2013). "Dynamics of 5-hydroxymethylcytosine during mouse spermatogenesis." Nat Commun 4: 1995. Igyarto, B. Z. and Z. Qin (2024). "The mRNA-LNP vaccines - the good, the bad and the ugly?" Front Immunol 15: 1336906.

      Ishii, T. (2017). "Germ line genome editing in clinics: the approaches, objectives and global society." Brief Funct Genomics 16(1): 46-56.

      Kanduser, M., D. Miklavcic and M. Pavlin (2009). "Mechanisms involved in gene electrotransfer using high- and low-voltage pulses--an in vitro study." Bioelectrochemistry 74(2): 265-271.

      Kubota, H., Y. Hayashi, Y. Kubota, K. Coward and J. Parrington (2005). "Comparison of two methods of in vivo gene transfer by electroporation." Fertil Steril 83 Suppl 1: 1310-1318.

      Michaelis, M., A. Sobczak and J. M. Weitzel (2014). "In vivo microinjection and electroporation of mouse testis." J Vis Exp(90).

      Muramatsu, T., O. Shibata, S. Ryoki, Y. Ohmori and J. Okumura (1997). "Foreign gene expression in the mouse testis by localized in vivo gene transfer." Biochem Biophys Res Commun 233(1): 45-49.

      Raina, A., S. Kumar, R. Shrivastava and A. Mitra (2015). "Testis mediated gene transfer: in vitro transfection in goat testis by electroporation." Gene 554(1): 96-100.

      Sadelain, M., E. P. Papapetrou and F. D. Bushman (2011). "Safe harbours for the integration of new DNA in the human genome." Nat Rev Cancer 12(1): 51-58.

      Thonneau, P., S. Marchand, A. Tallec, M. L. Ferial, B. Ducot, J. Lansac, P. Lopes, J. M. Tabaste and A. Spira (1991). "Incidence and main causes of infertility in a resident population (1,850,000) of three French regions (1988-1989)." Hum Reprod 6(6): 811-816.

      Usmani, A., N. Ganguli, H. Sarkar, S. Dhup, S. R. Batta, M. Vimal, N. Ganguli, S. Basu, P. Nagarajan and S. S. Majumdar (2013). "A non-surgical approach for male germ cell mediated gene transmission through transgenesis." Sci Rep 3: 3430.

      Wang, L., C. Liu, H. Wei, Y. Ouyang, M. Dong, R. Zhang, L. Wang, Y. Chen, Y. Ma, M. Guo, Y. Yu, Q. Y. Sun and W. Li (2022). "Testis electroporation coupled with autophagy inhibitor to treat nonobstructive azoospermia." Mol Ther Nucleic Acids 30: 451-464.

    1. Pero españoles que viven fuera y que conocen el día a día de otros países aseguran que la imagen catastrofista y derrotista que se tiene a nivel interno es exagerada, que muchos de los sambenitos que nos atribuimos no son ciertos, y que la actitud de queja generalizada y un cierto sentimiento de inferioridad impiden avanzar en la solución a los problemas concretos de la sociedad española, porque en otros países con menos hacen más.

      En estas líneas, presta especial atención al uso de lenguaje sofisticado del tipo "imagen catastrofista y derrotista", "los sambenitos que nos atribuimos no son ciertos", o frases como "la actitud de queja generalizada" y "un sentimiento de inferioridad".

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Tiedje et al. investigated the transient impact of indoor residual spraying (IRS) followed by seasonal malaria chemoprevention (SMC) on the plasmodium falciparum parasite population in a high transmission setting. The parasite population was characterized by sequencing the highly variable DBL$\alpha$ tag as a proxy for var genes, a method known as varcoding. Varcoding presents a unique opportunity due to the extraordinary diversity observed as well as the extremely low overlap of repertoires between parasite strains. The authors also present a new Bayesian approach to estimating individual multiplicity of infection (MOI) from the measured DBL$\alpha$ repertoire, addressing some of the potential shortcomings of the approach that have been previously discussed. The authors also present a new epidemiological endpoint, the so-called "census population size", to evaluate the impact of interventions. This study provides a nice example of how varcoding technology can be leveraged, as well as the importance of using diverse genetic markers for characterizing populations, especially in the context of high transmission. The data are robust and clearly show the transient impact of IRS in a high transmission setting, however, some aspects of the analysis are confusing.

      (1) Approaching MOI estimation with a Bayesian framework is a well-received addition to the varcoding methodology that helps to address the uncertainty associated with not knowing the true repertoire size. It's unfortunate that while the authors clearly explored the ability to estimate the population MOI distribution, they opted to use only MAP estimates. Embracing the Bayesian methodology fully would have been interesting, as the posterior distribution of population MOI could have been better explored. 

      We thank the reviewer for appreciating the extension of var_coding we present here. We believe the comment on maximum _a posteriori (MAP) refers to the way we obtained population-level MOI from the individual MOI estimates. We would like to note that reliance on MAP was only one of two approaches we described, although we then presented only MAP.  Having calculated both, we did not observe major differences between the two, for this data set.  Nonetheless, we revised the manuscript to include the result based on the mixture distribution which considers all the individual MOI distributions in the Figure supplement 6.

      (2) The "census population size" endpoint has unclear utility. It is defined as the sum of MOI across measured samples, making it sensitive to the total number of samples collected and genotyped. This means that the values are not comparable outside of this study, and are only roughly comparable between strata in the context of prevalence where we understand that approximately the same number of samples were collected. In contrast, mean MOI would be insensitive to differences in sample size, why was this not explored? It's also unclear in what way this is a "census". While the sample size is certainly large, it is nowhere near a complete enumeration of the parasite population in question, as evidenced by the extremely low level of pairwise type sharing in the observed data. 

      We consider the quantity a census in that it is a total enumeration or count of infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires.  But our focus here is in a measure of population size itself.  The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. for the seasonal influenza virus and for the measles virus (Bedford et al., 2011)), and it is also clear in the ecological literature for non-pathogen populations (Palstra and Fraser, 2012). 

      We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission which persists across the IRS intervention. Of course, one would like to be able to use this quantity across studies that differ in sampling depth and the reviewer makes an insightful and useful suggestion.  It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size, respectively) (Table supplement 7).  We can go further, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or to the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size.  We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths our original sample (Figure supplement 8 in the revised manuscript). We now include in the revision discussion of this point, which allows an extrapolation of the census population size to the whole population of hosts in the local area.

      We have also clarified the time denominator: Given the typical duration of infection, we expect our population size to be representative of a per-generation measure_._

      (3) The extraordinary diversity of DBL$\alpha$ presents challenges to analyzing the data. The authors explore the variability in repertoire richness and frequency over the course of the study, noting that richness rapidly declined following IRS and later rebounded, while the frequency of rare types increased, and then later declined back to baseline levels. The authors attribute this to fundamental changes in population structure. While there may have been some changes to the population, the observed differences in richness as well as frequency before and after IRS may also be compatible with simply sampling fewer cases, and thus fewer DBL$\alpha$ sequences. The shift back to frequency and richness that is similar to pre-IRS also coincides with a similar total number of samples collected. The authors explore this to some degree with their survival analysis, demonstrating that a substantial number of rare sequences did not persist between timepoints and that rarer sequences had a higher probability of dropping out. This might also be explained by the extreme stochasticity of the highly diverse DBL$\alpha$, especially for rare sequences that are observed only once, rather than any fundamental shifts in the population structure.

      We thank the reviewer raising this question which led us to consider whether the change in the number of DBLα types over the course of the study (and intervention) follows from simply sampling fewer P. falciparum cases. We interpreted this question as basically meaning that one can predict the former from the latter in a simple way, and that therefore, tracking the changes in DBLα type diversity would be unnecessary.  A simple map would be for example a linear relationship (a given proportion of DBLα types lost given genomes lost), and even more trivially, a linear loss with a slope of one (same proportion).  Note, however, that for such expectations, one needs to rely on some knowledge of strain structure and gene composition. In particular, we would need to assume a complete lack of overlap and no gene repeats in a given genome. We have previously shown that immune selection leads to selection for minimum overlap and distinct genes in repertoires at high transmission (see for example (He et al., 2018)) for theoretical and empirical evidence of both patterns). Also, since the size of the gene pool is very large, even random repertoires would lead to limited overlap (even though the empirical overlap is even smaller than that expected at random (Day et al., 2017)). Despite these conservators, we cannot a priori assume a pattern of complete non-overlap and distinct genes, and ignore plausible complexities introduced by the gene frequency distribution.  

      To examine this insightful question, we simulated the loss of a given proportion of genomes from baseline in 2012 and examined the resulting loss of DBLα types. We specifically cumulated the loss of infections in individuals until it reached a given proportion (we can do this on the basis of the estimated individual MOI values). We repeated this procedure 500 times for each proportion, as the random selection of individual infection to be removed, introduces some variation. Figure 2 below shows that the relationship is nonlinear, and that one quantity is not a simple proportion of the other.  For example, the loss of half the genomes does not result in the loss of half the DBLα types. 

      Author response image 1.

      Non-linear relationship between the loss of DBLα types and the loss of a given proportion of genomes. The graph shows that the removal of parasite genomes from the population through intervention does not lead to the loss of the same proportion of DBLα types, as the initial removal of genomes involves the loss of rare DBLα types mostly whereas common DBLα types persist until a high proportion of genomes are lost. The survey data (pink dots) used for this subsampling analysis was sampled at the end of wet/high transmission season in Oct 2012 from Bongo District from northern Ghana. We used the Bayesian formulation of the _var_coding method proposed in this work to calculate the multiplicity of infection of each isolate to further obtain the total number of genomes. The randomized surveys (black dots) were obtained based on “curveball algorithm” (Strona et al., 2014) which keep isolate lengths and type frequency distribution.

      We also investigated whether the resulting pattern changed significantly if we randomized the composition of the isolates.  We performed such randomization with the “curveball algorithm” (Strona et al., 2014). This algorithm randomizes the presence-absence matrix with rows corresponding to the isolates and columns, to the different DBLα types; importantly, it preserves the DBLα type frequency and the length of isolates. We generated 500 randomizations and repeated the simulated loss of genomes as above. The data presented in Figure 2 above show that the pattern is similar to that obtained for the empirical data presented in this study in Ghana. We interpret this to mean that the number of genes is so large, that the reduced overlap relative to random due to immune selection (see (Day et al., 2017)) does not play a key role in this specific pattern. 

      Reviewer #2 (Public Review):  

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebounds more slowly than prevalence measures. Overall, I found these results clear, convincing, and well-presented. They add to a growing literature that demonstrates the relevance of asymptomatic reservoirs.  There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric. However, I am not fully convinced the current implementation will be applied meaningfully across additional studies. 

      (1) I find the term "census population size" problematic as the groups being analyzed (hosts grouped by age at a single time point) do not delineate distinct parasite populations. Separate parasite lineages are not moving through time within these host bins. Rather, there is a single parasite population that is stochastically divided across hosts at each time point. I find this distinction important for interpreting the results and remaining mindful that the 2,000 samples at each time point comprise a subsample of the true population. Instead of "census population size", I suggest simplifying it to "census count" or "parasite lineage count".  It would be fascinating to use the obtained results to model absolute parasite numbers at the whole population level (taking into account, for instance, the age structure of the population), and I do hope this group takes that on at some point even if it remains outside the scope of this paper. Such work could enable calculations of absolute---rather than relative---fitness and help us further understand parasite distributions across hosts.

      Lineages moving exclusively through a given type of host or “patch”  are not a necessary requirement for enumerating the size of the total infections in such subset.  It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings where one has multiple habitat patches, with individuals able to move across patches.

      Remaining mindful that the count is relative to sample size is an important point. Please see our response to comment (2) of reviewer 1, also for the choice of terminology. We prefer not to adopt “census count” as a census in our mind is a count, and we are not clear on the concept of lineage for these highly recombinant parasites.  Also, census population size has been adopted already in the literature for both pathogens and non-pathogens, to make a distinction with the notion of effective population size in population genetics (see our response to reviewer 1) and is consistent with our usage as outlined in the introduction. 

      Thank you for the comment on an absolute number which would extrapolate to the whole host population.  Please see again our response to comment (2) of reviewer 1, on how we can use mean MOI for this purpose once the sampling is sufficient for this quantity to become constant/stable with sampling effort.

      (2) I'm uncertain how to contextualize the diversity results without taking into account the total number of samples analyzed in each group. Because of this, I would like a further explanation as to why the authors consider absolute parasite count more relevant than the combined MOI distribution itself (which would have sample count as a denominator). It seems to me that the "per host" component is needed to compare across age groups and time points---let alone different studies.

      Again, thank you for the insightful comment. We provide this number as a separate quantity and not a distribution, although it is clearly related to the mean MOI of such distribution. It gives a tangible sense for the actual infection count (different from prevalence) from the perspective of the parasite population in the ecological sense. The “per host” notion which enables an extrapolation to any host population size for the purpose of a complete count, or for comparison with another study site, has been discussed in the above responses for reviewer 1 and now in the revision of the discussion.

      (3) Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLα repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator? 

      This is a very good point and one we now discuss further in our revision. There is no predefined upper bound one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low values, and therefore for locations with low transmission intensity.  Interestingly, we have observed that this is not the case in our paper by Labbe et al. (Labbé et al., 2023) where we used model simulations in a gradient of three transmission intensities, from high to low values. The original _var_coding method performed well across the gradient. This robustness may arise from a nonlinear and fast transition from low to high overlap that is accompanied by MOI changing rapidly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This matter clearly needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of overlap.

      Smaller comments:

      - Figure 1 provides confidence intervals for the prevalence estimates, but these aren't carried through on the other plots (and Figure 5 has lost CIs for both metrics). The relationship between prevalence and diversity is one of the interesting points in this paper, and it would be helpful to have CIs for both metrics when they are directly compared. 

      Based on the reviewer’s advice we have revised both Figure 4 and Figure 5, to include the missing uncertainty intervals. The specific approach for each quantity is described in the corresponding caption.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions. 

      Strengths: 

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age-stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population. 

      Census population size is complementary to parasite prevalence where the former gives a measure of the “parasite population size”, and the latter describes the “proportion of infected hosts”.  The reason we see similar trends for the “genetic information” (i.e., census population size) and “age-specific parasite prevalence” is because we identify all samples for var_coding based on the microscopy (i.e., all microscopy positive _P. falciparum isolates). But what is more relevant here is the relative percentage change in parasite prevalence and census population size following the IRS intervention. To make this point clearer in the revised manuscript we have updated Figure 4 and included additional panels plotting this percentage change from the 2012 baseline, for both census population size and prevalence (Figure 4EF). Overall, we see a greater percentage change in 2014 (and 2015), relative to the 2012 baseline, for census parasite population size vs. parasite prevalence (Figure 4EF) as a consequence of the significant changes in distributions of MOI following the IRS intervention (Figure 3). As discussed in the Results following the deployment of IRS in 2014 census population size decreased by 72.5% relative to the 2012 baseline survey (pre-IRS) whereas parasite prevalence only decreased by 54.5%. 

      With respect to the reviewer’s comment on “practicalities and cost”, var_coding has been used to successfully amplify _P. falciparum DNA collected as DBS that have been stored for more than 5-years from both clinical and lower density asymptomatic infection, without the additional step and added cost of sWGA ($8 to $32 USD per isolates, for costing estimates see (LaVerriere et al., 2022; Tessema et al., 2020)), which is currently required by other molecular surveillance methods (Jacob et al., 2021; LaVerriere et al., 2022; Oyola et al., 2016). _Var_coding involves a single PCR per isolate using degenerate primers, where a large number of isolates can be multiplexed into a single pool for amplicon sequencing.  Thus, the overall costs for incorporating molecular surveillance with _var_coding are mainly driven by the number of PCRs/clean-ups, the number samples indexed per sequencing run, and the NGS technology used (discussed in more detail in our publication Ghansah et al. (Ghansah et al., 2023)). Previous work has shown that _var_coding can be use both locally and globally for molecular surveillance, without the need to be customized or updated, thus it can be fairly easily deployed in malaria endemic regions (Chen et al., 2011; Day et al., 2017; Rougeron et al., 2017; Ruybal-Pesántez et al., 2022, 2021; Tonkin-Hill et al., 2021).

      Weaknesses: 

      Overall the manuscript is well-written and generally comprehensively explained. Some terms could be clarified to help the reader and I had some issues with a section of the methods and some of the more definitive statements given the evidence supporting them. 

      Thank you for the overall positive assessment. On addressing the “issues with a section of the methods” and “some of the more definitive statements given the evidence supporting them”, it is impossible to do so however, without an explicit indication of which methods and statements the reviewer is referring to. Hopefully, the answers to the detailed comments and questions of reviewers 1 and 2 address any methodological concerns (i.e., in the Materials and Methods and Results). To the issue of “definitive statements”, etc. we are unable to respond without further information.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 273: there is a reference to a figure which supports the empirical distribution of repertoire given MOI = 1, but the figure does not appear to exist.

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing this to our attention.

      Line 299: while this likely makes little difference, an insignificant result from a Kolmogorov-Smirnov test doesn't tell you if the distributions are the same, it only means there is not enough evidence to determine they are different (i.e. fail to reject the null). Also, what does the "mean MOI difference" column in supplementary table 3 mean? 

      The mean MOI difference is the difference in the mean value between the pairwise comparison of the true population-level MOI distribution, that of the population-level MOI estimates from either pooling the maximum a posteriori (MAP) estimates per individual host or the mixture distribution, or that of the population-level MOI estimates from different prior choices. This is now clarified as requested in the Table supplements 3 - 6. 

      Figure 4: how are the confidence intervals for the estimated number of var repertoires calculated? Also should include horizontal error bars for prevalence measures.

      The confidence intervals were calculated based on a bootstrap approach. We re-sampled 10,000 replicates from the original population-level MOI distribution with replacement. Each resampled replicate is the same size as the original sample. We then derive the 95% CI based on the distribution of the mean MOI of those resampled replicates. This is now clarified as requested in the Figure 4 caption (as well as Table supplement 7 footnotes). In addition, we have also updated Figure 4AB and have included the 95% CI for all measures for clarity. 

      Reviewer #2 (Recommendations For The Authors): 

      -  I would like to see a plot like Supplemental Figure 8 for the upsA DBLα repertoire size. 

      The upsA repertoire size for each survey and by age group has now been provided as requested in Figure supplement 5AB. 

      -  Supplemental Table 2 is cut off in the pdf. 

      We have now resolved this issue so that the Table supplement 2 is no longer cut off.  

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript terms the phrase "census population size". To me, the census is all about the number of individuals, not necessarily their diversity. I appreciate that there is no simple term for this, and I imagine the authors have considered many alternatives, but could it be clearer to say the "genetic census population size"? For example, I found the short title not particularly descriptive "Impact of IRS and SMC on census population size", which certainly didn't make me think of parasite diversity.

      Please see our response to comment (2) of reviewer 1. We prefer not to add “genetic” to the phrase as the distinction from effective population size from population genetics is important, and the quantity we are after is an ecological one. 

      The authors do not currently say much about the potential biases in the genetic data and how this might influence results. It seems likely that because (i) patients with sub-microscopic parasitaemia were not sampled and (ii) because a moderate number of (likely low density) samples failed to generate genetic data, that the observed MOI is an overestimate. I'd be interested to hear the authors' thoughts about how this could be overcome or taken into account in the future. 

      We thank the reviewer for this this comment and agree that this is an interesting area for further consideration. However, based on research from the Day Lab that is currently under review (Tan et al. 2024, under review), the estimated MOI using the Bayesian approach is likely not an “overestimate” but rather an “underestimate”. In this research by Tan et al. (2024) isolate MOI was estimated and compared using different initial whole blood volumes (e.g., 1, 10, 50, 100 uL) for the gDNA extraction. Using _var_coding and comparing these different volumes it was found that MOI was significantly “underestimated” when small blood volumes were used for the gDNA extraction, i.e., there was a ~3-fold increase in median MOI between 1μL and 100μL blood. Ultimately these findings will allow us to make computational corrections so that more accurate estimates of MOI can be obtained from the DBS in the future.

      The authors do not make much of LLIN use and for me, this can explain some of the trends. The first survey was conducted soon after a mass distribution whereas the last was done at least a year after (when fewer people would have been using the nets which are older and less effective). We have also seen a rise in pyrethroid resistance in the mosquito populations of the area which could further diminish the LLIN activity. This difference in LLIN efficacy between the first and last survey could explain similar prevalence, yet lower diversity (in Figures 4B/5). However, it also might mean that statements such as Line 478 "This is indicative of a loss of immunity during IRS which may relate to the observed loss of var richness, especially the many rare types" need to be tapered as the higher prevalence observed in this age group could be caused by lower LLIN efficacy at the time of the last survey, not loss of immunity (though both could be true).  

      We thank the reviewer for this question and agree that (i) LLIN usage and (ii) pyrethroid resistance are important factors to consider. 

      (i) Over the course of this study self-reported LLIN usage the previous night remained high across all age groups in each of the surveys (≥ 83.5%), in fact more participants reported sleeping under an LLIN in 2017 (96.8%) following the discontinuation of IRS compared to the 2012 baseline survey (89.1%). This increase in LLIN usage in 2017 is likely a result of several factors including a rebound in the local vector population making LLINs necessary again, increased community education and/or awareness on the importance of using LLINs, among others. Information on the LLINs (i.e., PermaNet 2.0, Olyset, or DawaPlus 2.0) distributed and participant reported usage the previous night has now been included in the Materials and Methods as requested by the reviewer.

      (ii) As to the reviewer’s question on increased in pyrethroid resistance in Ghana over the study period, research undertaken by our entomology collaborators (Noguchi Memorial Insftute for Medical Research: Profs. S. Dadzie and M. Appawu; and Navrongo Health Research Centre:  Dr. V. Asoala) has shown that pyrethroid resistance is a major problem across the country, including the Upper East Region. Preliminary studies from Bongo District (2013 - 2015), were undertaken to monitor for mutations in the voltage gated sodium channel gene that have been associated with knockdown resistance to pyrethroids and DDT in West Africa (kdr-w). Through this analysis the homozygote resistance kdr-w allele (RR) was found in 90% of An. gambiae s.s. samples tested from Bongo, providing evidence of high pyrethroid resistance in Bongo District dating back to 2013, i.e., prior to the IRS intervention (S. Dadzie, M. Appawu, personal communication). Although we do not have data in Bongo District on kdr-w from 2017 (i.e., post-IRS), we can hypothesize that pyrethroid resistance likely did not decline in the area, given the widespread deployment and use of LLINs.

      Thus, given this information that (i) self-reported LLIN usage remained high in all surveys (≥ 83.5%), and that (ii) there was evidence of high pyrethroid resistance in 2013 (i.e., kdr-w (RR) _~_90%), the rebound in prevalence observed for the older age groups (i.e., adolescents and adults) in 2017 is therefore best explained by a loss of immunity.

      I must confess I got a little lost with some of the Bayesian model section methods and the figure supplements. Line 272 reads "The measurement error is simply the repertoire size distribution, that is, the distribution of the number of non-upsA DBLα types sequenced given MOI = 1, which is empirically available (Figure supplement 3)." This does not appear correct as this figure is measuring kl divergence. If this is not a mistake in graph ordering please consider explaining the rationale for why this graph is being used to justify your point. 

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing our attention to this matter. We hope that the inclusion of this Figure as well as a more detailed description of the Bayesian approach helps to makes this section in the Materials and Methods clearer for the reader. 

      I was somewhat surprised that the choice of prior for estimating the MOI distribution at the population level did not make much difference. To me, the negative binomial distribution makes much more sense. I was left wondering, as you are only measuring MOI in positive individuals, whether you used zero truncated Poisson and zero truncated negative binomial distributions, and if not, whether this was a cause of a lack of difference between uniform and other priors. 

      Thank you for the relevant question. We have indeed considered different priors and the robustness of our  estimates to this choice and have now better described this in the text. We focused on individuals who had a confirmed microscopic asymptomatic P. falciparum infection for our MOI estimation, as median P. falciparum densities were overall low in this population during each survey (i.e., median ≤ 520 parasites/µL, see Table supplement 1). Thus, we used either a uniform prior excluding zero or a zero truncated negative binomial distribution when exploring the impact of priors on the final population-level MOI distribution.  A uniform prior and a zero-truncated negative binomial distribution with parameters within the range typical of high-transmission endemic regions (higher mean MOI with tails around higher MOI values) produce similar MOI  estimates at both the individual and population level. However, when setting the parameter range of the zero-truncated negative binomial to be of those in low transmission endemic regions where the empirical MOI distribution centers around mono-clonal infections with the majority of MOI = 1 or 2 (mean MOI » 1.5, no tail around higher MOI values), the final population-level MOI distribution does deviate more from that assuming the aforementioned prior and parameter choices. The final individual- and population-level MOI estimates are not sensitive to the specifics of the prior MOI distribution as long as this distribution captures the tail around higher MOI values with above-zero probability.   

      The high MOI in children <5yrs in 2017 (immediately after SMC) is very interesting. Any thoughts on how/why? 

      This result indicates that although the prevalence of asymptomatic P. falciparum infections remained significantly lower for the younger children targeted by SMC in 2017 compared 2012, they still carried multiclonal infections, as the reviewer has pointed out (Figure 3B). Importantly this upward shift in the MOI distributions (and median MOI) was observed in all age groups in 2017, not just the younger children, and provides evidence that transmission intensity in Bongo has rebounded in 2017, 32-months a er the discontinuation of IRS.  This increase in MOI for younger children at first glance may seem to be surprising, but instead likely shows the limitations of SMC to clear and/or supress the establishment of newly acquired infections, particularly at the end of the transmission season following the final cycle of SMC (i.e., end of September 2017 in Bongo District; NMEP/GHS, personal communication) when the posttreatment prophylactic effects of SMC would have waned (Chotsiri et al., 2022).  

      Line 521 in the penultimate paragraph says "we have analysed only low density...." should this not be "moderate" density, as low density infections might not be detected? The density range itself is not reported in the manuscript so could be added. 

      In Table supplement 1 we have provided the median, including the inter-quartile range, across each survey by age group. For the revision we have now provided the density min-max range, as requested by the reviewer. Finally, we have revised the statement in the discussion so that it now reads “….we have analysed low- to moderate-density, chronic asymptomatic infections (see Table supplement 1)……”.   

      Data availability - From the text the full breakdown of the epidemiological survey does not appear to be available, just a summary of defined age bounds in the SI. Provision of these data (with associated covariates such as parasite density and host characteristics linked to genetic samples) would facilitate more in-depth secondary analyses. 

      To address this question, we have updated the “Data availability statement” section with the following statement: “All data associated with this study are available in the main text, the Supporting Information, or upon reasonable request for research purposes to the corresponding author, Prof. Karen Day (karen.day@unimelb.edu.au).”  

      REFERENCES

      Bedford T, Cobey S, Pascual M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11. doi:10.1186/1471-2148-11-220

      Chen DS, Barry AE, Leliwa-Sytek A, Smith T-AA, Peterson I, Brown SM, Migot-Nabias F, Deloron P, Kortok MM, Marsh K, Daily JP, Ndiaye D, Sarr O, Mboup S, Day KP. 2011. A molecular epidemiological study of var gene diversity to characterize the reservoir of Plasmodium falciparum in humans in Africa. PLoS One 6:e16629. doi:10.1371/journal.pone.0016629

      Chotsiri P, White NJ, Tarning J. 2022. Pharmacokinetic considerations in seasonal malaria chemoprevention. Trends Parasitol. doi:10.1016/j.pt.2022.05.003

      Day KP, Artzy-Randrup Y, Tiedje KE, Rougeron V, Chen DS, Rask TS, Rorick MM, Migot-Nabias F, Deloron P, Luty AJF, Pascual M. 2017. Evidence of Strain Structure in Plasmodium falciparum Var Gene Repertoires in Children from Gabon, West Africa. PNAS 114:E4103–E4111. doi:10.1073/pnas.1613018114

      Ghansah A, Tiedje KE, Argyropoulos DC, Onwona CO, Deed SL, Labbé F, Oduro AR, Koram KA, Pascual M, Day KP. 2023. Comparison of molecular surveillance methods to assess changes in the population genetics of Plasmodium falciparum in high transmission. Fron9ers in Parasitology 2:1067966. doi: 10.3389/fpara.2023.1067966

      He Q, Pilosof S, Tiedje KE, Ruybal-Pesántez S, Artzy-Randrup Y, Baskerville EB, Day KP, Pascual M. 2018. Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum. Nat Commun 9:1817. doi:10.1038/s41467-018-04219-3

      Jacob CG, Thuy-nhien N, Mayxay M, Maude RJ, Quang HH, Hongvanthong B, Park N, Goodwin S, Ringwald P, Chindavongsa K, Newton P, Ashley E. 2021. Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. Elife 10:1–22.

      Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19:e1010816. doi:doi.org/10.1101/2022.06.27.497801

      LaVerriere E, Schwabl P, Carrasquilla M, Taylor AR, Johnson ZM, Shieh M, Panchal R, Straub TJ, Kuzma R, Watson S, Buckee CO, Andrade CM, Portugal S, Crompton PD, Traore B, Rayner JC, Corredor V, James K, Cox H, Early AM, MacInnis BL, Neafsey DE. 2022. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study. Mol Ecol Resour 2285–2303. doi:10.1111/1755-0998.13622

      Oyola SO, Ariani C V., Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, Jacob CG, Ogo TD, Rockeg K, Newbold CI, Berriman M, Kwiatkowski DP. 2016. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selecFve whole genome amplification. Malar J 15:1–12. doi:10.1186/s12936-016-1641-7

      Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: A compendium and appraisal. Ecol Evol 2:2357–2365. doi:10.1002/ece3.329

      Rougeron V, Tiedje KE, Chen DS, Rask TS, Gamboa D, Maestre A, Musset L, Legrand E, Noya O, Yalcindag E, Renaud F, Prugnolle F, Day KP. 2017. Evolutionary structure of Plasmodium falciparum major variant surface antigen genes in South America : Implications for epidemic transmission and surveillance. Ecol Evol 7:9376–9390. doi:10.1002/ece3.3425

      Ruybal-Pesántez S, Sáenz FE, Deed S, Johnson EK, Larremore DB, Vera-Arias CA, Tiedje KE, Day KP. 2021. Clinical malaria incidence following an outbreak in Ecuador was predominantly associated with Plasmodium falciparum with recombinant variant antigen gene repertoires. medRxiv.

      Ruybal-Pesántez S, Tiedje KE, Pilosof S, Tonkin-Hill G, He Q, Rask TS, Amenga-Etego L, Oduro AR, Koram KA, Pascual M, Day KP. 2022. Age-specific patterns of DBLa var diversity can explain why residents of high malaria transmission areas remain susceptible to Plasmodium falciparum blood stage infection throughout life. Int J Parasitol 20:721–731.

      Strona G, Nappo D, Boccacci F, Fagorini S, San-Miguel-Ayanz J. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat Commun 5. doi:10.1038/ncomms5114

      Tessema SK, Hathaway NJ, Teyssier NB, Murphy M, Chen A, Aydemir O, Duarte EM, Simone W, Colborn J, Saute F, Crawford E, Aide P, Bailey JA, Greenhouse B. 2020. Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. Journal of Infec9ous Diseases 225:1227–1237.

      Tonkin-Hill G, Ruybal-Pesántez S, Tiedje KE, Rougeron V, Duffy MF, Zakeri S, Pumpaibool T, Harnyuganakorn P, Branch OH, Ruiz-Mesıa L, Rask TS, Prugnolle F, Papenfuss AT, Chan Y, Day KP. 2021. Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents. PLoS Genet 7:e1009269. doi:10.1371/journal.pgen.1009269

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor. 

      Major Strengths: 

      The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights. 

      Major Weaknesses: 

      While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.  

      Reviewer #2 (Public Review): 

      Summary: This paper aims to achieve a better understanding of how the antigenic or genetic compositions of the dominant influenza A viruses in circulation at a given time are related to key features of seasonal influenza epidemics in the US. To this end, the authors analyze an extensive dataset with a range of statistical, data science and machine learning methods. They find that the key drivers of influenza A epidemiological dynamics are interference between influenza A subtypes and genetic divergence, relative to the previous one or two seasons, in a broader range of antigenically related sites than previously thought. 

      Strengths: A thorough investigation of a large and complex dataset. 

      Weaknesses: The dataset covers a 21 year period which is substantial by epidemiological standards, but quite small from a statistical or machine learning perspective. In particular, it was not possible to follow the usual process and test predictive performance of the random forest model with an independent dataset. 

      Reviewer #3 (Public Review): 

      Summary: 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. It's a strong paper representing a thorough and fascinating exploration of potential drivers, and it makes a trove of relevant data readily available to the community. 

      Strengths: 

      This paper makes links between epidemiological and evolutionary data for influenza. Placing each in the context of the other is crucial for understanding influenza dynamics and evolution and this paper does a thorough job of this, with many analyses and nuances. The results on the extent to which evolutionary factors relate to epidemic burden, and on interference among influenza types, are particularly interesting. The github repository associated with the paper is clear, comprehensive, and well-documented. 

      Weaknesses: 

      The format of the results section can be hard to follow, and we suggest improving readability by restructuring and simplifying in some areas. There are a range of choices made about data preparation and scaling; the authors could explore sensitivity of the results to some of these. 

      Response to public reviews

      We appreciate the positive comments from the reviewers and have implemented or responded to all of the reviewers’ recommendations.

      In response to Reviewer 1, we expand on the potential drivers and biological implications of the findings pointed out in their specific recommendations. For example, we now explicitly mention that antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study. We note that, after the 2009 A(H1N1) pandemic, the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons is lower compared to A(H3N2) dominant seasons prior to 2009. We propose that the weakening of A(H3N2) predominance may be linked to the diversification of A(H3N2) viruses during the 2010s, wherein multiple antigenically distinct clades with similar fitness circulated in each season, as opposed to a single variant with high fitness.

      In response to Reviewer 2, we agree that it would be ideal and best practice to measure model performance with an independent test set, but our dataset includes only ~20 seasons. Predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. In the revised manuscript, we provide more justification and clarification of our methodology. Instead of testing model performance on an independent test set, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (Kuhn & Johnson, 2019).

      In response to Reviewer 3, we follow the reviewer’s advice to put the Methods section before the Results section. Concerning Reviewer 3’s question about the sensitivity of our results to data preparation and rescaling, we provide more justification and clarification of our methodology in the revised manuscript. In our study, we adjust influenza type/subtype incidences for differences in reporting between the pre- and post-2009 pandemic periods and across HHS regions. We adjust for differences in reporting between the pre- and post-2009 periods because the US CDC and WHO increased laboratory testing capacity in response to the 2009 A(H1N1) pandemic, which led to substantial, long-lasting improvements to influenza surveillance that are still in place today. Figure 1 - figure supplement 2 shows systematic increases in influenza test volume in all HHS regions after the 2009 pandemic. Given the substantial increase in test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results when adjusting for both pre- and post-2009 pandemic reporting and regional reporting versus only adjusting for the pre- and post-2009 pandemic reporting.

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      (1) Line 155-156. Request for a reference for: "Given that protective immunity wanes after 1-4 years" 

      We now include two references (He et al. 2015 and Wraith et al. 2022), which were cited at the beginning of the introduction when referring to the duration of protective immunity for antigenically homologous viruses. (Lines 640-642 in revised manuscript)

      (2) Line 162-163: Request a further explanation of the negative correlation between seasonal diversity of HA and NA LBI values and NA epitope distance. Clarify biological implications to aid reader understanding. 

      In the revised manuscript we expand on the biological implications of A(H3N2) virus populations characterized by high antigenic novelty and low LBI diversity.

      Lines 649-653:

      “The seasonal diversity of HA and NA LBI values was negatively correlated with NA epitope distance (Figure 2 – figure supplements 5 – 6), with high antigenic novelty coinciding with low genealogical diversity. This association suggests that selective sweeps tend to follow the emergence of drifted variants with high fitness, resulting in seasons dominated by a single A(H3N2) variant rather than multiple cocirculating clades.”

      (3) Figure S3 legend t-2 may be marked as t-1. 

      Thank you for catching this. We have fixed this typo. Note: Figure S3 is now Figure 2 – figure supplement 5.

      (4) Lines 201-214. The key takeaways from the analysis of subtype dominance are ultimately not clear. It also misses the underlying dynamics that H3N2 predominance following an evolutionary change has waned since 2009.

      In the revised manuscript we elaborate on key takeaways concerning the relationship between antigenic drift and A(H3N2) dominance. We also add a caveat noting that A(H3N2) predominance is weaker during the post-2009 period, which may be linked to the diversification of A(H3N2) lineages after 2012. We do not know of a reference that links the diversification of A(H3N2) viruses in the 2010s to a particular evolutionary change. Therefore, we do not attribute the diversification of A(H3N2) viruses to a specific evolutionary change in A(H3N2) variants circulating at the time (A/Perth/16/2009-like strains (PE09)). Instead, we allude to the potential role of A(H3N2) diversification in creating multiple co-circulating lineages that may have less of a fitness advantage.

      Lines 681-703:

      “We explored whether evolutionary changes in A(H3N2) may predispose this subtype to dominate influenza virus circulation in a given season. A(H3N2) subtype dominance – the proportion of influenza positive samples typed as A(H3N2) – increased with H3 epitope distance (t – 2) (R2 = 0.32, P = 0.05) and N2 epitope distance (t – 1) (R2 = 0.34, P = 0.03) (regression results: Figure 4; Spearman correlations: Figure 3 – figure supplement 1). Figure 4 illustrates this relationship at the regional level across two seasons in which A(H3N2) was nationally dominant, but where antigenic change differed. In 2003-2004, we observed widespread dominance of A(H3N2) viruses after the emergence of the novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). In contrast, there was substantial regional heterogeneity in subtype circulation during 2007-2008, a season in which A(H3N2) viruses were antigenically similar to those circulating in the previous season. Patterns in type/subtype circulation across all influenza seasons in our study period are shown in Figure 4 – figure supplement 1. As observed for the 2003-2004 season, widespread A(H3N2) dominance tended to coincide with major antigenic transitions (e.g.,

      A/Sydney/5/1997 (SY97) seasons, 1997-1998 to 1999-2000; A/California/7/2004 (CA04) season, 20042005), though this was not universally the case (e.g., A/Perth/16/2009 (PE09) season, 2010-2011). 

      After the 2009 A(H1N1) pandemic, A(H3N2) dominant seasons still occurred more frequently than A(H1N1) dominant seasons, but the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons was lower compared to A(H3N2) dominant seasons prior to 2009. Antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study (https://nextstrain.org/seasonal-

      flu/h3n2/ha/12y@2024-05-13) (Dhanasekaran et al., 2022; Huddleston et al., 2020; Yan et al., 2019). The decline in A(H3N2) predominance during the post-2009 period may be linked to the genetic and antigenic diversification of A(H3N2) viruses, wherein multiple lineages with similar fitness co-circulated in each season.”

      (5) Line 253-255: It would be beneficial to provide a more detailed interpretation of the statement that "pre-2009 seasonal A(H1N1) viruses may limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses." Elaborate on the cause-and-effect relationship within this statement.

      In the revised manuscript we suggest that seasonal A(H1N1) viruses may interfere with the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, because seasonal A(H1N1) viruses and A(H3N2) are more closely related, and thus may elicit stronger cross-reactive T cell responses.

      Lines 738-745:

      “The internal gene segments NS, M, NP, PA, and PB2 of A(H3N2) viruses and pre-2009 seasonal A(H1N1) viruses share a common ancestor (Webster et al., 1992) whereas A(H1N1)pdm09 viruses have a combination of gene segments derived from swine and avian reservoirs that were not reported prior to the 2009 pandemic (Garten et al., 2009; Smith et al., 2009). Non-glycoprotein genes are highly conserved between influenza A viruses and elicit cross-reactive antibody and T cell responses (Grebe et al., 2008; Sridhar, 2016). Because pre-2009 seasonal A(H1N1) viruses and A(H3N2) are more closely related, we hypothesized that seasonal A(H1N1) viruses could potentially limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, due to greater T cell-mediated cross-protective immunity.”

      (6) In the results section, many statements report statistical results of correlation analyses. Consider providing further interpretations of these results, such as the implications of nonsignificant correlations and how they support or contradict the hypothesis or previous studies. For example, the statement on line 248 regarding the lack of significant correlation between influenza B epidemic size and A(H3N2) epidemic metrics would benefit from additional discussion on what this non-significant correlation signifies and how it relates to the hypothesis or previous research. 

      In the Discussion section, we suggest that the lack of an association between influenza B circulation and A(H3N2) epidemic metrics is due to few T and B cell epitopes shared between influenza A and B viruses (Terajima et al., 2013).

      Lines 1005-1007 in revised manuscript (Lines 513-515 in original manuscript): 

      “Overall, we did not find any indication that influenza B incidence affects A(H3N2) epidemic burden or timing, which is not unexpected, given that few T and B cell epitopes are shared between the two virus types (Terajima et al., 2013).”

      Minor comments: 

      (1) Line 116-122: Include a summary statistical description of all collected data sets, detailing the number of HA and NA sequence data and their sources. Briefly describe subsampled data sets, specifying preferences (e.g., the number of HA or NA sequence data collected from each region). 

      In our revised manuscript we now include supplementary tables that summarize the number of A/H3 and

      A/N2 sequences in each subsampled dataset, aggregated by world region, for all seasons combined (Figure 2 - table supplements 1 - 2). We also include supplementary figures showing the number of sequences collected in each month and each season in North America versus the other nine world regions combined (Figure 2 - figure supplements 1 - 2). Subsampled datasets are plotted individually in the figures below but individual time series are difficult to discern due to minor differences in sequence counts across the datasets.

      (2) Figure 7A: Due to space limitations, consider rounding numbers on the x-axis to whole numbers for clarity. 

      Thank you for this suggestion. In the revised manuscript we round numbers in the axes of Figure 7A (Figure 9A in the revised manuscript) so that the axes are less crowded.

      (3) Figure 4C & Figure 4D: Note that Region 10 (purple) data were unavailable for seasons before 2009 (lines 1483-1484). Label each region on the map with its respective region number (1 to 10) and indicate this in the legend for easy identification. 

      In our original submission, the legend for Figure 4 included “Data for Region 10 (purple) were not available for seasons prior to 2009” at the end of the caption. We have moved this sentence, as well as other descriptions that apply to both C and D, so that they follow the sentence “C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant.”

      In our revised manuscript, Figure 4, and Figure 4 - figure supplement 1 (Figure S10 in original submission) include labels for each HHS region.

      We did not receive specific recommendations from Reviewer #2. However, our responses to Reviewer #3 addresses the study’s weaknesses mentioned by Reviewer #2.

      Reviewer #3 (Recommendations For The Authors): 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. 

      This is a work horse of paper, in the volumes of data that are analyzed and the extensive analysis that is done. The data that are provided are a treasure trove resource for influenza modelers and for anyone interested in seeing influenza surveillance data in the context of evolution, and evolutionary information in the context of epidemiology. 

      L53 - end of sentence "and antigenic drift": not sure this fits, explain? I thought this sentence was in contrast to antigenic drift.

      Thank you for catching this. We did not intend to include “and antigenic drift” at the end of this sentence and have removed it (Line 59).

      Para around L115: would using primarily US data be a limitation, because it's global immunity that shapes success of strains? Or, how much does each country's immunity and vaccination and so on actually shape what strains succeed there, compared to global/international factors? 

      The HA and NA phylogenetic trees in our study are enriched with US sequences because our study focuses on epidemiological dynamics in the US, and we wanted to prioritize A(H3N2) viruses that the US human population encountered in each season. We agree with the reviewer that the world population may be the right scale to understand how immunity, acquired by vaccination or natural infection, may shape the emergence and success of new lineages that will go on to circulate globally. However, our study assesses the overall impact of antigenic drift on regional A(H3N2) epidemic dynamics in the US. In other words, our driving question is whether we can predict the population-level impact of an A(H3N2) variant in the US, conditional on this particular lineage having established in the US and circulating at relatively high levels. We do not assess the global or population-level factors that may influence which A(H3N2) virus lineages are successful in a given location or season.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader. 

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      In the Results section, I found the format hard to follow, because of the extensive methodological details, numbers with CIs and long sentences. Sentences sometimes included the question, definitions of variables, and lists. For example at line 215 we have: "Next, we tested for associations between A(H3N2) evolution and epidemic timing, including onset week, defined as the winter changepoint in incidence [16], and peak week, defined as the first week of maximum incidence; spatiotemporal synchrony, measured as the variation (standard deviation, s.d.) in regional onset and peak timing; and epidemic speed, including seasonal duration and the number of weeks from onset to peak (Table 2, Figure S11)". I would suggest putting the methods section first, using shorter sentences, separating lists from the question being asked, and stating what was found without also putting in all the extra detail. Putting the methods section before the results might reduce the sense that you have to explain what you did and how in the results section too.

      Thank you for suggesting how to improve the readability of the Results section. In the revised manuscript, we follow the reviewer’s advice to put the Methods section before the Results section. Although eLife formatting requirements specify the order: Introduction, Results, Discussion, and Methods, the journal allows for the Methods section to follow the Introduction when it makes sense to do so. We agree with the reviewer that putting the Methods section before the Results section makes our results easier to follow because we no longer need to introduce methodological details at the beginning of each set of results.

      L285 in the RF you remove variables without significant correlations with the target variables, but isn't one of the aims of RF to uncover relationships where a correlation might not be evident, and in part to reveal combinations of features that give the targeted outcome? Also with the RF, I am a bit concerned that you could not use the leave-one-out approach because it was "unstable" - presumably that means that you obtain quite different results if you leave out a season. How robust are these results, and what are the most sensitive aspects? Are the same variables typically high in importance if you leave out a season, for example? What does the scatterplot of observed vs predicted epidemic size (as in Fig 7) look like if each prediction is for the one that was left out (i.e. from a model trained on all the rest)? In my experience, where the RF is "unstable", that can look pretty terrible even if the model trained on all the data looks great (as does Figure 7). In any case I think it's worth discussing sensitivity.

      (1) In response to the reviewer’s first question, we explain our rationale for not including all candidate predictors in random forest and penalized regression models. 

      Models trained with different combinations of predictors can have similar performance, and these combinations of predictors can include variables that do not necessarily have strong univariate associations with the target variable. The performance of random forest and LASSO regression models are not sensitive to redundant or irrelevant predictors (see Figure 10.2 in Kuhn & Johnson, 2019). However,  if our goal is variable selection rather than strictly model performance, it is considered best practice to remove collinear, redundant, and/or irrelevant variables prior to training models (see section 11.3 in Kuhn & Johnson, 2019). In both random forest and LASSO regression models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection. In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores. Thus, failing to minimize multicollinearity prior to model training could result in some variables having low rankings and the appearance of being unimportant, because their importance scores are overshadowed by those of the highly correlated variables. Our rationale for preprocessing predictor data follows the philosophy of Kuhn & Johnson, 2019, who recommend including the minimum possible set of variables that does not compromise model performance. Even if a particular model is insensitive to extra predictors, Kuhn and John explain that “removing predictors can reduce the cost of acquiring data or improve the throughput of the software used to make predictions.”

      In the revised manuscript, we include more details about our steps for preprocessing predictor data. We also follow the reviewer’s suggestion to include all evolutionary predictors in variable selection analyses, regardless of whether they have strong univariate correlations with target outcomes, because the performance of random forest and LASSO regression models is not affected by redundant predictors. 

      Including additional predictors in our variable selection analyses does not change our conclusions. As reported in our original manuscript, predictors with strong univariate correlations with various epidemic metrics were the highest ranked features in both random forest and LASSO regression models.

      Lines 523-563:

      “Preprocessing of predictor data: The starting set of candidate predictors included all viral fitness metrics: genetic and antigenic distances between current and previously circulating strains and the standard deviation and Shannon diversity of H3 and N2 LBI values in the current season. To account for potential type or subtype interference, we included A(H1N1) or A(H1N1)pdm09 epidemic size and B epidemic size in the current and prior season and the dominant IAV subtype in the prior season (Lee et al., 2018). We included A(H3N2) epidemic size in the prior season as a proxy for prior natural immunity to A(H3N2). To account for vaccine-induced immunity, we considered four categories of predictors and included estimates for the current and prior seasons: national vaccination coverage among adults (18-49 years coverage × ≥ 65 years coverage), adjusted A(H3N2) vaccine effectiveness (VE), a combined metric of vaccination coverage and A(H3N2) VE (18-49 years coverage × ≥ 65 years coverage × VE), and H3 and N2 epitope distances between naturally circulating A(H3N2) viruses and the U.S. A(H3N2) vaccine strain in each season. We could not include a predictor for vaccination coverage in children or consider cladespecific VE estimates, because these data were not available for most seasons in our study.

      Random forest and LASSO regression models are not sensitive to redundant (highly collinear) features (Kuhn & Johnson, 2019), but we chose to downsize the original set of candidate predictors to minimize the impact of multicollinearity on variable importance scores. For both types of models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection (Kuhn & Johnson, 2019). In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores (Kuhn & Johnson, 2019). We first confirmed that none of the candidate predictors had zero variance or near-zero variance. Because seasonal lags of each viral fitness metric are highly collinear, we included only one lag of each evolutionary predictor, with a preference for the lag that had the strongest univariate correlations with various epidemic metrics. We checked for multicollinearity among the remaining predictors by examining Spearman’s rank correlation coefficients between all pairs of predictors. If a particular pair of predictors was highly correlated (Spearman’s 𝜌 > 0.8), we retained only one predictor from that pair, with a preference for the predictor that had the strongest univariate correlations with various epidemic metrics. Lastly, we performed QR decomposition of the matrix of remaining predictors to determine if the matrix is full rank and identify sets of columns involved in linear dependencies. This step did not eliminate any additional predictors, given that we had already removed pairs of highly collinear variables based on Spearman correlation coefficients. 

      After these preprocessing steps, our final set of model predictors included 21 variables, including 8 viral evolutionary indicators: H3 epitope distance (t – 2), HI log2 titer distance (t – 2), H3 RBS distance (t – 2), H3 non-epitope distance (t – 2), N2 epitope distance (t – 1), N2 non-epitope distance (t – 1), and H3 and N2 LBI diversity (s.d.) in the current season; 6 proxies for type/subtype interference and prior immunity:

      A(H1N1) and B epidemic sizes in the current and prior season, A(H3N2) epidemic size in the prior season, and the dominant IAV subtype in the prior season; and 7 proxies for vaccine-induced immunity: A(H3N2) VE in the current and prior season, H3 and N2 epitope distances between circulating strains and the vaccine strain in each season, the combined metric of adult vaccination coverage × VE in the current and prior season, and adult vaccination coverage in the prior season.”

      (2) Next, we clarify our model training methodology to address the reviewer’s second point about using a leave-one-out cross-validation approach.

      We believe the reviewer is mistaken; we use a leave-one-season-out validation approach which lends some robustness to the predictions. In our original submission, we stated “We created each forest by generating 3,000 regression trees from 10 repeats of a leave-one-season-out (jackknife) cross-validated sample of the data. Due to the small size of our dataset, evaluating the predictive accuracy of random forest models on a quasi-independent test set produced unstable estimates.” (Lines 813-816 in the original manuscript)

      To clarify, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (see Section 3.4 in Kuhn & Johnson, 2019). To reduce noise, we generated 10 bootstrap resamples of each fold and averaged the RMSE and R2 values of model predictions from resamples. 

      Although it would be ideal and best practice to measure model performance with an independent test set, our dataset includes only ~20 seasons. We found that predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. Further, we suspect that large antigenic jumps in a small subset of seasons further contribute to variation in prediction accuracy across randomly selected test sets. Our rationale for using cross-validation instead of an independent test set is best described in Section 4.3 of Kuhn and Johnson’s book “Applied Predictive Modeling” (Kuhn & Johnson, 2013):

      “When the number of samples is not large, a strong case can be made that a test set should be avoided because every sample may be needed for model building. Additionally, the size of the test set may not have sufficient power or precision to make reasonable judgements. Several researchers (Molinaro 2005; Martin and Hirschberg 1996; Hawkins et al. 2003) show that validation using a single test set can be a poor choice. Hawkins et al. (2003) concisely summarize this point: “holdout samples of tolerable size [...] do not match the cross-validation itself for reliability in assessing model fit and are hard to motivate. “Resampling methods, such as cross-validation, can be used to produce appropriate estimates of model performance using the training set. These are discussed in length in Sect.4.4. Although resampling techniques can be misapplied, such as the example shown in Ambroise and McLachlan (2002), they often produce performance estimates superior to a single test set because they evaluate many alternate versions of the data.”

      In our revised manuscript, we provide additional clarification of our methods (Lines 574-590):

      “We created each forest by generating 3,000 regression trees. To determine the best performing model for each epidemic metric, we used leave-one-season-out (jackknife) cross-validation to train models and measure model performance, wherein each “assessment” set is one season of data predicted by the model, and the corresponding “analysis” set contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of each model (Kuhn & Johnson, 2019). Due to the small size of our dataset (~20 seasons), evaluating the predictive accuracy of random forest models on a quasi-independent test set of 2-3 seasons produced unstable estimates. Instead of testing model performance on an independent test set, we generated 10 bootstrap resamples (“repeats”) of each analysis set (“fold”) and averaged the predictions of models trained on resamples (Kuhn & Johnson, 2013, 2019). For each epidemic metric, we report the mean root mean squared error (RMSE) and R2 of predictions from the best tuned model. We used permutation importance (N = 50 permutations) to estimate the relative importance of each predictor in determining target outcomes. Permutation importance is the decrease in prediction accuracy when a single feature (predictor) is randomly permuted, with larger values indicating more important variables. Because many features were collinear, we used conditional permutation importance to compute feature importance scores, rather than the standard marginal procedure (Altmann et al., 2010; Debeer & Strobl, 2020; Strobl et al., 2008; Strobl et al., 2007).”

      (3) In response to the reviewer’s question about the sensitivity of results when one season is left out, we clarify that the variable importance scores in Figure 8 and model predictions in Figure 9 were generated by models tuned using leave-one-season-out cross-validation. 

      As explained above, in our leave-one-season-out cross-validation approach, each “assessment” set contains one season of data predicted by the model, and the corresponding “analysis” set (“fold”) contains the remaining seasons. We generated predictions of epidemic metrics and variable importance rankings by averaging the model output of 10 bootstrap resamples of each cross-validation fold. 

      In Lines 791-806, we describe which epidemic metrics have the highest prediction accuracy and report that random forest models tend to underpredict most epidemic metrics in seasons with high antigenic novelty:

      “We measured correlations between observed values and model-predicted values at the HHS region level. Among the various epidemic metrics, random forest models produced the most accurate predictions of A(H3N2) subtype dominance (Spearman’s 𝜌 = 0.95, regional range = 0.85 – 0.97), peak incidence (𝜌 = 0.91, regional range = 0.72 – 0.95), and epidemic size (𝜌 = 0.9, regional range = 0.74 – 0.95), while predictions of effective 𝑅! and epidemic intensity were less accurate (𝜌 = 0.81, regional range = 0.65 – 0.91; 𝜌 = 0.78, regional range = 0.63 – 0.92, respectively) (Figure 9). Random forest models tended to underpredict most epidemic targets in seasons with substantial H3 antigenic transitions, in particular the SY97 cluster seasons (1998-1999, 1999-2000) and the FU02 cluster season (2003-2004) (Figure 9). 

      For epidemic size and peak incidence, seasonal predictive error – the root-mean-square error (RMSE) across all regional predictions in a season – increased with H3 epitope distance (epidemic size, Spearman’s 𝜌 = 0.51, P = 0.02; peak incidence, 𝜌 = 0.63, P = 0.004) and N2 epitope distance (epidemic size, 𝜌 = 0.48, P = 0.04; peak incidence, 𝜌 = 0.48, P = 0.03) (Figure 9 – figure supplements 1 – 2). For models of epidemic intensity, seasonal RMSE increased with N2 epitope distance (𝜌 = 0.64, P = 0.004) but not H3 epitope distance (𝜌 = 0.06, P = 0.8) (Figure 9 – figure supplements 1 – 2). Seasonal RMSE of effective 𝑅! and subtype dominance predictions did not correlate with H3 or N2 epitope distance (Figure 9 – figure supplements 1 – 2).”

      I think the competition (interference) results are really interesting, perhaps among the most interesting aspects of this work. 

      Thank you! We agree that our finding that subtype interference has a greater impact than viral evolution on A(H3N2) epidemics is one of the more interesting results in the study.

      Have you seen the paper by Barrat-Charlaix et al? They found that LBI was not good predicting frequency dynamics (see https://pubmed.ncbi.nlm.nih.gov/33749787/); instead, LBI was high for sequences like the consensus sequence, which was near to future strains. LBI also was not positively correlated with epidemic impact in Figure S7.

      The local branching index (LBI) measures the rate of recent phylogenetic branching and approximates relative fitness among viral clades, with high LBI values representing greater fitness (Neher et al. 2014).

      Two of this study’s co-authors (John Huddleston and Trevor Bedford) are also co-authors of BarratCharlaix et al. 2021. Barrat-Charlaix et al. 2021 assessed the performance of LBI in predicting the frequency dynamics and fixation of individual amino acid substitutions in A(H3N2) viruses. Our study is not focused on predicting the future success of A(H3N2) clades or the frequency dynamics or probability of fixation of individual substitutions. Instead, we use the standard deviation and Shannon diversity of LBI values in each season as a proxy for genealogical (clade-level) diversity. We find that, at a seasonal level, low diversity of H3 or N2 LBI values in the current season correlates with greater epidemic intensity, higher transmission rates, and shorter seasonal duration.

      In the Discussion we provide an explanation for these correlation results (Lines 848-857): 

      “The local branching index (LBI) is traditionally used to predict the success of individual clades, with high LBI values indicating high viral fitness (Huddleston et al., 2020; Neher et al., 2014). In our epidemiological analysis, low diversity of H3 or N2 LBI in the current season correlated with greater epidemic intensity, higher transmission rates, and shorter seasonal duration. These associations suggest that low LBI diversity is indicative of a rapid selective sweep by one successful clade, while high LBI diversity is indicative of multiple co-circulating clades with variable seeding and establishment times over the course of an epidemic. A caveat is that LBI estimation is more sensitive to sequence sub-sampling schemes than strain-level measures. If an epidemic is short and intense (e.g., 1-2 months), a phylogenetic tree with our sub-sampling scheme (50 sequences per month) may not incorporate enough sequences to capture the true diversity of LBI values in that season.”

      Figure 1 - LBI goes up over time. Is that partly to do with sampling? Overall how do higher sampling volumes in later years impact this analysis? (though you choose a fixed number of sequences so I guess you downsample to cope with that). I note that LBI is likely to be sensitive to sequencing density. 

      Thank you for pointing this out. We realized that increasing LBI Shannon diversity over the course of the study period was indeed an artefact of increasing sequence volume over time. Our sequence subsampling scheme involves selecting a random sample of up to 50 viruses per month, with up to 25 viruses selected from North America (if available) and the remaining sequences evenly divided across nine other global regions. In early seasons of the study (late 1990s/early 2000s), sampling was often too sparse to meet the 25 viruses/month threshold for North America or for the other global regions combined (H3: Figure 2 - figure supplement 1; N2: Figure 2 - figure supplement 2). Ecological diversity metrics are sensitive to sample size, which explains why LBI Shannon diversity appeared to steadily increase over time in our original submission. In our revised manuscript, we correct for uneven sample sizes across seasons before estimating Shannon diversity and clarify our methodology. 

      Lines 443-482: 

      “Clade growth: The local branching index (LBI) measures the relative fitness of co-circulating clades, with high LBI values indicating recent rapid phylogenetic branching (Huddleston et al., 2020; Neher et al., 2014). To calculate LBI for each H3 and N2 sequence, we applied the LBI heuristic algorithm as originally described by Neher et al., 2014 to H3 and N2 phylogenetic trees, respectively. We set the neighborhood parameter 𝜏 to 0.4 and only considered viruses sampled between the current season 𝑡 and the previous season 𝑡 – 1 as contributing to recent clade growth in the current season 𝑡.  

      Variation in the phylogenetic branching rates of co-circulating A(H3N2) clades may affect the magnitude, intensity, onset, or duration of seasonal epidemics. For example, we expected that seasons dominated by a single variant with high fitness might have different epidemiological dynamics than seasons with multiple co-circulating clades with varying seeding and establishment times. We measured the diversity of clade growth rates of viruses circulating in each season by measuring the standard deviation (s.d.) and Shannon diversity of LBI values in each season. Given that LBI measures relative fitness among cocirculating clades, we did not compare overall clade growth rates (e.g., mean LBI) across seasons.

      Each season’s distribution of LBI values is right-skewed and does not follow a normal distribution. We therefore bootstrapped the LBI values of each season in each replicate dataset 1000 times (1000 samples with replacement) and estimated the seasonal standard deviation of LBI from resamples, rather than directly from observed LBI values. We also tested the seasonal standard deviation of LBI from log transformed LBI values, which produced qualitatively equivalent results to bootstrapped LBI values in downstream analyses.

      As an alternative measure of seasonal LBI diversity, we binned raw H3 and N2 LBI values into categories based on their integer values (e.g., an LBI value of 0.5 is assigned to the (0,1] bin) and estimated the exponential of the Shannon entropy (Shannon diversity) of LBI categories (Hill, 1973; Shannon, 1948). The Shannon diversity of LBI considers both the richness and relative abundance of viral clades with different growth rates in each season and is calculated as follows:  

      where 𝑞 𝐷 is the effective number of categories or Hill numbers of order 𝑞 (here, clades with different growth rates), with 𝑞 defining the sensitivity of the true diversity to rare versus abundant categories (Hill,

      1973). exp is the exponential function, 𝑝# is the proportion of LBI values belonging to the 𝑖th category, and 𝑅 is richness (the total number of categories). Shannon diversity 1𝐷 (𝑞 = 1) estimates the effective number of categories in an assemblage using the geometric mean of their proportional abundances 𝑝# (Hill, 1973).  

      Because ecological diversity metrics are sensitive to sampling effort, we rarefied H3 and N2 sequence datasets prior to estimating Shannon diversity so that seasons had the same sample size. For each season in each replicate dataset, we constructed rarefaction and extrapolation curves of LBI Shannon diversity and extracted the Shannon diversity estimate of the sample size that was twice the size of the reference sample size (the smallest number of sequences obtained in any season during the study) (iNEXT R package) (Chao et al., 2014). Chao et al. found that their diversity estimators work well for rarefaction and short-range extrapolation when the extrapolated sample size is up to twice the reference sample size. For H3, we estimated seasonal diversity using replicate datasets subsampled to 360 sequences/season; For N2, datasets were subsampled to 230 sequences/season.”

      Estimating the Shannon diversity of LBI from datasets with even sampling across seasons removes the previous secular trend of increasing LBI diversity over time (Figure 2 in revised manuscript).

      Figure 3 - I wondered what about the co-dominant times? 

      In Figure 3, orange points correspond to seasons in which A(H3N2) and A(H1N1) were codominant. We are not sure of the reviewer’s specific question concerning codominant seasons, but if it concerns whether antigenic drift is linked to epidemic magnitude among codominant seasons alone, we cannot perform separate regression analyses for these seasons because there are only two codominant seasons during the 22 season study period.

      Figure 4 - Related to drift and epidemic size, dominance, etc. -- when is drift measured, and (if it's measured in season t), would larger populations create more drift, simply by having access to more opportunity (via a larger viral population size)? This is a bit 'devil's advocate' but what if some epidemiological/behavioural process causes a larger and/or later peak, and those gave rise to higher drift?

      Seasonal drift is measured as the genetic or antigenic distance between viruses circulating during season t and viruses circulating in the prior season (𝑡 – 1) or two seasons ago (𝑡 – 2).

      Concerning the question about whether larger human populations lead to greater rates of antigenic drift, phylogeographic studies have repeatedly found that East-South-Southeast Asia are the source populations for A(H3N2) viruses (Bedford et al., 2015; Lemey et al., 2014), in part because these regions have tropical or subtropical climates and larger human populations, which enable year-round circulation and higher background infection rates. Larger viral populations (via larger host population sizes) and uninterrupted transmission may increase the efficiency of selection and the probability of strain survival and global spread (Wen et al., 2016). After A(H3N2) variants emerge in East-South-Southeast Asia and spread to other parts of the world, A(H3N2) viruses circulate via overlapping epidemics rather than local persistence (Bedford et al., 2015; Rambaut et al., 2008). Each season, A(H3N2) outbreaks in the US (and other temperate regions) are seeded by case importations from outside the US, genetic diversity peaks during the winter, and a strong genetic bottleneck typically occurs at the end of the season (Rambaut et al., 2008).

      Due to their faster rates of antigenic evolution, A(H3N2) viruses undergo more rapid clade turnover and dissemination than A(H1N1) and B viruses, despite similar global migration networks across A(H3N2), A(H1N1), and B viruses (Bedford et al., 2015). Bedford et al. speculate that there is typically little geographic differentiation in A(H3N2) viruses circulating in each season because A(H3N2) viruses tend to infect adults, and adults are more mobile than children. Compared to A(H3N2) viruses, A(H1N1) and B viruses tend to have greater genealogical diversity, geographic differentiation, and longer local persistence times (Bedford et al., 2015; Rambaut et al., 2008). Thus, some A(H1N1) and B epidemics are reseeded by viruses that have persisted locally since prior epidemics (Bedford et al., 2015).

      Theoretical models have shown that epidemiological processes can influence rates of antigenic evolution (Recker et al., 2007; Wen et al., 2016; Zinder et al., 2013), though the impact of flu epidemiology on viral evolution is likely constrained by the virus’s intrinsic mutation rate. 

      In conclusion, larger host population sizes and flu epidemiology can indeed influence rates of antigenic evolution. However, given that our study is US-centric and focuses on A(H3N2) viruses, these factors are likely not at play in our study, due to intrinsic biological characteristics of A(H3N2) viruses and the geographic location of our study.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      Methods -- 

      L 620 about rescaling and pre- vs post-pandemic times : tell us more - how has reporting changed? could any of this not be because of reporting but because of NPIs or otherwise? Overall there is a lot of rescaling going on. How sensitive are the results to it? 

      it would be unreasonable to ask for a sensitivity analysis for all the results for all the choices around data preparation, but some idea where there is a reason to think there might be a dependence on one of these choices would be great.

      In response to the 2009 A(H1N1) pandemic, the US CDC and WHO increased laboratory testing capacity and strengthened epidemiological networks, leading to substantial, long-lasting improvements to influenza surveillance that are still in place today (https://www.cdc.gov/flu/weekly/overview.htm). At the beginning of the COVID-19 pandemic, influenza surveillance networks were quickly adapted to detect and understand the spread of SARS-CoV-2. The 2009 pandemic occurred over a time span of less than one year, and strict non-pharmaceutical interventions (NPIs), such as lockdowns and mask mandates, were not implemented. Thus, we attribute increases in test volume during the post-2009 period to improved virologic surveillance and laboratory testing capacity rather than changes in care-seeking behavior. In the revised manuscript, we include a figure (Figure 1 - figure supplement 2) that shows systematic increases in test volume in all HHS regions after the 2009 pandemic.

      Given the substantial increase in influenza test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various

      A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results for Spearman correlations and regression models, when adjusting for the pre- and post-2009 pandemic time periods and regional reporting versus only adjusting for the pre-/post-2009 pandemic time periods. Below, we share adjusted versions of Figure 3 (regression results) and Figure 3 - figure supplement 1 (Spearman correlations). Each figure only adjusts for differences in pre- and post-2009 pandemic reporting.

      Author response image 1.

      Adjustment for pre- and post-2009 pandemic only

      Author response image 2.

      Adjustment for pre- and post-2009 pandemic only

      L635 - Why discretize the continuous LBI distribution and then use Shannon entropy when you could just use the variance and/or higher moments? (or quantiles)? Similarly, why not use the duration of the peak, rather than Shannon entropy? (though there, because presumably data are already binned weekly, and using duration would involve defining start and stop times, it's more natural than with LBI)

      We realize that we failed to mention in the methods that we calculated the standard deviation of LBI in each season, in addition to the exponential of the Shannon entropy (Shannon diversity) of LBI. Both the Shannon diversity of LBI values and the standard deviation of LBI values were negatively correlated with effective Rt and epidemic intensity and positively correlated with seasonal duration. The two measures were similarly correlated with effective Rt and epidemic intensity (Figure 3 - figure supplements 2 - 3), while the Shannon diversity of LBI had slightly stronger correlations with seasonal duration than s.d. LBI (Figure 5). Thus, both measures of LBI diversity appear to capture potentially biologically important heterogeneities in clade growth rates.

      Separately, we use the inverse Shannon entropy of the incidence distribution to measure the spread of an A(H3N2) epidemic during the season, following the methods of Dalziel et al. 2018. The peak of an epidemic is a single time point at which the maximum incidence occurs. We have not encountered “the duration of the peak” before in epidemiology terminology, and, to our knowledge, there is not a robust way to measure the “duration of a peak,” unless one were to measure the time span between multiple points of maximum incidence or designate an arbitrary threshold for peak incidence that is not strictly the maximum incidence. Given that Shannon entropy is based on the normalized incidence distribution over the course of the entire influenza season (week 40 to week 20), it does not require designating an arbitrary threshold to describe epidemic intensity.

      L642 - again why normalize epidemic intensities, and how sensitive are the results to this? I would imagine given that the RF results were unstable under leave-one-out analysis that some of those results could be quite sensitive to choices of normalization and scaling.

      Epidemic intensity, defined as the inverse Shannon entropy of the incidence distribution, measures the spread of influenza cases across the weeks in a season. Following Dalziel et al. 2018, we estimated epidemic intensity from normalized incidence distributions rather than raw incidences so that epidemic intensity is invariant under differences in reporting rates and/or attack rates across regions and seasons. If we were to use raw incidences instead, HHS regions or seasons could have the appearance of greater or lower epidemic intensity (i.e., incidence concentrated within a few weeks or spread out over several weeks), due to differences in attack rates or test volume, rather than fundamental differences in the shapes of their epidemic curves. In other words, epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season.

      In the methods section, we provide further clarification for why epidemic intensities are based on normalized incidence distributions rather than raw incidences.

      Lines 206-209: “Epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season. Following the methodology of Dalziel et al. 2018, epidemic intensity values were normalized to fall between 0 and 1 so that epidemic intensity is invariant to differences in reporting rates and/or attack rates across regions and seasons.”  

      L643 - more information about what goes into Epidemia (variables, priors) such that it's replicable/understandable without the code would be good. 

      We now include additional information concerning the epidemic models used to estimate Rt, including all model equations, variables, and priors (Lines 210-276 in Methods).

      L667 did you do breakpoint detection? Why linear models? Was log(incidence) used? 

      In our original submission, we estimated epidemic onsets using piecewise regression models (Lines 666674 in original manuscript), which model non-linear relationships with breakpoints by iteratively fitting linear models (Muggeo, 2003). Piecewise regression falls under the umbrella of parametric methods for breakpoint detection.

      We did not include results from linear models fit to log(incidence) or GLMs with Gaussian error distributions and log links, due to two reasons. First, models fit to log-transformed data require non-zero values as inputs. Although breakpoint detection does not necessarily require weeks of zero incidence leading up to the start of an outbreak, limiting the time period for breakpoint detection to weeks with nonzero incidence (so that we could use log transformed incidence) substantially pushed back previous more biologically plausible estimates of epidemic onset weeks. Second, as an alternative to limiting the dataset to weeks with non-zero incidence, we tried adding a small positive number to weekly incidences so that we could fit models to log transformed incidence for the whole time period spanning epidemic week 40 (the start of the influenza season) to the first week of maximum incidence. Fitting models to log

      transformed incidences produced unrealistic breakpoint locations, potentially because log transformations 1) linearize data, and 2) stabilize variance by reducing the impact of extreme values. Due to the short time span used for breakpoint detection, log transforming incidence diminishes abrupt changes in incidence at the beginning of outbreaks, making it difficult for models to estimate biologically plausible breakpoint locations. Log transformations of incidence may be more useful when analyzing time series spanning multiple seasons, rather than short time spans with sharp changes in incidence (i.e., the exponential growth phase of a single flu outbreak).

      As an alternative to piecewise regression, our revised manuscript also estimates epidemic onsets using a Bayesian ensemble algorithm that accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (BEAST - a Bayesian estimator of Abrupt change, Seasonal change, and Trend; Zhao et al., 2019). Although a few regional onset time times differed across the two methods, our conclusions did not change concerning correlations between viral fitness and epidemic onset timing.

      We have rewritten the methods section for estimating epidemic onsets to clarify our methodology and to include the BEAST method (Lines 292-308):

      “We estimated the regional onsets of A(H3N2) virus epidemics by detecting breakpoints in A(H3N2) incidence curves at the beginning of each season. The timing of the breakpoint in incidence represents epidemic establishment (i.e., sustained transmission) rather than the timing of influenza introduction or arrival (Charu et al., 2017). We used two methods to estimate epidemic onsets: 1) piecewise regression, which models non-linear relationships with break points by iteratively fitting linear models to each segment (segmented R package) (Muggeo, 2008; Muggeo, 2003), and 2) a Bayesian ensemble algorithm (BEAST – a Bayesian estimator of Abrupt change, Seasonal change, and Trend) that explicitly accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (Rbeast R package) (Zhao et al., 2019). For each region in each season, we limited the time period of breakpoint detection to epidemic week 40 to the first week of maximum incidence and did not estimate epidemic onsets for regions with insufficient signal, which we defined as fewer than three weeks of consecutive incidence and/or greater than 30% of weeks with missing data. We successfully estimated A(H3N2) onset timing for most seasons, except for three A(H1N1) dominant seasons: 20002001 (0 regions), 2002-2003 (3 regions), and 2009-2010 (0 regions). Estimates of epidemic onset weeks were similar when using piecewise regression versus the BEAST method, and downstream analyses of correlations between viral fitness indicators and onset timing produced equivalent results. We therefore report results from onsets estimated via piecewise regression.”

      L773 national indicators -- presumably this is because you don't have regional-level information, but it might be worth saying that earlier so it doesn't read like there are other indicators now, called national indicators, that we should have heard of 

      In the revised manuscript, we move a paragraph that was at the beginning of the Results to the beginning of the Methods.

      Lines 123-132: 

      “Our study focuses on the impact of A(H3N2) virus evolution on seasonal epidemics from seasons 19971998 to 2018-2019 in the U.S.; whenever possible, we make use of regionally disaggregated indicators and analyses. We start by identifying multiple indicators of influenza evolution each season based on changes in HA and NA. Next, we compile influenza virus subtype-specific incidence time series for U.S. Department of Health and Human Service (HHS) regions and estimate multiple indicators characterizing influenza A(H3N2) epidemic dynamics each season, including epidemic burden, severity, type/subtype dominance, timing, and the age distribution of cases. We then assess univariate relationships between national indicators of evolution and regional epidemic characteristics. Lastly, we use multivariable regression models and random forest models to measure the relative importance of viral evolution, heterosubtypic interference, and prior immunity in predicting regional A(H3N2) epidemic dynamics.”

      In Lines 484-487 in the Methods, we now mention that measures of seasonal antigenic and genetic distance are at the national level. 

      “For each replicate dataset, we estimated national-level genetic and antigenic distances between influenza viruses circulating in consecutive seasons by calculating the mean distance between viruses circulating in the current season 𝑡 and viruses circulating during the prior season (𝑡 – 1 year; one season lag) or two prior seasons ago (𝑡 – 2 years; two season lag).”

      L782 Why Beta regression and what is "the resampled dataset" ? 

      Beta regression is appropriate for models of subtype dominance, epidemic intensity, and age-specific proportions of ILI cases because these data are continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). “The resampled dataset” refers to the “1000 bootstrap replicates of the original dataset (1000 samples with replacement)” mentioned in Lines 777-778 of the original manuscript. 

      In the revised manuscript, we include more background information about Beta regression models, and explicitly mention that regression models were fit to 1000 bootstrap replicates of the original dataset.

      Lines 503-507: 

      “For subtype dominance, epidemic intensity, and age-specific proportions of ILI cases, we fit Beta regression models with logit links. Beta regression models are appropriate when the variable of interest is continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). For each epidemic metric, we fit the best-performing regression model to 1000 bootstrap replicates of the original dataset.”

      The github is clear, comprehensive and well-documented, at least at a brief glance. 

      Thank you! At the time of resubmission, our GitHub repository is updated to incorporate feedback from the reviewers.

      References

      Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347.

      https://doi.org/10.1093/bioinformatics/btq134  

      Barrat-Charlaix, P., Huddleston, J., Bedford, T., & Neher, R. A. (2021). Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses. Mol Biol Evol, 38(7), 2767-2777.

      https://doi.org/10.1093/molbev/msab065  

      Bedford, T., Riley, S., Barr, I. G., Broor, S., Chadha, M., Cox, N. J., Daniels, R. S., Gunasekaran, C. P.,

      Hurt, A. C., Kelso, A., Klimov, A., Lewis, N. S., Li, X., McCauley, J. W., Odagiri, T., Potdar, V., Rambaut, A., Shu, Y., Skepner, E., . . . Russell, C. A. (2015). Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559), 217-220.

      https://doi.org/10.1038/nature14460  

      Chao, A., Gotelli, N. J., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K., & Ellison, A. M. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67. https://doi.org/10.1890/13-0133.1  Charu, V., Zeger, S., Gog, J., Bjornstad, O. N., Kissler, S., Simonsen, L., Grenfell, B. T., & Viboud, C. (2017). Human mobility and the spatial transmission of influenza in the United States. PLoS

      Comput Biol, 13(2), e1005382. https://doi.org/10.1371/journal.pcbi.1005382  

      Dalziel, B. D., Kissler, S., Gog, J. R., Viboud, C., Bjornstad, O. N., Metcalf, C. J. E., & Grenfell, B. T.

      (2018). Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.

      Science, 362(6410), 75-79. https://doi.org/10.1126/science.aat6030  

      Debeer, D., & Strobl, C. (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1), 307. https://doi.org/10.1186/s12859-020-03622-2  

      Dhanasekaran, V., Sullivan, S., Edwards, K. M., Xie, R., Khvorov, A., Valkenburg, S. A., Cowling, B. J., & Barr, I. G. (2022). Human seasonal influenza under COVID-19 and the potential consequences of influenza lineage elimination. Nat Commun, 13(1), 1721. https://doi.org/10.1038/s41467-02229402-5  

      Ferrari, S., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501  

      Garten, R. J., Davis, C. T., Russell, C. A., Shu, B., Lindstrom, S., Balish, A., Sessions, W. M., Xu, X., Skepner, E., Deyde, V., Okomo-Adhiambo, M., Gubareva, L., Barnes, J., Smith, C. B., Emery, S. L., Hillman, M. J., Rivailler, P., Smagala, J., de Graaf, M., . . . Cox, N. J. (2009). Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.

      Science, 325(5937), 197-201. https://doi.org/10.1126/science.1176225  

      Grebe, K. M., Yewdell, J. W., & Bennink, J. R. (2008). Heterosubtypic immunity to influenza A virus:

      where do we stand? Microbes Infect, 10(9), 1024-1029.

      https://doi.org/10.1016/j.micinf.2008.07.002  

      Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2), 427-432. https://doi.org/https://doi.org/10.2307/1934352  

      Huddleston, J., Barnes, J. R., Rowe, T., Xu, X., Kondor, R., Wentworth, D. E., Whittaker, L., Ermetal, B., Daniels, R. S., McCauley, J. W., Fujisaki, S., Nakamura, K., Kishida, N., Watanabe, S., Hasegawa, H., Barr, I., Subbarao, K., Barrat-Charlaix, P., Neher, R. A., & Bedford, T. (2020).

      Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza

      A/H3N2 evolution. Elife, 9, e60067. https://doi.org/10.7554/eLife.60067  Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer. 

      Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC. 

      Lee, E. C., Arab, A., Goldlust, S. M., Viboud, C., Grenfell, B. T., & Bansal, S. (2018). Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol,

      14(3), e1006020. https://doi.org/10.1371/journal.pcbi.1006020  

      Lemey, P., Rambaut, A., Bedford, T., Faria, N., Bielejec, F., Baele, G., Russell, C. A., Smith, D. J., Pybus,

      O. G., Brockmann, D., & Suchard, M. A. (2014). Unifying viral genetics and human transportation

      data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog, 10(2), e1003932. https://doi.org/10.1371/journal.ppat.1003932  

      Muggeo, V. (2008). Segmented: An R Package to Fit Regression Models With Broken-Line Relationships. R News, 8, 20-25. 

      Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Stat Med, 22(19), 30553071. https://doi.org/10.1002/sim.1545  

      Neher, R. A., Russell, C. A., & Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife, 3, e03568. https://doi.org/10.7554/eLife.03568  

      Rambaut, A., Pybus, O. G., Nelson, M. I., Viboud, C., Taubenberger, J. K., & Holmes, E. C. (2008). The genomic and epidemiological dynamics of human influenza A virus. Nature, 453(7195), 615-619.

      https://doi.org/10.1038/nature06945  

      Recker, M., Pybus, O. G., Nee, S., & Gupta, S. (2007). The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proceedings of the National Academy of Sciences, 104(18), 7711-7716.

      https://doi.org/doi:10.1073/pnas.0702154104  

      Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423. 

      Smith, G. J., Vijaykrishna, D., Bahl, J., Lycett, S. J., Worobey, M., Pybus, O. G., Ma, S. K., Cheung, C. L., Raghwani, J., Bhatt, S., Peiris, J. S., Guan, Y., & Rambaut, A. (2009). Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459(7250), 1122-1125. https://doi.org/10.1038/nature08182  

      Sridhar, S. (2016). Heterosubtypic T-Cell Immunity to Influenza in Humans: Challenges for Universal TCell Influenza Vaccines. Front Immunol, 7, 195. https://doi.org/10.3389/fimmu.2016.00195  

      Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. https://doi.org/10.1186/1471-2105-9-307  

      Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8, 25.

      https://doi.org/10.1186/1471-2105-8-25  

      Terajima, M., Babon, J. A., Co, M. D., & Ennis, F. A. (2013). Cross-reactive human B cell and T cell epitopes between influenza A and B viruses. Virol J, 10, 244. https://doi.org/10.1186/1743-422x10-244  

      Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M., & Kawaoka, Y. (1992). Evolution and ecology of influenza A viruses. Microbiological Reviews, 56(1), 152-179.

      https://doi.org/doi:10.1128/mr.56.1.152-179.1992  

      Wen, F., Bedford, T., & Cobey, S. (2016). Explaining the geographical origins of seasonal influenza A

      (H3N2). Proc Biol Sci, 283(1838). https://doi.org/10.1098/rspb.2016.1312  

      Yan, L., Neher, R. A., & Shraiman, B. I. (2019). Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. Elife, 8. https://doi.org/10.7554/eLife.44205  

      Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing

      of Environment, 232, 111181. https://doi.org/10.1016/j.rse.2019.04.034  

      Zinder, D., Bedford, T., Gupta, S., & Pascual, M. (2013). The Roles of Competition and Mutation in Shaping Antigenic and Genetic Diversity in Influenza. PLOS Pathogens, 9(1).

      https://doi.org/10.1371/journal.ppat.1003104

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents an important finding on the implicit and automatic emotion perception from biological motion (BM). The evidence supporting the claims of the authors is solid, although inclusion of a larger number of samples and more evidence for the discrepancy between Intact and local emotional BMs would have strengthened the study. The work will be of broad interest to perceptual and cognitive neuroscience.

      We express our sincere gratitude for the positive and constructive evaluation of our manuscript. We have now included more participants and conducted a replication experiment to strengthen our results.

      Reviewer #1 (Public Review):

      Summary:

      Tian et al. investigated the effects of emotional signals in biological motion on pupil responses. In this study, subjects were presented with point-light biological motion stimuli with happy, neutral, and sad emotions. Their pupil responses were recorded with an eye tracker. Throughout the study, emotion type (i.e., happy/sad/neutral) and BM stimulus type (intact/inverted/non-BM/local) were systematically manipulated. For intact BM stimuli, happy BM induced a larger pupil diameter than neutral BM, and neutral BM also induced a larger pupil diameter than sad BM. Importantly, the diameter difference between happy and sad BM correlated with the autistic trait of individuals. These effects disappeared for the inverted BM and non-BM stimuli. Interestingly, both happy and sad emotions show superiority in pupil diameter.

      Strengths:

      (1) The experimental conditions and results are very easy to understand.

      (2) The writing and data presentation are clear.

      (3) The methods are sound. I have no problems with the experimental design and results.

      Weaknesses:

      (1) My main concern is the interpretation of the intact and local condition results. The processing advantage of happy emotion is not surprising given a number of existing studies. However, the only difference here seems to be the smaller (or larger) pupil diameter for sad compared to neutral in the intact (or local, respectively) condition. The current form only reports this effect but lacks in-depth discussions and explanations as to why this is the case.

      Thanks for pointing this out, our apology for not making this point clear. It has long been documented that pupil size reflects the degree of cognitive effort and attention input (Joshi & Gold, 2019; van der Wel & van Steenbergen, 2018), and indexes the noradrenalin activity in emotion processing structures like amygdala (Dal Monte et al., 2015; Harrison et al., 2006; Liddell et al., 2005). Accordingly, we proposed that the smaller pupil response observed under the sad condition as compared to the neutral condition is because the sad biological motion (BM) could be less efficient in attracting visual attention and evoking emotional arousal. In line with this, it has been found that infants looked more at the neutral point-light walker when displayed in pair with the sad walker (Ogren et al., 2019), suggesting that the sad BM is less effective in capturing visual attention than the neutral BM. Besides, neural studies have revealed that, compared with other emotions (anger, happiness, disgust, and fear), the processing of sad emotion failed to evoke heightened activities in any emotionally relevant brain regions including the amygdala, the extrastriate body area (EBA) and the fusiform body area (FBA) (Peelen et al., 2007)(Peelen et al., 2007). The current study echoed with these previous findings by demonstrating a disadvantage for intact sad BM in evoking pupil responses. Notably, different from the intact sad BM, the local sad BM would instead induce stronger pupil responses than the neutral local BM. This distinctive pupil modulation effect observed in intact and local sad BM could be explained as a multi-level emotion processing model of BM. Specifically, even though both the intact and local BM conveyed important life information (Chang & Troje, 2008, 2009; Simion et al., 2008), the latter is deprived of the global form feature. Hence, the processing of emotions in local BM may occur at a more basic and preliminary level, responding to the general affective salient emotion information (happy and sad) without detailed analysis. In fact, similar dissociated emotion processing phenomenon has been observed in another important type of emotional signal with analogous function (i.e., facial expression). For example, happy and fearful faces elicited differential amygdala activations when perceived consciously. However, they elicited comparable amygdala activations when suppressed (Williams et al., 2004). Moreover, it has been proposed that there exist two parallel routes for facial expression processing: a quick but coarse subcortical route that detects affective salient information without detailed analysis, and a fine-grained but slow cortical route that discriminates the exact emotion type. Similarly, the dissociated emotion processing in local and intact BM may function in the same manner, with the former serving as a primary emotion detection mechanism and the latter serving as a detailed emotion discrimination mechanism. Still, future studies adopting more diverse experimental paradigms and neuroimaging techniques were needed to further investigate this issue. We have added these points and more thoroughly discussed the potential mechanism in the revised text (see lines 329-339, 405-415, 418-420).

      References:

      Chang, D. H. F., & Troje, N. F. (2008). Perception of animacy and direction from local biological motion signals. Journal of Vision, 8(5), 3. https://doi.org/10.1167/8.5.3

      Chang, D. H. F., & Troje, N. F. (2009). Characterizing global and local mechanisms in biological motion perception. Journal of Vision, 9(5), 8–8. https://doi.org/10.1167/9.5.8

      Dal Monte, O., Costa, V. D., Noble, P. L., Murray, E. A., & Averbeck, B. B. (2015). Amygdala lesions in rhesus macaques decrease attention to threat. Nature Communications, 6(1). https://doi.org/10.1038/ncomms10161

      Harrison, N. A., Singer, T., Rotshtein, P., Dolan, R. J., & Critchley, H. D. (2006). Pupillary contagion: central mechanisms engaged in sadness processing. Social Cognitive and Affective Neuroscience, 1(1), 5–17. https://doi.org/10.1093/scan/nsl006

      Joshi, S., & Gold, J. I. (2019). Pupil size as a window on neural substrates of cognition. Trends in Cognitive Sciences, 24(6), 466–480. https://doi.org/10.31234/osf.io/dvsme

      Liddell, B. J., Brown, K. J., Kemp, A. H., Barton, M. J., Das, P., Peduto, A., Gordon, E., & Williams, L. M. (2005). A direct brainstem–amygdala–cortical ‘alarm’ system for subliminal signals of fear. NeuroImage, 24(1), 235–243.

      Ogren, M., Kaplan, B., Peng, Y., Johnson, K. L., & Johnson, S. P. (2019). Motion or emotion: infants discriminate emotional biological motion based on low-level visual information. Infant Behavior and Development, 57, 101324. https://doi.org/10.1016/j.infbeh.2019.04.006

      Peelen, M. V., Atkinson, A. P., Andersson, F., & Vuilleumier, P. (2007). Emotional modulation of body-selective visual areas. Social Cognitive and Affective Neuroscience, 2(4), 274–283. https://doi.org/10.1093/scan/nsm023

      Simion, F., Regolin, L., & Bulf, H. (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. https://doi.org/10.1073/pnas.0707021105

      van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of effort in cognitive control tasks: a review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y

      Williams, M. A., Morris, A. P., McGlone, F., Abbott, D. F., & Mattingley, J. B. (2004). Amygdala responses to fearful and happy facial expressions under conditions of binocular suppression. Journal of Neuroscience, 24(12), 2898-2904.

      (2) I also found no systematic discussion and theoretical contributions regarding the correlation with the autistic traits. If the main point of this paper is to highlight an implicit and objective behavioral marker of the autistic trait, more interpretation and discussion of the links between the results and existing findings in ASD are needed.

      We thank the reviewer for this insightful suggestion. The perception of biological motion (BM) has long been considered an important hallmark of social cognition. Abundant studies reported that individuals with social cognitive deficits (e.g., ASD) were impaired in BM perception (Blake et al., 2003; Freitag et al., 2008; Klin et al., 2009; Nackaerts et al., 2012). More recently, it has been pointed out that the extraction of more complex social information (e.g., emotions, intentions) from BM, as compared to basic BM recognitions, could be more effective in detecting ASDs (Federici et al., 2020; Koldewyn et al., 2009; Parron et al., 2008; Todorova et al., 2019). Specifically, a meta-analysis found that the effect size expanded nearly twice when the task required emotion recognition as compared to simple perception/detection (Todorova et al., 2019). However, for the high-functioning ASD individuals, it has been reported that they showed comparable performance with the control group in explicitly labelling BM emotions, while their responses were rather delayed (Mazzoni et al., 2021). This suggested that ASD individuals could adopt compensatory strategies to complete the explicit BM labelling task, while their automatic behavioural responses remained impaired. This highlights the importance of using more objective measures that do not rely on active reports to investigate the intrinsic perception of emotions from BM and its relationship with ASD-related social deficits. The current study thus introduced the pupil size measurement to this field, and we combined it with the passive viewing task to investigate the more automatic aspect of BM emotion processing. More importantly, in addition to diagnostic ASDs, the non-clinical general population also manifested autistic tendencies that followed normal distribution and demonstrated substantial heritability (Hoekstra et al., 2007). Here, we focused on the autistic tendencies in the general population, and our results showed that pupil modulations by BM emotions were indicative of individual autistic traits. Specifically, passively viewing the happy BMs evoked larger pupil responses than the sad BMs, while such emotional modulation diminished with the increase of autistic tendencies. More detailed test-retest examination further illustrated such a correlation was driven by the general diminishment in pupil modulation effects by emotional BM (happy or sad) for individuals with high autistic tendencies. This finding demonstrated that the automatic emotion processing of BM stimuli was impaired in individuals with high autistic tendencies, lending support to previous studies (Hubert et al., 2006; Nackaerts et al., 2012; Parron et al., 2008). This indicated the utility of emotional BM stimuli and pupil measurement in identifying ASD-related tendencies in both clinical and non-clinical populations. We have added these points to the revised text (see lines 347-375).

      References:

      Blake, R., Turner, L. M., Smoski, M. J., Pozdol, S. L., & Stone, W. L. (2003). Visual recognition of biological motion is impaired in children with autism. Psychological Science, 14(2), 151–157. https://doi.org/10.1111/1467-9280.01434

      Federici, A., Parma, V., Vicovaro, M., Radassao, L., Casartelli, L., & Ronconi, L. (2020). Anomalous perception of biological motion in autism: a conceptual review and meta-analysis. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-61252-3

      Freitag, C. M., Konrad, C., Häberlen, M., Kleser, C., von Gontard, A., Reith, W., Troje, N. F., & Krick, C. (2008). Perception of biological motion in autism spectrum disorders. Neuropsychologia, 46(5), 1480–1494. https://doi.org/10.1016/j.neuropsychologia.2007.12.025

      Hoekstra, R. A., Bartels, M., Verweij, C. J. H., & Boomsma, D. I. (2007). Heritability of autistic traits in the general population. Archives of Pediatrics & Adolescent Medicine, 161(4), 372. https://doi.org/10.1001/archpedi.161.4.372

      Hubert, B., Wicker, B., Moore, D. G., Monfardini, E., Duverger, H., Fonséca, D. D., & Deruelle, C. (2006). Brief report: recognition of emotional and non-emotional biological motion in individuals with autistic spectrum disorders. Journal of Autism and Developmental Disorders, 37(7), 1386–1392. https://doi.org/10.1007/s10803-006-0275-y

      Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. https://doi.org/10.1038/nature07868

      Koldewyn, K., Whitney, D., & Rivera, S. M. (2009). The psychophysics of visual motion and global form processing in autism. Brain, 133(2), 599–610. https://doi.org/10.1093/brain/awp272

      Mazzoni, N., Ricciardelli, P., Actis-Grosso, R., & Venuti, P. (2021). Difficulties in recognising dynamic but not static emotional body movements in autism spectrum disorder. Journal of Autism and Developmental Disorders, 52(3), 1092–1105. https://doi.org/10.1007/s10803-021-05015-7

      Nackaerts, E., Wagemans, J., Helsen, W., Swinnen, S. P., Wenderoth, N., & Alaerts, K. (2012). Recognizing biological motion and emotions from point-light displays in autism spectrum disorders. PLoS ONE, 7(9), e44473. https://doi.org/10.1371/journal.pone.0044473

      Parron, C., Da Fonseca, D., Santos, A., Moore, D. G., Monfardini, E., & Deruelle, C. (2008). Recognition of biological motion in children with autistic spectrum disorders. Autism, 12(3), 261–274. https://doi.org/10.1177/1362361307089520

      Todorova, G. K., Hatton, R. E. M., & Pollick, F. E. (2019). Biological motion perception in autism spectrum disorder: a meta-analysis. Molecular Autism, 10(1). https://doi.org/10.1186/s13229-019-0299-8

      Reviewer #2 (Public Review):

      Summary:

      Through a series of four experiments, Yuan, Wang and Jiang examined pupil size responses to emotion signals in point-light motion stimuli. Experiment 1 examined upright happy, sad and neutral point-light biological motion (BM) walkers. The happy BM induced a significantly larger pupil response than the neutral, whereas the sad BM evoked a significantly smaller pupil size than the neutral BM. Experiment 2 examined inverted BM walkers. Experiment 3 examined BM stimuli with acceleration removed. No significant effects of emotion were found in neither Experiment 2 nor Experiment 3. Experiment 4 examined scrambled BM stimuli, in which local motion features were preserved while the global configuration was disrupted. Interestingly, the scrambled happy and sad BM led to significantly greater pupil size than the scrambled neutral BM at a relatively early time, while no significant difference between the scrambled happy and sad BM was found. Thus, the authors argue that these results suggest multi-level processing of emotions in life motion signals.

      Strengths:

      The experiments were carefully designed and well-executed, with point-light stimuli that eliminate many potential confounding effects of low-level visual features such as luminance, contrast, and spatial frequency.

      Weaknesses:

      Correlation results with limited sample size should be interpreted with extra caution.

      Thanks for pointing this out. To strengthen the correlation results, we have conducted a replication experiment (Exp.1b) and added a test-retest examination to further assess the reliability of our measurements. Specifically, a new group of 24 participants (16 females, 8 males) were recruited to perform the identical experiment procedure as in Experiment 1. Then, after at least seven days, they were asked to return to the lab for a retest. The results successfully replicated the previously reported main effect of emotional condition in both the first test (F(2, 46) = 12.0, p < .001, ηp2 = 0.34, Author response image 1A) and the second test (F(2, 46) = 14.8, p < .001, ηp2 = 0.39, Author response image 1B). The happy BM induced a significantly larger pupil response than the neutral BM (First Test: t(23) = 2.60, p = .022, Cohen’s d = 0.53, 95% CI for the mean difference = [0.02, 0.14], Holm-corrected, p = .048 after Bonferroni correction, Author response image 1A; Second Test: t(23) = 3.36, p = .005, Cohen’s d = 0.68, 95% CI for the mean difference = [0.06, 0.24], Holm-corrected, p = .008 after Bonferroni correction, Author response image 1B). On the contrary, the sad BM induced a significantly smaller pupil response than the neutral BM (First Test: t(23) = -2.77, p = .022, Cohen’s d = 0.57, 95% CI for the mean difference = [-0.19, -0.03], Holm-corrected, p = .033 after Bonferroni correction; Second Test: t(23) = -3.19, p = .005, Cohen’s d = 0.65, 95% CI for the mean difference = [-0.24, -0.05], Holm-corrected, p = .012 after Bonferroni correction, Author response image 1B). Besides, the happy BM induced significantly larger pupil response than the sad BM (first test: t(23) = 4.23, p < .001, Cohen’s d = 0.86, 95% CI for the mean difference = [0.10, 0.28], Holm-corrected, p < .001 after Bonferroni correction, Author response image 1A; second test: t(23) = 4.26, p < .001, Cohen’s d = 0.87, 95% CI for the mean difference = [0.15, 0.44], Holm-corrected, p < .001 after Bonferroni correction, Author response image 1B). The results of the cluster-based permutation analysis were also similar (see Supplementary Material for more details).

      Author response image 1.

      Normalized mean pupil responses in the replication experiment (Experiment 1b) of Experiment 1a and its retest, using the neutral condition as baseline, plotted against happy and sad conditions. (A) In the first test, the group average pupil response to happy intact BM is significantly larger than that to sad and neutral BM, while the pupil response induced by sad BM is significantly smaller than that evoked by neutral BM, replicating the results of Experiment 1a. (B) Moreover, such results were similarly found in the second test.

      Notably, we successfully replicated the negative correlation between the happy over sad dilation effect and individual autistic traits in the first test (r(23) = -0.46, p = .023, 95% CI for the mean difference = [-0.73, -0.07], Author response image 2A). No other significant correlations were found (see Author response image 2B-C). Moreover, in the second test, such a correlation was similarly found and was even stronger (r(23) = -0.61, p = .002, 95% CI for the mean difference = [-0.81, -0.27], Author response image 2D). We‘ve also performed a test-retest reliability analysis on the happy over sad pupil dilation effect and the AQ score. The results showed robust correlations. See Author response table 1 for more details.

      Author response table 1.

      Reliability of pupil size and AQ indices.

      Importantly, in the second test, we’ve also observed a significant negative correlation between AQ and the happy minus neutral pupil dilation effect (r(23) = -0.44, p = .032, 95% CI for the mean difference = [-0.72, -0.04], Author response image 2E), and a significant positive correlation between the sad minus neutral pupil size and AQ (r(23) = 0.50, p = .014, 95% CI for the mean difference = [0.12, 0.75], Author response image 2F). This indicated that the overall correlation between happy over sad dilation effect and AQ was driven both by the diminished happy dilation effect as well as the sad constriction effect. Overall, our replication experiment consistently found a significant negative correlation between AQ and happy over sad dilation effect both in the test and the retest. Moreover, it revealed that such an effect was contributed by both a negative correlation between AQ and happy-neutral pupil response and a positive correlation between AQ and sad-neutral pupil response, demonstrating a general impairment in BM emotion perception (happy or sad) for individuals with high autistic tendencies. This also indicated the utility of adopting a test-retest pupil examination to more precisely detect individual autistic tendencies. We have added these points in the revised text (see lines 135-173, lines 178-180).

      Author response image 2.

      Correlation results for pupil modulation effects and AQ scores in the replication experiment (Experiment 1b) of Experiment 1a and its retest. (A) We replicated the negative correlation between the happy over sad pupil dilation effect and AQ in the first test. (B-C) No other significant correlations were found. (D) In the second test, the negative correlation between the happy over sad pupil dilation effect and AQ was similarly observed and even stronger. (E-F) Moreover, the happy vs. neutral pupil dilation effect and the sad vs. neutral pupil constriction effect respectively correlate with AQ in the second test.

      It would be helpful to add discussions as a context to compare the current results with pupil size reactions to emotion signals in picture stimuli.

      Thanks for this this thoughtful comment. The modulation of emotional information on pupil responses has been mostly investigated using picture stimuli. Bradley et al. (2008) first demonstrated that humans showed larger pupil responses towards emotional images as compared to neutral images, while no difference was observed between the positive and negative images. This was regarded as the result of increased sympathetic activity induced by emotional arousal that is independent of the emotional valence. Similar results have been replicated with different presentation durations, repetition settings, and tasks (Bradley & Lang, 2015; Snowden et al., 2016). However, the emotional stimuli adopted in these studies were mostly complicated scene images that conveyed rather general emotional information. When it comes to the specific emotion cues (e.g., fear, anger, happy, sad) delivered by our conspecifics through biologically salient signals (e.g., faces, gestures, voices), the results became intermixed. Some studies demonstrated that fearful, disgusted, and angry static faces induced larger pupil sizes than the neutral face, while sad and happy faces failed to induce such pupil dilatory effects (Burley et al., 2017). In contrast, other studies observed larger pupil responses for happy faces as compared to sad and fearful faces (Aktar et al., 2018; Burley & Daughters, 2020; Jessen et al., 2016). These conflicting results could be due to the low-level confounds of emotional faces (e.g., eye size) (Carsten et al., 2019; Harrison et al., 2006). Similar to faces, BM also conveyed salient clues concerning the emotional states of our interactive partners. However, they were highly simplified, deprived of various irrelevant visual confounders (e.g., body shape). Here, we reported that the happy BM induced a stronger pupil response than the neutral and sad BM, lending support to the happy dilation effect observed with faces (Burley & Daughters, 2020; Prunty et al., 2021). Moreover, it helps ameliorate the concern regarding the low-level confounding factors by identifying similar pupil modulations in another type of social signal with distinctive perceptual features. We have added these points to the revised text (see lines 301-321).

      References:

      Aktar, E., Mandell, D. J., de Vente, W., Majdandžić, M., Oort, F. J., van Renswoude, D. R., Raijmakers, M. E. J., & Bögels, S. M. (2018). Parental negative emotions are related to behavioral and pupillary correlates of infants’ attention to facial expressions of emotion. Infant Behavior and Development, 53, 101–111. https://doi.org/10.1016/j.infbeh.2018.07.004

      Bradley, M. M., & Lang, P. J. (2015). Memory, emotion, and pupil diameter: repetition of natural scenes. Psychophysiology, 52(9), 1186–1193. https://doi.org/10.1111/psyp.12442

      Bradley, M. M., Miccoli, L., Escrig, M. A., & Lang, P. J. (2008). The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology, 45(4), 602–607. https://doi.org/10.1111/j.1469-8986.2008.00654.x

      Burley, D. T., & Daughters, K. (2020). The effect of oxytocin on pupil response to naturalistic dynamic facial expressions. Hormones and Behavior, 125, 104837. https://doi.org/10.1016/j.yhbeh.2020.104837

      Burley, D. T., Gray, N. S., & Snowden, R. J. (2017). As far as the eye can see: relationship between psychopathic traits and pupil response to affective stimuli. PLOS ONE, 12(1), e0167436. https://doi.org/10.1371/journal.pone.0167436

      Carsten, T., Desmet, C., Krebs, R. M., & Brass, M. (2019). Pupillary contagion is independent of the emotional expression of the face. Emotion, 19(8), 1343–1352. https://doi.org/10.1037/emo0000503

      Harrison, N. A., Singer, T., Rotshtein, P., Dolan, R. J., & Critchley, H. D. (2006). Pupillary contagion: central mechanisms engaged in sadness processing. Social Cognitive and Affective Neuroscience, 1(1), 5–17. https://doi.org/10.1093/scan/nsl006

      Jessen, S., Altvater-Mackensen, N., & Grossmann, T. (2016). Pupillary responses reveal infants’ discrimination of facial emotions independent of conscious perception. Cognition, 150, 163–169. https://doi.org/10.1016/j.cognition.2016.02.010

      Prunty, J. E., Keemink, J. R., & Kelly, D. J. (2021). Infants show pupil dilatory responses to happy and angry facial expressions. Developmental Science, 25(2). https://doi.org/10.11<br /> 11/desc.13182

      Snowden, R. J., O’Farrell, K. R., Burley, D., Erichsen, J. T., Newton, N. V., & Gray, N. S. (2016). The pupil’s response to affective pictures: role of image duration, habituation, and viewing mode. Psychophysiology, 53(8), 1217–1223. https://doi.org/10.1111/psyp.12668

      Overall, I think this is a well-written paper with solid experimental results that support the claim of the authors, i.e., the human visual system may process emotional information in biological motion at multiple levels. Given the key role of emotion processing in normal social cognition, the results will be of interest not only to basic scientists who study visual perception, but also to clinical researchers who work with patients of social cognitive disorders. In addition, this paper suggests that examining pupil size responses could be a very useful methodological tool to study brain mechanisms underlying emotion processing.

      Reviewer #3 (Public Review):

      Summary:

      The overarching goal of the authors was to understand whether emotional information conveyed through point-light biological motion can trigger automatic physiological responses, as reflected in pupil size.

      Strengths:

      This manuscript has several noticeable strengths: it addresses an intriguing research question that fills that gap in existing literature, presents a clear and accurate presentation of the current literature, and conducts a series of experiments and control experiments with adequate sample size. Yet, it also entails several noticeable limitations - especially in the study design and statistical analyses.

      Weaknesses:

      (1) Study design:

      (1.1) Dependent variable:

      Emotional attention is known to modulate both microsaccades and pupil size. Given the existing pupillometry data that the authors have collected, it would be both possible and valuable to determine whether the rate of microsaccades is also influenced by emotional biological motion.

      We thank the reviewer for this advice. Microsaccades functioned as a mechanism to maintain visibility by continuously shifting the retinal image to overcome visual adaptation (Martinez-Conde et al., 2006). Moreover, it was found to be sensitive to attention processes (Baumeler et al., 2020; Engbert & Kliegl, 2003b; Meyberg et al., 2017), and could reflect the activity of superior colliculus (SC) and other related brain areas (Martinez-Conde et al., 2009, 2013). Previous studies have found that, compared with neutral and pleasant images, unpleasant images significantly inhibit early microsaccade rates (Kashihara, 2020; Kashihara et al., 2013). This is regarded as the result of retaining previous crucial information at the sacrifice of updating new visual input. We agree with the reviewer that it would be valuable to investigate whether emotional information conveyed by BM could modulate microsaccades. However, it should be noted that our data collection and experimental design are not optimized for this purpose. This is because we have only recorded the left eye’s data, while abundant methodological studies have doubted the reliability of using only one eye’s data to analyze microsaccades (Fang et al., 2018; Hauperich et al., 2020; Nyström et al., 2017) and suggested that the microsaccades should be defined by spontaneous binocular eye movement (Engbert & Kliegl, 2003a, 2003b). Besides, according to Kashihara et al. (2013), participants showed differential microsaccade rates after the stimuli disappeared so as to maintain the previously observed different emotional information. However, in the current study, we discarded the data after the stimuli disappeared, making it impossible to analyze the microsaccade data after the stimuli disappeared. Despite these disadvantages, we have attempted to analyze the microsaccade rate during the stimuli presentation using only the left eye’s data. Specifically, we applied the algorithm developed by Otero-Millan et al. (2014) (minimum duration =6 ms, maximum amplitude = 1.5 degrees, maximum velocity = 150 degrees/sec) to the left eye’s data from 100 ms before to 4000 ms after stimulus onset. Subsequently, we calculated the microsaccade rates using a moving window of 100 ms (stepped in 1 ms) (Engbert & Kliegl, 2003b; Kashihara et al., 2013). The microsaccade rate displayed a typical curve, with suppression shortly after stimulus appearance (inhibition phase), followed by an increased rate of microsaccade occurrence (rebound phase). The cluster-based permutation analysis was then applied to explore the modulation of BM emotions on microsaccade rates. However, no significant differences among different emotional conditions (happy, sad, neutral) were found for the four experiments.

      Author response image 3.

      Time-series change in the microsaccade rates to happy, sad, and neutral BM in Experiments 1-4. Solid lines represent microsaccade rates under each emotional condition as a function of time (happy: red; sad: blue; neutral: gray); shaded areas represent the SEM between participants. No significant differences were found after cluster-based permutation correction for the four experiments.

      It is important to note that the microsaccade rate analysis was conducted on only the left eye’s data and that the experiment design is not optimized for this analysis, thus, extra caution should be exercised in interpreting the results. Still, we found it very innovative and important to combine the microsaccade index with the pupil size to holistically investigate the processing of emotional information in BM, and future studies are highly needed to adopt more suitable recording techniques and experiment designs to further probe this issue. We have discussed this issue in the revised text (see lines 339-344).

      References:

      Baumeler, D., Schönhammer, J. G., & Born, S. (2020). Microsaccade dynamics in the attentional repulsion effect. Vision Research, 170, 46–52. https://doi.org/10.1016/j.visres.2020.03.009

      Engbert, R., & Kliegl, R. (2003a). Binocular coordination in microsaccades. In The Mind’s Eye (pp. 103–117). Elsevier. https://doi.org/10.1016/b978-044451020-4/50007-4

      Engbert, R., & Kliegl, R. (2003b). Microsaccades uncover the orientation of covert attention. Vision Research, 43(9), 1035–1045. https://doi.org/10.1016/s0042-6989(03)00084-1

      Fang, Y., Gill, C., Poletti, M., & Rucci, M. (2018). Monocular microsaccades: do they really occur? Journal of Vision, 18(3), 18. https://doi.org/10.1167/18.3.18

      Hauperich, A.-K., Young, L. K., & Smithson, H. E. (2020). What makes a microsaccade? a review of 70 years research prompts a new detection method. Journal of Eye Movement Research, 12(6). https://doi.org/10.16910/jemr.12.6.13

      Kashihara, K. (2020). Microsaccadic modulation evoked by emotional events. Journal of Physiological Anthropology, 39(1). https://doi.org/10.1186/s40101-020-00238-6

      Kashihara, K., Okanoya, K., & Kawai, N. (2013). Emotional attention modulates microsaccadic rate and direction. Psychological Research, 78(2), 166–179. https://doi.org/10.1007/s00426-013-0490-z

      Martinez-Conde, S., Macknik, S. L., Troncoso, X. G., & Dyar, T. A. (2006). Microsaccades counteract visual fading during fixation. Neuron, 49(2), 297–305. https://doi.org/10.1016/j.neuron.2005.11.033

      Martinez-Conde, S., Macknik, S. L., Troncoso, X. G., & Hubel, D. H. (2009). Microsaccades: a neurophysiological analysis. Trends in Neurosciences, 32(9), 463–475. https://doi.org/10.1016/j.tins.2009.05.006

      Martinez-Conde, S., Otero-Millan, J., & Macknik, S. L. (2013). The impact of microsaccades on vision: towards a unified theory of saccadic function. Nature Reviews Neuroscience, 14(2), 83–96. https://doi.org/10.1038/nrn3405

      Meyberg, S., Sinn, P., Engbert, R., & Sommer, W. (2017). Revising the link between microsaccades and the spatial cueing of voluntary attention. Vision Research, 133, 47–60. https://doi.org/10.1016/j.visres.2017.01.001

      Nyström, M., Andersson, R., Niehorster, D. C., & Hooge, I. (2017). Searching for monocular microsaccades – a red hering of modern eye trackers? Vision Research, 140, 44–54. https://doi.org/10.1016/j.visres.2017.07.012

      Otero-Millan, J., Castro, J. L. A., Macknik, S. L., & Martinez-Conde, S. (2014). Unsupervised clustering method to detect microsaccades. Journal of Vision, 14(2), 18–18. https://doi.org/10.1167/14.2.18

      (1.2) Stimuli:

      It appears that the speed of the emotional biological motion stimuli mimics the natural pace of the emotional walker. What is the average velocity of the biological motion stimuli for each condition?

      Thanks for pointing out this issue. The neutral and emotional (sad or happy) BM stimuli are equal in walking speed (one step for one second, 1Hz). We have also computed their physical velocity by calculating the Euclidean distance in pixel space of each key point between adjacent frames (Poyo Solanas et al., 2020). The velocity was 5.76 pixels/frame for the happy BM, 4.14 pixels/frame for the neutral BM, and 3.21 pixels/frame for the sad BM. This difference in velocity profile was considered an important signature for conveying emotional information, as the happy walker was characterized by a larger step pace and longer arm swing and the sad walker would instead exhibit a slouching gait with short slow strides and smaller arm movement (Barliya et al., 2012; Chouchourelou et al., 2006; Halovic & Kroos, 2018; Roether et al., 2009). More importantly, our current results could not be explained by the differences in velocities. This is because the inverted emotional BM with identical velocity characteristics failed to induce any modulations on pupil responses. Furthermore, the local sad and happy BM differed the most in velocity feature, while they induced similar modulations on pupil sizes. We have added these points in the revised text (see lines 254-257, 484-491).

      References:

      Barliya, A., Omlor, L., Giese, M. A., Berthoz, A., & Flash, T. (2012). Expression of emotion in the kinematics of locomotion. Experimental Brain Research, 225(2), 159–176. https://doi.org/10.1007/s00221-012-3357-4

      Chouchourelou, A., Matsuka, T., Harber, K., & Shiffrar, M. (2006). The visual analysis of emotional actions. Social Neuroscience, 1(1), 63–74. https://doi.org/10.1080/17470910600630599

      Halovic, S., & Kroos, C. (2018). Not all is noticed: kinematic cues of emotion-specific gait. Human Movement Science, 57, 478–488. https://doi.org/10.1016/j.humov.2017.11.008

      Poyo Solanas, M., Vaessen, M. J., & de Gelder, B. (2020). The role of computational and subjective features in emotional body expressions. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-63125-1

      Roether, C. L., Omlor, L., Christensen, A., & Giese, M. A. (2009). Critical features for the perception of emotion from gait. Journal of Vision, 9(6), 15–15. https://doi.org/10.1167/9.6.15

      When the authors used inverted biological motion stimuli, they didn't observe any modulation in pupil size. Could there be a difference in microsaccades when comparing inverted emotional biological motion stimuli?

      Thanks for this consideration. Both microsaccades and pupil size can provide valuable insights into the underlying neural dynamics of attention and cognitive control (Baumeler et al., 2020; Engbert & Kliegl, 2003; Meyberg et al., 2017). Notably, previous studies have shown that the microsaccades and pupil sizes could be similar and highly correlated in reflecting various cognitive processes, such as multisensory integration, inhibitory control, and cognitive load (Krejtz et al., 2018; Wang et al., 2017; Wang & Munoz, 2021). Moreover, the generation of both microsaccades and pupil responses would involve shared neural circuits, including the midbrain structure superior colliculus (SC) and the noradrenergic system (Hafed et al., 2009; Hafed & Krauzlis, 2012; Wang et al., 2012). However, the pupil size could be more sensitive than microsaccade rates in contexts such as affective priming (Krejtz et al., 2020) and decision formation (Strauch et al., 2018). Moreover, abundant former studies have all shown that inversion would significantly disrupt the perception of emotions from BM (Atkinson et al., 2007; Dittrich et al., 1996; Spencer et al., 2016; Yuan et al., 2022, 2023). Overall, it is unlikely for the microsaccade rates to show significant differences when comparing inverted emotional biological motion stimuli. Besides, we have attempted to analyze the microsaccade rate in the inverted BM situation, while our results showed no significant differences (see also Point 1.1, Author response image 3). Still, it is needed for future studies to combine the microsaccade index and pupil size to provide a thorough understanding of BM emotion processing. We have discussed this issue in the revised text (see lines 339-344).

      References:

      Atkinson, A. P., Tunstall, M. L., & Dittrich, W. H. (2007). Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition, 104(1), 59–72. https://doi.org/10.1016/j.cognition.2006.05.005

      Baumeler, D., Schönhammer, J. G., & Born, S. (2020). Microsaccade dynamics in the attentional repulsion effect. Vision Research, 170, 46–52. https://doi.org/10.1016/j.visres.2020.03.009

      Dittrich, W., Troscianko, T., Lea, S., & Morgan, D. (1996). Perception of emotion from dynamic point-light displays represented in dance. Perception, 25(6), 727–738. https://doi.org/10.1068/p250727

      Engbert, R., & Kliegl, R. (2003). Microsaccades uncover the orientation of covert attention. Vision Research, 43(9), 1035–1045. https://doi.org/10.1016/s0042-6989(03)00084-1

      Hafed, Z. M., Goffart, L., & Krauzlis, R. J. (2009). A neural mechanism for microsaccade generation in the primate superior colliculus. Science, 323(5916), 940–943. https://doi.org/10.1126/science.1166112

      Hafed, Z. M., & Krauzlis, R. J. (2012). Similarity of superior colliculus involvement in microsaccade and saccade generation. Journal of neurophysiology, 107(7), 1904-1916.

      Krejtz, K., Duchowski, A. T., Niedzielska, A., Biele, C., & Krejtz, I. (2018). Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. Plos One, 13(9), e0203629. https://doi.org/10.1371/journal.pone.0203629

      Krejtz, K., Żurawska, J., Duchowski, A., & Wichary, S. (2020). Pupillary and microsaccadic responses to cognitive effort and emotional arousal during complex decision making. Journal of Eye Movement Research, 13(5). https://doi.org/10.16910/jemr.13.5.2

      Meyberg, S., Sinn, P., Engbert, R., & Sommer, W. (2017). Revising the link between microsaccades and the spatial cueing of voluntary attention. Vision Research, 133, 47–60. https://doi.org/10.1016/j.visres.2017.01.001

      Spencer, J. M. Y., Sekuler, A. B., Bennett, P. J., Giese, M. A., & Pilz, K. S. (2016). Effects of aging on identifying emotions conveyed by point-light walkers. Psychology and Aging, 31(1), 126–138. https://doi.org/10.1037/a0040009

      Strauch, C., Greiter, L., & Huckauf, A. (2018). Pupil dilation but not microsaccade rate robustly reveals decision formation. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-31551-x

      Wang, C.-A., Blohm, G., Huang, J., Boehnke, S. E., & Munoz, D. P. (2017). Multisensory integration in orienting behavior: pupil size, microsaccades, and saccades. Biological Psychology, 129, 36–44. https://doi.org/10.1016/j.biopsycho.2017.07.024

      Wang, C.-A., Boehnke, S. E., White, B. J., & Munoz, D. P. (2012). Microstimulation of the monkey superior colliculus induces pupil dilation without evoking saccades. Journal of Neuroscience, 32(11), 3629–3636. https://doi.org/10.1523/jneurosci.5512-11.2012

      Wang, C.-A., & Munoz, D. P. (2021). Differentiating global luminance, arousal and cognitive signals on pupil size and microsaccades. European Journal of Neuroscience, 54(10), 7560–7574. https://doi.org/10.1111/ejn.15508

      Yuan, T., Ji, H., Wang, L., & Jiang, Y. (2022). Happy is stronger than sad: emotional information modulates social attention. Emotion. https://doi.org/10.1037/emo0001145

      Yuan, T., Wang, L., & Jiang, Y. (2023). Cross-channel adaptation reveals shared emotion representation from face and biological motion. In Emotion (p. In Press).

      (2) Statistical analyses

      (2.1) Multiple comparisons:

      There are many posthoc comparisons throughout the manuscript. The authors should consider correction for multiple comparisons. Take Experiment 1 for example, it is important to note that the happy over neutral BM effect and the sad over neutral BM effect are no longer significant after Bonferroni correction, which is worth noting.

      Thanks for this suggestion. In our original analysis, we applied the Holm post-hoc corrections for multiple comparisons. The Holm correction is a step-down correction method and is more powerful but less conservative than the Bonferroni correction. We have now conducted the stricter Bonferroni post-hoc correction. In Experiment 1, the happy over neutral, and happy over sad BM effect is still significant after the Bonferroni post-hoc correction (happy vs. neutral: p = .036; happy vs. sad: p = .009), and the sad over neutral comparison remains marginally significant after the Bonferroni post-hoc correction (p = .071). Importantly, the test-retest replication experiment also yielded significant results for the comparisons between happy and neutral (First Test: p = .022, Holm-corrected, p = .048, Bonferroni-corrected; Second Test: p = .005,  Holm-corrected, p = .008, Bonferroni-corrected), sad and neutral (First Test: p = .022, Holm-corrected, p = .033, Bonferroni-corrected; Second Test: p = .005, Holm-corrected, p = .012, Bonferroni-corrected, Author response image 1B), and happy and sad BM  (First test: p < .001, Holm-corrected, p < .001, Bonferroni-corrected; Second test: p < .001, Holm-corrected, p < .001, Bonferroni-corrected). These results provided support for the replicability and consistency of the reported significant contrasts. See also Point 2.3.

      In Experiment 4, the significance levels of all comparisons remained the same after Bonferroni post-hoc correction (happy vs. neutral: p = .011; sad vs. neutral: p = .007; happy vs. sad: p = 1.000). We have now added these results in the main text (See lines 119, 122, 124, 143, 145, 148, 150, 153, 155, 248, 251, 254).

      (2.2) The authors present the correlation between happy over sad dilation effect and the autistic traits in Experiment 1, but do not report such correlations in Experiments 2-4. Did the authors collect the Autistic Quotient measure in Experiments 2-4? It would be informative if the authors could demonstrate the reproducibility (or lack thereof) of this happy-sad index in Experiments 2-4.

      We apologize for not making it clear. We have collected the AQ scores in Experiments 2-4. However, it should be pointed out that the happy over sad pupil dilation effect was only observed in Experiment 1. Moreover, we’ve again identified such happy over sad pupil dilation effect in the replication experiment (Experiment 1b) as well as its correlation with AQ. Instead, no significant correlations between AQ and the happy-sad pupil index were found in Experiments 2-4, see Author response image 4 for more details. We have reported these correlations in the main text (see lines 157-173, 190-194, 212-216, 257-262).

      Author response image 4.

      Correlations between the happy over sad pupil dilation effect and AQ scores. (A)  The happy over sad pupil dilation effect correlated negatively with individual autistic scores. (B-C) Such correlation was similarly observed in the test and retest of the replication experiment. (D-F) No such correlations were found for the inverted, nonbiological, and local BM stimuli.

      (2.3) The observed correlation between happy over sad dilation effect and the autistic traits in Experiment 1 seems rather weak. It could be attributed to the poor reliability of the Autistic Quotient measure or the author-constructed happy-sad index. Did the authors examine the test-retest reliability of their tasks or the Autistic Quotient measure?

      Thanks for this suggestion. We have now conducted a test-retest replication study to further confirm the observed significant correlations. Specifically, we recruited a new group of 24 participants (16 females, 8 males) to perform the identical procedure as in Experiment 1, and they were asked to return to the lab for a retest after at least seven days. We’ve replicated the significant main effect of emotional conditions in both the first test (F(2, 46) = 12.0, p < .001, ηp2 = 0.34) and the second test (F(2, 46) = 14.8, p < .001, ηp2 = 0.39). Besides, we also replicated the happy minus neutral pupil dilation effect (First Test: t(23) = 2.60, p = .022, Cohen’s d = 0.53, 95% CI for the mean difference = [0.02, 0.14], Holm-corrected, p = .048 after Bonferroni correction; Second Test: t(23) = 3.36, p = .005, Cohen’s d = 0.68, 95% CI for the mean difference = [0.06, 0.24], Holm-corrected, p = .008 after Bonferroni correction), and the sad minus neutral pupil constriction effect (First Test: t(23) = -2.77, p = .022, Cohen’s d = 0.57, 95% CI for the mean difference = [-0.19, -0.03], Holm-corrected, p = .033 after Bonferroni correction; Second Test: t(23) = -3.19, p = .005, Cohen’s d = 0.65, 95% CI for the mean difference = [-0.24, -0.05], Holm-corrected, p = .012 after Bonferroni correction). Additionally, the happy BM still induced a significantly larger pupil response than the sad BM (first test: t(23) = 4.23, p < .001, Cohen’s d = 0.86, 95% CI for the mean difference = [0.10, 0.28], Holm-corrected, p < .001 after Bonferroni correction; second test: t(23) = 4.26, p < .001, Cohen’s d = 0.87, 95% CI for the mean difference = [0.15, 0.44], Holm-corrected, p < .001 after Bonferroni correction).

      Notably, we’ve successfully replicated the negative correlation between the happy over sad dilation effect and individual autistic traits (r(23) = -0.46, p = .023, 95% CI for the mean difference = [-0.73, -0.07]). Such a correlation was similarly found and was even stronger in the retest (r(23) = -0.61, p = .002, 95% CI for the mean difference = [-0.81, -0.27]). A test-retest reliability analysis was conducted on the happy over sad pupil dilation effect and the AQ score. The results showed robust correlations (r(happy-sad pupil size)= 0.56; r(AQ)= 0.90) and strong test-retest reliabilities (α(happy-sad pupil size)= 0.60; α(AQ)= 0.82). We have added these results to the main text (see lines 135-173). See also Response to Reviewer #2 Response 1 for more details.

      (2.4) Relatedly, the happy over sad dilation effect is essentially a subtraction index. Without separately presenting the pipul size correlation with happy and sad BM in supplemental figures, it becomes challenging to understand what's primarily driving the observed correlation.

      Thanks for pointing this out. We have now presented the separate correlations between AQ and the pupil response towards happy and sad BM in Experiment 1 (see Author response image 5A), and the test-retest replication experiment of Experiment 1 (see Author response image 5B-C). No significant correlations were found. This is potentially because the raw pupil response is a mixed result of BM perception and emotion perception, while the variations in pupil sizes across emotional conditions could more faithfully reflect individual sensitivities to emotions in BM (Burley et al., 2017; Pomè et al., 2020; Turi et al., 2018).  

      Author response image 5.

      No significant correlations between AQ and pupil response towards happy and sad intact BM were found in Experiment 1a and the test-retest replication experiment (Experiment 1b).

      To probe what's primarily driving the observed correlation between happy-sad pupil size and AQ, we instead used the neutral as the baseline and separately correlated AQ with the happy-neutral and the sad-neutral pupil modulation effects. No significant correlation was found in Experiment 1a (Author response image 6A-B) and the first test of the replication experiment (Experiment 1b) (Author response image 6C-D). Importantly, in the second test of the replication experiment, we found a significant negative correlation between AQ and the happy-neutral pupil size (r(23) = -0.44, p = .032, 95% CI for the mean difference = [-0.72, -0.04], Author response image 6E), and a significant positive correlation between AQ and the sad-neutral pupil size (r(23) = 0.50, p = .014, 95% CI for the mean difference = [0.12, 0.75], Author response image 6F). This suggested that the overall correlation between AQ and the happy over sad dilation effect was driven by diminished pupil modulations towards both the happy and sad BM for high AQ individuals, demonstrating a general deficiency in BM emotion perception (happy or sad) among individuals with high autistic tendencies. It further revealed the potential of adopting a test-retest pupil examination to more precisely detect individual autistic tendencies. We have reported these results in the main text (see lines 166-173).

      Author response image 6.

      Correlation results for pupil modulations and AQ scores. (A-B) In Experiment 1a, no significant correlation was observed between AQ and the happy pupil modulation effect, as well as between AQ and the sad pupil modulation effect. (C-D) Similarly, no significant correlations were found in the first test of the replication experiment (Experiment 1b). (E-F) Importantly, in the second test of Experiment 1b, the happy vs. neutral pupil dilation effect was positively correlated with AQ, and the sad vs. neutral pupil constriction effect was positively correlated with AQ.

      References:

      Burley, D. T., Gray, N. S., & Snowden, R. J. (2017). As Far as the Eye Can See: Relationship between Psychopathic Traits and Pupil Response to Affective Stimuli. PLOS ONE, 12(1), e0167436. https://doi.org/10.1371/journal.pone.0167436

      Pomè, A., Binda, P., Cicchini, G. M., & Burr, D. C. (2020). Pupillometry correlates of visual priming, and their dependency on autistic traits. Journal of vision, 20(3), 3-3.

      Turi, M., Burr, D. C., & Binda, P. (2018). Pupillometry reveals perceptual differences that are tightly linked to autistic traits in typical adults. eLife, 7. https://doi.org/10.7554/elife.32399

      (2.5) For the sake of transparency, it is important to report all findings, not just the positive results, throughout the paper.

      Thanks for this suggestion. We have now reported all the correlations results between AQ and pupil modulation effects (happy-sad, happy-neutral, sad-neutral) in the main text (see lines 130-131, 157-162, 166-170, 190-194, 212-216, 257-262). Given that no significant correlations were observed between AQ and the raw pupil responses across four experiments, we reported their correlations with AQ in the supplementary material. We have stated this point in the main text (see lines 132-134).

      (3) Structure

      (3.1) The Results section immediately proceeds to the one-way repeated measures ANOVA. This section could be more reader-friendly by including a brief overview of the task procedures and variables, e.g., shifting Fig. 3 to this section.

      Thanks for this advice. We have now added a brief overview of the task procedures and variables and we have also shifted the figure position (see lines 101-103).

      Reviewer #1 (Recommendations For The Authors):

      (1) I suggest that the authors first explain the task (i.e., Fig. 3) at the beginning of the results. And it seems more appropriate to show the time course figures (Fig. 2) and before the bar plots (Fig. 1). If I understand correctly, the bar plots reflect the averaged data from the time course plots. Also, please clearly state the time window used to average the data. The results of the correlation analysis can be displayed in the last step.

      Thanks for this suggestion. We have now added a concise explanation of the task at the beginning of the results (see lines 101-103). We have also adjusted the figure positions and adjusted the order of our results according to the reviewer’s suggestion. The time window we used to average the data was from the onset of the stimuli until the end of the stimuli presentation. We have now clearly stated these issues in the revised text (see lines 111-112).

      (2) According to the above, I think a more reasonable arrangement should be Fig. 3, 2, and 1.

      Thanks for this suggestion. We have adjusted the figure positions accordingly.

      (3) Please include each subject's data points in the bar plots in Fig. 1.

      We have now presented each subject’s individual data point in the bar plot.

      (4) Lines 158-160 and 199-202 report interaction effects of the two-way ANOVA. This is good, but the direction of interaction effect should also be reported.

      We thank the reviewer for this suggestion. We have now reported the direction of the interaction effect. The significant interaction observed across Experiment 1 and Experiment 2 was mainly due to the diminishment of emotional modulation in inverted BM. The significant interaction crossing Experiment 1 and Experiment 3 was similarly caused by the lack of emotional modulation in nonbiological stimuli. With regard to the significant interaction across Experiment 1 and Experiment 4, it could be primarily attributed to the vanishment of pupil modulation effect between happy and sad local BM. We have specified these points in the revised text, see lines 198-199, 219-220, 267-269.

      Reviewer #3 (Recommendations For The Authors):

      (1) Number of experiments:

      As stated in the Methods section, this study seems to consist of five experiments (120/24=5) according to the description below. However, the current manuscript only reports findings from four of these experiments. Can the authors clarify on this matter?

      "A total of 120 participants (44 males, 76 females) ranging from 18 to 29 years old (M ± SD = 23.1 ± 2.5) were recruited, with 24 in each experiment."

      We apologize for not making it clear. This referred to a pure behavior explicit emotion classification experiment (N=24) that served as a prior test to confirm that the local BM stimuli conveyed recognizable emotional information. We have now more carefully stated this issue in the revised text, see lines 456-458.

      (2) Emotion processing mechanism of BM

      "Mechanism" is a very strong word, suggesting a causal relationship. In the setting of a passive viewing task that lacks any behavioral report, it is possible that the observed changes in pupil size could be epiphenomenal, rather than serving as the underlying mechanism.

      Thanks for this suggestion. We have now either changed “mechanism” into “phenomenon” or deleted it. We have also carefully discussed the potential implications for future studies to incorporate variant behavioral, physiological and neural indexes to yield more robust causal evidence to unveil the potential mechanism serving the observed multi-level BM emotion processing phenomenon.

      (3) Data sharing

      The authors could improve their efforts in promoting data transparency to ensure a comprehensive view of the results. This implies sharing deidentified raw data instead of summary data in an Excel spreadsheet.

      Thanks for this suggestion. We have now uploaded the deidentified raw data. (https://doi.org/10.57760/sciencedb.psych.00125).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have specifically addressed the points of uncertainty highlighted in eLife's editorial assessment, which concerned the lack of low-level acoustics control, limitations of experimental design, and in-depth analysis. Regarding “the lack of low-level acoustics control, limitations of experimental design”, in response to Reviewer #1, we clarify that our study aimed to provide a broad perspective —which includes both auditory and higher-level processes— on the similarities and distinctions in processing natural speech and music within an ecological context. Regarding “the lack of in-depth analysis”, in response to Reviewer #1 and #2, we have clarified that while model-based analyzes are valuable, they pose fundamental challenges when comparing speech and music. Non-acoustic features inherently differ between speech and music (such as phonemes and pitch), making direct comparisons reliant on somewhat arbitrary choices. Our approach mitigates this challenge by analyzing the entire neural signal, thereby avoiding potential pitfalls associated with encoding models of non-comparable features. Finally, we provide some additional analyzes suggested by the Reviewers.

      We sincerely appreciate your thoughtful and thorough consideration throughout the review process.

      eLife assessment

      This study presents valuable intracranial findings on how two important types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid but somewhat incomplete since although the data analysis is thorough, the results are robust and the stimuli have ecological validity, important considerations such as low-level acoustics control, limitations of experimental design, and in-depth analysis, are lacking. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which the processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      The weakness of this study, in my view, lies in its experimental design and reasoning:

      (1) Despite using longer stimuli, the study does not significantly enhance ecological validity compared to previous research. The analyses treat these long speech and music stimuli as stationary signals, overlooking their intricate musical or linguistic structural details and temporal variation across local structures like sentences and phrases. In previous studies, short, less ecological segments of music were used, maintaining consistency in content and structure. However, this study, despite employing longer stimuli, does not distinguish between neural responses to the varied contents or structures within speech and music. Understanding the implications of long-term analyses, such as spectral and connectivity analyses over extended periods of around 10 minutes, becomes challenging when they do not account for the variable, sometimes quasi-periodical or even non-periodical, elements present in natural speech and music. When contrasting this study with prior research and highlighting its advantages, a more balanced perspective would have been beneficial in the manuscript.

      Regarding ecological validity, we respectfully hold a differing perspective from the reviewer. In our view, a one-second music stimulus lacks ecological validity, as real-world music always extends much beyond such a brief duration. While we acknowledge the trade-off in selecting longer stimuli, limiting the diversity of musical styles, we maintain that only long stimuli afford participants an authentic musical listening experience. Conversely, shorter stimuli may lead participants to merely "skip through" musical excerpts rather than engage in genuine listening.

      Regarding the critique that we "did not distinguish between neural responses to the varied contents or structures within speech and music," we partly concur. Our TRF (temporal response function) analyzes incorporate acoustic content, particularly the acoustic envelope, thereby addressing this concern to some extent. However, it is accurate to note that we did not model non-acoustic features. In acknowledging this limitation, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      Finally, we did take into account the reviewer’s remark and did our best to give a more balanced perspective of our approach and previous studies in the discussion.

      “While listening to natural speech and music rests on cognitively relevant neural processes, our analytical approach, extending over a rather long period of time, does not allow to directly isolate specific brain operations. Computational models -which can be as diverse as acoustic (Chi et al., 2005), cognitive (Giordano et al., 2021), information-theoretic (Di Liberto et al., 2020), or self-supervised neural network (Donhauser & Baillet, 2019 ; Millet et al., 2022) models- are hence necessary to further our understanding of the type of computations performed by our reported frequency-specific distributed networks. Moreover, incorporating models accounting for musical and linguistic structure can help us avoid misattributing differences between speech and music driven by unmatched sensitivity factors (e.g., arousal, emotion, or attention) as inherent speech or music selectivity (Mas-Herrero et al., 2013; Nantais & Schellenberg, 1999).”

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music.

      We acknowledge the reviewer's concern. Indeed, speech and music differ on various levels, including acoustic and cognitive aspects, and our analyzes do not explicitly distinguish them. The aim of this study was to provide an overview of the similarities and differences between natural speech and music processing, in ecological context. Future work is needed to explore further the different hierarchical levels or networks composing such listening experiences. Of note, however, we report whole-brain results with high spatial resolution (thanks to iEEG recordings), enabling the distinction between auditory, superior temporal gyrus (STG), and higher-level responses. Our findings clearly highlight that both auditory and higher-level regions predominantly exhibit shared responses, challenging the interpretation that our results can be attributed solely to differences in 'basic acoustic characteristics'.

      We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The concept of selectivity - shared, preferred, and domain-selective - increases the risks of potentially overgeneralized interpretations and theoretical inaccuracies. The authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with post hoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not necessarily imply that a region is specifically selective to a type of stimulus like speech. The manuscript's narrative might lead to an overgeneralized interpretation that their findings apply broadly to speech or music. However, identifying differences in neural responses to a few sets of specific stimuli in one brain region does not robustly support such a generalization. This is because speech and music are inherently diverse, and specificity often relates more to the underlying functions than to observed neural responses to a limited number of examples of a stimulus type. See the next point.

      Exactly! Here, we present a precise operational definition of these terms, implemented with clear and rigorous statistical methods. It is important to note that in many cognitive neuroscience studies, the term "selective" is often used without a clear definition. By establishing operational definitions, we identified three distinct categories based on statistical testing of differences from baseline and between conditions. This approach provides a framework for more accurate interpretation of experimental findings, as now better outlined in the introduction:

      “Finally, we suggest that terms should be operationally defined based on statistical tests, which results in a clear distinction between shared, selective, and preferred activity. That is, be A and B two investigated cognitive functions, “shared” would be a neural population that (compared to a baseline) significantly and equally contributes to the processing of both A and B; “selective” would be a neural population that exclusively contributes to the processing of A or B (e.g. significant for A but not B); and “preferred” would be a neural population that significantly contributes to the processing of both A and B, but more prominently for A or B (Figure 1A).”

      Regarding the risk of over-generalization, we want to clarify that our manuscript does not claim that a specific region or frequency band is selective to speech or music. As indeed we focus on testing excerpts of speech and music, we employ the reverse logical reasoning: "if 10 minutes of instrumental music activates a region traditionally associated with speech selectivity, we can conclude that this region is NOT speech-selective." Our conclusions revolve around the absence of selectivity rather than the presence of selective areas or frequency bands. In essence, "one counterexample is enough to disprove a theory." We now further elaborated on this point in the discussion section:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyzes. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents issues. For instance, in the cochlea, different stimuli activate different parts of the basilar membrane due to the distinct spectral contents of speech and music, with each part being selective to certain frequencies. However, this phenomenon reflects the frequency selectivity of the basilar membrane - an important function, not an inherent selectivity for speech or music. Similarly, if cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

      We completely agree with the last statement, as our primary goal was not to investigate the functional mechanisms underlying speech and music processing. However, the finding of a substantial portion of the cortical network as being shared between the two domains constrains our understanding of the underlying common operations. Regarding the initial part of the comment, we would like to clarify that in the framework we propose, if cortical regions show heightened responses to one type of stimulus over another, this falls into the ‘preferred’ category. The ‘selective’ (exclusive) category, on the other hand, would require that the region be unresponsive to one of the two stimuli.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. The selected musical stimuli, incorporating both vocals and multiple instrumental sounds, raise questions about the specificity of neural activation. For instance, it's unclear if the vocal elements in music and speech engage identical neural circuits. Additionally, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at a neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      We appreciate the reviewer's acknowledgment that delving into the intricate details of neural coding of speech and music was beyond the scope of this work. To address some of the more precise issues raised, we have clarified in the manuscript that our musical stimuli do not contain vocals and are purely instrumental. We apologize if this was not clear initially.

      “In the main experimental session, patients passively listened to ~10 minutes of storytelling (Gripari, 2004); 577 secs, La sorcière de la rue Mouffetard, (Gripari, 2004) and ~10 minutes of instrumental music (580 secs, Reflejos del Sur, (Oneness, 2006) separated by 3 minutes of rest.”

      Furthermore, we now acknowledge the importance of modeling melodic, phonetic, or linguistic features in the discussion, and we have referenced the work of Sankaran et al. (2024) and McCarty et al. (2023) in this regard. However, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      “These selective responses, not visible in primary cortical regions, seem independent of both low-level acoustic features and higher-order linguistic meaning (Norman-Haignere et al., 2015), and could subtend intermediate representations (Giordano et al., 2023) such as domain-dependent predictions (McCarty et al., 2023; Sankaran et al., 2023).”

      References:

      McCarty, M. J., Murphy, E., Scherschligt, X., Woolnough, O., Morse, C. W., Snyder, K., Mahon, B. Z., & Tandon, N. (2023). Intraoperative cortical localization of music and language reveals signatures of structural complexity in posterior temporal cortex. iScience, 26(7), 107223.

      Sankaran, N., Leonard, M. K., Theunissen, F., & Chang, E. F. (2023). Encoding of melody in the human auditory cortex. bioRxiv. https://doi.org/10.1101/2023.10.17.562771

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

      We appreciate the reviewer's concern, but we do not view this as a weakness for our study's purpose. Every method inherently has limitations, and intracranial recordings currently offer the best possible spatial specificity and temporal resolution for studying the human brain. Studying cell assemblies thoroughly in humans is ethically challenging, and examining speech and music in non-human primates or rats raises questions about cross-species analogy. Therefore, despite its limitations, we believe intracranial recording remains the best option for addressing these questions in humans.

      Regarding the granularity of neural representation, while understanding how computations occur in the central nervous system is crucial, we question whether the single neuron scale provides the most informative insights. The single neuron approach seem more versatile (e.g., in term of cell type or layer affiliation) than the local circuitry they contribute to, which appears to be the brain's building blocks (e.g., like the laminar organization; see Mendoza-Halliday et al.,2024). Additionally, the population dynamics of these functional modules appear crucial for cognition and behavior (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023). Therefore, we emphasize the need for multi-scale research, as we believe that a variety of approaches will complement each other's weaknesses when taken individually. We clarified this in the introduction:

      “This approach rests on the idea that the canonical computations that underlie cognition and behavior are anchored in population dynamics of interacting functional modules (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023) and bound to spectral fingerprints consisting of network- and frequency-specific coherent oscillations (Siegel et al., 2012).”

      Importantly, we focus on the macro-scale and conclude that, at the anatomical region level, no speech or music selectivity can be observed during natural stimulation. This is stated in the discussion, as follow:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyses. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      References :

      Mendoza-Halliday, D., Major, A.J., Lee, N. et al. A ubiquitous spectrolaminar motif of local field potential power across the primate cortex. Nat Neurosci (2024).

      Safaie, M., Chang, J.C., Park, J. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).

      Buzsáki, G., & Vöröslakos, M. (2023). Brain rhythms have come of age. Neuron, 111(7), 922-926.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category, or ratio-based statistics, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing.

      To clarify, the metrics we are investigating (coherence, power, linear correlations) are continuous. Additionally, we conduct a comprehensive statistical analysis of these results. The statistical testing, which includes assessing differences from baseline and between the speech and music conditions using a statistical threshold, yields three categories. Of note, ratio-based statistics (a continuous metric) are provided in Figures S9 and S10 (Figures S8 and S9 in the original version of the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain-specific or rather domain-general and shared? To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low-frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of the regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence of shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity, and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches, and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence, and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis of the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

      Thank you for this positive review of our work. We added these points as limitations and future directions in the discussion section:

      “Finally, in adopting here a comparative approach of speech and music – the two main auditory domains of human cognition – we only investigated one type of speech and of music also using a passive listening task. Future work is needed to investigate for instance whether different sentences or melodies activate the same selective frequency-specific distributed networks and to what extent these results are related to the passive listening context compared to a more active and natural context (e.g. conversation).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The concepts of activation and deactivation within the study's context of selectivity are not straightforward to comprehend. It would be beneficial for the authors to provide more detailed explanations of how these phenomena relate to the selectivity of neural responses to speech and music. Such elaboration would aid readers in better understanding the nuances of how certain brain regions are selectively activated or deactivated in response to different auditory stimuli.

      The reviewer is right that the reported results are quite complex to interpret. The concepts of activation and deactivation are generally complex to comprehend as they are in part defined by an approach (e.g., method and/or metric) and the scale of observation (Pfurtscheller et al., 1999). The power (or the magnitude) of time-frequency estimate is by definition a positive value. Deactivation (or desynchronization) is therefore related to the comparison used (e.g., baseline, control, condition). This is further complexified by the scale of the measurement, for instance, when it comes to a simple limb movement, some brain areas in sensory motor cortex are going to be activated, yet this phenomenon is accompanied at a finer scale by some desynchonization of the mu-activity, and such desynchronization is a relative measure (e.g., before/after motor movement). At a broader scale it is not rare to see some form of balance between brain networks, some being ‘inhibited’ to let some others be activated like the default mode network versus sensory-motor networks. In our case, when estimating selective responses, it is the strength of the signal that matters. The type of selectivity is then defined by the sign/direction of the comparison/subtraction. We now provide additional details about the sign of selectivity between domains and frequencies in the Methods and Results section:

      Methods:

      “In order to explore the full range of possible selective, preferred, or shared responses, we considered both responses greater and smaller than the baseline. Indeed, as neural populations can synchronize or desynchronize in response to sensory stimulation, we estimated these categories separately for significant activations and significant deactivations compared to baseline.”

      Results:

      “We classified, for each canonical frequency band, each channel into one of the categories mentioned above, i.e. shared, selective, or preferred (Figure 1A), by examining whether speech and/or music differ from baseline and whether they differ from each other. We also considered both activations and deactivations, compared to baseline, as both index a modulation of neural population activity, and have been linked with cognitive processes (Pfurtscheller & Lopes da Silva, 1999; Proix et al., 2022). However, because our aim was not to interpret specific increase or decrease with respect to the baseline, we here simply consider significant deviations from the baseline. In other words, when estimating selectivity, it is the strength of the response that matters, not its direction (activation, deactivation).”

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      References :

      J.P. Lachaux, J. Jung, N. Mainy, J.C. Dreher, O. Bertrand, M. Baciu, L. Minotti, D. Hoffmann, P. Kahane,Silence Is Golden: Transient Neural Deactivation in the Prefrontal Cortex during Attentive Reading, Cerebral Cortex, Volume 18, Issue 2, February 2008, Pages 443–450

      Pfurtscheller, G., & Da Silva, F. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical neurophysiology, 110(11), 1842-1857

      (2) The manuscript doesn't easily provide information about the control conditions, yet the conclusion significantly depends on these conditions as a baseline. It would be beneficial if the authors could clarify this information for readers earlier and discuss how their choice of control stimuli influences their conclusions.

      We added information in the Results section about the baseline conditions:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Of note, while the choice of different ‘basic auditory stimuli’ as baseline can change the reported results in regions involved in low-level acoustical analyzes (auditory cortex), it will have no impact on the results observed in higher-level regions, which predominantly also exhibit shared responses. We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The spectral analyses section doesn't clearly explain how the authors performed multiwise correction. The authors' selectivity categorization appears similar to ANOVAs with posthoc tests, implying the need for certain corrections in the p values or categorization. Could the authors clarify this aspect?

      We apologize that this was not in the original version of the manuscript. In the spectral analyzes, the selectivity categorization depended on both (1) the difference effects between the domains and the baseline, and (2) the difference effect between domains. Channels were marked as selective when there was (1) a significant difference between domains and (2) only one domain significantly differed from the baseline. All difference effects were estimated using the paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the build-in tmax method to correct for the multiple comparisons over channels (Nichols & Holmes, 2002; Groppe et al. 2011). We have now more clearly explained how we controlled family-wise error in the Methods section:

      “For each frequency band and channel, the statistical difference between conditions was estimated with paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the tmax method to control the family-wise error rate (Nichols and Holmes 2002; Groppe et al. 2011). In tmax permutation testing, the null distribution is estimated by, for each channel (i.e. each comparison), swapping the condition labels (speech vs music or speech/music vs baseline) between epochs. After each permutation, the most extreme t-scores over channels (tmax) are selected for the null distribution. Finally, the t-scores of the observed data are computed and compared to the simulated tmax distribution, similar as in parametric hypothesis testing. Because with an increased number of comparisons, the chance of obtaining a large tmax (i.e. false discovery) also increases, the test automatically becomes more conservative when making more comparisons, as such correcting for the multiple comparison between channels.”

      References :

      Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L., & Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. NeuroImage, 86, 446–460.

      Groppe, D. M., Bickel, S., Dykstra, A. R., Wang, X., Mégevand, P., Mercier, M. R., Lado, F. A., Mehta, A. D., & Honey, C. J. (2017). iELVis: An open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. Journal of Neuroscience Methods, 281, 40–48.

      Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.

      Reviewer #2 (Recommendations For The Authors):

      Other suggestions:

      (1) The authors need to provide more details on how the sEEG electrodes were localized and selected. Are all electrodes included or only the ones located in the gray matter? If all electrodes were used, how to localize and label the ones that are outside of gray matter? In Figures 1C & 1D it seems that a lot of the electrodes were located in depth locations, how were the anatomical labels assigned for these electrodes

      We apologize that this was not clear in the original version of the manuscript. Our electrode localization procedure was based on several steps described in detail in Mercier et al., 2022. Once electrodes were localized in a post-implant CT-scan and the coordinates projected onto the pre-implant MRI, we were able to obtain the necessary information regarding brain tissues and anatomical region. That is, first, the segmentation of the pre-impant MRI with SPM12 provided both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (csf) probabilities) and the indexed-binary representations (i.e., either gray, white, csf, bone, or soft tissues) that allowed us to dismiss electrodes outside of the brain and select those in the gray matter. Second, the individual's brain was co-registered to a template brain, which allowed us to back project atlas parcels onto individual’s brain and assign anatomical labels to each electrode. The result of this procedure allowed us to group channels by anatomical parcels as defined by the Brainnetome atlas (Figure 1D), which informed the analyses presented in section Population Prevalence (Methods, Figures 4, 9-10, S4-5). Because this study relies on stereotactic EEG, and not Electro-Cortico-Graphy, recording sites include both gyri and sulci, while depth structures were not retained.

      We have now updated the “General preprocessing related to electrodes localisation” section in the Methods. The relevant part now states:

      “To precisely localize the channels, a procedure similar to the one used in the iELVis toolbox and in the fieldtrip toolbox was applied (Groppe et al., 2017; Stolk et al., 2018). First, we manually identified the location of each channel centroid on the post-implant CT scan using the Gardel software (Medina Villalon et al., 2018). Second, we performed volumetric segmentation and cortical reconstruction on the pre-implant MRI with the Freesurfer image analysis suite (documented and freely available for download online http://surfer.nmr.mgh.harvard.edu/). This segmentation of the pre-implant MRI with SPM12 provides us with both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (CSF) probabilities) and the indexed-binary representations (i.e., either gray, white, CSF, bone, or soft tissues). This information allowed us to reject electrodes not located in the brain. Third, the post-implant CT scan was coregistered to the pre-implant MRI via a rigid affine transformation and the pre-implant MRI was registered to MNI152 space, via a linear and a non-linear transformation from SPM12 methods (Penny et al., 2011), through the FieldTrip toolbox (Oostenveld et al., 2011). Fourth, applying the corresponding transformations, we mapped channel locations to the pre-implant MRI brain that was labeled using the volume-based Human Brainnetome Atlas (Fan et al., 2016).”

      Reference:

      Mercier, M. R., Dubarry, A.-S., Tadel, F., Avanzini, P., Axmacher, N., Cellier, D., Vecchio, M. D., Hamilton, L. S., Hermes, D., Kahana, M. J., Knight, R. T., Llorens, A., Megevand, P., Melloni, L., Miller, K. J., Piai, V., Puce, A., Ramsey, N. F., Schwiedrzik, C. M., … Oostenveld, R. (2022). Advances in human intracranial electroencephalography research, guidelines and good practices. NeuroImage, 260, 119438.

      (2) From Figures 5 and 6 (and also S4, S5), is it true that aside from the shared response, lower frequency bands show more music selectivity (blue dots), while higher frequency bands show more speech selectivity (red dots)? I am curious how the authors interpret this.

      The reviewer is right in noticing the asymmetric selective response to music and speech in lower and higher frequency bands. However, while this effect is apparent in the analyzes wherein we inspected stronger synchronization (activation) compared to baseline (Figures 2 and S1), the pattern appears to reverse when examining deactivation compared to baseline (Figures 3 and S2). In other words, there seems to be an overall stronger deactivation for speech in the lower frequency bands and a relatively stronger deactivation for music in the higher frequency bands.

      We now provide additional details about the sign of selectivity between domains and frequencies in the Results section:

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      Note, however, that this pattern of results depends on only a select number of patients, i.e. when ignoring regional selective responses that are driven by as few as 2 to 4 patients, the pattern disappears (Figures 5-6). More precisely, ignoring regions explored by a small number of patients almost completely clears the selective responses for both speech and music. For this reason, we do not feel confident interpreting the possible asymmetry in low vs high frequency bands differently encoding (activation or deactivation) speech and music.

      Minor:

      (1) P9 L234: Why only consider whether these channels were unresponsive to the other domain in the other frequency bands? What about the responsiveness to the target domain?

      We thank the reviewer for their interesting suggestion. The primary objective of the cross-frequency analyzes was to determine whether domain-selective channels for a given frequency band remain unresponsive (i.e. exclusive) to the other domain across frequency bands, or whether the observed selectivity is confined to specific frequency ranges (i.e.frequency-specific). In other words, does a given channel exclusively respond to one domain and never—in whichever frequency band—to the other domain? The idea behind this question is that, for a channel to be selectively involved in the encoding of one domain, it does not necessarily need to be sensitive to all timescales underlying that domain as long as it remains unresponsive to any timescale in the other domain. However, if the channel is sensitive to information that unfolds slowly in one domain and faster in the other domain, then the channel is no longer globally domain selective, but the selectivity is frequency-specific to each domain.

      The proposed analyzes answer a slightly different, albeit also meaningful, question: how many frequencies (or frequency bands) do selective responses span? From the results presented below, the reviewer can appreciate the overall steep decline in selective response beyond the single frequency band with only few channels remaining selectively responsive across maximally four frequency bands. That is, selective responses globally span one frequency band.

      Author response image 1.

      Cross-frequency channel selective responses. The top figure shows the results for the spectral analyzes (baselined against the tones condition, including both activation and deactivation). The bottom figure shows the results for the connectivity analyzes. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band. In the next value, we remove the channels that no longer respond selectively to the target domain for the following frequency band. The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis.

      (2) P21 L623: "Population prevalence." The subsection title should be in bold.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      The authors chose to use pure tone and syllables as baseline, I wonder if they also tried the rest period between tasks and if they could comment on how it differed and why they chose pure tones, (above and beyond a more active auditory baseline).

      This is an interesting suggestion. The reason for not using the baseline between speech and music listening (or right after) is that it will be strongly influenced by the previous stimulus. Indeed, after listening to the story it is likely that patients keep thinking about the story for a while. Similarly after listening to some music, the music remains in “our head” for some time.

      This is why we did not use rest but other auditory stimulation paradigms. Concerning the choice of pure tones and syllables, these happen to be used for clinical purposes to assess functioning of auditory regions. They also corresponded to a passive listening paradigm, simply with more basic auditory stimuli. We clarified this in the Results section:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Discussion - you might want to address phase information in contrast to power. Your encoding models map onto low-frequency (bandpassed) activity which includes power and phase. However, the high-frequency model includes only power. The model comparison is not completely fair and may drive part of the effects in Figure 7a. I would recommend discussing this, or alternatively ruling out the effect with modeling power separately for the low frequency.

      We thank the reviewer for their recommendation. First, we would like to emphasize that the chosen signal extraction techniques that we used are those most frequently reported in previous papers (e.g. Ding et al., 2012; Di Liberto et al., 2015; Mesgarani and Chang, 2012).

      Low-frequency (LF) phase and high-frequency (HFa) amplitude are also known to track acoustic rhythms in the speech signal in a joint manner (Zion-Golumbic et al., 2013; Ding et al., 2016). This is possibly due to the fact that HFa amplitude and LF phase dynamics have a somewhat similar temporal structure (see Lakatos et al., 2005 ; Canolty and Knight, 2010).

      Still, the reviewer is correct in pointing out the somewhat unfair model comparison and we appreciate the suggestion to rule out a potential confound. We now report in Supplementary Figure S8, a model comparison for LF amplitude vs. HFa amplitude to complement the findings displayed in Figure 7A. Overall, the reviewer can appreciate that using LF amplitude or phase does not change the results: LF (amplitude or phase) always better captures acoustic features than HFa amplitude.

      Author response image 2.

      TRF model comparison of low-frequency (LF) amplitude and high-frequency (HFa) amplitude. Models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency (LF) amplitude or the high frequency (HFa) amplitude. The ‘peakRate & LF amplitude’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. Same conventions as in Figure 7A.

      References:

      Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515.

      Di Liberto, G. M., O’sullivan, J. A., & Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current Biology, 25(19), 2457-2465.

      Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164.

      Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., ... & Schroeder, C. E. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77(5), 980-991.

      Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.

      Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233-236.

      Similarly, the Coherence analysis is affected by both power and phase and is not dissociated. i.e. if the authors wished they could repeat the coherence analysis with phase coherence (normalizing by the amplitude). Alternatively, this issue could be addressed in the discussion above

      We agree with the Reviewer. We have now better clarified our choice in the Methods section:

      “Our rationale to use coherence as functional connectivity metric was three fold. First, coherence analysis considers both magnitude and phase information. While the absence of dissociation can be criticized, signals with higher amplitude and/or SNR lead to better time-frequency estimates (which is not the case with a metric that would focus on phase only and therefore would be more likely to include estimates of various SNR). Second, we choose a metric that allows direct comparison between frequencies. As, at high frequencies phase angle changes more quickly, phase alignment/synchronization is less likely in comparison with lower frequencies. Third, we intend to align to previous work which, for the most part, used the measure of coherence most likely for the reasons explained above.“

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript entitled "Hexokinase regulates Mondo-mediated longevity via the PPP and organellar dynamics", Laboy and colleagues investigated upstream regulators of MML-1/Mondo, a key transcription factor that regulates aging and metabolism, using the nematode C. elegans and cultured mammalian cells. By performing a targeted RNAi screen for genes encoding enzymes in glucose metabolism, the authors found that two hexokinases, HXK-1 and HXK-2, regulate nuclear localization of MML-1 in C. elegans. The authors showed that knockdown of hxk-1 and hxk-2 suppressed longevity caused by germline-deficient glp-1 mutations. The authors demonstrated that genetic or pharmacological inhibition of hexokinases decreased nuclear localization of MML-1, via promoting mitochondrial β-oxidation of fatty acids. They found that genetic inhibition of hxk-2 changed the localization of MML-1 from the nucleus to mitochondria and lipid droplets by activating pentose phosphate pathway (PPP). The authors further showed that the inhibition of PPP increased the nuclear localization of mammalian MondoA in cultured human cells under starvation conditions, suggesting the underlying mechanism is evolutionarily conserved. This paper provides compelling evidence for the mechanisms by which novel upstream metabolic pathways regulate MML-1/Mondo, a key transcription factor for longevity and glucose homeostasis, through altering organelle communications, using two different experimental systems, C. elegans and mammalian cells. This paper will be of interest to a broad range of biologists who work on aging, metabolism, and transcriptional regulation. 

      Reviewer #2 (Public Review):

      Raymond Laboy et.al explored how transcriptional Mondo/Max-like complex (MML-1/MXL-2) is regulated by glucose metabolic signals using germ-line removal longevity model. They believed that MML-1/MXL-2 integrated multiple longevity pathways through nutrient sensing and therefore screened the glucose metabolic enzymes that regulated MML-1 nuclear localization. Hexokinase 1 and 2 were identified as the most vigorous regulators, which function through mitochondrial beta-oxidation and the pentose phosphate pathway (PPP), respectively. MML-1 localized to mitochondria associated with lipid droplets (LD), and MML-1 nuclear localization was correlated with LD size and metabolism. Their findings are interesting and may help us to further explore the mechanisms in multiple longevity models, however, the study is not complete and the working model remains obscure. For example, the exact metabolites that account for the direct regulation of MML-1 were not identified, and more detailed studies of the related cellular processes are needed. 

      The identification of responsible metabolites is necessary since multiple pieces of evidence from the study suggests that lipid other than glucose metabolites may be more likely to be the direct regulator of MML-1 and HXK regulate MML-1 indirectly by affecting the lipid metabolism: 1) inhibiting the PPP is sufficient to rescue MML-1 function independent of G6P levels; 2) HXK-1 regulates MML-1 by increasing fatty acid beta-oxidation; 3) LD size correlates with MML-1 nuclear localization and LD metabolism can directly regulate MML-1. The identification of metabolites will be helpful for understanding the mechanism. 

      Beta-oxidation and the PPP are involved in the regulation of MML-1 by HXK-1 and HXK-2, respectively. But how these two pathways participate in the regulation is not clear. Is it the beta-oxidation rate or the intermediate metabolites that matters? As for the PPP, it provides substrates for nucleotide synthesis and also its product NADPH is essential for redox balance. Is one of the metabolites or the NADPH levels involved in MML-1 regulation? More studies are needed to provide answers to these concerns. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Following are my comments that the authors may want to address to further improve this excellent paper.

      Major comments 

      (1) Although the authors provided evidence that hexokinases in glucose metabolism are associated with germline-deficient glp-1(-) mutants, they did not mention why they focused on glp-1(-) mutants rather than other longevity mutants. In their previous study (Nakamura et al., 2016), they showed that MML-1 is required for multiple longevity pathways in C. elegans, including reduced mitochondrial respiration and insulin/IGF-1 signaling. Please discuss why the authors focused on glp-1(-) mutants in this paper. It will be even better if the authors test the roles of hexokinases in some other longevity regimens. 

      Many thanks for this astute comment. Previously we had shown that mml-1 is required for glp-1, daf-2, and isp-1 longevity, and Johnson et al. had shown a requirement for eat-2, hence the idea that MML-1 is a convergent transcription factor. We first focused on glp-1 because that was the starting point of our screen, and the result was clear and simple: hexokinases regulate MML‑1 nuclear localization and activity in glp-1 and are required for longevity. Naturally, the question arises: do hexokinases behave like MML-1 as convergent longevity regulators across pathways? To address this, we examined the interaction of hxk-1 and hxk-2 with isp-1, daf-2, and raga-1.  Specifically, we now show that:

      A. Like glp-1(e2141) mutants, isp-1(qm150) mutants stimulate MML-1 nuclear localization, and the hexokinases are required for isp-1 longevity (Figure 1G-H).

      B. daf-2(e1370) mutants do not further stimulate MML-1 nuclear localization beyond basal levels, yet MML-1 is strongly required for daf-2 longevity (Nakamura et al., 2016, Supplementary Figure 1L-M). However, the hexokinases are not required for daf-2 longevity (Supplementary Figure 1M), suggesting that the signaling pathway is wired differently in daf-2, and that other pathways regulate MML-1 activity.

      C. raga-1(ok701) mutants stimulate MML-1 nuclear localization and mml-1 is required for raga-1 longevity, suggesting that MML-1 acts downstream of TORC1 signaling (Supplementary Figure 1N-O). However, hexokinases are not required for raga-1 longevity, suggesting that raga-1 acts downstream or parallel to hexokinase signaling (Supplementary Figure 1P).

      D. We performed untargeted metabolomics in glp-1, daf-2, and mml-1 single and double mutants and observed that hexose phosphates, which have been shown to regulate MML-1 human homologs MondoA/ChREBP, were differentially regulated between mutants.

      Author response image 1.

      E. Altogether these experiments reveal that though MML-1 promotes longevity in most pathways, the hexokinases are only required in some (glp-1, isp-1), but not others (raga-1, daf-2). Furthermore, strong MML-1 nuclear localization is often but not always associated with longevity (e.g. daf-2), and the wiring of the signaling pathway is different for various longevity regimens. Consistently, mTOR and Insulin signaling are more functionally linked and therefore may show a more similar genetic profile. Differences in hexose phosphate between glp-1 and daf-2 could explain why MML-1 requires hexokinase function in glp-1 to promote longevity but not in daf-2. However, considerably more work is required to rigorously validate this hypothesis.

      (2) In figure 5, the authors investigated whether the association between PPP and MML‑1/MondoA, tested in C. elegans, is conserved in mammals under starvation conditions. The authors should clarify why they tested the MondoA localization upon starvation in cultured human cells. This comment is related to my comment #1 as the authors could determine the roles of hexokinases under dietary restriction (DR)-conditions or in DR-mimetic in eat-2(-) mutants. 

      In this case, the actual translatability to a worm longevity pathway was not our goal. Rather, we examined MondoA in cell culture under contrasting conditions of MondoA subcellular localization, where high glucose media had cytosolic/nuclear localization and starvation conditions cytosolic localization. We then showed that similar to our data in worms, PPP inhibition with 6-AN induced MondoA nuclear localization and activity. We now mention this rationale in the results section, lines 352-356.

      (3) In figure 2, the authors showed that HXK-2 regulates mitochondrial localization of MML-1, and HXK-1 regulates nuclear localization of MML-1 through mitochondrial β-oxidation in glp‑1(-) mutants. Can the authors test whether mitochondrial β-oxidation affects the effects of hxk RNAi on longevity of glp-1(-) mutants? 

      Excellent suggestion. We tried to test this idea and found that acs-2 RNAi alone abolished glp-1 longevity, making epistasis experiments difficult to interpret. This is consistent with published data showing that glp-1 longevity requires NHR-49, a transcription factor that regulates mitochondrial b‑oxidation, that drives acs-2 expression (Ratnappan et al., 2014). It could well be that b‑oxidation inhibition promotes MML-1 nuclear localization but abolishes lifespan extension because of epistatic effects on other transcription factors or processes. Further investigation would be required to elucidate the exact mechanism that goes beyond the scope of the paper.

      (4) The authors showed that 2-deoxy-glucose, which decreases the activity of HXK, decreased the nuclear localization of MML-1, and this is consistent with their genetic data. Based on these data, 2-deoxy-glucose is expected to decrease longevity. Interestingly, however, 2-deoxy-glucose has been reported to increase lifespan by restricting glucose, whereas extra glucose intake decreases lifespan in C. elegans, shown by multiple research groups, including M. Ristow, C. Kenyon, and S.J.V. Lee labs. This is seemingly paradoxical and worth discussing with key references, especially because MondoA and Chrebp are known as glucose-responsive transcription factors. 

      Thank you for this important comment. 2-DG has been shown to extend lifespan by suppressing glucose metabolism at concentrations ranging from 0.1 to 5 mM, higher concentrations ranging from 20 to 50 mM had the opposite effect decreasing lifespan (Schulz et al., 2007). The concentration we tested was 50 mM 2-DG and observed decreased MML-1 nuclear localization, which is consistent with the previous data showing decreased longevity. We now raise this point in the discussion suggesting that mild inhibition of glucose metabolism has beneficial effects on longevity, while strong suppression causes a shortening of the lifespan (lines 411-414).

      Minor comments 

      (1) The current Introduction does not include the explicit statement about that MML-1 and MondoA are homologs. Please clarify this as naive readers may be confused.

      Thank you for pointing this out. We now say in the intro that MondoA and MML-1 are homologs (lines 59-60).

      (2) In figure 1, the effects of hxk-3 on nuclear localization of MML-1 is small compared to those of hxk-1 and hxk-2. Please add speculation about why HXK-3 has different roles in nuclear localization of MML-1 compared to HXK-1 and HXK-2. 

      According to GExplore 1.4 (Hutter & Suh, 2016), hxk-3 expression declines during larval development and is low expressed in the adult. Perhaps it has little effect in the young adult, and the other hexokinases suffice to support MML-1 nuclear localization. It also remains possible that hxk-3 is not required in glp-1, but required in other longevity pathways.

      (3) The authors tested the effects of genetic inhibition of hxk-1 and hxk-2 on the regulation of MML-1 localization and lifespan of glp-1(-) mutants by using RNAi. I wonder whether the authors can perform the experiments with hxk-1 or hxk-2 loss (or reduction) of function mutants. If they cannot, please discuss the reason and the limitations of RNAi. 

      This is an important point raised by the reviewer. We found that RNAi was most effective for phenotypes related to MML-1 nuclear localization and longevity, likely because it results in acute knockdown. We also showed that pharmacological inhibition of hexokinase function with 3BrP and 2‑DG (Supplementary Figure 1B and 1C) and the PPP with 6-AN (Figure 3B) had consistent results with our observation with RNAi.

      We generated hexokinase KO mutants by deleting the coding sequence of each hexokinase by CRISPR/Cas9. First, we measured the expression of each hexokinase isozyme in each mutant. Notably, hxk-1(syb1271) null mutant had higher expression of hxk-2 and hxk-3, hxk-2(syb1261) did not significantly affect the expression of hxk-1 and hxk-3, and hxk-3(syb1267) had a mild increase in hxk-2 expression. We followed up on the hxk-1(syb1271) and hxk-2(syb1261) and crossed these mutants with our MML-1::GFP reporter. We observed a modest but significant reduction in MML-1 nuclear localization in both strains. The effect with RNAi is much stronger in comparison to the null mutants, potentially due to a compensatory upregulation of the other hexokinases in the mutants that we do not observe with RNAi (Supplementary Figure 1D-E). Another alternative is that there is a threshold in the effects of hexokinase function on MML-1 nuclear localization. We tried to generate a hxk-1; hxk-2 double mutant but it was lethal and therefore did not pursue this further.

      Author response image 2.

      (4) Please correct minor typos throughout the manuscript. Following are some examples. <br /> - On page 4, line 111, please correct "Supplementary Figure D-E" to "Supplementary Figure 1D-E". 

      - On page 9, line 272, please correct "3A-B" to "4A-B". 

      - On page 9, line 275, please correct "S4" to "4". 

      - On page 10, line 309, please correct "4A" to "4B" 

      Corrected.

      (5) In Fig. 3E, please add the information about the scale bars in figure legends.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      Here are some detailed suggestions for the authors:

      (1) Since MML-1/MXL-2 complex functions in multiple longevity models, e.g. DR, ILS, what are the roles of HXK-1 and HXK-2 in these models? 

      We now show that although mml-1 is required in most longevity pathways, hxk-1 and hxk-2 are required in some pathways (glp-1, isp-1) but not others (daf-2, raga-1). See above for more details.

      (2) As for the metabolites screening, the lipid metabolic genes can be included. Not only for the above reasons, also previous study had found that the mml-1 mRNA levels and MML-1 GFP nuclear localization were all increased in the glp-1 model, while mml-1 mRNA levels were unaffected by hxk knockdown, suggesting more pathways be involved. 

      We agree with the reviewer that understanding what metabolites regulate MML-1 nuclear localization and activity is an important, yet challenging question. Our studies demonstrate a role of glucose metabolism, in particular, hexokinase in this process, consistent with hexose-p being activators of MondoA. Our data also suggest mechanisms beyond hexose-p regulate MML-1, since knockdown of the PPP components stimulates MML-1 even when hxk-2 is depleted and low G6P, and inhibition of the PPP with 6-AN stimulates MondoA nuclear localization under starvation conditions in mammalian cell culture. We tested redox regulation, nucleoside, and lipid metabolism as candidate processes (see below). Notably, our data suggest this other mechanism is tied to lipid metabolism through droplet size since various perturbations that impact LD size and number (atgl-1, dgat-2, tkt-1, Figure 4) affected MML-1 nuclear localization. It remains an open question whether MML-1 is regulated by other metabolites through a ligand-protein interaction or not. We cannot exclude that beyond lipid droplet regulation, specific lipids, other metabolites, or metabolic modules linked to the PPP might regulate MML-1 nuclear localization and activity.

      We employed genetic manipulation and pharmacological inhibition to understand the upstream signals that regulate MML-1. These approaches will not be sufficient to determine whether other metabolite(s) are involved in MML-1/MondoA translocation to the nucleus through a direct interaction. Novel technologies that determine protein-metabolite interactions (e.g. MIDAS) will help us answer this question in future work, and go beyond the scope of this paper. As a compromise, we discuss possible metabolites that may orchestrate this based on our observations based on MML‑1 subcellular localization at LD/mitochondria (including PPP and TCA cycle intermediates).

      (3) Line 238, it should be "NADPH". 

      Corrected.

      (4) RNAi targeting enzymes of different branches of PPP can be performed

      In our initial screen, we examined the effect of various enzymes of the PPP on MML-1 nuclear localization (Figure 1A, Supplementary Table S1) and found that knockdown of enzymes in both the oxidative phase (PGDH/T25B9.9) and non-oxidative phase (transketolase/TKT-1) affect MML-1 nuclear localization. In line, 6-AN treatment, which affects the oxidative phase, also stimulated MML‑1 nuclear localization (Figure 3B). We also observed that knockdown of enzymes involved in ribose 5P conversion to ribose, ribose 1P, and phosphoribosyl pyrophosphate, an intermediate in nucleotide biosynthesis, decreased MML-1 nuclear localization (rpia-1, F07A11._5, _Y43F4B.5, _R151._2; Supplementary Table S1). Whether MML‑1/MondoA responds to nucleotide pool remains elusive.

      (5) As for PPP, these are many possibilities that can be tested. For example, as PPP supplies NADPH for oxidative balance, does MML-1 respond to ROS? Also, it appears the genes in the non-oxidative arm of PPP regulate MML-1, so is nucleotide synthesis involved? 

      Thank you for the suggestion. We tested other enzymes involved in NADPH production from the folate cycle and observed a mild but significant reduction of MML-1 nuclear localization upon dao-3i (Supplementary Table S1). Moreover, we tested whether MML-1 nuclear localization is responsive to ROS. While paraquat exposure induced oxidative stress by measuring the transcriptional reporter gst‑4p::GFP (Supplementary Figure 3A), paraquat exposure did not significantly affect MML-1 nuclear localization (Supplementary Figure 3B). Therefore we think it less likely that NADPH production acting through redox regulation is the main effect.

      We also tried supplementation with some of the metabolite outputs of PPP including ribose, ribulose, and xylulose, as well as nucleosides (see below), but saw no effect on MML-1 nuclear localization. We agree that further studies are required to pinpoint whether there is another metabolic moiety regulating MML-1 at the protein-ligand level, but this goes beyond the scope of the current investigation.

      Author response image 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study could potentially represent a step forward towards personalized medicine by combining cell-based data and a prior-knowledge network to derive Boolean-based predictive logic models to uncover altered protein/signaling networks within cancer cells. However, the level of evidence supporting the conclusions is inadequate, and further validation of the reported approach is required. If properly validated, these findings could be of interest to medical biologists working in the field of cancer and would inform drug development and treatment choices in the field of oncology.

      We thank the editor and the reviewer for their constructive comments, which helped us to improve our story. We have now performed new analyses and experiments to further support our proposed approach.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors deploy a combination of their own previously developed computational methods and databases (SIGNOR and CellNOptR) to model the FLT3 signaling landscape in AML and identify synergistic drug combinations that may overcome the resistance AML cells harboring ITD mutations in the TKI domain of FLT3 to FLT3 inhibitors. I did not closely evaluate the details of these computational models since they are outside of my area of expertise and have been previously published. The manuscript has significant issues with data interpretation and clarity, as detailed below, which, in my view, call into question the main conclusions of the paper.

      The authors train the model by including perturbation data where TKI-resistant and TKIsensitive cells are treated with various inhibitors and the activity (i.e. phosphorylation levels) of the key downstream nodes are evaluated. Specifically, in the Results section (p. 6) they state "TKIs sensitive and resistant cells were subjected to 16 experimental conditions, including TNFa and IGF1 stimulation, the presence or absence of the FLT3 inhibitor, midostaurin, and in combination with six small-molecule inhibitors targeting crucial kinases in our PKN (p38, JNK, PI3K, mTOR, MEK1/2 and GSK3)". I would appreciate more details on which specific inhibitors and concentrations were used for this experiment. More importantly, I was very puzzled by the fact that this training dataset appears to contain, among other conditions, the combination of midostaurin with JNK inhibition, i.e. the very combination of drugs that the authors later present as being predicted by their model to have a synergistic effect. Unless my interpretation of this is incorrect, it appears to be a "self-fulfilling prophecy", i.e. an inappropriate use of the same data in training and verification/test datasets.

      We thank the reviewer for this comment. We have now extensively revised the Figure 2B and edited the text to clarify and better describe the experimental conditions of our multiparametric analysis. As the reviewer stated, we have used different combinations of drugs, including midostaurin and JNK inhibitor to generate two cell-specific predictive models recapitulating the main signal transduction events, down-stream FLT3, occurring in resistant (FLT3ITD-TKD) and sensitive (FLT3ITD-JMD) cells. These experiments were performed by treating cells at very early time points to obtain a picture of the signaling response of FLT3-ITD positive cells. Indeed, we have measured the phosphorylation level of signaling proteins, because at these early time points (90 minutes) we do not expect a modulation of downstream crucial phenotypes, including apoptosis or proliferation. To infer perturbations impacting the apoptosis or proliferation phenotypes, we applied a computational two-steps strategy:

      (1) We extracted key regulators of ‘apoptosis’ and ‘proliferation’ hallmarks from SIGNOR database.

      (2) We applied our recently developed ProxPath algorithm to retrieve significant paths linking nodes of our two optimized models to ‘proliferation’ and ‘apoptosis’ phenotypes.

      This allowed us to evaluate in silico the “proliferation” and “apoptosis” rate upon inactivation of each node of the network. With the proposed approach, we identified JNK as a potential drug target to use in combination with FLT3 to restore sensitivity (i.e. in silico inducing apoptosis and reducing proliferation) of FLT3 ITD-TKD cells. We here want to stress once more that although the first piece of information (the effect of JNK and FLT3 inhibition) on sentinel readouts was provided in the training dataset, the second piece of information (the effect on this treatment over the entire model and, as a consequence, on the cellular phenotype) was purely the results of our computational models. As such, we hope that the reviewer will agree that this could not represent a “self-fulfilling prophecy".

      That said, we understand that this aspect was not clearly defined in the manuscript. For this reason, we have now 1) extensively revised the Figure 2B; 2) edited the text (pg. 6) to clarify the purpose and the results of our approach; and 3) described in further detail (pg. 16-18) the experimental conditions of our multiparametric analysis.

      (2) My most significant criticism is that the proof-of-principle experiment evaluating the combination effects of midostaurin and SP600125 in FLT3-ITD-TKD cell line model does not appear to show any synergism, in my view. The authors' interpretation of the data is that the addition of SP600125 to midostaurin rescues midostaurin resistance and results in increased apoptosis and decreased viability of the midostaurin-resistant cells. Indeed, they write on p.9: "Strikingly, the combined treatment of JNK inhibitor (SP600125) and midostaurin (PKC412) significantly increased the percentage of FLT3ITD-TKD cells in apoptosis (Fig. 4D). Consistently, in these experimental conditions, we observed a significant reduction of proliferating FLT3ITD- TKD cells versus cells treated with midostaurin alone (Fig. 4E)." However, looking at Figs 4D and 4E, it appears that the effects of the midostaurin/SP600125 combination are virtually identical to SP600125 alone, and midostaurin provides no additional benefit. No p-values are provided to compare midostaurin+SP600125 to SP600125 alone but there seems to be no appreciable difference between the two by eye. In addition, the evaluation of synergism (versus additive effects) requires the use of specialized mathematical models (see for example Duarte and Vale, 2022). That said, I do not appreciate even an additive effect of midostaurin combined with SP600125 in the data presented.

      We agree with the reviewer that the JNK inhibitor and midostaurin do not have neither a synergic nor additive effect and we have now revised the text accordingly. It is highly discussed in the scientific community whether FLT3ITD-TKD AML cells benefit from midostaurin treatments. In a recently published retroprospective study of K. Dohner et al. (Rücker et al., 2022), the authors investigated the prognostic and predictive impact of FLT3-ITD insertion site (IS) in 452 patients randomized within the RATIFY trial, which evaluated midostaurin additionally to intensive chemotherapy. Their study clearly showed that “Midostaurin exerted a significant benefit only for JMDsole” patients. In agreement with this result, we have demonstrated that midostaurin treatment had no effects on apoptosis of blasts derived from FLT3ITD-TKD patients (Massacci et al., 2023). On the other hand, we and others observed that midostaurin triggers apoptosis in FLT3ITD-TKD cells to a lesser extent as compared to FLT3ITDJMD cells (Arreba-Tutusaus et al., 2016). The data presented here (Fig. 4) and our previously published papers (Massacci et al., 2023; Pugliese et al., 2023) pinpoint that hitting cell cycle regulators (WEE1, CDK7, JNK) induce a significant apoptotic response of TKI resistant FLT3ITD-TKD cells. Prompted by the reviewer comment, we have now revised the text and discussion (pg.9; 14) highlighting the crucial role of JNK in apoptosis induction.

      (3) In my view, there are significant issues with clarity and detail throughout the manuscript. For example, additional details and improved clarity are needed, in my view, with respect to the design and readouts of the signaling perturbation experiments (Methods, p. 15 and Fig 2B legend). For example, the Fig 2B legend states: "Schematic representation of the experimental design: FLT3 ITD-JMD and FLT3 ITD-JMD cells were cultured in starvation medium (w/o FBS) overnight and treated with selected kinase inhibitors for 90 minutes and IGF1 and TNFa for 10 minutes. Control cells are starved and treated with PKC412 for 90 minutes, while "untreated" cells are treated with IGF1 100ng/ml and TNFa 10ng/ml with PKC412 for 90 minutes.", which does not make sense to me. The "untreated" cells appear to be treated with more agents than the control cells. The logic behind cytokine stimulation is not adequately explained and it is not entirely clear to me whether the cytokines were used alone or in combination. Fig 2B is quite confusing overall, and it is not clear to me what the horizontal axis (i.e. columns of "experimental conditions", as opposed to "treatments") represents. The Method section states "Key cell signaling players were analyzed through the X-Map Luminex technology: we measured the analytes included in the MILLIPLEX assays" but the identities of the evaluated proteins are not given in the Methods. At the same time, the Results section states "TKIs sensitive and resistant cells were subjected to 16 experimental conditions" but these conditions do not appear to be listed (except in Supplementary data; and Fig 2B lists 9 conditions, not 16). In my subjective view, the manuscript would benefit from a clearer explanation and depiction of the experimental details and inhibitors used in the main text of the paper, as opposed to various Supplemental files/Figures. The lack of clarity on what exactly were the experimental conditions makes the interpretation of Fig 2 very challenging. In the same vein, in the PCA analysis (Fig 2C) there seems to be no reference to the cytokine stimulation status while the authors claim that PC2 stratifies cells according to IGF1 vs TNFalpha. There are numerous other examples of incomplete or confusing legends and descriptions which, in my view, need to be addressed to make the paper more accessible.

      We thank the reviewer for his/her comment. We have now extensively revised the text of the manuscript (pg. 6), revised Fig. 2B (now Fig 2C) and methods (pg. 16-18) to improve the clarity of our manuscript, making the take-home messages more accessible. We believe that the revised versions of text and of Figure 2 better explain our strategy and clarify the experimental set up, we added details on the choices of the experimental conditions, and we proposed a better graphic representation of the analysis.

      (4) I am not sure that I see significant value in the patient-specific logic models because they are not supported by empirical evidence. Treating primary cells from AML patients with relevant drug combinations would be a feasible and convincing way to validate the computational models and evaluate their potential benefit in the clinical setting.

      We thank the reviewer for this comment. We have now performed additional experiments in a small cohort of FLT3-ITD positive patient-derived primary blasts. Specifically, we have treated blasts from 2 FLT3ITD-TKD patients and 3 FLT3ITD-JMD+TKD patients with PKC412 (100nM) 24h and/or 10μM SP600125 (JNK inhibitor). After 24h of treatment we have measured the apoptotic rate. As shown below and in the new Fig. 4F (see pg.10, main text), midostaurin triggers higher levels of apoptosis in FLT3ITD-JMD+TKD blasts as compared to FLT3ITD-TKD blasts. Importantly, treatment with the JNK inhibitor SP600125 alone triggers apoptosis in FLT3ITD-TKD blasts, validating the crucial role of JNK in FLT3ITD-TKD cell survival and TKI resistance. The combined treatment of midostaurin and SP600125 increases the percentage of apoptotic cells as compared to midostaurin treatment alone but to a lesser extent than single agent treatment. This result is in agreement with the current debate in the scientific community on the actual beneficial effect of midostaurin treatment in FLT3ITD-TKD AML patients.

      Author response image 1.

      Primary samples from AML patients with the FLT3ITD-TKD mutation (n=2, yellow bars) or the FLT3ITD-JMD/TKD mutation (n=3, blue bars) were exposed to Midostaurin (100nM, PKC412), and JNK inhibitor (10µM, SP600125) for 48 hours, or combinations thereof. The specific cell death of gated AML blasts was calculated to account for treatment-unrelated spontaneous cell death. The bars on the graph represent the mean values with standard errors.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Latini et al describes a methodology to develop Boolean-based predictive logic models that can be applied to uncover altered protein/signalling networks in cancer cells and discover potential new therapeutic targets. As a proof-of-concept, they have implemented their strategy on a hematopoietic cell line engineered to express one of two types of FLT3 internal tandem mutations (FLT3-ITD) found in patients, FLT3-ITD-TKD (which are less sensitive to tyrosine kinase inhibitors/TKIs) and FLT3-ITD-JMD (which are more sensitive to TKIs).

      Strengths:

      This useful work could potentially represent a step forward towards personalised targeted therapy, by describing a methodology using Boolean-based predictive logic models to uncover altered protein/signalling networks within cancer cells. However, the weaknesses highlighted below severely limit the extent of any conclusions that can be drawn from the results.

      Weaknesses:

      While the highly theoretical approach proposed by the authors is interesting, the potential relevance of their overall conclusions is severely undermined by a lack of validation of their predicted results in real-world data. Their predictive logic models are built upon a set of poorlyexplained initial conditions, drawn from data generated in vitro from an engineered cell line, and no attempt was made to validate the predictions in independent settings. This is compounded by a lack of sufficient experimental detail or clear explanations at different steps. These concerns considerably temper one's enthusiasm about the conclusions that could be drawn from the manuscript.

      We thank the reviewer for the thorough review and kind comments about our manuscript. We hope the changes and new data we provide further strengthen it in his or her eyes.

      Some specific concerns include:

      (1) It remains unclear how robust the logic models are, or conversely, how affected they might be by specific initial conditions or priors that are chosen. The authors fail to explain the rationale underlying their input conditions at various points. For example: - at the start of the manuscript, they assert that they begin with a pre-PKN that contains "76 nodes and 193 edges", though this is then ostensibly refined with additional new edges (as outlined in Fig 2A). However, why these edges were added, nor model performance comparisons against the basal model are presented, precluding an evaluation of whether this model is better.

      We understand the reviewer’s concern. We have now complemented the manuscript with an extended version of the proposed modelling strategy offering a detailed description of the pipeline and the rationale behind each choice (Supplementary material, pg.14-19). Furthermore, we also referenced the manuscript to a GitHub repository where users can follow and reproduce each step of the pipeline (https://github.com/SaccoPerfettoLab/FLT3ITD_driven_AML_Boolean_models).

      • At a later step (relevant to Fig S4 and Fig 3), they develop separate PKNs, for each of the mutation models, that contain "206 [or] 208 nodes" and "756 [or] 782 edges", without explaining how these seemingly arbitrary initial conditions were arrived at. Their relation to the original parameters in the previous model is also not investigated, raising concerns about model over-fitting and calling into question the general applicability of their proposed approach. The authors need to provide a clearer explanation of the logic underlying some of these initial parameter selections, and also investigate the biological/functional overlap between these sets of genes (nodes).

      We thank the reviewer for raising this question. Very briefly, the proposed optimization strategy falls in a branch of the modelling, where the predictive model is, indeed, driven by the data (Blinov and Moraru, 2012). From a certain point of view, the scope of optimization is the one of fitting the experimental data in the best way possible. To achieve this, we followed standard practices (Dorier et al., 2016; Traynard et al., 2017). To address the issue of “calling into question the general applicability of their proposed approach”, we have compared the activity status of nodes in the models with ‘real data’ extracted from cell lines and patients’ samples to reassure about the robustness and scalability of the strategy (please see below, response to point 3 pg. 9).

      Finally, as mentioned in the previous point, we have now provided a detailed supplementary material, where we have described all the aspects mentioned by the reviewer: step-by-step changes in the PKN, the choice of the parameters and other details can be traced over the novel text and are also available in the GitHub repository (https://github.com/SaccoPerfettoLab/FLT3-ITD_driven_AML_Boolean_models).

      (2) There is concern about the underlying experimental data underpinning the models that were generated, further compounded by the lack of a clear explanation of the logic. For example, data concerning the status of signalling changes as a result of perturbation appears to be generated from multiplex LUMINEX assays using phosphorylation-specific antibodies against just 14 "sentinel" proteins. However, very little detail is provided about the rationale underlying how these 14 were chosen to be "sentinels" (and why not just 13, or 15, or any other number, for that effect?). How reliable are the antibodies used to query the phosphorylation status? What are the signal thresholds and linear ranges for these assays, and how would these impact the performance/reliability of the logic models that are generated from them?

      We thank the reviewer for this comment as it gives us the opportunity to clarify and better explain the criteria behind the experimental data generation.

      Overall, we revised the main text at page 6 and the Figure 2B to improve the clarity of our experimental design. Specifically, the sentinels were chosen because they were considered indirect or direct downstream effectors of the perturbations and were conceived to serve as both a benchmarking system of the study and a readout of the global perturbation of the system. To clarify this aspect, we have added a small network (compressed PKN) in Figure 2B to show that the proteins (green nodes) we chose to measure in the LUMINEX multiplex assay are “sentinels” of the activity of almost all the pathways included in the Prior knowledge network. Moreover, we implemented the methods section “Multiparametric experiment of signaling perturbation” (pg. 16-18), where we added details about the antibodies used in the assay paired with the target phosphosites and their functional role (Table 3). We also better specified the filtering process based on the number of beads detected per each antibody used (pg. 18). About the reliability of the measurements, we can say that the quality of the perturbation data impacts greatly on the logic models’ performance. xMAP technology been already used by the scientific community to generate highly reproducible and reliable multiparametric dataset for model training (Terfve et al., 2012). Additionally, we checked that for each sentinel we could measure a fully active state, a fully inactive state and intermediate states. Modulation of individual analytes are displayed in Figure S3.

      Author response image 2.

      Partial Figure of normalization of analytes activity through Hill curves. Experimental data were normalized and scaled from 0 to 1 using analyte-specific Hill functions. Raw data are reported as triangles, normalized data and squares. Partial Figure representing three plots of the FLT3 ITD-JMD data (Complete Figure in Supplementary material Fig S3).

      (3) In addition, there are publicly available quantitative proteomics datasets from FLT3-mutant cell lines and primary samples treated with TKIs. At the very least, these should have been used by the authors to independently validate their models, selection of initial parameters, and signal performance of their antibody-based assays, to name a few unvalidated, yet critical, parameters. There is an overwhelming reliance on theoretical predictions without taking advantage of real-world validation of their findings. For example, the authors identified a set of primary AML samples with relevant mutations (Fig 5) that could potentially have provided a valuable experimental validation platform for their predictions of effective drug combination. Yet, they have performed Boolean simulations of the predicted effects, a perplexing instance of adding theoretical predictions on top of a theoretical prediction!

      Additionally, there are datasets of drug sensitivity on primary AML samples where mutational data is also known (for example, from the BEAT-AML consortia), that could be queried for independent validation of the authors' models.

      We thank the reviewer for this comment that helped us to significantly strengthen our story. Prompted by his/her comment, we have now queried three different datasets for independent validation of our logic models. Specifically, we have taken advantage of quantitative phosphoproteomics datasets of FLT3-ITD cell lines treated with TKIs (Massacci et al., 2023), phosphoproteomic data of FLT3-ITD positive patients-derived primary blast (Kramer et al., 2022) and of drug sensitivity data on primary FLT3-ITD positive AML samples (BEAT-AML consortia)

      • Comparison with phosphoproteomic data of FLT3-ITD cell lines treated with TKIs (Massacci et al., 2023)

      Here, we compared the steady state of our model upon FLT3 inhibition with the phosphoproteomic data describing the modulation of 16,319 phosphosites in FLT3-ITD BaF3 cells (FLT3ITD-TKD and FLT3ITD-JMD) upon TKI treatment (i.e. quizartinib, a highly selective FLT3 inhibitor). As shown in the table below and new Figure S5A, the activation status of the nodes in the two generated models is highly comparable with the level of regulatory phosphorylations reported in the reference dataset. Briefly, to determine the agreement between each model and the independent dataset, we focused on the phosphorylation level of specific residues that (i) regulate the functional activity of sentinel proteins (denoted in the ‘Mode of regulation’ column) and (ii) that were measured in this work to train the model. So, we cross-referenced the sentinel protein status in FLT3 inhibition simulation (as denoted in the 'Model simulation of FLT3 inhibition' column) with the functional impact of phosphorylation measured in Massacci et. al dataset (as denoted in the 'Functional impact in quizartinib dataset' column). Points of congruence were summarized in the 'Consensus' column. As an example, if the phosphorylation level of an activating residue decreases (e.g., Y185 of Mapk1), we can conclude that the protein is inhibited (‘Down-reg’) and this is coherent with model simulation in which Mapk1 is ‘Inactive’.

      Author response image 3.

      • Comparison with phosphoproteomic data of FLT3-ITD patient-derived primary blasts (Kramer et al., 2022)

      Using the same criteria, we extended our validation efforts by comparing the activity status of the proteins in the “untreated” simulation (i.e. reproducing the tumorigenic state where FLT3, IGF1R and TNFR are set to be active) with their phosphorylation levels in the dataset by Kramer et al. (Kramer et al., 2022). Briefly, this dataset gathers phosphoproteomic data from a cohort of 44 AML patients and we restricted the analysis to 11 FLT3-ITD-positive patients. Importantly, all patients carry the ITD mutation in the juxta membrane domain (JMD), thus allowing for the comparison with FLT3 ITD-JMD specific Boolean model, exclusively.

      The results are shown in the heatmap below. Each cell in the heatmap reports the phosphorylation level of sentinel proteins’ residues in the indicated patient (red and blue indicate up- or- down-regulated phosphoresidues, respectively). Patients were clustered according to Pearson correlation. We observed a good level of agreement between the patients’ phosphoproteomics data and our model (reported in the column “Tumor simulation steady state”) for a subset of patients highlighted within the black rectangle. However, for the remaining patients, the level of agreement is poor. The main reason is that our work focuses on FLT3-ITD signaling and a systematic translation of the Boolean modeling approach to the entire cohort of AML patients would require the inclusion of the impact of other driver mutations in the network. This is actually a current and a future line of investigation of our group. We have revised the discussion, taking this result into consideration.

      Author response image 4.

      • Comparison with drug sensitivity data on primary FLT3-ITD positive AML samples (BEAT-AML consortia)

      Here we took advantage of the Beat AML programme on a cohort of 672 tumour specimens collected from 562 patients. The BEAT AML consortium provides whole-exome sequencing, RNA sequencing and analyses of ex vivo drug sensitivity of this large cohort of patient-derived primary blasts. We focused on drug sensitivity screening on 134 patients carrying the typical FLT3-ITD mutation in the JMD region. Unfortunately, the ITD insertion in the TKD region is less characterized and additional in-depth sequencing studies are required to identify in this cohort FLT3ITD-TKD positive blasts. Next, we focused on those compounds hitting nodes present in the FLT3ITD-JMD Boolean model. Specifically, we selected drugs inhibiting FLT3, PI3K, mTOR, JNK and p38 and we calculated the average IC50 of FLT3ITD-JMD patient-derived primary blasts for each drug. These results are reported as a bar graph in the new Fig. S5B and below (upper panel) and were compared with the apoptotic and proliferation rate measured in silico simulation of the FLT3ITD-JMD Boolean model. Drug sensitivity screening on primary FLT3ITD-JMD blasts revealed that inhibition of FLT3, PI3K and mTOR induces cell death at low drug concentrations in contrast with JNK and p38 inhibitors showing higher IC50 values. These observations are consistent with our simulation results of the FLT3ITD-JMD model. As expected, in silico inhibition of FLT3 greatly impacts apoptosis and proliferation. Additionally, in silico suppression of mTOR and to a lesser extent PI3K and p38 affect apoptosis and proliferation. Of note, JNK inhibition neither in silico nor in vitro seems to affect viability of FLT3ITD-JMD cells.

      Author response image 5.

      Altogether these publicly available datasets independently validate our models, strengthening the reliability and robustness of our approach.

      We have now revised the main text (pg. 8; 9) and added a new Figure (Fig. S5) in the supplementary material; we collected the results of the analysis in TableS6.

      (4) There are additional examples of insufficient experimental detail that preclude a fuller appreciation of the relevance of the work. For example, it is alluded that RNA-sequencing was performed on a subset of patients, but the entire methodological section detailing the RNA-seq amounts to just 3 lines! It is unclear which samples were selected for sequencing nor where the data has been deposited (or might be available for the community - there are resources for restricted/controlled access to deidentified genomics/transcriptomics data).

      We apologize for the lack of description regarding the RNA sequencing of patient samples. We have now added details of this approach in the method section (pg. 24), clearly explained in text how we selected the patients for the analysis. Additionally, data has now been deposited in the GEO database (accession number: GSE247483).

      The sentences we have rephrased are below:

      “We analyzed the mutational and expression profiles of 262 genes (Table S7), relevant to hematological malignancies in a cohort of 14 FLT3-ITD positive de novo AML patients (Fig. 5A, panel a). Since, follow-up clinical data were available for 10 out of 14 patients (Fig. 5B, Table S9), we focused on this subset of patients. Briefly, the classification of these 10 patients according to their ITD localization (see Methods) was as follows: 8 patients with FLT3ITD-JMD, 4 with FLT3ITD-JMD+TKD, and 2 with FLT3ITD-TKD (Fig. 5A, panel b). The specific insertion sites of the ITD in the patient cohort are shown in Table S8.

      Similarly, in the "combinatory treatment inference" methods, it states "...we computed the steady state of each cell line best model....." and "Then we inferred the activity of "apoptosis" and "proliferation" phenotypes", without explaining the details of how these were done. The outcomes of these methods are directly relevant to Fig 4, but with such sparse methodological detail, it is difficult to independently assess the validity of the presented data.

      Overall, the theoretical nature of the work is hampered by real-world validation, and insufficient methodological details limit a fuller appreciation of the overall relevance of this work.

      We thank the reviewer for the insightful feedback regarding the methodology in our paper.<br /> About ‘real-world validation’ we have extensively replied to this issue in point 3 (pg. 9-14 of this document). For what concerns the ‘insufficient methodological details’, we have made substantial improvements to enhance clarity and reproducibility, that encompass: (i) revisions in the main text and in the Materials and Methods section; (ii) detailed explanation of each step and decisions taken that can be accessed either as an extended Materials and Methods section (Supplementary material, pg. 14-19) and through our GitHub repository (https://github.com/SaccoPerfettoLab/FLT3-ITD_driven_AML_Boolean_models). We sincerely hope this addition addresses concerns and facilitates a more thorough and independent assessment of our work.

      Reviewer #3 (Public Review):

      Summary:

      The paper "Unveiling the signaling network of FLT3-ITD AML improves drug sensitivity prediction" reports the combination of prior knowledge signaling networks, multiparametric cell-based data on the activation status of 14 crucial proteins emblematic of the cell state downstream of FLT3 obtained under a variety of perturbation conditions and Boolean logic modeling, to gain mechanistic insight into drug resistance in acute myeloid leukemia patients carrying the internal tandem duplication in the FLT3 receptor tyrosine kinase and predict drug combinations that may reverse pharmacoresistant phenotypes. Interestingly, the utility of the approach was validated in vitro, and also using mutational and expression data from 14 patients with FLT3-ITD positive acute myeloid leukemia to generate patient-specific Boolean models.

      Strengths:

      The model predictions were positively validated in vitro: it was predicted that the combined inhibition of JNK and FLT3, may reverse resistance to tyrosine kinase inhibitors, which was confirmed in an appropriate FLT3 cell model by comparing the effects on apoptosis and proliferation of a JNK inhibitor and midostaurin vs. midostaurin alone.

      Whereas the study does have some complexity, readability is enhanced by the inclusion of a section that summarizes the study design, plus a summary Figure. Availability of data as supplementary material is also a high point.

      We thank the reviewer for his/her constructive comments about our manuscript. We believe that our story has been significantly strengthened by the changes and new data we provided.

      Weaknesses:

      (1) Some aspects of the methodology are not properly described (for instance, no methodological description has been provided regarding the clustering procedure that led to Figs. 2C and 2D).

      We apologize for the lack of proper description of the methodology. We have extensively revised the methods section and worked to improve the clarity. We have now added a description of the clustering procedures in the methods section (pg. 19) of new Fig. S2D., Fig. S2E.

      It is not clear in the manuscript whether the patients gave their consent to the use of their data in this study, or the approval from an ethical committee. These are very important points that should be made explicit in the main text of the paper.

      We thank the reviewer for this comment. We have now added the following sentence (pg. 24): “Peripheral blood (PB) samples from 14 AML patients were obtained upon patient’s informed consent.”

      The authors claim that some of the predictions of their models were later confirmed in the follow-up of some of the 14 patients, but it is not crystal clear whether the models helped the physicians to make any decisions on tailored therapeutic interventions, or if this has been just a retrospective exercise and the predictions of the models coincide with (some of) the clinical observations in a rather limited group of patients. Since the paper presents this as additional validation of the models' ability to guide personalized treatment decisions, it would be very important to clarify this point and expand the presentation of the results (comparison of observations vs. model predictions).

      As described in the introduction section, this study was inspired by an urgent clinical problem in AML research: patients carrying the ITD in the TKD domain of the FLT3 receptor display poor prognosis and do not respond to current therapy: Midostaurin (which on the other hand is effective in patients with the ITD in the JMD domain).

      To fill this gap, we gathered a team of 18 participants, of which 7 have a clinical background and have expertise in the diagnosis, treatment and management of AML patients and 5 are experts in Boolean modeling. The scope of the project is the development of a computational approach to identify possible alternative solutions for FLT3ITD-TKD AML patients, generating future lines of investigations. Drug combinations are currently under investigation as a potential means of avoiding drug resistance and achieving more effective and durable treatment responses. However, it is impractical to test for potential synergistic properties among all available drugs using empirical experiments alone. With our approach, we developed models that recreated in silico the main differences in the signaling of sensitive and resistant cells to support the prioritization of novel therapies. Prompted by the reviewer suggestions, we have now extended the validation of our models, through the comparison with publicly available cell lines and patient-derived dataset. We have also confirmed our results by performing in vitro experiments in patient-derived primary blasts treated with midostaurin and/or JNK inhibitor. Importantly, we have already demonstrated that hitting cell cycle regulators in FLT3ITD-TKD cells can be an effective approach to kill resistant leukemia cells (Massacci et al., 2023; Pugliese et al., 2023). We are aware that changing the clinical practice and the therapies for patients require a proper clinical study which goes far beyond the scope of this manuscript.

      However, we hope that our results can be translated soon from “bench-to-bed”. Importantly, we believe that our study can open lines of investigations aimed at the application of our approach to identify promising therapeutic strategies in other clinical settings.

      Recommendations for the authors

      The reviewers have highlighted significant issues regarding the inadequate level of evidence to support some of the conclusions, plus lack of an exhaustive methodological description that may jeopardize reproducibility.

      We hope that the editor and the reviewers will appreciate the extensive revision we made and new data and analysis we provided to strengthen our story.

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig 2D the hierarchical tree is off-set in relation to the treatment symbols and names in the middle of the Figure. In addition, I do not see FLT3i combination with JNKi in the JMD cells (perhaps, a coloring error?).

      We thank the reviewer for this observation. We have now revised the hierarchical tree, which is now in Figure S2D, we have aligned the tree with the symbols and names and corrected the colouring error for the sample FLT3i+JNKi in JMD cells.

      (2) Midostaurin and PKC412 refer to the same drug and are used interchangeably in the manuscript. Using one name consistently would improve readability.

      We have now improved the readability of the text and the Figures by choosing “Midostaurin” when we refer to the FLT3 inhibitor.

      (3) It is not clear to me why the FLT3-ITD-JMD cells are not presented in Fig. 4B. Perhaps their values are 0? In that case, the readability would be improved by including a thin blue line representing zero values. Additionally, on p.8 the authors state "Interestingly, in the FLT3ITDTKD model, the combined inhibition of JNK and FLT3, exclusively, in silico restores the TKI sensitivity, as revealed by the evaluation of the apoptosis and proliferation levels (Fig. 4B-C)." but Fig. 4C shows no differential effects of JNK inhibition in sensitive versus resistant cells.

      To address the reviewer's point, we’ve added a thin blue line representing the zero values of the FLT3ITD-JMD in the results of the simulations in Figure 4B. Regarding the Figure 4C, the reviewer is right in saying that there is no difference in terms of proliferation between sensitive and resistant cells upon JNKi and FLT3i co-inhibition. However, we can see lower proliferation levels in both cell lines as compared to the “untreated” condition. Indeed, the simulation suggests that by combining JNK and FLT3 inhibition we restore the resistant phenotype lowering the proliferation rate of the resistant cells to the TKI-sensitive levels.

      Reviewer #2 (Recommendations For The Authors):

      I have addressed a number of concerns in the public review. Much better effort needs to be made to provide sufficient methodological detail (to permit independent validation by a sufficiently capable and motivated party) and explain the rationale of important parameter selections. Furthermore, I urge the authors to take advantage of the plethora of publicly available real-world data to validate their predicted outcomes.

      We are grateful to the reviewer for the careful revisions. All the aspects raised have been discussed in the specific sections of the public review. In summary, we have provided more methodological details, by revising the text, the methods session, by adding a new step-by-step description of the modelling strategy, the parameters and the criteria adopted in each phase (supplementary methods) and by referring to the entire code developed. Prompted by the reviewer suggestions, we have performed a novel and extensive comparison of our model with three different publicly available datasets. This analysis significantly strengthens our story, and a new supplementary Figure (Fig. S5) summarizes our findings (pg. 9-14 of this document).

      Reviewer #3 (Recommendations For The Authors):

      (1) At first sight, the distribution of the data points in the PCA space does not really seem to speak of nice clustering. Have the authors computed any clustering validation metric to assess if their clustering strategy is adequate and how informative the results are? Further analysis of this point of the article is precluded by the absence of a clear methodological description.

      Here we have used the PCA analysis to obtain a global view of our complex multiparametric data. We have now worked on the PCA to improve its readability. As shown in the new Figure 2D, PCA analysis showed that the activity level of sentinel proteins stratifies cells according to FLT3 activation status (component 1: presence vs absence of FLT3i) and cytokine stimulation (component 2: IGF1 vs TNF⍺). We have now added new experimental details on this part in the methods section (pg. 19) and we deposited the code used for the clustering strategy on the GitHub repository (https://github.com/SaccoPerfettoLab/FLT3ITD_driven_AML_Boolean_models).

      (2) Whereas scientists and medical professionals who work in the field of oncology may be familiar with some of the abbreviations used here, it would be good for improved readability by a more general audience to make sure that all the abbreviations (e.g., TKI) are properly defined the first time that they appear in the text.

      We thank the reviewer for this observation. To improve the readability of the text, we properly defined all the abbreviations in their first appearance, and we added the “Abbreviation” paragraph at page 15 of the manuscript to summarize them all.

      (3) How were the concentrations of the combined treatments chosen in the cell assays used as validation?

      We thank the reviewer for giving us the chance to clarify this point. We implemented the Methods with additional information about the treatments used in the validations. We detailed the SP600125 IC50 evaluation and usage in our cell lines (pg.22): IC50 values are approximately 1.5 µM in FLT3-ITD mutant cell lines; the SP600125 treatment affects cell viability, reaching a plateau phase of cell death and at about 2 µM. I used the minimal dose of SP600125 (10µM) to properly inhibit JNK. (Kim et al., 2010; Moon et al., 2009).

      We also specified (pg.22) that the concentration of Midostaurin was chosen based on the previously published work (Massacci et al., 2022): FLT3 ITD-TKD cells treated with Midostaurin 100nM show lower apoptotic rate and higher cell viability compared to FLT3 ITD-JMD cells.

      The concentration of SB203580 and UO126 was chosen based on previous data available in the lab and set up experiments (pg.22).

      (4) The authors say that "we were able to derive patient-specific signaling features and enable the identification of potential tailored treatments restoring TKI resistance" and that "our predictions were confirmed by follow-up clinical data for some patients". However, the results section on this part of the manuscript is rather scarce (the main text should be much more descriptive about the results summarized in Fig. 5, which are not self-explanatory).

      We thank the reviewer for this observation. We have now expanded the text to provide a more comprehensive description of the results about personalized Boolean model generation and usage and the content presented in Fig. 5 (pg.10-12).

      (5) I do not really agree with the final conclusion about this paper being "the proof of concept that our personalized informatics approach described here is clinically valid and will enable us to propose novel patient-centered targeted drug solutions". First, the clinical data used here belongs to a rather low number of patients. Second, as mentioned before, it is not clear if the models have been used to make any prospective decision or if this conclusion is drawn from an in vitro assay plus a retrospective analysis on a limited number of patients. Moreover, a description of the results and the discussion of the part of the manuscript dealing with patientspecific models is rather scarce, and it is difficult to see how the authors support their conclusions. Also, the statement " In principle, the generalization of our strategy will enable to obtain a systemic perspective of signaling rewiring in different cancer types, driving novel personalized approaches" may be a bit overoptimistic if one considers that so far, the approach has only been applied to a single type of drug-resistant cancer.

      We thank the reviewer for this comment. We agree with the referees that the clinical data we used belongs to a rather low number of patients. However, during the revision we have extensively worked to support the clinical relevance of our models and our discoveries. Specifically, we have compared our Boolean logic models with two different publicly available datasets on phosphoproteomics and drug sensitivity of FLT3ITD-JMD and FLT3ITD-TKD cell lines and blasts (FigS5 and answer to reviewer 2, point 3). Importantly, these datasets independently validated our models, highlighting that our approach has a translational value. Additionally, we have performed novel experiments by measuring the apoptotic rate of patient-derived primary blasts upon pharmacological suppression of JNK (Fig. 4H, pg. 10 of main text). Our data highlights that our approach has the potential to suggest novel effective treatments.

      That said, we have now revised the discussion to avoid overstatements.

      References

      Arreba-Tutusaus, P., Mack, T.S., Bullinger, L., Schnöder, T.M., Polanetzki, A., Weinert, S., Ballaschk, A., Wang, Z., Deshpande, A.J., Armstrong, S.A., Döhner, K., Fischer, T., Heidel, F.H., 2016. Impact of FLT3-ITD location on sensitivity to TKI-therapy in vitro and in vivo. Leukemia 30, 1220–1225. https://doi.org/10.1038/leu.2015.292

      Blinov, M.L., Moraru, I.I., 2012. Logic modeling and the ridiculome under the rug. BMC Biol 10, 92. https://doi.org/10.1186/1741-7007-10-92

      Dorier, J., Crespo, I., Niknejad, A., Liechti, R., Ebeling, M., Xenarios, I., 2016. Boolean regulatory network reconstruction using literature based knowledge with a genetic algorithm optimization method. BMC Bioinformatics 17, 410. https://doi.org/10.1186/s12859-016-1287-z

      Kramer, M.H., Zhang, Q., Sprung, R., Day, R.B., Erdmann-Gilmore, P., Li, Y., Xu, Z., Helton, N.M., George, D.R., Mi, Y., Westervelt, P., Payton, J.E., Ramakrishnan, S.M., Miller, C.A., Link, D.C., DiPersio, J.F., Walter, M.J., Townsend, R.R., Ley, T.J., 2022. Proteomic and phosphoproteomic landscapes of acute myeloid leukemia. Blood 140, 1533–1548. https://doi.org/10.1182/blood.2022016033

      Massacci, G., Venafra, V., Latini, S., Bica, V., Pugliese, G.M., Graziosi, S., Klingelhuber, F., Krahmer, N., Fischer, T., Mougiakakos, D., Boettcher, M., Perfetto, L., Sacco, F., 2023. A key role of the WEE1-CDK1 axis in mediating TKI-therapy resistance in FLT3-ITD positive acute myeloid leukemia patients. Leukemia 37, 288–297. https://doi.org/10.1038/s41375-022-01785-w

      Pugliese, G.M., Venafra, V., Bica, V., Massacci, G., Latini, S., Graziosi, S., Fischer, T., Mougiakakos, D., Boettcher, M., Perfetto, L., Sacco, F., 2023. Impact of FLT3-ITD location on cytarabine sensitivity in AML: a network-based approach. Leukemia 37, 1151–1155. https://doi.org/10.1038/s41375-023-01881-5

      Rücker, F.G., Du, L., Luck, T.J., Benner, A., Krzykalla, J., Gathmann, I., Voso, M.T., Amadori, S., Prior, T.W., Brandwein, J.M., Appelbaum, F.R., Medeiros, B.C., Tallman, M.S., Savoie, L., Sierra, J., Pallaud, C., Sanz, M.A., Jansen, J.H., Niederwieser, D., Fischer, T., Ehninger, G., Heuser, M., Ganser, A., Bullinger, L., Larson, R.A., Bloomfield, C.D., Stone, R.M., Döhner, H., Thiede, C., Döhner, K., 2022. Molecular landscape and prognostic impact of FLT3-ITD insertion site in acute myeloid leukemia: RATIFY study results. Leukemia 36, 90–99. https://doi.org/10.1038/s41375-021-01323-0

      Terfve, C., Cokelaer, T., Henriques, D., MacNamara, A., Goncalves, E., Morris, M.K., van Iersel, M., Lauffenburger, D.A., Saez-Rodriguez, J., 2012. CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Syst Biol 6, 133. https://doi.org/10.1186/1752-0509-6-133

      Traynard, P., Tobalina, L., Eduati, F., Calzone, L., Saez-Rodriguez, J., 2017. Logic Modeling in Quantitative Systems Pharmacology: Logic Modeling in Quantitative Systems Pharmacology. CPT Pharmacometrics Syst. Pharmacol. 6, 499–511. https://doi.org/10.1002/psp4.12225

    1. Author response:

      The following is the authors’ response to the original reviews.

      Major changes in the revised manuscript include:

      (1) The distinction between condition-dependent versus condition-independent variation in neural activity has been clarified. 

      (2) Principal angle calculations have been added. 

      (3) Neurons modulated during action execution but not during action observation have been analyzed to compare and contrast with mirror neurons. 

      (4) Canonical correlation analysis has been extended to three dimensions. 

      (5) Speculations have been moved to and modified in the Discussion. 

      (6) Computational details have been expanded in the Methods.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary and strengths. This paper starts with an exceptionally fair and balanced introduction to a topic, the mirror neuron literature, which is often debated and prone to controversies even in the choice of the terminology. In my opinion, the authors made an excellent job in this regard, and I really appreciated it. Then, they propose a novel method to look at population dynamics to compare neural selectivity and alignment between execution and observation of actions performed with different types of grip. 

      Thank you.

      Weakness.

      Unfortunately, the goal and findings within this well-described framework are less clear to me. The authors aimed to investigate, using a novel analytic approach, whether and to what extent a match exists between population codes and neural dynamics when a monkey performs an action or observes it performed by an experimenter. This motivation stems from the fact that the general evidence in the literature is that the match between visual and motor selectivity of mirror neuron responses is essentially at a chance level. While the approach devised by the author is generally well-described and understandable, the main result obtained confirms this general finding of a lack of matching between the two contexts in 2 out of the three monkeys. Nevertheless, the authors claim that the patterns associated with execution and observation can be re-aligned with canonical correlation, indicating that these distinct neural representations show dynamical similarity that may enable the nervous system to recognize particular actions. This final conclusion is hardly acceptable to me, and constitutes my major concern, at least without a more explicit explanation: how do we know that this additional operation can be performed by the brain? 

      Point taken.  In the Discussion, we now have clarified that this is our speculation rather than a conclusion and we also offer an alternative interpretation (lines 724 to 744):

      “One classic interpretation of similar latent dynamics in the PM MN population during execution and observation would be that this similarity provides a means for the brain to recognize similar movements performed by the monkey during execution and by the experimenter during observation. Through some process akin to a communication subspace (Semedo et al., 2019), brain regions beyond PM might recognize the correspondence between the latent dynamics of the executed and observed actions.

      Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021). Though neurons active only during observation of others (AO units) have been hypothesized to drive observation activity in MNs, the present AO populations were too small to analyze with the approaches we applied here.  Nevertheless, the similar relative organization of the execution and observation population activity in PM MNs revealed here by alignment of their latent dynamics through CCA could constitute a correspondence between particular movements that might be made by the subject in response to particular movements made by the other individual, i.e. responsive movements which would not necessarily be motorically similar to the observed movements.”

      Is this a computational trick to artificially align something that is naturally non-aligned, or can it capture something real and useful? 

      We feel this is more than a trick.  In the Introduction, we now have clarified (lines 166 to 170):

      “Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population.”

      In the Results we give the follow example (lines 446 to 455):

      “Such alignment would indicate that neural representations of trials involving the four objects bore a similar relationship to one another in neural space during execution and observation, even though they occurred in different subspaces.  For example, the trajectories of PMd+M1 neuron populations recorded from two different monkeys during center-out reaching movements could be aligned well (Safaie et al., 2023).  CCA showed, for example, that in both brains the neural trajectory for the movement to the target at 0° was closer to the trajectory for movement to the target at 45° than to the trajectory for the movement to the target at 180°. Relationships among these latent dynamic representations of the eight movements thus were similar even though the neural populations were recorded from two different monkeys.”

      And in the Discussion we now compare (lines 677 to 686):

      “Corresponding neural representations of action execution and observation during task epochs with higher neural firing rates have been described previously in PMd MNs and in PMv MNs using representational similarity analysis RSA (Papadourakis and Raos, 2019).  And during force production in eight different directions, neural trajectories of PMd neurons draw similar “clocks” during execution, cooperative execution, and passive observation (Pezzulo et al., 2022).  Likewise in the present study, despite execution and observation trajectories progressing through largely distinct subspaces, in all three monkeys execution and observation trajectory segments showed some degree of alignment, particularly the Movement and Hold segments (Figure 8C), indicating similar relationships among the latent dynamic representations of the four RGM movements during execution and observation.”

      Based on the accumulated evidence on space-constrained coding of others' actions by mirror neurons (e.g., Caggiano et al. 2009; Maranesi et al. 2017), recent evidence also cited by the authors (Pomper et al. 2023), and the most recent views supported even by the first author of the original discovery (i.e., Vittorio Gallese, see Bonini et al. 2022 on TICS), it seems that one of the main functions of these cells, especially in monkeys, might be to prepare actions and motor responses during social interaction rather than recognizing the actions of others - something that visual brain areas could easily do better than motor ones in most situations. In this perspective, and given the absence of causal evidence so far, the lack of visuo-motor congruence is a potentially relevant feature of the mechanism rather than something to be computationally cracked at all costs. 

      We agree that this perspective provides a valuable interpretation of our findings.  In the Discussion, we have added the following paragraph (lines 730 to 744):

      “Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021). Though neurons active only during observation of others (AO units) have been hypothesized to drive observation activity in MNs, the present AO populations were too small to analyze with the approaches we applied here.  Nevertheless, the similar relative organization of the execution and observation population activity in PM MNs revealed here by alignment of their latent dynamics through CCA could constitute a correspondence between particular movements that might be made by the subject in response to particular movements made by the other individual, i.e. responsive movements which would not necessarily be motorically similar to the observed movements.”

      Specific comments on Results/Methods: 

      I can understand, based on the authors' hypothesis, that they employed an ANOVA to preliminarily test whether and which of the recorded neurons fit their definition of "mirror neurons". However, given the emphasis on the population level, and the consolidated finding of highly different execution and observation responses, I think it could be interesting to apply the same analysis on (at least also) the whole recorded neuronal population, without any preselection-based on a single neuron statistic. Such preselection of mirror neurons could influence the results of EXE-OBS comparisons since all the neurons activated only during EXE or OBS are excluded. Related to this point, the authors could report the total number of recorded neurons per monkey/session, so that also the fraction of neurons fitting their definition of mirror neuron is explicit. 

      We are aware that a number of recent studies from other laboratories already have analyzed the entire population of neurons during execution versus observation, without selectively analyzing neurons active during both execution and observation (Jiang et al., 2020; Albertini et al., 2021). However, our focus lies not in how the entire PM neural population encodes execution versus observation, but in the differential activity of the mirror neuron subpopulation in these two contexts.  Our new Table 2 presents the numbers of mirror neurons (MN), action execution only neurons (AE), action observation only neurons (AO), and neurons not significantly task-related during either execution or observation (NS).  Although we often recorded substantial numbers of AE neurons, very few AO neurons were found in our recordings.  In analyzing the AE subpopulation, we found unexpected differences in canonical correlation alignment between and within the MN and AE neuron populations. In view of the editors’ comments that “…the reviewers provided several specific recommendations of new analyses to include. However, now the paper feels extremely long…”. We have chosen to focus on comparing AE neurons with MNs.  

      Furthermore, the comparison of the dynamics of the classification accuracy in figures 4 and 5, and therefore the underlying assumption of subspaces shift in execution and observation, respectively, reveal substantial similarities between monkeys despite the different contexts, which are clearly greater than the similarities among neural subspaces shifts across task epochs: to me, this suggests that the main result is driven by the selected neural populations in different monkeys/implants rather than by an essential property of the neuronal dynamics valid across animals. Could the author comment on this issue? This could easily explain the "strange" result reported in figure 6 for monkey T. 

      We have taken the general approach of emphasizing findings common across individual animals, but also reporting individual differences.  We have added the following in the Discussion (lines 645 to 654):

      “We did not attempt to classify neurons in our PM MN populations as strictly congruent, broadly congruent, or non-congruent.  Nevertheless, the minimal overlap we found in instantaneous execution and observation subspaces would be consistent with a low degree of congruence in our PM MN populations.  Particularly during one session monkey T was an exception in this regard, showing a considerable degree of overlap between execution and observation subspaces, not unlike the shared subspace found in other studies that identified orthogonal execution and observation subspaces as well (Jiang et al., 2020).  Although our microelectrode arrays were placed in similar cortical locations in the three monkeys, by chance monkey T’s PM MN population may have included a substantial proportion of congruent neurons.”

      Reviewer #2 (Public Review): 

      In this work, the authors set out to identify time-varying subspaces in the premotor cortical activity of monkeys as they executed/observed a reach-grasp-hold movement of 4 different objects. Then, they projected the neural activity to these subspaces and found evidence of shifting subspaces in the time course of a trial in both conditions, executing and observing. These shifting subspaces appear to be distinct in execution and observation trials. However, correlation analysis of neural dynamics reveals the similarity of dynamics in these distinct subspaces. Taken together, Zhao and Schieber speculate that the condition-dependent activity studied here provides a representation of movement that relies on the actor. 

      This work addresses an interesting question. The authors developed a novel approach to identify instantaneous subspaces and decoded the object type from the projected neural dynamics within these subspaces. As interesting as these results might be, I have a few suggestions and questions to improve the manuscript: 

      (1) Repeating the analyses in the paper, e.g., in Fig5, using non-MN units only or the entire population, and demonstrating that the results are specific to MNs would make the whole study much more compelling. 

      We have added analyses of those non-MNs modulated significantly during action execution but not during observation, which we refer to as AE neurons.  The additional findings from these analyses are spread throughout the manuscript:

      Lines 284-293:

      “We also examined the temporal progression of the instantaneous subspace of AE neurons.  As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3).  During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D).  After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset.  As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.” 

      Lines 411-419:

      “During execution trials, classification accuracy for AE populations (Figure 6I-L) showed a time course quite similar to that for MN populations, though amplitudes were lower overall, most likely because of the smaller population sizes. During observation, AE populations showed only low-amplitude, short-lived peaks of classification accuracy around times I, G, M, and H (Figure 6 – figure supplement 1).  Given that individual AE neurons showed no statistically significant modulation during observation trials, even these small peaks might not have been expected.  Previous studies have indicated, however, that neurons not individually related to task events nevertheless may contribute to a population response (Shenoy et al., 2013; Cunningham and Yu, 2014; Gallego et al., 2017; Jiang et al., 2020).”

      Lines 495-508:

      “Although MNs are known to be present in considerable numbers in both the primary motor cortex and premotor cortex (see Introduction), most studies of movement-related cortical activity in these areas make no distinction between neurons with activity only during action execution (AE neurons) and those with activity during both execution and observation (MNs).  This reflects an underlying assumption that during action execution, mirror neurons function in parallel with AE neurons, differing only during observation.  We therefore tested the hypothesis that MN and AE neuron execution trajectory segments from the same session would align well.  Figure 8C (blue) shows the mean CCs between MN and AE execution trajectory segments across 8 alignments (MN/AE; 2 R, 3 T, 3 F), which reached the highest values for the Hold segments .  All three of these coefficients were substantially lower than those for the MN execution vs. observation alignments given above.  Surprisingly, the alignment of AE neuron execution trajectory segments with those of the simultaneously recorded MN population was weaker than the alignment of MN trajectories during execution vs. observation.

      Did these differences in MN:1/2, MN:E/O, and MN/AE alignment result from consistent differences in their respective patterns of co-modulation, or from of greater trial-by-trial variability in the patterns of co-modulation among MNs during observation than during execution, and still greater variability among AE neurons during execution?  The bootstrapping approach we used for CCA (see Methods) enabled us to evaluate the consistency of relationships among trajectory segments across repeated samplings of trials recorded from the same neuron population in the same session and in the same context (execution or observation).  We therefore performed 500 iterations of CCA between two different random samples of MN execution (MN:E/E), MN  observation (MN:O/O), or AE execution (AE:E/E) trajectory segments from a given session (2 R, 3 T, 3 F). This within-group alignment of MN execution trajectory segments from the same session (Figure 8D, MN:E/E, gray, Hold: () was as strong as between session alignment (Figure 8C, MN/1:2, black).  But within-group alignment of MN observation trajectory segments (Figure 8D, MN:O/O, orange, Hold: () was lower than that found with MN execution segments (Figure 8C, MN:E/O, red, .  Likewise, within-group alignment of AE neuron trajectory segments (Figure 8D, AE:E/E, light blue, Hold: () was lower than their alignment with MN execution segments (Figure 8C, MN/AE, blue, Hold: ().  Whereas MN execution trajectories were relatively consistent within sessions, MN observation trajectories and AE execution trajectories were less so.”

      And in the Discussion we now suggest (lines 682 to 698):

      “Based on the assumption that AE neurons and MNs function as a homogenous neuron population during action execution, we had expected AE and MN execution trajectory segments to align closely.  During execution trials, the progression of instantaneous condition-dependent subspaces and of classification accuracy in AE populations was quite similar to that in MN populations.  We were surprised to find, therefore, that alignment between execution trajectory segments from AE populations and from the simultaneously recorded MN populations was even lower than alignment between MN execution and observation segments (Figure 8C, blue versus red).  Moreover, whereas within-group alignment of MN execution trajectory segments was high, within-group alignment of AE neuron execution trajectory segments was low (Figure 8D, gray versus light blue).  These findings indicate that the predominant patterns of co-modulation among MNs during execution are quite consistent within sessions, but the patterns of comodulation among AE neurons are considerably more variable.  Together with our previous finding that modulation of MNs leads that of non-mirror neurons in time, both at the single neuron level and at the population level (Mazurek and Schieber, 2019), this difference in consistency versus variability leads us to speculate that during action execution, while MNs carry a consistent forward model of the intended movement, AE neurons carry more variable feedback information.”

      (2) The method presented here is similar and perhaps related to principal angles (https://doi.org/10.2307/2005662). It would be interesting to confirm these results with principal angles. For instance, instead of using the decoding performance as a proxy for shifting subspaces, principal angles could directly quantify the 'shift' (similar to Gallego et al, Nat Comm, 2018). 

      Point taken.  We now have calculated the principal angles as a function of time and present them as a new section of the Results including new figure 4 (lines 237 to 293). 

      “Instantaneous subspaces shift progressively during both execution and observation 

      We identified an instantaneous subspace at each one millisecond time step of RGM trials.  At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods).  Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation.  To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H).  This process was repeated 10 times with replacement to assess the variability of the principal angles.  The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.  

      Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course.  In the Results we therefore illustrate only the first (i.e. smallest) principal angle.  Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation.  As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°.  Of greater interest are the slower changes in the first principal angle in between these four time points.  Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).

      Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials.  Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H.  The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution.  Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.

      We also examined the temporal progression of the instantaneous subspace of AE neurons.  As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3).  During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D).  After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset.  As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”

      The related Methods are now described in subsection “Subspace Comparisons—Principal Angles”

      Relatedly, why the decoding of the 'object type' is used to establish the progressive shifting of the subspaces? I would be interested to see the authors' argument. 

      We have clarified the reason for our decoding analysis as follows (lines 295 to 297):

      “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity.  The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.”

      And… (lines 332 to 348):

      “Decodable information changes progressively during both execution and observation 

      As RGM trials proceeded in time, the condition-dependent neural activity of the PM MN population thus changed in two ways.  First, the instantaneous condition-dependent subspace shifted, indicating that the patterns of firing-rate co-modulation among neurons representing the four different RGM movements changed progressively, both during execution and during observation.  Second, as firing rates generally increased, the neural trajectories representing the four RGM movements became progressively more separated, more so during execution than during observation. 

      To evaluate the combined effects of these two progressive changes, we clipped 100 ms single-trial trajectory segments beginning at times I, G, M, or H, and projected these trajectory segments from individual trials into the instantaneous 3D subspaces at 50 ms time steps.  At each of these time steps, we trained a separate LSTM decoder to classify individual trials according to which of the four objects was involved in that trial.  We expected that the trajectory segments would be classified most accurately when projected into instantaneous subspaces near the time at which the trajectory segments were clipped.  At other times we reasoned that classification accuracy would depend both on the similarity of the current instantaneous subspace to that found at the clip time as evaluated by the principal angle (Figure 4), and on the separation of the four trajectories at the clip time (Figure 5).”

      The object type should be much more decodable during movement or hold, than instruction, which is probably why the chance-level decoding performance (horizontal lines) is twice the instruction segment for the movement segment. 

      Indeed, the object type is more decodable during the movement and hold than during instruction or delay epochs.

      (3) Why aren't execution and observation subspaces compared together directly? Especially given that there are both types of trials in the same session with the same recorded population of neurons. Using instantaneous subspaces, or the principal angles between manifolds during exec trials vs obs trials.

      Point taken.  We now have added comparison of the execution and observation subspaces using the principal angles between instantaneous subspaces (lines 421 to 436):

      “Do PM mirror neurons progress through the same subspaces during execution and observation?

      Having found that PM mirror neuron populations show similar progressive shifts in their instantaneous neural subspace during execution and observation of RGM trials, as well as similar changes in decodable information, we then asked whether this progression passes through similar subspaces during execution and observation.  To address this question, we first calculated the principal angles between the instantaneous mirror-neuron execution subspace at selected times I, G, M, or H and the entire time series of instantaneous mirror-neuron observation subspaces (Figure 7A-D).  Conversely, we calculated the principal angles between the instantaneous observation subspaces at selected times I, G, M, or H and the entire time series of instantaneous execution subspaces (Figure 7E-H).  Although the principal angles were slightly smaller than might be expected from chance alone, indicating some minimal overlap of execution and observation instantaneous subspaces, the instantaneous observation subspaces did not show any progressive shift toward the I, G, M, or H execution subspace (Figure 7A-D), nor did the instantaneous execution subspaces shift toward the I, G, M, or H observation subspace (Figure 7E-H).”

      (4) The definition of the instantaneous subspaces is a critical point in the manuscript. I think it is slightly unclear: based on the Methods section #715-722 and the main text #173-#181, I gather that the subspaces are based on trial averaged neural activity for each of the 4 objects, separately. So for each object and per timepoint, a vector of size (1, n) -n neurons- is reduced to a vector of (1, 2 or 3 -the main text says 2, methods say 3-) which would be a single point in the low-d space. Is this description accurate? This should be clarified in the manuscript.  

      In the Methods, we now have clarified (lines 849 to 859):

      “Instantaneous subspace identification 

      Instantaneous neural subspaces were identified at 1 ms intervals.  At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step.  PCA then was performed on these four points.  Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace.  Each instantaneous 3D subspace can be considered a filter described by a matrix, W, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, W_i, forming a time series of filters (Figure 1B).”

      (5) Isn't the process of projecting segments of neural dynamics and comparing the results equivalent to comparing the projection matrices in the first place? If so, that might have been a more intuitive avenue to follow. 

      As described in more detail in our responses to item 2, above, we have added analyses of principal angles to compare the projection matrices directly.  However, “the process of projecting segments of neural dynamics and comparing the results” incorporates the progressively increasing separation of the trajectory segments and hence is not simply equivalent to comparing the subspaces with principal angles.

      (6) Lines #385-#389: This process seems unnecessarily complicated. Also, given the number of trials available, this sometimes doesn't make sense. E.g. Monkey R exec has only 8 trials of one of the objects, so bootstrapping 20 trials 500 times would be spurious. Why not, as per Gallego et al, Nat Neurosci 2020 and Safaie et al, Nat 2023 which are cited, concatenate the trials? 

      In the Methods we now clarify that (lines 953 to 969):

      “To provide an estimate of variability, we used a bootstrapping approach to CCA.  From each of two data sets we randomly selected 20 trials involving each target object (totaling 80 trials) with replacement, clipped trajectory segments from each of those trials for 100 ms (100 points at 1 ms intervals) after the instruction onset, go cue, movement onset, or beginning of the final hold, and performed CCA as described above. (Note that because session 1 from monkey R included only 8 button trials (Table 1), we excluded this session from CCA analyses.)  With 500 iterations, we obtained a distribution of the correlation coefficients (CCs) between the two data sets in each of the three dimensions of the aligned subspace, which permitted statistical comparisons. We then used this approach to evaluate alignment of latent dynamics between different sessions (e.g. execution trials on two different days), between different contexts (e.g. execution and observation), and between different neural populations (e.g. MNs and AE neurons).This bootstrapping approach further enabled us to assess the consistency of relationships among neural trajectories within a given group—i.e. the same neural population during the same context (execution or observation) in the same session—by drawing two separate random samples of 80 trials from the same population, context, and session (Figure 8D), which would not have been possible had we concatenated trajectory segments from all trials in the session (Gallego et al., 2020; Safaie et al., 2023).”

      And we report results that could not have been obtained by concatenating all the trials (lines 522 to 541):

      “Did these differences in MN:1/2, MN:E/O, and MN/AE alignment result from consistent differences in their respective patterns of co-modulation, or from of greater trial-by-trial variability in the patterns of co-modulation among MNs during observation than during execution, and still greater variability among AE neurons during execution?  The bootstrapping approach we used for CCA (see Methods) enabled us to evaluate the consistency of relationships among trajectory segments across repeated samplings of trials recorded from the same neuron population in the same session and in the same context (execution or observation).  We therefore performed 500 iterations of CCA between two different random samples of MN execution (MN:E/E), MN  observation (MN:O/O), or AE execution (AE:E/E) trajectory segments from a given session (2 R, 3 T, 3 F). This within-group alignment of MN execution trajectory segments from the same session (Figure 8D, MN:E/E, gray, Hold: () was as strong as between session alignment (Figure 8C, MN/1:2, black).  But within-group alignment of MN observation trajectory segments (Figure 8D, MN:O/O, orange, Hold: () was lower than that found with MN execution segments (Figure 8C, MN:E/O, red, .  Likewise, within-group alignment of AE neuron trajectory segments (Figure 8D, AE:E/E, light blue, Hold: () was lower than their alignment with MN execution segments (Figure 8C, MN/AE, blue, Hold: ().  Whereas MN execution trajectories were relatively consistent within sessions, MN observation trajectories and AE execution trajectories were less so.”

      Because only 8 button trials were available in Session 1 from Monkey R, we excluded this session from the CCA analyses.  Sessions 2 and 3 from monkey R provide valid results, however.  For example, we now state explicitly (lines 468 to 472):

      “As a positive control, we first aligned MN execution trajectory segments from two different sessions in the same monkey (which we abbreviate as MN:1/2).  The 2 sessions in monkey R provided only 1 possible comparison, but the 3 sessions in monkeys T and F each provided 3 comparisons.  For each of these 7 comparisons, we found the bootstrapped average of CC1, of CC2, and of CC3.”

      (7) Related to the CCA analysis, what behavioural epoch has been used here, the same as the previous analyses, i.e. 100ms? how many datapoint is that in time? Given that CCA is essentially a correlation value, too few datapoints make it rather meaningless. If that's the case, I encourage using, let's say, one window combined of I and G until movement, and one window of movement and hold, such that they are both easier to interpret. Indeed low values of exec-exec in CC2 compared to Gallego et al, Nat Neurosci, 2020 might be a sign of a methodological error. 

      In the Methods described for CCA, we now have clarified that (lines 953 to 961):

      “To provide an estimate of variability, we used a bootstrapping approach to CCA.  From each of two data sets we randomly selected 20 trials involving each target object (totaling 80 trials) with replacement, clipped trajectory segments from each of those trials for 100 ms (100 points at 1 ms intervals) after the instruction onset, go cue, movement onset, or beginning of the final hold, and performed CCA as described above. (Note that because session 1 from monkey R included only 8 button trials (Table 1), we excluded this session from CCA analyses.)  With 500 iterations, we obtained a distribution of the correlation coefficients (CCs) between the two data sets in each of the three dimensions of the aligned subspace, which permitted statistical comparisons.”

      And in the Results we report that (lines 475 to 480):

      “The highest values for MN:1/2 correlations were obtained for the Movement trajectory segments .  These values indicate consistent relationships among the Movement neural trajectory segments representing the four different RGM movements from session to session, as would have been expected from previous studies (Gallego et al., 2018; Gallego et al., 2020; Safaie et al., 2023).”

      Reviewer #3 (Public Review): 

      Summary: 

      In their study, Zhao et al. investigated the population activity of mirror neurons (MNs) in the premotor cortex of monkeys either executing or observing a task consisting of reaching to, grasping, and manipulating various objects. The authors proposed an innovative method for analyzing the population activity of MNs during both execution and observation trials. This method enabled to isolate the condition-dependent variance in neural data and to study its temporal evolution over the course of single trials. The method proposed by the authors consists of building a time series of "instantaneous" subspaces with single time step resolution, rather than a single subspace spanning the entire task duration. As these subspaces are computed on an instant time basis, projecting neural activity from a given task time into them results in latent trajectories that capture condition-dependent variance while minimizing the condition-independent one. The authors then analyzed the time evolution of these instantaneous subspaces and revealed that a progressive shift is present in subspaces of both execution and observation trials, with slower shifts during the grasping and manipulating phases compared to the initial preparation phase. Finally, they compared the instantaneous subspaces between execution and observation trials and observed that neural population activity did not traverse the same subspaces in these two conditions. However, they showed that these distinct neural representations can be aligned with Canonical Correlation Analysis, indicating dynamic similarities of neural data when executing and observing the task. The authors speculated that such similarities might facilitate the nervous system's ability to recognize actions performed by oneself or another individual. 

      Strengths: 

      Unlike other areas of the brain, the analysis of neural population dynamics of premotor cortex MNs is not well established. Furthermore, analyzing population activity recorded during non-trivial motor actions, distinct from the commonly used reaching tasks, serves as a valuable contribution to computational neuroscience. This study holds particular significance as it bridges both domains, shedding light on the temporal evolution of the shift in neural states when executing and observing actions. The results are moderately robust, and the proposed analytical method could potentially be used in other neuroscience contexts. 

      Weaknesses: 

      While the overall clarity is satisfactory, the paper falls short in providing a clear description of the mathematical formulas for the different methods used in the study. 

      We have added the various mathematical formulas in the Methods.

      For Cumulative Separation (lines 864 to 871): 

      “To quantify the separation between the four trial-averaged trajectory segments involving the different objects in a given instantaneous subspace, we then calculated their cumulative separation (𝐶𝑆) as: 

      where d<sub>ij</sub>(t) is the 3-dimensional Euclidean distance between the i<sup>th</sup> and j<sup>th</sup> trajectories at time point 𝑡. We summed the 6 pairwise distances between the 4 trajectory segments across time points and normalized by the number of time points, 𝑇 = 100.  The larger the 𝐶𝑆, the greater the separation of the trajectory segments.”

      For principal angles (lines 877 to 884): 

      For example, given the 3-dimensional instantaneous subspace at the time of movement onset, W<sub>M</sub> and at any other time, W<sub>i</sub>, we calculated their 3x3 inner product matrix and performed singular value decomposition to obtain:

      where 3x3 matrices P<sub>M</sub> and W<sub>P</sub> define new manifold directions which successively minimize the 3 principal angles specific to the two subspaces being compared. The elements of diagonal matrix 𝐶 then are the ranked cosines of the principal angles, 𝜃𝑖 , ordered from smallest to largest: 

      For CCA (lines 945 to 952): 

      “CCA was performed as follows: The original latent dynamics, L<sub>A</sub> and L<sub>B</sub>, first were transformed and decomposed as and .  The first m = 3 column vectors of each 𝑄𝑖 provide an orthonormal basis for the column vectors of (where 𝑖 = 𝐴, 𝐵).  Singular value decomposition on the inner product matrix of  𝑄𝐴 and 𝑄𝐵 then gives , and new manifold directions that maximize pairwise correlations are provided by and .  We then projected the original latent dynamics into the new, common subspace: .  Pairwise correlation coefficients between the aligned latent dynamics sorted from largest to smallest then are given by the elements of the diagonal matrix .”

      Moreover, it was not immediately clear why the authors did not consider a (relatively) straightforward metric to quantity the progressive shift of the instantaneous subspaces, such as computing the angle between consecutive subspaces, rather than choosing a (in my opinion) more cumbersome metric based on classification of trajectory segments representing different movements. 

      Point taken.  We now have calculated the principal angles as a function of time and present them as a new section of the Results including new figure 4 (lines 237 to 293). 

      “Instantaneous subspaces shift progressively during both execution and observation 

      We identified an instantaneous subspace at each one millisecond time step of RGM trials.  At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods).  Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation.  To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H).  This process was repeated 10 times with replacement to assess the variability of the principal angles.  The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.  

      Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course.  In the Results we therefore illustrate only the first (i.e. smallest) principal angle.  Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation.  As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°.  Of greater interest are the slower changes in the first principal angle in between these four time points.  Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).

      Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials.  Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H.  The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution.  Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.

      We also examined the temporal progression of the instantaneous subspace of AE neurons.  As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3).  During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D).  After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset.  As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”

      The related Methods are now described in subsection “Subspace Comparisons—Principal Angles”

      Specific comments: 

      In the methods, it is stated that instantaneous subspaces are found with 3 PCs. Why does it say 2 here?  

      We now have clarified. (lines 295 to 310):

      “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity.  The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.  To illustrate this increasing separation, we clipped 100 ms segments of high-dimensional MN population trial-averaged trajectories beginning at times I, G, M, and H, for trials involving each of the four objects.  We then projected the set of four object-specific trajectory segments clipped at each time into each of the four instantaneous 3D subspaces at times I, G, M, and H.  This process was repeated separately for execution trials and for observation trials.  

      For visualization, we projected these trial-averaged trajectory segments from an example session into the PC1 vs PC2 planes (which consistently captured > 70% of the variance) of the I, G, M, or H instantaneous 3D subspaces.  In Figure 5, the trajectory segments for each of the four objects (sphere – purple, button – cyan, coaxial cylinder – magenta, perpendicular cylinder – yellow) sampled at different times (rows) have been projected into each of the four instantaneous subspaces defined at different times (columns).  Rather than appearing knotted as in Figure 3, these short trajectory segments are distinct when projected into each instantaneous subspace.”

      And in the legend for Figure 5 we now clarify that:

      “Each set of these four segments then was projected into the PC1 vs PC2 plane of the instantaneous 3D subspace present at four different times (columns: I, G, M, H).”

      Another doubt on how instantaneous subspaces are computed: in the methods you state that you apply PCA on trial-averaged activity at each 50ms time step. From the next sentence, I gather that you apply PCA on an Nx4 data matrix (N being the number of neurons, and 4 being the trial-averaged activity of the four objects) every 50 ms. Is this right? It would help to explicitly specify the dimensions of the data matrix that goes into PCA computation. 

      We apologize for this confusion.  Although the LSTM decoding was performed in 50 ms time steps, the instantaneous subspaces were calculated at 1 ms intervals. In the Methods we now have clarified (lines 849 to 759):

      “Instantaneous subspace identification 

      Instantaneous neural subspaces were identified at 1 ms intervals.  At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step.  PCA then was performed on these four points.  Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace.  Each instantaneous 3D subspace can be considered a filter described by a matrix, W, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, W_i, forming a time series of filters (Figure 1B).”

      It would help to include some equations in the methods section related to the LSTM decoding. Just to make sure I understood correctly: after having identified the instantaneous subspaces (every 50 ms), you projected the Instruction, Go, Movement, and Holding segments from individual trials (each containing 100 samples, since they are sampled from a 100ms window) onto each instantaneous subspace. So you have four trajectories for each subspace. In the methods, it is stated that a single LSTM classifier is trained for each subspace. Do you also have a separate classifier for each trajectory segment? What is used as input to the classifier? Each trajectory segment should be a 100x3 matrix once projected in an instantaneous subspace. Is that what (each of) the LSTMs take as input? And lastly, what is the LSTM trained to predict exactly? Just a label indicating the type of object that was manipulated in that trial? I apologize if I overlooked any detail, but I believe a clearer explanation of the LSTM, preferably with mathematical formulas, would greatly help readers understand this section. 

      LSTM decoding is not readily described with a set of equations.  However, we have expanded our description to provide the information requested (lines 910 to 937):

      “Decodable information—LSTM

      As illustrated schematically in Figure 1B, the same segment of high-dimensional neural activity projected into different instantaneous subspaces can generate low-dimensional trajectories of varying separation.  The degree of separation among the projected trajectory segments will depend, not only on their separation at the time when the segments were clipped, but also on the similarity of the subspaces into which the trajectory segments are projected.  To quantify the combined effects of trajectory separation and projection into different subspaces, we projected high-dimensional neural trajectory segments (each including 100 points at 1 ms intervals) from successful trials involving each of the four different target objects into time series of 3-dimensional instantaneous subspaces at 50 ms intervals. In each of these instantaneous subspaces, the neural trajectory segment from each trial thus became a 100 point x 3 dimensional matrix.  For each instantaneous subspace in the time series, we then trained a separate long short-term memory (LSTM, (Hochreiter and Schmidhuber, 1997)) classifier to attribute each of the neural trajectories from individual trials to one of the four target object labels: sphere, button, coaxial cylinder, or perpendicular cylinder. Using MATLAB’s Deep Learning Toolbox, each LSTM classifier had 3 inputs (instantaneous subspace dimensions), 20 hidden units in the bidirectional LSTM layer, and a softmax layer preceding the classification layer which had 4 output classes (target objects). The total number of successful trials available in each session for each object is given in Table 1.  To avoid bias based on the total number of successful trials, we used the minimum number of successful trials across the four objects in each session, selecting that number from the total available randomly with replacement. Each LSTM classifier was trained with MATLAB’s adaptive moment estimation (Adam) optimizer on 40% of the selected trials, and the remaining 60% were decoded by the trained classifier.  The success of this decoding was used as an estimate of classification accuracy from 0 (no correct classifications) to 1 (100% correct classifications). This process was repeated 10 times and the mean ± standard deviation across the 10 folds was reported as the classification accuracy at that time.  Classification accuracy of trials projected into each instantaneous subspace at 50 ms intervals was plotted as a function of trial time.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Here are some more specific comments. 

      Abstract. Line 41. "same action" is not justified, there is plenty of evidence showing that the action does not need to be the same (or it has not even to be an action), rephrasing or substituting with "similar" is necessary, especially in the light of the subsequent sentence (which is totally correct). 

      Thank you for pointing this out.  As recommended, we have changed “same” to “similar” (lines 40 to 41):  

      “Many neurons in the premotor cortex show firing rate modulation whether the subject performs an action or observes another individual performing a similar action.”

      Introduction. A relevant, missing reference in the otherwise exhaustive introduction is Albertini et al. 2021 J Neurophysiol, showing that neural dynamics and similarities between biological and nonbiological movements in premotor areas are greater than those between the same executed and observed movements. 

      Thank you for pointing out this important finding.  After revision, we felt it was now cited most appropriately in the revised Discussion as follows (lines 730 to 736):

      “Alternatively, given that observation of another individual can be considered a form of social interaction, PM MN population activity during action observation, rather than representing movements made by another individual similar to one’s own movements, instead may represent different movements one might execute oneself in response to those made by another individual (Ninomiya et al., 2020; Bonini et al., 2022; Ferrucci et al., 2022; Pomper et al., 2023). This possibility is consistent with the finding that the neural dynamics of PM MN populations are more similar during observation of biological versus non-biological movements than during execution versus observation (Albertini et al., 2021)."

      In Line 85, the sentence about Papadourakis and Raos 2019 has to be generalized to PMv, as they show that the proportion of congruent MNs is at chance in both PMd and PMv. 

      Point taken.  We have rephrased this sentence as follows (lines 88 to 89): 

      “And in both PMv and PMd, the proportion of congruent neurons may not be different from that expected by chance alone (Papadourakis and Raos, 2019).”

      Lines 122-132. The initial sentence was unclear to me at first glance. I was wondering how subspaces could be "at other times over the course of the trial" if they are instantaneous. I could imagine that the subspaces referred to corresponding behavioral intervals of execution and observation conditions (and this may be what they will later call "condition dependent" activity), but nevertheless, they could hardly be understood as "instantaneous". I grasped the author's idea only when reading the results, with the statement "no-time dependent variance is captured". The idea is to take a static snapshot of the evolution of population activity at each checkpoint (i.e. I, G, M, and H): I suggest clarifying this point immediately in the introduction to improve readability. 

      We have clarified this point by adding two paragraphs to the Introduction first defining condition independent versus condition-dependent variance and then explaining the use of instantaneous subspaces (lines 125 to 153):

      “A relevant but often overlooked aspect of such dynamics in neuron populations active during both execution and observation has to do with the distinction between condition independent and condition-dependent variation in neuronal activity (Kaufman et al., 2016; Rouse and Schieber, 2018).  The variance in neural activity averaged across all the conditions in a given task context is condition-independent.  For example, in an 8-direction center-out reaching task, averaging a unit’s firing rate as a function of time across all 8 directions may show an initially low firing rate that increases prior to movement onset, peaks during the movement, and then declines during the final hold, irrespective of the movement direction.  Subtracting this condition-independent activity from the unit’s firing rate during each trial gives the remaining variance, and averaging separately across trials in each of the 8 directions then averages out noise variance, leaving the condition-dependent variance that represents the unit’s modulation among the 8 directions (conditions). Alternatively, condition-independent, condition dependent, and noise variance can be partitioned through demixed principal component analysis (Kobak et al., 2016; Gallego et al., 2018).  The extent to which neural dynamics occur in a subspace shared by execution and observation versus subspaces unique to execution or observation may differ for the condition-independent versus condition-dependent partitions of neural activity.  Here, we tested the hypothesis that the condition-dependent activity of PM mirror neuron populations progresses through distinct subspaces during execution versus observation, which would indicate distinct patterns of co-modulation amongst mirror neurons during execution versus observation.

      Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach.  Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements.  Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.”

      Results. 

      Regarding the execution-observation alignment, as explained in my initial comment, it does not sound convincing. Applying a CCA to align EXE and OBS activities (which the authors had just shown being essentially not aligned), even separately for each epoch segment (line 396), seems to be a trick to show that they nonetheless share some similarities. Couldn't this be applied to any pairs of differently encoded conditions to create some sort of artificial link between them? Is the similarity in the neural data or rather in the method used to realign them? 

      CCA would not align arbitrary sets of neural data.  The similarity is in the data, not in the method.  For example, in an 8-direction center-out task, the neural representation of movement to the 45° target is between the neural representations of the 0° and the 90° targets.  If the same is true in a second data set, then CCA will give high correlation coefficients.  But if in the second data set the neural representation of the 45° target is between the 135° and 180° targets, CCA will give low correlation coefficients. 

      In the end, what does this tell us about the brain? 

      In the Introduction we now clarify that (lines 166 to 170):

      “Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population.”

      And in the Results (lines 449 to 455):

      “For example, the trajectories of PMd+M1 neuron populations recorded from two different monkeys during center-out reaching movements could be aligned well (Safaie et al., 2023).  CCA showed, for example, that in both brains the neural trajectory for the movement to the target at 0° was closer to the trajectory for movement to the target at 45° than to the trajectory for the movement to the target at 180°. Relationships among these latent dynamic representations of the eight movements thus were similar even though the neural populations were recorded from two different monkeys.”

      In relation to Figure 8 (lines 461 to 467)

      “But when both sets of trajectory segments are projected into another common subspace identified with CCA, as shown in Figure 8B, a similar relationship among the neural representations of the four movements during execution and observation is revealed.  In both behavioral contexts the neural representation of movements involving the sphere (purple) is now closest to the representation of movements involving the coaxial cylinder (magenta) and farthest from that of movements involving the button (cyan). The two sets of trajectory segments are more or less “aligned.”

      And in the Discussion (lines 665 to 674):

      “Corresponding neural representations of action execution and observation during task epochs with higher neural firing rates have been described previously in PMd MNs and in PMv MNs using representational similarity analysis RSA (Papadourakis and Raos, 2019).  And during force production in eight different directions, neural trajectories of PMd neurons draw similar “clocks” during execution, cooperative execution, and passive observation (Pezzulo et al., 2022).  Likewise in the present study, despite execution and observation trajectories progressing through largely distinct subspaces, in all three monkeys execution and observation trajectory segments showed some degree of alignment, particularly the Movement and Hold segments (Figure 12A), indicating similar relationships among the latent dynamic representations of the four RGM movements during execution and observation.”

      Concerning the discussion, I would like to reconsider it after having seen the authors' response to the comments above and to my general concern about the relevance of the findings from the neurophysiological point of view. 

      Certainly, please do.

      Reviewer #2 (Recommendations For The Authors): 

      Here are a few issues that I want to bring to the authors' attention (in no particular order): 

      • I am not clear on what is meant by "condition-dependent". Is the condition exec vs obs, or the object types? 

      In the Introduction, we now clarify (lines 125 to 144): 

      “A relevant but often overlooked aspect of such dynamics in neuron populations active during both execution and observation has to do with the distinction between condition independent and condition-dependent variation in neuronal activity (Kaufman et al., 2016; Rouse and Schieber, 2018).  The variance in neural activity averaged across all the conditions in a given task context is condition-independent.  For example, in an 8-direction center-out reaching task, averaging a unit’s firing rate as a function of time across all 8 directions may show an initially low firing rate that increases prior to movement onset, peaks during the movement, and then declines during the final hold, irrespective of the movement direction.  Subtracting this condition-independent activity from the unit’s firing rate during each trial gives the remaining variance, and averaging separately across trials in each of the 8 directions then averages out noise variance, leaving the condition-dependent variance that represents the unit’s modulation among the 8 directions (conditions). Alternatively, condition-independent, condition dependent, and noise variance can be partitioned through demixed principal component analysis (Kobak et al., 2016; Gallego et al., 2018).  The extent to which neural dynamics occur in a subspace shared by execution and observation versus subspaces unique to execution or observation may differ for the condition-independent versus condition-dependent partitions of neural activity.  Here, we tested the hypothesis that the condition-dependent activity of PM mirror neuron populations progresses through distinct subspaces during execution versus observation, which would indicate distinct patterns of co-modulation amongst mirror neurons during execution versus observation.”

      And in the Results, we have added a new Figure 3 to illustrate condition-independent versus conditiondependent activity using an example from the present data sets (lines 208 to 236): 

      “Condition-dependent versus condition-independent neural activity in PM MNs

      Whereas a large fraction of condition-dependent neural variance during reaching movements without grasping can be captured in a two-dimensional subspace (Churchland et al., 2012; Ames et al., 2014), condition-dependent activity in movements that involve grasping is more complex (Suresh et al., 2020). In part, this may reflect the greater complexity of controlling the 24 degrees of freedom in the hand and wrist as compared to the 4 degrees of freedom in the elbow and shoulder (Sobinov and Bensmaia, 2021).  Figure 3 illustrates this complexity in a PM MN population during the present RGM movements.  Here, PCA was performed on the activity of a PM MN population across the entire time course of execution trials involving all four objects.  The colored traces in Figure 3A show neural trajectories averaged separately across trials involving each of the four objects and then projected into the PC1 vs PC2 plane of the total neural space.  Most of the variance in these four trajectories is comprised of a shared rotational component.  The black trajectory, obtained by averaging trajectories from trials involving all four objects together, represents this condition-independent (i.e. independent of the object involved) activity.  The condition-dependent (i.e. dependent on which object was involved) variation in activity is reflected by the variation in the colored trajectories around the black trajectory.  The condition-dependent portions can be isolated by subtracting the black trajectory from each of the colored trajectories. The resulting four condition dependent trajectories have been projected into the PC1 vs PC2 plane of their own common subspace in Figure 3B.  Rather than exhibiting a simple rotational motif, these trajectories appear knotted. To better understand how these complex, condition-dependent trajectories progress over the time course of RGM trials, we chose to examine time series of instantaneous subspaces.”

      While there is an emphasis on the higher complexity of manipulating objects compared to just reaching movements in the Abstract, the majority of the analysis relates to the instruction, movement initiation, and grasp, and there is no specific analyses looking at manipulation and how those presumably more complex dynamics compare to the reaching dynamics, and how they differ from reaching in the mirror neurons. 

      We have clarified that (lines 178 to 187):

      “Because we chose to study relatively naturalistic movements, the reach, grasp, and manipulation components were not performed separately, but rather in a continuous fluid motion during the movement epoch of the task sequence (Figure 2B).  In previous studies involving a version of this task without separate instruction and delay epochs, we have shown that joint kinematics, EMG activity, and neuron activity in the primary motor cortex, all vary throughout the movement epoch in relation to both reach location and object grasped, with location predominating early in the movement epoch and object predominating later (Rouse and Schieber, 2015, 2016a, b).  The present task, however, did not dissociate the reach, the hand shape used to grasp the object, and the manipulation performed on the object.”

      • The analysis in Fig3C,D is interesting, however, in my opinion, requires control. For instance, what would these values look like if you projected the segments to a subspace defined by the activity during the entire length of the trial, or if you projected the activity during intertrials, just to get a sense of how meaningful these values are? 

      This material is now presented in Figure 5 – figure supplement 1.  In the legend to this figure supplement, we have clarified that (lines 327 to 328):

      “CS values, which we use only to characterize the phenomenon of trajectory separation,….”

      • MN is used (#85) before definition (#91). Similar for RGM, I believe. 

      Thanks for catching this problem.  We have now defined these abbreviations at first use as follows:

      In lines 89 to 92:

      “Though many authors apply the term mirror neurons strictly to highly congruent neurons, here we will refer to all neurons modulated during both contexts—execution and observation—as mirror neurons (MNs).”

      And in lines 148 to 150:

      We identified separate time series for execution trials and for observation trials, both involving four different reach-grasp-manipulation (RGM) movements.”

      • I believe in the Intro when presenting the three hypotheses, there is a First, and a Third, but no Second. 

      We have revised this part of the Introduction without numbering our hypotheses as follows (lines 145 to 173):

      “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach.  Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements.  Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.

      We then tested the hypothesis that the condition-dependent subspace shifts progressively over the time course of behavioral trials (Figure 1A) by calculating the principal angles between four selected instantaneous subspaces that occurred at times easily defined in each behavioral trial—instruction onset (I), go cue (G), movement onset (M), and the beginning of the final hold (H)—and every other instantaneous subspace in the time series.  Initial analyses showed that condition-dependent neural trajectories for the four RGM movements tended to separate increasingly over the course of behavioral trials.  We therefore additionally examined the combined effects of i) the progressively shifting subspaces and ii) the increasing trajectory separation, by decoding neural trajectory segments sampled for 100 msec after times I, G, M, and H and projected into the time series of instantaneous subspaces (Figure 1B).

      Finally, we used canonical correlation to ask whether the prevalent patterns of mirror neuron co-modulation showed similar relationships among the four RGM movements during execution and observation (Figure 1C).  Such alignment would indicate that the relationships among the trajectory segments in the execution subspace are similar to the relationships among the trajectory segments in the observation subspace, indicating a corresponding structure in the latent dynamic representations of execution and observation movements by the same PM MN population.  And finally, because we previously have found that during action execution the activity of PM mirror neurons tends to lead that of non-mirror neurons which are active only during action execution (AE neurons) (Mazurek and Schieber, 2019), we performed parallel analyses of the instantaneous state space of PM AE neurons.”

      • The use of the term 'instantaneous subspaces' in the abstract confused me initially, as I wasn't sure what it meant. It might be a good idea to define or rephrase it. 

      In the Abstract we now state (lines 51 to 52):

      “Rather than following neural trajectories in subspaces that contain their entire time course, we identified time series of instantaneous subspaces …”

      And in the Introduction, we have clarified (lines 145 to 153):

      “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach.  Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements.  Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.”

      And in the Methods (lines 849 to 859):

      “Instantaneous subspace identification 

      Instantaneous neural subspaces were identified at 1 ms intervals.  At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step.  PCA then was performed on these four points.  Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace.  Each instantaneous 3D subspace can be considered a filter described by a matrix, 𝑊, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, 𝑊𝑖, forming a time series of filters (Figure 1B).”

      Reviewer #3 (Recommendations For The Authors): 

      (1) Page 4, lines 127-131. In the introduction, it was not immediately clear to me what you meant by 'separation' and 'decoding' of the projected neural activity. You do mention that you are separating/decoding trajectory segments representing different movements at the end of this paragraph, but at this point of the paper it was not very clear to me what those different movements were (I only understood that after reading the results section). I suggest briefly expanding on these concepts here. 

      To clarify these points in the Introduction, we have expanded exposition of these concepts (lines 145 to 163):

      “Because of the complexity of condition-dependent neural trajectories for movements involving the hand, we developed a novel approach.  Rather than examining trajectories over the entire time course of behavioral trials, we identified time series of instantaneous PM mirror neuron subspaces covering the time course of behavioral trials. We identified separate time series for execution trials and for observation trials, both involving four different reach-graspmanipulation (RGM) movements.  Given that each subspace in these time series is instantaneous (a snapshot in time), it captures condition-dependent variance in the neural activity among the four RGM movements while minimizing condition-independent (time dependent) variance.

      We then tested the hypothesis that the condition-dependent subspace shifts progressively over the time course of behavioral trials (Figure 1A) by calculating the principal angles between four selected instantaneous subspaces that occurred at times easily defined in each behavioral trial—instruction onset (I), go cue (G), movement onset (M), and the beginning of the final hold (H)—and every other instantaneous subspace in the time series.  Initial analyses showed that condition-dependent neural trajectories for the four RGM movements tended to separate increasingly over the course of behavioral trials.  We therefore additionally examined the combined effects of i) the progressively shifting subspaces and ii) the increasing trajectory separation, by decoding neural trajectory segments sampled for 100 msec after times I, G, M, and H and projected into the time series of instantaneous subspaces (Figure 1B).”

      (2) Page 6, line 175. In the methods, it is stated that instantaneous subspaces are found with 3 PCs. Why does it say 2 here? 

      Thank you for noticing this discrepancy.  In the Methods, we have clarified that the instantaneous subspaces are 3-dimensional (see our reply to the next comment), but in Figure 5 (previously Figure 3), for purposes of visualization, we are projecting trajectory segments into the PC1-PC2 plane (lines 295 to 308):

      “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity.  The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.  To illustrate this increasing separation, we clipped 100 ms segments of high-dimensional MN population trial-averaged trajectories beginning at times I, G, M, and H, for trials involving each of the four objects.  We then projected the set of four object-specific trajectory segments clipped at each time into each of the four instantaneous 3D subspaces at times I, G, M, and H.  This process was repeated separately for execution trials and for observation trials.  

      For visualization, we projected these trial-averaged trajectory segments from an example session into the PC1 vs PC2 planes (which consistently captured > 70% of the variance) of the I, G, M, or H instantaneous 3D subspaces.  In Figure 5, the trajectory segments for each of the four objects (sphere – purple, button – cyan, coaxial cylinder – magenta, perpendicular cylinder – yellow) sampled at different times (rows) have been projected into each of the four instantaneous subspaces defined at different times (columns).”

      And in the legend for Figure 5 we now clarify that:

      “Each set of these four segments then was projected into the PC1 vs PC2 plane of the instantaneous 3D subspace present at four different times (columns: I, G, M, H).”

      Another doubt on how instantaneous subspaces are computed: in the methods you state that you apply PCA on trial-averaged activity at each 50ms time step. From the next sentence, I gather that you apply PCA on an Nx4 data matrix (N being the number of neurons, and 4 being the trial-averaged activity of the four objects) every 50 ms. Is this right? It would help to explicitly specify the dimensions of the data matrix that goes into PCA computation. 

      Thank you for catching an error: The instantaneous subspaces were computed at 1 ms intervals. (It is the LSTM decoding that was done in 50 ms time steps).  We have clarified how the instantaneous subspaces were computed in the Methods (lines 849 to 859):

      “Instantaneous subspace identification 

      Instantaneous neural subspaces were identified at 1 ms intervals.  At each 1 ms time step, the N-dimensional neural firing rates from trials involving the four different objects— sphere, button, coaxial cylinder, and perpendicular cylinder—were averaged separately, providing four points in the N-dimensional space representing the average neural activity for trials involving the different objects at that time step.  PCA then was performed on these four points.  Because three dimensions capture all the variance of four points, three principal component dimensions fully defined each instantaneous subspace.  Each instantaneous 3D subspace can be considered a filter described by a matrix, 𝑊, that can project high-dimensional neural activity into a low-dimensional subspace, with the time series of instantaneous subspaces, 𝑊𝑖, forming a time series of filters (Figure 1B).”

      (3) Page 7, line 210-212. I am not sure if I missed it in the discussion, but have you speculated on why the greatest separation in observation trials was observed during the holding phase while in execution trials during the movement phase? 

      This was a consistent finding, and we therefore point it out as a difference between execution and observation.  Of course, this reflects greater condition-dependent variance in the PM MN population in the movement epoch than in the hold epoch during execution, whereas the reverse is true during observation.  We have no clear speculation as to why this occurs, however.

      (4) Figure 3. Add a legend with color scheme for each object in panels A and B. Also, please specify what metric is represented by the colorbar of panels C, D, E, F (write it down next to the colorbar itself and not just in the caption). 

      This is now Figure 5.  We have added a color legend for A and B.  Panels C, D, E, and F, now have been moved to Figure 5 – figure supplement 1, where we have indicated that the colorbar represents cumulative separation.

      (5) Page 9, line 228. I found the description of this decoding analysis a bit confusing initially (and perhaps still do), this should be clarified. 

      We have clarified our decoding analysis in the Methods (lines 910 to 937):

      “Decodable information—LSTM

      As illustrated schematically in Figure 1B, the same segment of high-dimensional neural activity projected into different instantaneous subspaces can generate low-dimensional trajectories of varying separation.  The degree of separation among the projected trajectory segments will depend, not only on their separation at the time when the segments were clipped, but also on the similarity of the subspaces into which the trajectory segments are projected.  To quantify the combined effects of trajectory separation and projection into different subspaces, we projected high-dimensional neural trajectory segments (each including 100 points at 1 ms intervals) from successful trials involving each of the four different target objects into time series of 3-dimensional instantaneous subspaces at 50 ms intervals. In each of these instantaneous subspaces, the neural trajectory segment from each trial thus became a 100 point x 3 dimensional matrix.  For each instantaneous subspace in the time series, we then trained a separate long short-term memory (LSTM, (Hochreiter and Schmidhuber, 1997)) classifier to attribute each of the neural trajectories from individual trials to one of the four target object labels: sphere, button, coaxial cylinder, or perpendicular cylinder. Using MATLAB’s Deep Learning Toolbox, each LSTM classifier had 3 inputs (instantaneous subspace dimensions), 20 hidden units in the bidirectional LSTM layer, and a softmax layer preceding the classification layer which had 4 output classes (target objects). The total number of successful trials available in each session for each object is given in Table 1.  To avoid bias based on the total number of successful trials, we used the minimum number of successful trials across the four objects in each session, selecting that number from the total available randomly with replacement. Each LSTM classifier was trained with MATLAB’s adaptive moment estimation (Adam) optimizer on 40% of the selected trials, and the remaining 60% were decoded by the trained classifier.  The success of this decoding was used as an estimate of classification accuracy from 0 (no correct classifications) to 1 (100% correct classifications). This process was repeated 10 times and the mean ± standard deviation across the 10 folds was reported as the classification accuracy at that time.  Classification accuracy of trials projected into each instantaneous subspace at 50 ms intervals was plotted as a function of trial time.”

      (6) Page 9, line 268. This might be trivial, but can you speculate on why the accuracy for Instruction segments had a lower peak compared to the rest of the segments? Is it because there is less 'distinct' information embedded in neural data about the type of object manipulated until you are actually reaching toward it or holding it? The latter seems straightforward, but the former not so much. 

      Thank you for asking this question.  We have added the following speculations (lines 592 to 604): 

      “Short bursts of “signal” related discharge are known to occur in a substantial fraction of PMd neurons beginning at latencies of ~60 ms following an instructional stimulus (Weinrich et al., 1984; Cisek and Kalaska, 2004).  Here we found that the instantaneous subspace shifted briefly toward the subspace present at the time of instruction onset (I), similarly during execution and observation.  This brief trough in principal angle (Figure 4A) and the corresponding peak in classification accuracy (Figure 7A) in part may reflect smoothing of firing rates with a 50 ms Gaussian kernel.  We speculate, however, that the early rise of this peak at the time of instruction onset also reflects the anticipatory activity often seen in PMd neurons in expectation of an instruction, which may not be entirely non-specific, but rather may position the neural population to receive one of a limited set of potential instructions (Mauritz and Wise, 1986). We attribute the relatively low amplitude of peak classification accuracy for Instruction trajectory segments to the likely possibility that only the last 40 ms of our 100 ms Instruction segments captured signal related discharge.”

      (7) Figure 8. Shouldn't the plots in panel A resemble those in Figure 3? Here you are projecting the hold trajectory segments into the subspace at time H, which should be the same as in Fig. 3A/B bottom right panel. 

      The previous Figure 8 is now Figure 8 panels A and B, and the previous Figure 3 is now Figure 5.  The data used in these two figures come from two different recording sessions in two different monkeys. The current Figure 8A,B uses data from monkey F, session 2; whereas Figure 5 uses data from monkey T, session 3, which we now state in the legend to each figure, respectively.  Consequently, the relative arrangement of the trajectory segments in the instantaneous subspace at time H differs.  The session used in Figure 8A,B, which we now show in three dimensions, better illustrates how CCA identifies a common subspace in which execution versus observations segments show alignment (Figure 8B) that was not evident in their original subspaces (Figure 8A).

      (8) Page 14, line 369. Are you computing CCA using only 2 components? I thought the subspaces were 3 dimensional. Why not align all three dimensions? 

      We have expanded this analysis to use all three dimensions, as illustrated in Figure 8 above.

      (9) Page 14, line 407. Does this mean that instantaneous subspaces between execution and observation trials are more similar to each other during the Movement and Holding phase? Is this related to the fact that in those moments there is a smaller progressive shift of the subspaces within execution and observation trials? 

      Our new analyses of principal angles (see our reply to your comment 11, below) show that the progressive shifting of the instantaneous subspace continues through the movement and hold epochs.  We now discuss this better alignment of the Movement and Hold trajectory segments as follows (lines 656 to 664):

      “Given the complexity of condition-dependent neural trajectories across the entire time course of RGM trials (Figure 3B), rather than attempting to align entire neural trajectories, we applied canonical correlation to trajectory segments clipped for 100 ms following four well defined behavioral events: Instruction onset, Go cue, Movement onset, and the beginning of the final Hold.  In all cases, alignment was poorest for Instruction segments, somewhat higher for Go segments, and strongest for Movement and Hold segments.  This progressive increase in alignment likely reflects a progressive increase in the difference between average neuron firing rates for trials involving different objects (Figure 6) relative to the trial-by-trial variance in firing rate for a given object.”

      (10) page 15, line 431. Typo, it should be Table 3. 

      We have removed Table 3 which no longer applies.

      (11) A more general observation: did you try to compute another metric to assess the progressive shift of subspaces over time? I am thinking of something like computing the principal angles between consecutive subspaces. If it is true that the shifts happen over time, but it slows down during movement and hold, you should be able to conclude it from principal angles as well. Am I missing something? Is there any reason you went with classification accuracy instead of a metric like this?  

      Point taken.  We now have calculated the principal angles as a function of time and have presented them as a new section of the Results including new Figure 4 and Figure 4 – figure supplement 3 (lines 237 to 293). 

      “Instantaneous subspaces shift progressively during both execution and observation 

      We identified an instantaneous subspace at each one millisecond time step of RGM trials.  At each time step, we applied PCA to the 4 instantaneous neural states (i.e. the 4 points on the neural trajectories representing trials involving the 4 different objects each averaged across 20 trials per object, totaling 80 trials), yielding a 3-dimensional subspace at that time (see Methods).  Note that because these 3-dimensional subspaces are essentially instantaneous, they capture the condition-dependent variation in neural states, but not the common, condition-independent variation.  To examine the temporal progression of these instantaneous subspaces, we then calculated the principal angles between each 80-trial instantaneous subspace and the instantaneous subspaces averaged across all trials at four behavioral time points that could be readily defined across trials, sessions, and monkeys: the onset of the instruction (I), the go cue (G), the movement onset (M), and the beginning of the final hold (H).  This process was repeated 10 times with replacement to assess the variability of the principal angles.  The closer the principal angles are to 0°, the closer the two subspaces are to being identical; the closer to 90°, the closer the two subspaces are to being orthogonal.  

      Figure 4A-D illustrate the temporal progression of the first principal angle of the mirror neuron population in the three sessions (red, green, and blue) from monkey R during execution trials. As illustrated in Figure 4 – figure supplement 1 (see also the related Methods), in each session all three principal angles, each of which could range from 0° to 90°, tended to follow a similar time course.  In the Results we therefore illustrate only the first (i.e. smallest) principal angle.  Solid traces represent the mean across 10-fold cross validation using the 80-trial subsets of all the available trials; shading indicates ±1 standard deviation.  As would be expected, the instantaneous subspace using 80 trials approaches the subspace using all trials at each of the four selected times—I, G, M, and H—indicated by the relatively narrow trough dipping toward 0°.  Of greater interest are the slower changes in the first principal angle in between these four time points.  Figure 4A shows that after instruction onset (I) the instantaneous subspace shifted quickly away from the subspace at time I, indicated by a rapid increase in principal angle to levels not much lower than what might be expected by chance alone (horizontal dashed line). In contrast, throughout the remainder of the instruction and delay epochs (from I to G), Figure 4B and C show that the 80-trial instantaneous subspace shifted gradually and concurrently, not sequentially, toward the all-trial subspaces that would be reached at the end of the delay period (G) and then at the onset of movement (M), indicated by the progressive decreases in principal angle. As shown by Figure 4D, shifting toward the H subspace did not begin until the movement onset (M). To summarize, these changes in principal angles indicate that after shifting briefly toward the subspace present at time the instruction appeared (I), the instantaneous subspace shifted progressively throughout the instruction and delay epochs toward the subspace that would be reached at the time of the go cue (G), then further toward that at the time of movement onset (M), and only thereafter shifted toward the instantaneous subspace that would be present at the time of the hold (H).

      Figure 4E-H show the progression of the first principal angle of the mirror neuron population during observation trials.  Overall, the temporal progression of the MN instantaneous subspace during observation was similar to that found during execution, particularly around times I and H.  The decrease in principal angle relative to the G and M instantaneous subspaces during the delay epoch was less pronounced during observation than during execution.  Nevertheless, these findings support the hypothesis that the condition-dependent subspace of PM MNs shifts progressively over the time course of RGM trials during both execution and observation, as illustrated schematically in Figure 1A.

      We also examined the temporal progression of the instantaneous subspace of AE neurons.  As would be expected given that AE neurons were not modulated significantly during observation trials, in the observation context AE populations had no gradual changes in principal angle (Figure 4 – figure supplement 3).  During execution, however, Figure 4I-L show that the AE populations had a pattern of gradual decrease in principal angle similar to that found in the MN population (Figure 4A-D).  After the instruction onset, the instantaneous subspace shifted quickly away from that present at time I and progressed gradually toward that present at times G and M, only shifting toward that present at time H after movement onset.  As for the PM MN populations, the condition-dependent subspace of the PM AE populations shifted progressively over the time course of execution RGM trials.”

      The related Methods are now described is subsection “Subspace Comparisons—Principal Angles”

      Is there any reason you went with classification accuracy instead of a metric like this? 

      We now point out that (lines 295 to 297):

      “The progressive changes in principal angles do not capture another important aspect of condition-dependent neural activity.  The neural trajectories during trials involving different objects separated increasingly as trials progressed in time.”

      And we further clarify this as follows (lines 331 to 348):

      “Decodable information changes progressively during both execution and observation 

      As RGM trials proceeded in time, the condition-dependent neural activity of the PM MN population thus changed in two ways.  First, the instantaneous condition-dependent subspace shifted, indicating that the patterns of firing-rate co-modulation among neurons representing the four different RGM movements changed progressively, both during execution and during observation.  Second, as firing rates generally increased, the neural trajectories representing the four RGM movements became progressively more separated, more so during execution than during observation. 

      To evaluate the combined effects of these two progressive changes, we clipped 100 ms single-trial trajectory segments beginning at times I, G, M, or H, and projected these trajectory segments from individual trials into the instantaneous 3D subspaces at 50 ms time steps.  At each of these time steps, we trained a separate LSTM decoder to classify individual trials according to which of the four objects was involved in that trial.  We expected that the trajectory segments would be classified most accurately when projected into instantaneous subspaces near the time at which the trajectory segments were clipped.  At other times we reasoned that classification accuracy would depend both on the similarity of the current instantaneous subspace to that found at the clip time as evaluated by the principal angle (Figure 4), and on the separation of the four trajectories at the clip time (Figure 5).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors addressed how long-range interactions between boundary elements are established and influence their function in enhancer specificity. Briefly, the authors placed two different reporters separated by a boundary element. They inserted this construct ectopically ~140 kb away from an endogenous locus that contains the same boundary element. The authors used expression patterns driven by nearby enhancers as an output to determine which enhancers the reporters interact with. They complemented this analysis with 3D DNA contact mapping. The authors found that the orientation of the boundary element determined which enhancers each reporter interacted with. They proposed that the 3D interaction topology, whether being circular or stem configuration, distinguished whether the interaction was cohesin mediated or through an independent mechanism termed pairing.

      Strengths:

      The transgene expression assays are built upon prior knowledge of the enhancer activities. The 3D DNA contacts confirm that transgene expression correlates with the contacts. Using 4 different orientations covers all combinations of the reporter genes and the boundary placement.

      Weaknesses:

      The interpretation of the data as a refusal of loop extrusion playing a role in TAD formation is not warranted, as the authors did not deplete the loop extruders to show that what they measure is independent.

      (1.1) To begin with, our findings do not exclude the possibility that cohesin loop extrusion has some sort of role in the formation or maintenance of TADs in flies or other aspects of chromosome structure.  On the other hand, it clearly is not determinative in defining the end-points of TADs or in generating the resulting topology (stem-loop or circle-loop).  Our main point, which we feel we have established unequivocally, is that it can’t explain many essential features of TADs or chromosome loops (see below) in Drosophila.  This reviewer agrees with this point in their next paragraph (below).  We also think that the loop extrusion model’s general acceptance as THE driving force behind TAD formation in mammals is unwarranted and not fully consistent with the available data, as explained below.

      As to the reviewer’s specific point regarding depletion of loop extruders, we first note that completely eliminating factors encoding cohesin subunits in fly embryos isn’t readily feasible.  As cohesin is essential starting at the beginning of embryonic development, and is maternally deposited, knockdowns/depletions would likely be incomplete and there would always be some remaining activity.  As long as there is some residual activity—and no disruption in TAD formation is observed—this experimental test would be a failure.  In addition, any defects that are observed might arise not from a failure in TAD formation via loop extrusion but rather because the rapid mitotic cycles would be disrupted.  A far better approach would be to deplete/knockdown cohesin subunits in tissue culture cells, as there is no requirement for the cells to undergo embryonic development.  Moreover, since cell division is relatively slow, the depletion would likely eliminate much if not all of the activity before a checkpoint is reached.

      While a drastic depletion of cohesin is not feasible in our model organism, we would draw the reviewer’s attention to an experiment of this type which has already been done in mammalian tissue culture cells by Goel et al. (Goel et al. 2023).  Unlike most Hi-C studies in mammals, the authors used region capture MicroC (RCMC).  In contrast to published genome-wide mammalian MicroC experiments (c.f., (Hsieh et al. 2020; Krietenstein et al. 2020)) which require large bin sizes to visualize mammalian “TADs,” the resolution of the experiments in Goel et al. (Goel et al. 2023) is similar to the resolution in our MicroC experiments (200-400 bp).  A MicroC contact map from Goel et al. shows the Pdm1g locus on chromosome 5 before and after Rad21 depletion.  The contact map visualizes a 250 kb DNA segment, which is only slightly larger than the ~230 kb DNA segment in Fig. 2C in our paper.

      In this experiment, there was a 97% reduction in the amount of Rad21.  However, as can be seen by comparing the contact profiles above and below the diagonal, there is little or no difference in TAD organization after cohesin depletion when individual TADs are visualized with a bin size of 250 bp.  These results would indicate that mammalian TADs do not require cohesin.

      Note also that the weak 45o stripes connecting different TADs (c.f. blue/green arrowheads) are still present after Rad21 depletion.  In the most popular version of the loop extrusion model, cohesin loads at a site(s) somewhere in the TAD-to-be, and then extrudes both strands until it bumps into CTCF roadblocks.  As illustrated in Figure Sup 2, this mechanism generates a vertical stripe originating at the cohesin loading site and extending until cohesin bumps into the left or right roadblock, at which point the stripe transitions into 45o stripe that ends when cohesin bumps into the other roadblock.  While 45o stripes are visible, there is no hint of a vertical stripe.  This suggests that the mechanism for generating stripes, if it is an active mechanism (rather than passive diffusion) may be quite different.  The 45o stripes must be generated by a factor(s) that is anchored to one (blue arrowhead) or both (green arrowhead) boundaries.  In addition, this factor, whatever it is, is not cohesin.  The reason for this is that the 45o stripes are present both before and after Rad21 depletion.  Moreover, if one were to imagine that the stripes represent a process involved in TAD formation, this process does not require cohesin (see Goel et al 2023).

      It is worth noting another observation that is inconsistent with the cohesin loop extrusion/CTCF roadblock model for TAD formation/maintenance.  CTCF is not found at all of the TAD boundaries in this 250 kb DNA region.  This would suggest that there are other DNA binding proteins that have chromosomal architectural functions besides CTCF.  In flies, many of the chromosomal architectural proteins are, like CTCF, polydactyl zinc finger (PZF) proteins (Bonchuk et al. 2021; Bonchuk et al. 2022; Fedotova et al. 2017).  These include Su(Hw), CTCF, Pita, Zipic and CLAMP.  The PZF family in flies is quite large.  There are ~250 different PZF genes, and since only a handful of these have been characterized, it seems likely that additional members of this family will have architectural functions.  Thus far, only one boundary protein, CTCF, has received attention in studies on mammalian chromosome architecture.  As the mammalian genome is much larger and more complicated than the fly genome, it is difficult to believe that CTCF is the sole chromosomal architectural protein in mammals.  In this respect, it is worth noting that there are ~800 members of the PZF family in mammalian genomes (Fedotova et al. 2017).

      Goel et al. (Goel et al. 2023) did observe alterations in the contact profiles after Rad21 depletion when they visualized the Ppm1g region at much lower resolution (bin sizes of 5 kb and 1 kb). The 5 kb bin size visualizes a region of ~1.2 Mb, while the 1 kb bin size visualizes a region that spans ~800 kb.  These large triangular units do not correspond to the individual TADs seen when Goel et al. visualized the Ppm1g locus at 250 bp resolution. 

      Nor do they correspond to TADs in Fig. 2 of our paper.  Instead they represent TAD neighborhoods which, likely consist of 20-30 or more individual TADs.  Consequently the alterations in contact patterns seen after Rad21 depletion are occurring at the level of TAD neighborhoods.  This can be seen by comparing pixel density inside the blue lines before (above the diagonal) and after Rad21 depletion (below the diagonal) (Goel et al 2023).  The more distant contacts between individual TADs within this neighborhood are preferentially reduced by Rad21 depletion (the region below and to the left of the double arrowhead).  By contrast, the TADs themselves are unaffected, as are contacts between individual TADs and their immediate neighbors (see purple and light green asterisk).  The other interesting feature is the loss of contacts between what appears to be partially overlapping neighborhoods.  This loss of neighborhood-toneighborhood contacts can be seen in the region located between the green and blue lines.  The neighborhood that appears to partially overlap the Ppm1g neighborhood is outlined in purple.

      It worth noting that, with the exception of the high resolution experiments in Goel et al., all of the other studies on cohesin (and CTCF) have examined the effects on contact maps within (and between) large neighborhoods (bin sizes >1 kb).  In most cases, these large neighborhoods are likely to be composed of many individual TADs like those seen in Goel et al. and in Fig. 2 of our paper.  We also observe larger neighborhoods in the fly genome, though they do not appear to be as large as those in mammals.  Our experiments do not address what role cohesin might have in facilitating contacts between more distant TADs located within the same neighborhoods, or between TADs in different neighborhoods, or whether loop extrusion is involved.

      We would also note that the Drosophila DNA segment in Fig. 2C contains 35 different genes, while the mammalian DNA segment shown in Fig. 1 has only 9.  Thus, in this part of the fly genome, Pol II genes are more densely packed than in the mammalian DNA segment.  Much of the fly genome is also densely packed, and the size of individual TADs will likely be smaller, on average, than in mammals.  Nevertheless, the MicroC profiles are not all that different.  As is also common in flies, each TAD in the Ppm1g region only encompasses one or two genes.  Note also that there are no volcano triangles with plumes as would be predicted for TADs that have a stem-loop topology.

      In fact, as shown in Author response image 1, the high-resolution contact profile for the Ppm1g region shows a strong resemblance to that observed for the fly Abd-B regulatory domains.  These regulatory domains are part of larger neighborhood that encompasses the abd-A and Abd-B genes and their regulatory domains.

      Author response image 1.

      Abd-B regulatory domains

      As the authors show, the single long DNA loop mediated by cohesin loop extrusion connecting the ectopic and endogenous boundary is clearly inconsistent with the results, therefore the main conclusion of the paper that the 3D topology of the boundary elements a consequence of pairing is strong. However, the loop extrusion and pairing are not mutually exclusive models for the formation of TADs. Loop-extruding cohesin complexes need not make a 140 kb loop, multiple smaller loops could bring together the two boundary elements, which are then held together by pairing proteins that can make circular topologies.

      (1.2) In the pairing model, distant boundaries bump into each other (by random walks or partially constrained walks), and if they are “compatible” they pair with each other, typically in an orientation-dependent manner.  As an alternative, the reviewer argues that cohesin need not make one large 140 kb loop.  Instead it could generate a series of smaller loops (presumably corresponding to the intervening TADs).  These smaller loops would bring homie in the transgene in close proximity to the eve locus so that it could interact with the endogenous homie and nhomie elements in the appropriate orientation, and in this way only one of the reporters would be ultimately activated.

      There are two problems with the idea that cohesin-dependent loop extrusion brings transgene homie into contact with homie/nhomie in the eve locus by generating a series of small loops (TADs).  The first is the very large distances over which specific boundary:boundary pairing interactions can occur.  The second is that boundary:boundary pairing interactions can take place not only in cis, but also in trans.

      We illustrate these points with several examples. 

      Fujioka et al. 2016, Fig 7 shows an experiment in which attP sites located ~2 Mb apart were used to insert two different transgenes, one containing a lacZ reporter and the other containing the eve anal plate enhancer (AP) (Fujioka et al. 2016).  If the lacZ reporter and the AP transgenes also contain homie, the AP enhancer can activate lacZ expression (panel A,).  On the other hand, if one of the transgenes has lambda DNA instead of homie, no regulatory interactions are observed (panel A,).  In addition, as is the case in our experiments using the -142 kb platform, orientation matters.  In the combination on the top left, the homie boundary is pointing away from both the lacZ reporter and the AP enhancer.  Since homie pairs with itself head-tohead, pairing brings the AP enhancer into contact with the lacZ reporter.  A different result is obtained for the transgene pair in panel A on the top right.  In this combination, homie is pointing away from the lacZ reporter, while it is pointing towards the AP enhancer.  As a consequence, the reporter and enhancer are located on opposite sides of the paired homie boundaries, and in this configuration they are unable to interact with each other.

      On the top left of panel B, the homie element in the AP enhancer transgene was replaced by a nhomie boundary oriented so that it is pointing towards the enhancer.  Pairing of homie and nhomie head-to-tail brings the AP enhancer in the nhomie transgene into contact with the lacZ reporter in the homie transgene, and it activates reporter expression.  Finally, like homie, nhomie pairs with itself head-to-head, and when the nhomie boundaries are pointing towards both the AP reporter and the lacZ reporter, reporter expression is turned on.

      Long distance boundary-dependent pairing interactions by the bithorax complex Mcp boundary have also been reported in several papers.  Fig. 6 from Muller et al. (Muller et al. 1999) shows the pattern of regulatory interactions (in this case PRE-dependent “pairing-sensitive silencing”) between transgenes that have a mini-white reporter, the Mcp and scs’ boundaries and a PRE that is located close to Mcp.  In this experiment flies carrying transgenes inserted at the indicated sites on the left and right arms of the 3rd chromosome were mated in pairwise combinations, and their trans-heterozygous progeny examined for pairing-sensitive silencing of the mini-white reporter.

      Two examples of long-distance pairing-sensitive silencing mediated by Mcp/scs’ are shown in Fig. 5b from Muller et al. 1999.  The transgene inserts in panel A are w#12.43 and ff#10.5w#12.43 is inserted close to the telomere of 3R at 99B.  ff10.5 is inserted closer to the middle of 3R at 91A.  The estimate distance between them is 11.3 Mb.  The transgene inserts in panel B are ff#10.5 and ff#11.102ff#11.102 is inserted at 84D, and the distance between them is 11 Mb.  Normally, the eye color phenotype of the mini-white reporter is additive: homozygyous inserts have twice as dark eye color as hemizygous inserts, while in trans-_heterozygous flies the eye color would be the sum of the two different transgenes.  However, when a PRE is present and the transgene can pair, silencing is observed.  In panel A, the t_rans-_heterozygous combination has a lighter eye color than either of the parents.  In panel B, the _trans-_heterozygous combination is darker than one of the parents (_ff#10.5) but much lighter than the other (ff#11.102).

      All ten of the transgenes tested were able to engage in long distance (>Mbs) trans_regulatory interactions; however, likely because of how the chromosome folds on the Mb scale (e.g., the location of meta-loops: see #2.1 and Author response image 3) not all of the possible pairwise silencing interactions are observed.  The silencing interactions shown in Muller et.al. are between transgenes inserted on different homologs.  _Mcp/scs'-dependent silencing interactions can also occur in cis. Moreover, just like the homie and nhomie experiments described above, Muller et.al. (Muller et al. 1999) found that Mcp could mediate long-distance activation of mini-white and yellow by their respective enhancers.

      The pairing-sensitive activity of the PRE associated with the Mcp boundary is further enhanced when the mini-white transgene has the scs boundary in addition to Mcp and scs’.  In the experiment shown in Fig. 8 from Muller et al. 1999, the pairing-sensitive silencing interactions of the Mcp/scs’/scs transgene are between transgenes inserted on different chromosomes.  Panel A shows pairing-sensitive silencing between w#15.60, which is on the X chromosome, and w#15.102, which is on the 2nd chromosome.  Panel B shows pairing-sensitive silencing between the 2nd chromosome insert w#15.60 and a transgene, w#15.48, which is inserted on the 3rd chromosome.

      The long-distance trans and cis interactions described here are not unique to homie, nhomie, Mcp, scs’, or scs.  Precisely analogous results have been reported by Sigrist and Pirrotta (Sigrist and Pirrotta 1997) for the gypsy boundary when the bxd PRE was included in the mini-white transgene.  Also like the Mcp-containing transgenes in Muller et al. (Muller et al. 1999), Sigrist and Pirrotta observed pairing-sensitive silencing between gypsy bxd_PRE _mini-white transgenes inserted on different chromosomes.  Similar long-distance (Mb) interactions have been reported for Fab-7 (Bantignies et al. 2003; Li et al. 2011).  In addition, there are examples of “naturally occurring” long-distance regulatory and/or physical interactions.  One would be the regulatory/physical interactions between the p53 enhancer upstream of reaper and Xrp1 which was described by Link et al. (Link et al. 2013).  Another would be the nearly 60 meta-loops identified by Mohana et al. (Mohana et al. 2023).

      Like homie at -142 kb, the regulatory interactions (pairing-sensitive silencing and enhancer activation of reporters) reported in Muller et al. (Muller et al. 1999) involve direct physical interactions between the transgenes.  Vazquez et al. (Vazquez et al. 2006) used the lacI/lacO system to visualize contacts between distant scs/Mcp/scs’-containing transgenes in imaginal discs.  As indicated in Vasquez et al. 2006, Table 3 lines #4-7,  when both transgenes have Mcp and were inserted on the same chromosome, they colocalized in trans-_heterozygotes (single dot) in 94% to 97% of the disc nuclei in the four pairwise combinations they tested.  When the transgenes both lacked _Mcp (Vasquez et al. 2006, Table 3 #1), co-localization was observed in 4% of the nuclei.  When scs/Mcp/scs’-containing transgenes on the 2nd and 3rd chromosome were combined (Vasquez et al. 2006, Table 3 #8), colocalization was observed in 96% of the nuclei.  They also showed that four different scs/Mcp/scs’ transgenes (two at the same insertion site but on different homologs, and two at different sites on different homologs) co-localized in 94% of the eye imaginal disc nuclei (Vasquez et al. 2006, Table 3 #9).  These pairing interactions were also found to be stable over several hours.  Similar co-localization experiments together with 3C were reported by Li et al. (Li et al. 2011).

      The de novo establishment of trans interactions between compatible boundary elements has been studied by Lim et al. (Lim et al. 2018).  These authors visualized transvection (enhancer activation of a MS2 loop reporter in trans) mediated by the gypsy insulator, homie and Fab-8  in NC14 embryos.  When both transgenes shared the same boundary element, transvection/physical pairing was observed in a small subset of embryos.  The interactions took place after a delay and increased in frequency as the embryo progressed into NC14.  As expected, transvection was specific: it was not observed when the transgenes had different boundaries.  For homie it was also orientation-dependent.  It was observed when homie was orientated in the same direction in both transgenes, but not when homie was orientated in opposite directions in the two transgenes.

      While one could imagine that loop extrusion-dependent compaction of the chromatin located between eve and the transgene at -142 kb into a series of small loops (the intervening TADs) might be able to bring homie in the transgene close to homie/nhomie in the eve locus, there is no cohesinbased loop extrusion scenario that would bring transgenes inserted at sites 6 Mb, 11 Mb, on different sides of the centromere, or at opposite ends of the 3rd chromosome together so that the distant boundaries recognize their partners and physically pair with each other.  Nor is there a plausible cohesin-based loop extrusion mechanism that could account for the fact that most of the documented long-distance interactions involve transgenes inserted on different homologs.  This is not to mention the fact that long-distance interactions are also observed between boundarycontaining transgenes inserted on different chromosomes.

      In fact, given these results, one would logically come to precisely the opposite conclusion.  If boundary elements inserted Mbs apart, on different homologs and on different chromosomes can find each other and physically pair, it would be reasonable to think that the same mechanism (likely random collisions) is entirely sufficient when they are only 142 kb apart.

      Yet another reason to doubt the involvement or need for cohesin-dependent loop extrusion in bringing the transgene homie in contact with the eve locus comes from the studies of Goel et al. (Goel et al. 2023).  They show that cohesin has no role in the formation of TADs in mammalian tissue culture cells.  So if TADs in mammals aren’t dependent on cohesin, there would not be a good reason to think at this point that the loops (TADs) that are located between eve and the transgene are generated by, or even strongly dependent on, cohesin-dependent loop extrusion.

      It is also important to note that even if loop-extrusion were to contribute to chromatin compaction in this context and make the looping interactions that lead to orientation-specific pairing more efficient, the role of loop extrusion in this model is not determinative of the outcome, it is merely a general compaction mechanism.  This is a far cry from the popular concept of loop extrusion as being THE driving force determining chromosome topology at the TAD level.

      Reviewer #2 (Public Review):

      In Bing et al, the authors analyze micro-C data from NC14 fly embryos, focusing on the eve locus, to assess different models of chromatin looping. They conclude that fly TADs are less consistent with conventional cohesin-based loop extrusion models and instead rely more heavily on boundaryboundary pairings in an orientation-dependent manner.

      Overall, I found the manuscript to be interesting and thought-provoking. However, this paper reads much more like a perspective than a research article. Considering eLIFE is aimed at the general audience, I strongly suggest the authors spend some time editing their introduction to the most salient points as well as organizing their results section in a more conventional way with conclusion-based titles. It was very difficult to follow the authors' logic throughout the manuscript as written. It was also not clear as written which experiments were performed as part of this study and which were reanalyzed but published elsewhere. This should be made clearer throughout.

      It has been shown several times that Drosophila Hi-C maps do not contain all of the features (frequent corner peaks, stripes, etc.) observed when compared to mammalian cells. Considering these features are thought to be products of extrusion events, it is not an entirely new concept that Drosophila domains form via mechanisms other than extrusion.

      (2.1) While there are differences between the Hi-C contact profiles in flies and mammals, these differences likely reflect in large part the bin sizes used to visualize contact profiles.  With the exception of Goel et al. (Goel et al. 2023), most of the mammalian Hi-C studies have been low resolution restriction enzyme-based experiments, and required bin sizes of >1 kb or greater to visualize what are labeled as  “TADs.”  In fact, as shown by experiments in Goel et al., these are not actually TADs, but rather a conglomeration of multiple TADs into a series of TAD neighborhoods.  The same is true for the MicroC experiments of Krietenstein et al. and Hsieh et al. on human and mouse tissue culture cells (Hsieh et al. 2020; Krietenstein et al. 2020).  This is shown in Author response image 2.  In this image, we have compared the MicroC profiles generated from human and mouse tissue culture cells with fly MicroC profiles at different levels of resolution.

      For panels A-D, the genomic DNA segments shown are approximately 2.8 Mb, 760 kb, 340 kb, and 190 kb.  For panels E-H, the genomic DNA segments shown are approximately 4.7 Mb, 870 kb, 340 kb and 225 kb.  For panels I-L, the genomic DNA segments shown are approximately 3 Mb, 550 kb, 290 kb and 175 kb.

      As reported for restriction enzyme-based Hi-C experiments, a series of stripes and dots are evident in mammalian MicroC profiles.  In the data from Krietenstein et al., two large TAD “neighborhoods” are evident with a bin size of 5 kb, and these are bracketed by 45o stripes (A: black arrows).  At 1 kb (panel B), the 45o stripe bordering the neighborhood on the left no longer defines the edge of the neighborhood (blue arrow: panel B), and both stripes become discontinuous (fuzzy dots).  At 500 (panel C) and 200 bp (panel D) bin sizes, the stripes largely disappear (black arrows) even though they were the most prominent feature in the TAD landscape with large bin sizes.  At 200 bp, the actual TADs (as opposed to the forest) are visible, but weakly populated.  There are no stripes, and only one of the TADs has an obvious “dot” (green asterisk: panel C).

      Author response image 2.

      Mammalian MicroC profiles different bin sizes.

      Large TAD neighborhoods bordered by stripes are also evident in the Hsieh et al. data set in Author response image 2 panels E and F (black arrows in E and F and green arrow in F).  At 400 bp resolution (panel G), the narrow stripe in panel F (black arrows) becomes much broader, indicating that it is likely generated by interactions across one or two small TADs that can be discerned at 200 bp resolution.  The same is true for the broad stripe indicated by the green arrows in panels F, G and H.  This stripe arises from contacts between the TADs indicated by the red bar in panels G and H and the TADs to the other side of the volcano triangle with a plume (blue arrow in panel H).  As in flies, we would expect that this volcano triangle topped by a plume corresponds to a stem-loop.  However, the resolution is poor at 200 bp, and the profiles of the neighboring TADs are not very distinct.

      For the fly data set, stripes can be discerned when analyzed at 800 bp resolution (see arrows in Author response image 3);  however, these stripes are flanked by regions of lower contact, and represent TAD-TAD interactions.  At 400 bp, smaller neighborhoods can be discerned, and these neighborhoods exhibit a complex pattern of interaction with adjacent neighborhoods.  With bin sizes of 200 bp, individual TADs are observed, as are TAD-TAD interactions like those seen near eve.  Some of the TADs have dots at their apex, while others do not—much like what is seen in the mammalian MicroC studies.

      Author response image 3.

      Mammalian MicroC profiles different bin sizes.

      Stripes: As illustrated in Author response image 2 A-D and E-H, the continuous stripes seen in low resolution mammalian studies (>1 kb bins) would appear to arise from binning artefacts.  At high resolution where single TADs are visible, the stripes seem to be generated by TAD-TAD interactions, and not by some type of “extrusion” mechanism.  This is most clearly seen for the volcano with plume TAD in Author response inage 2 G and H.  While stripes in Author response image 2 disappear at high resolution, this is not always true.  There are stripes that appear to be “real” in Geol et al. 2023 for the TADs in the Ppm1g region, and in Author response image 1 for the Abd-B regulatory domain TADs.  Since the stripes in the Ppm1g region are unaffected by Rad21 depletion, some other mechanism must be involved (c.f. (Shidlovskii et al. 2021)).

      Dots: The high resolution images of mammalian MicroC experiments in Author response image 2D and H show that, like Drosophila (Author response image 3L), mammalian TADs don’t always have a “dot” at the apex of the triangle.  This is not surprising.  In the MicroC procedure, fixed chromatin is digested to mononucleosomes with MNase.  Since most TAD boundaries in flies, and presumably also in mammals, are relatively large (150-400 bp) nuclease hypersensitive regions, extensive MNase digestion will typically reduce the boundary element sequences to oligonucleotides.

      In flies, the only known sequences (at least to date) that end up giving dots (like those seen in Author response image 1) are bound by a large (>1,000 kd) GAF-containing multiprotein complex called LBC.  In the Abd-B region of BX-C, LBC binds to two ~180 bp sequences in Fab-7 (dHS1 and HS3: (Kyrchanova et al. 2018; Wolle et al. 2015), and to the centromere proximal (CP) side of Fab-8.  The LBC elements in Fab-7 (dHS1) and Fab-8 (CP) have both blocking and boundary bypass activity (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018).  Elsewhere, LBC binds to the bx and bxd PREs in the Ubx regulatory domains, to two PREs upstream of engrailed, to the hsp70 promoter, the histone H3-H4 promoters, and the eve promoter (unpublished data).  Based on ChIP signatures, it likely binds to most PREs/tethering elements in the fly genome (Batut et al. 2022; Li et al. 2023).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that LBC protects an ~150-180 bp DNA segment from MNase digestion, which would explain why LBC-bound sequences are able to generate dots in MicroC experiments.  Also unlike typical boundary elements, the pairing interactions of the LBC elements we’ve tested appear to be orientation-independent (unpublished data).

      The difference in MNase sensitivity between typical TAD boundaries and LBC-bound elements is illustrated in the MicroC of the Leukocyte-antigen-related-like (Lar) meta-loop in Author response image 4 panels A and B.  Direct physical pairing of two TAD boundaries (blue and purple) brings two TADs encompassing the 125 kb lar gene into contact with two TADs in a gene poor region 620 kb away.  This interaction generates two regions of greatly enhanced contact: the two boxes on either side of the paired boundaries (panel A).  Note that like transgene homie pairing with the eve boundaries, the boundary pairing interaction that forms the lar meta-loop is orientation-dependent.  In this case the TAD boundary in the Lar locus pairs with the TAD boundary in the gene poor region head-to-head (arrow tip to arrow tip), generating a circle-loop.  This circle-loop configuration brings the TAD upstream of the blue boundary into contact with the TAD upstream of the purple boundary.  Likewise, the TAD downstream of the blue boundary is brought into contact with the TAD downstream of the purple boundary.

      In the MicroC procedure, the sequences that correspond to the paired boundaries are not recovered (red arrow in Author response image 4 panel B).  This is why there are vertical and horizontal blank stripes (red arrowheads) emanating from the missing point of contact.  Using a different HiC procedure (dHS-C) that allows us to recover sequences from typical boundary elements (Author response image 4 panels C and D), there is a strong “dot” at the point of contact which corresponds to the pairing of the blue and purple boundaries.

      There is a second dot (green arrow) within the box that represents physical contacts between sequences in the TADs downstream of the blue and purple boundaries.  This dot is resistant to MNase digestion and is visible both in the MicroC and dHS-C profiles.  Based on the ChIP signature of the corresponding elements in the two TADs downstream of the blue and purple boundaries, this dot represents paired LBC elements.

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      That being said, the authors' analyses do not distinguish between the formation and the maintenance of domains. It is not clear to this reviewer why a single mechanism should explain the formation of the complex structures observed in static Hi-C heatmaps from a population of cells at a single developmental time point. For example, how can the authors rule out that extrusion initially provides the necessary proximity and possibly the cis preference of contacts required for boundaryboundary pairing whereas the latter may more reflect the structures observed at maintenance?

      (2.2) The MicroC profiles shown in Fig. 2 of our paper were generated from nuclear cycle (NC) 14 embryos.  NC14 is the last nuclear cycle before cellularization (Foe 1989).  After the nuclei exit mitosis, S-phase begins, and because satellite sequences are late replicating in this nuclear cycle, S phase lasts 50 min instead of only 4-6 min during earlier cycles (Shermoen et al. 2010).  So unlike MicroC studies in mammals, our analysis of chromatin architecture in NC14 embryos likely offers the best opportunity to detect any intermediates that are generated during TAD formation.  In particular, we should be able to observe evidence of cohesin linking the sequences from the two extruding strands together (the stripes) as it generates TADs de novo.  However, there are no vertical stripes in the eve TAD as would be expected if cohesin entered at a few specific sites somewhere within the TAD and extruded loops in opposite directions synchronously, nor are their stripes at 45o as would be expected if it started at nhomie or homie (see Figure Supplemental 1).  We also do not detect cohesin-generated stripes in any of the TADs in between eve and the attP site at -142 kb. Note that in some models, cohesin is thought to be continuously extruding loops. After hitting the CTCF roadblocks, cohesin either falls off after a short period and starts again or it breaks through one or more TAD boundaries generating the LDC domains. In this dynamic model, stripes of crosslinked DNA generated by the passing cohesin complex should be observed throughout the cell cycle.  They are not. 

      As for formation versus maintenance, and the possible involvement of cohesin loop extrusion in the former, but not the latter:  This question was indirectly addressed in point #1.2 above.  In this point we described multiple examples of specific boundary:boundary pairing interactions that take place over Mbs, in cis and in trans and even between different chromosomes.  These long-distance interactions don’t preexist;  instead they must be established de novo and then maintained.  This process was actually visualized in the studies of Lim et al. (Lim et al. 2018) on the establishment of trans boundary pairing interactions in NC14 embryos.  There is no conceivable mechanism by which cohesin-based loop extrusion could establish the long or short distance trans interactions that have been documented in many studies on fly boundary elements.  Also as noted above, its seems unlikely that it is necessary for long-range interactions in cis.  

      A more plausible scenario is that cohesin entrapment helps to stabilize these long-distance interactions after they are formed.  If this were true, then one could argue that cohesin might also function to maintain TADs after boundaries have physically paired with their neighbors in cis.  However, the Rad21 depletion experiments of Goel et al. (Goel et al. 2023) would rule out an essential role for cohesin in maintaining TADs after boundary:boundary pairing.  In short, while we cannot formally rule out that loop extrusion might help bring sequences closer together to increase their chance of pairing, neither the specificity of that pairing, nor its orientation can be explained by loop extrusion.  Furthermore, since pairing in trans cannot be facilitated by loop extrusion, invoking it as potentially important for boundary-boundary pairing in cis can only be described as a potential mechanism in search of a function, without clear evidence in its favor.

      On the other hand, the apparent loss of contacts between TADs within large multi-TAD neighborhoods (Geol et al. 2023) would suggest that there is some sort of decompaction of neighborhoods after Rad21 depletion.  It is possible that this might stress interactions that span multiple TADs as is the case for homie at -142, or for the other examples described in #1.2 above.  This kind of involvement of cohesin might or might not be associated with a loop extrusion mechanism.

      Future work aimed at analyzing micro-C data in cohesin-depleted cells might shed additional light on this.

      (2.3) This experiment has been done by Goel et al. (Goel et al. 2023) in mammalian tissue culture cells.  They found that TADs, as well as local TAD neighborhoods, are not disrupted/altered by Rad21 depletion (see Geol at al. 2023 and our response to point #1.1 of reviewer #1).

      Additional mechanisms at play include compartment-level interactions driven by chromatin states. Indeed, in mammalian cells, these interactions often manifest as a "plume" on Hi-C maps similar to what the authors attribute to boundary interactions in this manuscript. How do the chromatin states in the neighboring domains of the eve locus impact the model if at all?

      (2.4) Chromatin states have been implicated in driving compartment level interactions. 

      Compartments as initially described were large, often Mb sized, chromosomal segments that “share” similar chromatin marks/states, and are thought to merge via co-polymer segregation.  They were visualized using large multi-kb bin sizes.  In the studies reported here, we use bin sizes of 200 bp to examine a DNA segment of less than 200 kb which is subdivided into a dozen or so small TADs.  Several of the TADs contain more than one transcription unit, and they are expressed in quite different patterns, and thus might be expected to have different “chromatin states” at different points in development and in different cells in the organism. However, as can be seen by comparing the MicroC patterns in our paper that are shown in Fig. 2 with Fig. 7, Figure Supplemental 5 and Figure Supplemental 6, the TAD organization in NC14 and 12-16 hr embryos is for the most part quite similar.  There is no indication that these small TADs are participating in liquid phase compartmentalization that depends upon shared chromatin/transcriptional states in NC14 and then again in 12-16 hr embryos. 

      In NC14 embryos, eve is expressed in 7 stripes, while it is potentially active throughout much of the embryo.  In fact, the initial pattern in early cycles is quite broad and is then refined during NC14.  In 12-16 hr embryos, the eve gene is silenced by the PcG system in all but a few cells in the embryo.  However, here again the basic structure of the TAD, including the volcano plume, looks quite similar at these different developmental stages.  

      As for the suggestion that the plume topping the eve volcano triangle is generated because the TADs flanking the eve TAD share chromatin states and coalesce via some sort of phase separation:

      This model has been tested directly in Ke et al. (Ke et al. 2024).  In Ke et al., we deleted the nhomie boundary and replaced it with either nhomie in the reverse orientation or homie in the forward orientation.  According to the compartment model, changing the orientation of the boundaries so that the topology of the eve TAD changes from a stem-loop to a circle-loop should have absolutely no effect on the plume topping the eve volcano triangle.  The TADs flanking the eve TAD would still be expected to share the same chromatin states and would still be able to coalesce via phase transition.  However, this is not what is observed.  The plume disappears and is replaced by “clouds” on both sides of the eve TAD. The clouds arise because the eve TAD bumps into the neighboring TADs when the topology is a circle-loop.  

      We would also note that “compartment-level” interactions would not explain the findings presented in Muller at al. 1999, in Table 1 or in Author response image 4.  It is clear that the long distant (Mb) interactions observed for Mcp, gypsy, Fab-7, homie, nhomie and the blue and purple boundaries in Author response image 4 arise by the physical pairing of TAD boundary elements.  This fact is demonstrated directly by the MicroC experiments in Fig. 7 and Fig Supplemental 4 and 5, and by the MicroC and dHS-C experiments in Author response image 4.  There is no evidence for any type of “compartment/phase separation” driving these specific boundary pairing interactions.

      In fact, given the involvement of TAD boundaries in meta-loop formation, one might begin to wonder whether some of the “compartment level interactions” are generated by the specific pairing of TAD boundary elements rather than by “shared chromatin” states.  For example, the head-tohead pairing of the blue and purple boundaries generates a Lar meta-loop that has a circle-loop topology.  As a consequence, sequences upstream of the blue and purple boundary come into contact, generating the small dark rectangular box on the upper left side of the contact map.  Sequences downstream of the blue and purple boundary also come into contact, and this generates the larger rectangular box in the lower right side of the contact map.  A new figure, Fig. 9, shows that the interaction pattern flips (lower left and top right) when the meta-loop has a stem-loop topology.  If these meta-loops are visualized using larger bin sizes, the classic “compartment” patchwork pattern of interactions emerges.  Would the precise patchwork pattern of “compartmental” interactions involving the four distant TADs that are linked in the two meta-loops shown in Fig. 9 persist as is if we deleted one of the TAD boundaries that forms the meta-loop?  Would the precise patchwork pattern persist if we inverted one of the meta-loop boundaries so that we converted the topology of the loop from a circle-loop to a stem-loop or vice versa?  We haven’t used MicroC to compare the compartment organization after deleting or inverting a meta-loop TAD boundary; however, a comparison of the MicroC pattern in WT in Fig. 1C with that for the homie transgenes in Fig. 7 and Figs. Supplemental 5, 6 and 7 indicates a) that novel patterns of TAD:TAD interactions are generated by this homie dependent mini-meta-loop and b) that the patterns of TAD:TAD interactions depend upon loop topology. Were these novel TAD:TAD interactions generated instead by compartment level interactions/shared chromatin states, they should be evident in WT as well (Fig. 1).  They are not.

      How does intrachromosomal homolog pairing impact the models proposed in this manuscript (Abed et al. 2019; Erceg et al., 2019). Several papers recently have shown that somatic homolog pairing is not uniform and shows significant variation across the genome with evidence for both tight pairing regions and loose pairing regions. Might loose pairing interactions have the capacity to alter the cis configuration of the eve locus?

      (2.5) At this point it is not entirely clear how homolog pairing impacts the cis configuration/MicroC contact maps.  We expect that homolog pairing is incomplete in the NC14 embryos we analyzed;  however, since replication of eve and the local neighborhood is likely complete, sister chromosomes should be paired.  So we are likely visualizing the 3D organization of paired TADs.

      In summary, the transgenic experiments are extensive and elegant and fully support the authors' models. However, in my opinion, they do not completely rule out additional models at play, including extrusion-based mechanisms. Indeed, my major issue is the limited conceptual advance in this manuscript. The authors essentially repeat many of their previous work and analyses.

      (2.6) In our view, the current paper makes a number of significant contributions that go well beyond those described in our 2016 publication.  These are summarized below.

      A) While our 2016 paper used transgenes inserted in the -142 kb attP site to study pairing interactions of homie and nhomie, we didn’t either consider or discuss how our findings might bear on the loop extrusion model.  However, since the loop extrusion model is currently accepted as established fact by many labs working on chromosome structure, it is critically important to devise experimental approaches which test the predictions of this particular model.  One approach would be to deplete cohesin components; however, as discussed in #1.1, our experimental system is not ideal for this type of approach.  On the other hand, there are other ways to test the extrusion model.  Given the mechanism proposed for TAD formation—extruding a loop until cohesin bumps into CTCF/boundary road blocks—it follows that only two types of loop topologies are possible: stemloop and unanchored loop.  The loop extrusion model, as currently conceived, can’t account for the two cases in this study in which the reporter on the wrong side of the homie boundary from the eve locus is activated by the eve enhancers.  In contrast, our findings are completely consistent with orientation-specific boundary:boundary pairing.

      B) In the loop extrusion model, cohesin embraces both of the extruded chromatin fibers, transiently bringing them into close proximity.  As far as we know, there have been no (high resolution) experiments that have actually detected these extruding cohesin complexes during TAD formation.  In order to have a chance of observing the expected signatures of extruding cohesin complexes, one would need a system in which TADs are being formed.  As described in the text, this is why we used MicroC to analyze TADs in NC14 embryos.  We do not detect the signature stripes that would be predicted (see Figure Supp 2) by the current version of the loop extrusion model.

      C) Reporter expression in the different -142 kb transgenes provides only an indirect test of the loop extrusion and boundary:boundary pairing models for TAD formation.  The reporter expression results need to be confirmed by directly analyzing the pattern of physical interactions in each instance.  While we were able to detect contacts between the transgenes and eve in our 2016 paper, the 3C experiments provided no information beyond that.  By contrast, the MicroC experiments in the current paper give high resolution maps of the physical contacts between the transgene and the eve TAD.  The physical contacts track completely with reporter activity.  Moreover, just as is the case for reporter activity, the observed physical interactions are inconsistent with the loop extrusion model.

      D) Genetic studies in Muller et al. (Muller et al. 1999) and imaging in Vazquez et al. (Vazquez et al. 2006) suggested that more than two boundaries can participate in pairing interactions.  Consistent with these earlier observations, viewpoint analysis indicates the transgene homie interacts with both eve boundaries.  While this could be explained by transgene homie alternating between nhomie and homie in the eve locus, this would require the remodeling of the eve TAD each time the pairing interaction switched between the three boundary elements.  Moreover, two out of the three possible pairing combinations would disrupt the eve TAD, generating an unanchored loop (c.f., the lambda DNA TAD in Ke et al., (Ke et al. 2024)).  However, the MicroC profile of the eve TAD is unaffected by transgenes carrying the homie boundary.  This would suggest that like Mcp, the pairing interactions of homie and nhomie might not be exclusively pairwise.  In this context is interesting to compare the contact profiles of the lar meta-loop shown in Author response image 4 with the different 142 kb homie inserts.  Unlike the homie element at -142 kb, there is clearly only a single point of contact between the blue and purple boundaries.

      E) Chen et al. (Chen et al. 2018) used live imaging to link physical interactions between a homie containing transgene inserted at -142 kb and the eve locus to reporter activation by the eve enhancers.  They found that the reporter was activated by the eve enhancers only when it was in “close proximity” to the eve gene.  “Close proximity” in this case was 331 nM.  This distance is equivalent to ~1.1 kb of linear duplex B form DNA, or ~30 nucleosome core particles lined up in a row.  It would not be possible to ligate two DNAs wrapped around nucleosome core particles that are located 330 nM apart in a fixed matrix.  Since our MicroC experiments were done on embryos in which the gene is silent in the vast majority of cells, it is possible that the homie transgene only comes into close enough proximity for transgene nucleosome: eve nucleosome ligation events when the eve gene is off.  Alternatively, and clearly more likely, distance measurements using imaging procedures that require dozens of fluorescent probes may artificially inflate the distance between sequences that are actually close enough for enzymatic ligation.

      F) The findings reported in Goel et al. (Goel et al. 2023) indicate that mammalian TADs don’t require cohesin activity; however, the authors do not provide an alternative mechanism for TAD formation/stability.  Here we have suggested a plausible mechanism.

      The authors make no attempt to dissect the mechanism of this process by modifying extrusion components directly.

      (2.7) See point #1.1

      Some discussion of Rollins et al. on the discovery of Nipped-B and its role in enhancer-promoter communication should also be made to reconcile their conclusions in the proposed absence of extrusion events.

      (2.8) The reason why reducing nipped-B activity enhances the phenotypic effects of gypsy-induced mutations is not known at this point; however, the findings reported in Rollins et al. (Rollins et al. 1999) would appear to argue against an extrusion mechanism for TAD formation.

      Given what we know about enhancer blocking and TADs, there are two plausible mechanisms for how the Su(Hw) element in the gypsy transposon blocks enhancer-promoter interactions in the gypsy-induced mutants studied by Rollins et al.  First, the Su(Hw) element could generate two new TADs through pairing interactions with boundaries in the immediate neighborhood.  This would place the enhancers in one TAD and the target gene in another TAD.  Alternatively, the studies of Sigrist and Pirrotta (Sigrist and Pirrotta 1997) as well as several publications from Victor Corces’ lab raise the possibility that the Su(Hw) element in gypsy-induced mutations is pairing with gypsy transposons inserted elsewhere in the genome.  This would also isolate enhancers from their target genes.  In either case, the loss of nipped-B activity increases the mutagenic effects of Su(Hw) element presumably by strengthening its boundary function.  If this is due to a failure to load cohesin on to chromatin, this would suggest that cohesin normally functions to weaken the boundary activity of the Su(Hw) element, i.e., disrupting the ability of Su(Hw) elements to interact with either other boundaries in the neighborhood or with themselves.  Were this a general activity of cohesin (to weaken boundary activity), one would imagine that cohesin normally functions to disrupt TADs rather than generate/stabilize TADs.

      An alternative model is that Nipped-B (and thus cohesion) functions to stabilize enhancerpromoter interactions within TADs.  In this case, loss of Nipped-B would result in a destabilization of the weak enhancer:promoter interactions that can still be formed when gypsy is located between the enhancer and promoter.  In this model the loss of these weak interactions in nipped-b mutants would appear to increase the “blocking” activity of the gypsy element.  However, this alternative model would also provide no support for the notion that Nipped-B and cohesin function to promote TAD formation.

      Reviewer #3 (Public Review):

      Bing et al. attempt to address fundamental mechanisms of TAD formation in Drosophila by analyzing gene expression and 3D conformation within the vicinity of the eve TAD after insertion of a transgene harboring a Homie insulator sequence 142 kb away in different orientations. These transgenes along with spatial gene expression analysis were previously published in Fujioka et al. 2016, and the underlying interpretations regarding resulting DNA configuration in this genomic region were also previously published. This manuscript repeats the expression analysis using smFISH probes in order to achieve more quantitative analysis, but the main results are the same as previously published. The only new data are the Micro-C and an additional modeling/analysis of what they refer to as the 'Z3' orientation of the transgenes. The rest of the manuscript merely synthesizes further interpretation with the goal of addressing whether loop extrusion may be occurring or if boundary:boundary pairing without loop extrusion is responsible for TAD formation. The authors conclude that their results are more consistent with boundary:boundary pairing and not loop extrusion; however, most of this imaging data seems to support both loop extrusion and the boundary:boundary models. This manuscript lacks support, especially new data, for its conclusions.

      (3.1) The new results/contributions of our paper are described in #2.6 above. 

      Although there are (two) homie transgene configurations that give expression patterns that would be consistent with the loop extrusion model, that is not quite the same as strong evidence supporting loop extrusion.  On the contrary, key aspects of the expression data are entirely inconsistent with loop extrusion, and they thus rule out the possibility that loop extrusion is sufficient to explain the results.  Moreover, the conclusions drawn from the expression patterns of the four transgenes are back up by the MicroC contact profiles—profiles that are also not consistent with the loop extrusion model.  Further, as documented above, loop extrusion is not only unable to explain the findings reported in this manuscript, but also the results from a large collection of published studies on fly boundaries.  Since all of these boundaries function in TAD formation, there is little reason to think that loop extrusion makes a significant contribution at the TAD level in flies.   Given the results reported by Goel et al. (Goel et al. 2023), one might also have doubts about the role of loop extrusion in the formation/maintenance of mammalian TADs. 

      To further document these points, we’ve included a new figure (Fig. 9) that shows two meta-loops.  Like the loops seen for homie-containing transgenes inserted at -142 kb, meta-loops are formed by the pairing of distant fly boundaries.  As only two boundaries are involved, the resulting loop topologies are simpler than those generated when transgene homie pairs with nhomie and homie in the eve locus.  The meta-loop in panel B is a stem-loop.  While a loop with this topology could be formed by loop extrusion, cohesion would have to break through dozens of intervening TAD boundaries and then somehow know to come to a halt at the blue boundary on the left and the purple boundary on the right.  However, none of the mechanistic studies on either cohesin or the mammalian CTCF roadblocks have uncovered activities of either the cohesin complex or the CTCF roadblocks that could explain how cohesin would be able to extrude hundreds of kb and ignore dozens of intervening roadblocks, and then stop only when it encounters the two boundaries that form the beat-IV meta-loop.  The meta-loop in panel A is even more problematic in that it is a circle-loop--a topology that can’t be generated by cohesin extruding a loop until comes into contact with CTCF roadblocks on the extruded strands.

      Furthermore, there are many parts of the manuscript that are difficult to follow. There are some minor errors in the labelling of the figures that if fixed would help elevate understanding. Lastly, there are several major points that if elaborated on, would potentially be helpful for the clarity of the manuscript.

      Major Points:

      (1) The authors suggest and attempt to visualize in the supplemental figures, that loop extrusion mechanisms would appear during crosslinking and show as vertical stripes in the micro-C data. In order to see stripes, a majority of the nuclei would need to undergo loop extrusion at the same rate, starting from exactly the same spots, and the loops would also have to be released and restarted at the same rate. If these patterns truly result from loop extrusion, the authors should provide experimental evidence from another organism undergoing loop extrusion.

      (3.2) We don’t know of any reports that actually document cohesion extrusion events that are forming TADs (TADs as defined in our paper, in the RCMC experiments of Goel et al. (Goel et al. 2023), in response #1.1, or in the high-resolution images from the MicroC data of Krietenstein et al (Krietenstein et al. 2020) and Hseih et al. (Hsieh et al. 2020). However, an extruding cohesin complex would be expected to generate stripes because it transiently brings together the two chromatin strands as illustrated by the broken zipper in Figure Supplemental 2 of our paper.  While stripes generated by cohesin forming a TAD have not to our knowledge ever been observed, Fig. 4 in Goel et al. (Goel et al. 2023)) shows 45o stripes outlining TADs and connecting neighboring TADs.  These stripes are visible with or without Rad21.

      In some versions of the loop extrusion model, cohesin extrudes a loop until it comes to a halt at both boundaries, where it then remains holding the loop together.  In this model, the extrusion event would occur only once per cell cycle.  This is reason we selected NC14 embryos as this point in development should provide by far the best opportunity to visualize cohesin-dependent TAD formation.  However, the expected stripes generated by cohesin embrace of both strands of the extruding loop were not evident.  Other newer versions of the loop extrusion model are much more dynamic—cohesin extrudes the loop, coming to a halt at the two boundaries, but either doesn’t remain stably bound or breaks through one or both boundaries. In the former case, the TAD needs to be reestablished by another extrusion event, while in the latter case LDC domains are generated.  In this dynamic model, we should also be able to observe vertical and 45o stripes (or stripes leaning to one side or another of the loading site if the extrusion rates aren’t equal on both fibers) in NC14 embryos corresponding to the formation of TADs and LDC domains.  However, we don’t.

      (2) On lines 311-314, the authors discuss that stem-loops generated by cohesin extrusion would possibly be expected to have more next-next-door neighbor contacts than next-door neighbor contacts and site their models in Figure 1. Based on the boundary:boundary pairing models in the same figure would the stem-loops created by head-to-tail pairing also have the same phenotype? Making possible enrichment of next-next-door neighbor contacts possible in both situations? The concepts in the text are not clear, and the diagrams are not well-labeled relative to the two models.

      (3.3) Yes, we expect that stem-loops formed by cohesin extrusion or head-to-tail pairing would behave in a similar manner.  They could be stem-loops separated by unanchored loops as shown in Fig. 1B and E.  Alternatively, adjacent loops could be anchored to each other (by cohesin/CTCF road blocks or by pairing interactions) as indicated in Fig. 1C and F.  In stem-loops generated either by cohesin extrusion or by head-to-tail pairing, next-next door neighbors should interact with each other, generating a plume above the volcano triangle.  In the case of circle-loops, the volcano triangle should be flanked by clouds that are generated when the TAD bumps into both next-door neighbors.  In the accompanying paper, we test this idea by deleting the nhomie boundary and then a) inserting nhomie back in the reverse orientation, or b) by inserting homie in the forward orientation.  The MicroC patterns fit with the predictions that were made in this paper.

      (3) The authors appear to cite Chen et al., 2018 as a reference for the location of these transgenes being 700nM away in a majority of the nuclei. However, the exact transgenes in this manuscript do not appear to have been measured for distance. The authors could do this experiment and include expression measurements.

      (3.4) The transgenes used in Chen et al. are modified versions of a transgene used in Fujioka et al. (2016) inserted into the same attP site.  When we visualize reporter transcription in NC14 embryos driven by the eve enhancers using smFISH, HCR-FISH or DIG, only a subset of the nuclei at this stage are active.  The number of active nuclei we detect is similar to that observed in the live imaging experiments of Chen et al.  The reason we cited Chen et al. (Chen et al. 2018) was that they found that proximity was a critical factor in determining whether the reporter was activated or not in a given nucleus.  The actual distance they measured wasn’t important.  Moreover, as we discussed in response #2.6 above, there are good reasons to think that the “precise” distances measured in live imaging experiments like those used in Chen et al. are incorrect.  However, their statements are certainly correct if one considers that a distance of ~700 nM or so is “more distant” relative to a distance of ~300 nM or so, which is “closer.”

      (4) The authors discuss the possible importance of CTCF orientation in forming the roadblock to cohesin extrusion and discuss that Homie orientation in the transgene may impact Homie function as an effective roadblock. However, the Homie region inserted in the transgene does not contain the CTCF motif. Can the authors elaborate on why they feel the orientation of Homie is important in its ability to function as a roadblock if the CTCF motif is not present? Trans-acting factors responsible for Homie function have not been identified and this point is not discussed in the manuscript.

      We discussed the “importance” of CTCF orientation in forming roadblocks because one popular version of the cohesin loop extrusion/CTCF roadblock model postulates that CTCF must be oriented so that the N-terminus of the protein is facing towards the oncoming cohesin complex, otherwise it won’t be able to halt extrusion on that strand.  When homie in the transgene is pointing towards the eve locus, the reporter on the other side (farther from eve) is activated by the eve enhancers.  One possible way to explain this finding (if one believes the loop extrusion model) is that when homie is inverted, it can’t stop the oncoming cohesin complex, and it runs past the homie boundary until it comes to a stop at a properly oriented boundary farther away.  In this case, the newly formed loop would extend from the boundary that stopped cohesin to the homie boundary in the eve locus, and would include not only the distal reporter, but also the proximal reporter.  If both reporters are in the same loop with the eve enhancers (which they would have to be given the mechanism of TAD formation by loop extrusion), both reporters should be activated.  They are not.

      For the boundary pairing model, the reporter that will be activated will depend upon the orientation of the pairing interaction—which can be either head-to-head or head-to-tail (or both: see discussion of LBC elements in #2.1).  For an easy visualization of how the orientation of pairing interactions is connected to the patterns of interactions between sequences neighboring the boundary, please look at Fig. 9.  This figure shows two different meta-loops.  In panel A, head-tohead pairing of the blue and purple boundaries brings together, on the one hand, sequences upstream of the blue and purple boundary, and on the other hand, sequences downstream of the blue and purple boundaries.  In the circle loop configuration, the resulting rectangular boxes of enhanced contact are located in the upper left and lower right of the contact map.  In panel B, the head-to-tail pairing of the blue and purple boundary changes how sequences upstream and downstream of the blue and purple boundaries interact with each other.  Sequences upstream of the blue boundary interact with sequences downstream of the purple boundary, and this gives the rectangular box of enhanced interactions on the top right.  Sequences downstream of the blue boundary interact with sequences upstream of the purple boundary, and this gives the rectangular box of enhanced contact on the lower left.

      CTCF: Our analysis of the homie boundary suggests that CTCF contributes little to its activity.  It has an Su(Hw) recognition sequence and a CP190 “associated” sequence.  Mutations in both compromise boundary activity (blocking and -142 kb pairing).  Gel shift experiments and ChIP data indicate there are half a dozen or more additional proteins that associate with the 300 bp homie fragment used in our experiments.

      Orientation of CTCF or other protein binding sites:  The available evidence suggests that orientation of the individual binding sites is not important (Kyrchanova et al. 2016; Lim et al. 2018)).  Instead, it is likely that the order of binding sites affects function.

      (5) The imaging results seem to be consistent with both boundary:boundary interaction and loop extrusion stem looping.

      It is not clear whether the reviewer is referring to the different patterns of reporter expression— which clearly don’t fit with the loop extrusion model in the key cases that distinguish the two models—or the live imaging experiments in Chen et al. (Chen et al. 2018).

      (6) The authors suggest that the eveMa TAD could only be formed by extrusion after the breakthrough of Nhomie and several other roadblocks. Additionally, the overall long-range interactions with Nhomie appear to be less than the interactions with endogenous Homie (Figures 7, 8, and supplemental 5). Is it possible that in some cases boundary:boundary pairing is occurring between only the transgenic Homie and endogenous Homie and not including Nhomie?

      Yes, it is possible.  On the other hand, the data that are currently available supports the idea that transgene homie usually interacts with endogenous homie and nhomie at the same time.  This is discussed in #2.6D above.  The viewpoints indicate that crosslinking occurs more frequently to homie than to nhomie.  This could indicate that when there are only pairwise interactions, these tend to be between homie and homie.  Alternatively, this could also be explained by a difference in relative crosslinking efficiency.

      (7) In Figure 4E, the GFP hebe expression shown in the LhomieG Z5 transgenic embryo does not appear in the same locations as the LlambdaG Z5 control. Is this actually hebe expression or just a background signal?

      The late-stage embryos shown in E are oriented differently.  For GlambdaL, the embryo is oriented so that hebe-like reporter expression on the ventral midline is readily evident.  However, this orientation is not suitable for visualizing eve enhancer-dependent expression of the reporters in muscle progenitor cells.  For this reason, the 12-16 hr GeimohL embryo in E is turned so that the ventral midline isn’t readily visible in most of the embryo.  As is the case in NC14 embyros, the eve enhancers drive lacZ but not gfp expression in the muscle progenitor cells.

      (8) Figure 6- The LhomieG Z3 (LeimohG) late-stage embryo appears to be showing the ventral orientation of the embryo rather than the lateral side of the embryo as was shown in the previous figure. Is this for a reason? Additionally, there are no statistics shown for the Z3 transgenic images.

      Were these images analyzed in the same way as the Z5 line images?

      The LeimohG embryo was turned so that the hebe enhancer-dependent expression of lacZ is visible.  While the eve enhancer-dependent expression of lacZ in the muscle progenitor cells isn’t visible with this orientation, eve enhancer-dependent expression in the anal plate is.

      (9) Do the Micro-C data align with the developmental time points used in the smFISH probe assays?

      The MicroC data aligns with the smFISH images of older embryos: 12-14 hour embryos or stages 14-16.  

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      This was a difficult paper to review. It took me several hours to understand the terminology and back and forth between different figures to put it together. It might be useful to put the loop models next to the MicroC results and have a cartoon way of incorporating which enhancers are turning on which reporters.

      I also found the supercoiled TAD models in Figure 1 not useful. These plectoneme-type of structures likely do not exist, based on the single-cell chromosome tracing studies, and the HiC structures not showing perpendicular to diagonal interactions between the arms of the plectonemes.

      We wanted to represent the TAD as a coiled 30nM fiber, as they are not likely to resemble the large loops like those shown in Fig. 1 A, D, and G.

      There are no stripes emerging from homies, which is consistent with the pairing model, but there seem to be stripes from the eve promoter. I think these structures may be a result of both the underlying loop extruders + pairing elements.

      There are internal structures in the eve TAD that link the upstream region of the eve promoter to the eve PRE and sequences in nhomie.  All three of these sequences are bound by LBC.  Each of the regulatory domains in BX-C also have LBC elements and, as shown in Author response image 1, you can see stripes connecting some of these LBC elements to each other.  Since the stripes that Goel et al. (Goel et al. 2023) observed in their RCMC analysis of Ppm1g didn’t require cohesin, how these stripes are generated (active: e.g, a chromatin remodeler or passive: e.g., the LBC complex has non-specific DNA binding activity that can be readily crosslinked as the chromatin fiber slides past) isn’t clear.

      The authors say there are no TADs that have "volcano plumes" but the leftmost TAD TA appears to have one. What are the criteria for calling the plumes? I am also not clear why there is a stripe off the eve volcano. It looks like homie is making a "stripe" loop extrusion type of interaction with the next TAD up. Is this maybe cohesin sliding off the left boundary?

      The reviewer is correct, the left-most TAD TA appears to have a plume.  We mentioned TA seems to have a plume in the original text, but it was inadvertently edited out.

      Two different types of TADßàTAD interactions are observed.  In the case of eve, the TADs to either side of eve interact more frequently with each other than they do with eve.  This generates a “plume” above the eve volcano triangle.  The TADs that comprise the Abd-B regulatory domains (see Author response image 1) are surrounded by clouds of diminishing intensity.  Clouds at the first level represent interactions with both next-door neighbors; clouds at the second level represent interactions with both next-next-door neighbors; clouds at the third level represent interactions with next-next-next door neighbors.  The Abd-B TADs are close to the same size, so that interactions with neighbors are relatively simple.  However, this is not always the case.  When there are smaller TADs near larger TADs the pattern of interaction can be quite complicated.  An example is indicated by the red bar in Author response image 2

      The authors state "In the loop-extrusion model, a cohesin complex initiating loop extrusion in the eve TAD must break through the nhomie roadblock at the upstream end of the eve TAD. It must then make its way past the boundaries that separate eve from the attP site in the hebe gene, and come to a halt at the homie boundary associated with the lacZ reporter." Having multiple loops formed by cohesin would also bring in the 142kb apart reporter and homie. Does cohesin make 140 kb long loops in flies?

      A mechanism in which cohesin brings the reporter close to the eve TAD by generating many smaller loops (which would be the intervening TADs) was discussed in #1.2.

      Figure 5 title mistakes the transgene used?

      Fixed.

      In figure 6, the orientation of the embryos does not look the same for the late-stage panels. So it was difficult to tell if the eve enhancer was turning the reporter on.

      Here we were focusing mainly on the AP enhancer activation of the reporter, as this is most easily visualized.  It should be clear from the images that the appropriate reporter is activated by the AP enhancer for each of the transgene inserts.

      It is not clear to me why the GFP makes upstream interactions (from the 4C viewpoint) in GhomileLZ5 but not in LhomieGZ5? Corresponding interactions for Fig Supp 5 & 6 are not the same. That is, LacZ in the same place and with the same homie orientation does not show a similar upstream enrichment as the GFP reporter does.

      We are uncertain as to whether we understand this question/comment.  In GhomieLZ5 (now GhomieL, the lacZ reporter is on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  Since homie is pointing away from gfp, pairing interactions with homie and nhomie in the eve locus bring the eve enhancers in close proximity with the gfp reporter.  This is what is seen in Fig. 7 panel D—lower trace.  In LhomieGZ5 (now GeimohL) the lacZ reporter is again on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  However, in this case homie is inverted so that it is points away from lacZ (towards gfp).  In this orientation, pairing brings the lacZ reporter into contact with the eve enhancers.  This is what is seen in the upper trace in Fig. 7 panel D.

      The orientation of the transgene is switch in Fig. Supp 5 and 6.  For these “Z3) transgenes (now called LeimohG and LhomieG the gfp reporter is on the eve side of homie while the lacZ reporter is on the hebe enhancer side of homie.  The interactions between the reporters and eve are determined by the orientation of homie in the transgene.  When homie is pointing away from gfp (as in LeimohG), gfp is activated and that is reflected in the trace in Supp Fig. 5. When homie is pointing away from lacZ, lacZ is activated and this is reflected (though not as cleanly as in other cases) in the trace in Supp Fig. 6.  

      I did not see a data availability statement. Is the data publicly available? The authors also should consider providing the sequences of the insertions, or provide the edited genomes, in case other researchers would like to analyze the data.

      Data have been deposited.

      Reviewer #3 (Recommendations For The Authors):

      Minor Points:

      (1) There is an inconsistency in the way that some of the citations are formatted. Some citations have 'et al' italicized while others do not. It seems to be the same ones throughout the manuscript. Some examples: Chetverina et al 2017, Chetverina et al 2014, Cavalheiro et al 2021, Kyrchanova et al 2008a, Muravyova et al 2001.

      Fixed

      (2) Pita is listed twice in line 48.

      Fixed

      (3) Line 49, mod(mdg4)67.2 is written just as mod(mdg4). The isoform should be indicated.

      This refers to all Mod isoforms.

      (4) Homie and Nhomie are italicized throughout the manuscript and do not need to be.

      This is the convention used previously.  

      (5) The supplemental figure captions 1 and 2 in the main document are ordered differently than in the supplemental figures file. This caused it to look like the figures are being incorrectly cited in lines 212-214 and 231-232.

      Fixed

      (6) Is the correct figure being cited in line 388-389? The line cites Figure 6E when mentioning LlambdaG Z5; however, LlambdaG Z5 is not shown in Figure 6.

      Fixed

      (7) Section heading 'LhomieG Z5 and GhomieL Z5' could be renamed for clarity. GhomieL Z5 results are not mentioned until the next section, named 'GhomieL Z5'.

      Fixed

      (8) Can the authors provide better labeling for control hebe expression? This would help to determine what is hebe expression and what is background noise in some of the embryos in Figures 4-6.

      Author response image 5 shows expression of the lacZ reporter in GeimohL and GlambdaL.  For the GlambdaL transgene, the hebe enhancers drive lacZ expression in 1216 hr embryos.  Note that lacZ expression is restricted to a small set of quite distinctive cells along the ventral midline.  lacZ is also expressed on the ventral side of the GeimohL embryo (top panel).  However, their locations are quite different from those of the lacZ positive cells in the GlambdaL transgene embryo.  These cells are displaced from the midline, and are arranged as pairs of cells in each hemisegment, locations that correspond to eve-expressing cells in the ventral nerve cord.  The eve enhancers also drive lacZ expression elsewhere in the GeimohL embryo, including the anal plate and dorsal muscle progenitor cells (seen most clearly in the lower left panel).

      Author response image 5.

      lacZ expression in Giemohl and Glambdal embryos

      (9) The Figure 5 title is labeled with the wrong transgene.

      Fixed

      (10) Heat map scales are missing for Figures 7, supplemental 5, and supplemental 6.

      Fixed

      (11) Did the authors check if there was a significant difference in the expression of GFP and lacZ from lambda control lines to the Homie transgenic lines?

      Yes.  Statistical analysis added in Table Supplemental #1

      (12) The Figure 7 title references that these are Z3 orientations, however, it is Z5 orientations being shown.

      Fixed

      (13) The virtual 4C data should include an axis along the bottom of the graphs for better clarity. An axis is missing in all 4C figures.

      References:

      Bantignies F, Grimaud C, Lavrov S, Gabut M, Cavalli G. 2003. Inheritance of polycomb-dependent chromosomal interactions in drosophila. Genes Dev. 17(19):2406-2420.

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zinc-fingerassociated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk AN, Boyko KM, Nikolaeva AY, Burtseva AD, Popov VO, Georgiev PG. 2022. Structural insights into highly similar spatial organization of zinc-finger associated domains with a very low sequence similarity. Structure. 30(7):1004-1015.e1004.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):4758.

      Foe VE. 1989. Mitotic domains reveal early commitment of cells in drosophila embryos. Development. 107(1):1-22.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis-regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539553.e538.

      Ke W, Fujioka M, Schedl P, Jaynes JB. 2024. Chromosome structure ii: Stem-loops and circle-loops. eLife.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442.

      Kyrchanova O, Mogila V, Wolle D, Deshpande G, Parshikov A, Cleard F, Karch F, Schedl P, Georgiev P. 2016. Functional dissection of the blocking and bypass activities of the fab-8 boundary in the drosophila bithorax complex. PLoS Genet. 12(7):e1006188.

      Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P.

      2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Lim B, Heist T, Levine M, Fukaya T. 2018. Visualization of transvection in living drosophila embryos. Mol Cell. 70(2):287-296. e286.

      Link N, Kurtz P, O'Neal M, Garcia-Hughes G, Abrams JM. 2013. A p53 enhancer region regulates target genes through chromatin conformations in cis and in trans. Genes Dev. 27(22):24332438.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Rollins RA, Morcillo P, Dorsett D. 1999. Nipped-b, a drosophila homologue of chromosomal adherins, participates in activation by remote enhancers in the cut and ultrabithorax genes. Genetics. 152(2):577-593.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Shermoen AW, McCleland ML, O'Farrell PH. 2010. Developmental control of late replication and s phase length. Curr Biol. 20(23):2067-2077.

      Shidlovskii YV, Bylino OV, Shaposhnikov AV, Kachaev ZM, Lebedeva LA, Kolesnik VV, Amendola D, De Simone G, Formicola N, Schedl P et al. 2021. Subunits of the pbap chromatin remodeler are capable of mediating enhancer-driven transcription in drosophila. Int J Mol Sci. 22(6).

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Wolle D, Cleard F, Aoki T, Deshpande G, Schedl P, Karch F. 2015. Functional requirements for fab-7 boundary activity in the bithorax complex. Mol Cell Biol. 35(21):3739-3752.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      fMRI was used to address an important aspect of human cognition - the capacity for structured representations and symbolic processing - in a cross-species comparison with non-human primates (macaques); the experimental design probed implicit symbolic processing through reversal of learned stimulus pairs. The authors present solid evidence in humans that helps elucidate the role of brain networks in symbolic processing, however the evidence from macaques was incomplete (e.g., sample size constraints, potential and hard-to-quantify differences in attention allocation, motivation, and lived experience between species).

      Thank you very much for your assessment. We would like to address the potential issues that you raise point-by-point below.

      We agree that for macaque monkey physiology, sample size is always a constraint, due to both financial and ethical reasons. We addressed this concern by combining the results from two different labs, which allowed us to test 4 animals in total, which is twice as much as what is common practice in the field of primate physiology. (We discuss this now on lines 473-478.)

      Interspecies differences in motivation, attention allocation, task strategies etc. could also be limiting factors. Note that we did address the potential lack of attention allocation directly in Experiment 2 using implicit reward association, which was successful as evidenced by the activation of attentional control areas in the prefrontal cortex. We cannot guarantee that the strategies that the two species deploy are identical, but we tentatively suggest that this might be a less important factor in the present study than in other interspecies comparisons that use explicit behavioral reports. In the current study, we directly measured surprise responses in the brain in the absence of any explicit instructions in either species, which allowed us to  measure the spontaneous reversal of learned associations, which is a very basic element of symbolic representation. Our reasoning is that such spontaneous responses should be less dependent on attention allocation and task strategies. (We discuss this now in more detail on lines 478-485.)

      Finally, lived experience could be a major factor. Indeed, obvious differences include a lifetime of open-field experiences and education in our human adult subjects, which was not available to the monkey subjects, and includes a strong bias towards explicit learning of symbolic systems (e.g. words, letters, digits, etc). However, we have previously shown that 5-month-old human infants spontaneously generalize learning to the reversed pairs after a short learning in the lab using EEG (Kabdebon et al, PNAS, 2019). This indicates that also with very limited experience, humans spontaneously reverse learned associations. (We discuss this now in more detail on lines 478-485.) It could be very interesting to investigate whether spontaneous reversal could be present in infant macaque monkeys, as there might be a critical period for this effect. Although neurophysiology in awake infant monkeys is highly challenging, it would be very relevant for future work. (We discuss this in more detail on lines 493-498.)

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Kerkoerle and colleagues present a very interesting comparative fMRI study in humans and monkeys, assessing neural responses to surprise reactions at the reversal of a previously learned association. The implicit nature of this task, assessing how this information is represented without requiring explicit decision-making, is an elegant design. The paper reports that both humans and monkeys show neural responses across a range of areas when presented with incongruous stimulus pairs. Monkeys also show a surprise response when the stimuli are presented in a reversed direction. However, humans show no such surprise response based on this reversal, suggesting that they encode the relationship reversibly and bidirectionally, unlike the monkeys. This has been suggested as a hallmark of symbolic representation, that might be absent in nonhuman animals. 

      I find this experiment and the results quite compelling, and the data do support the hypothesis that humans are somewhat unique in their tendency to form reversible, symbolic associations. I think that an important strength of the results is that the critical finding is the presence of an interaction between congruity and canonicity in macaques, which does not appear in humans. These results go a long way to allay concerns I have about the comparison of many human participants to a very small number of macaques. 

      We thank the reviewer for the positive assessment. We also very much appreciate the point about the interaction effect in macaque monkeys – indeed, we do not report just a negative finding. 

      I understand the impossibility of testing 30+ macaques in an fMRI experiment. However, I think it is important to note that differences necessarily arise in the analysis of such datasets. The authors report that they use '...identical training, stimuli, and whole-brain fMRI measures'. However, the monkeys (in experiment 1) actually required 10 times more training. 

      We agree that this description was imprecise. We have changed it to “identical training stimuli” (line 151), indeed the movies used for training were strictly identical. Furthermore, please note that we do report the fMRI results after the same training duration. In experiment 1, after 3 days of training, the monkeys did not show any significant results, even in the canonical direction. However, in experiment 2, with increased attention and motivation, a significant effect was observed on the first day of scanning after training, as was found in human subjects (see Figure 4 and Table 3).

      More importantly, while the fMRI measures are the same, group analysis over 30+ individuals is inherently different from comparing only 2 macaques (including smoothing and averaging away individual differences that might be more present in the monkeys, due to the much smaller sample size). 

      Thank you for understanding that a limited sampling size is intrinsic to macaque monkey physiology. We also agree that data analysis in humans and monkeys is necessarily different. As suggested by the reviewer, we added an analysis to address this, see the corresponding reply to the ‘Recommendations for the authors’ section below.

      Despite this, the results do appear to show that macaques show the predicted interaction effect (even despite the sample size), while humans do not. I think this is quite convincing, although had the results turned out differently (for example an effect in humans that was absent in macaques), I think this difference in sample size would be considerably more concerning. 

      Thank you for noting this. Indeed, the interaction effect is crucial, and the task design was explicitly made to test this precise prediction, described in our manuscript as the “reversibility hypothesis”. The congruity effect in the learned direction served as a control for learning, while the corresponding congruity effect in the reversed direction tested for spontaneous reversal. The reversibility hypothesis stipulates that in humans there should not be a difference between the learned and the reversed direction, while there should be for monkeys. We already wrote about that in the result section of the original manuscript and now also describe this more explicitly in the introduction and beginning of the result section.

      I would also note that while I agree with the authors' conclusions, it is notable to me that the congruity effect observed in humans (red vs blue lines in Fig. 2B) appears to be far more pronounced than any effect observed in the macaques (Fig. 3C-3). Again, this does not challenge the core finding of this paper but does suggest methodological or possibly motivational/attentional differences between the humans and the monkeys (or, for example, that the monkeys had learned the associations less strongly and clearly than the humans). 

      As also explained in response to the eLife assessment above, we expanded the “limitations” section of the discussion, with a deeper description of the possible methodological differences between the two species (see lines 478-485).

      With the same worry in mind, we did increase the attention and motivation of monkeys in experiment 2, and indeed obtained a greater activation to the canonical pairs and their violation, -notably in the prefrontal cortex – but crucially still without reversibility.

      In the end, we believe that the striking interspecies difference in size and extent of the violation effect, even for purely canonical stimuli, is an important part of our findings and points to a more efficient species-specific learning system, that our experiment tentatively relates to a symbolic competence.

      This is a strong paper with elegant methods and makes a worthwhile contribution to our understanding of the neural systems supporting symbolic representations in humans, as opposed to other animals. 

      We again thank the reviewer for the positive review.

      Reviewer #2 (Public Review): 

      In their article titled "Brain mechanisms of reversible symbolic reference: a potential singularity of the human brain", van Kerkoerle et al address the timely question of whether non-human primates (rhesus macaques) possess the ability for reverse symbolic inference as observed in humans. Through an fMRI experiment in both humans and monkeys, they analyzed the bold signal in both species while observing audio-visual and visual-visual stimuli pairs that had been previously learned in a particular direction. Remarkably, the findings pertaining to humans revealed that a broad brain network exhibited increased activity in response to surprises occurring in both the learned and reverse directions. Conversely, in monkeys, the study uncovered that the brain activity within sensory areas only responded to the learned direction but failed to exhibit any discernible response to the reverse direction. These compelling results indicate that the capacity for reversible symbolic inference may be unique to humans. 

      In general, the manuscript is skillfully crafted and highly accessible to readers. The experimental design exhibits originality, and the analyses are tailored to effectively address the central question at hand.

      Although the first experiment raised a number of methodological inquiries, the subsequent second experiment thoroughly addresses these concerns and effectively replicates the initial findings, thereby significantly strengthening the overall study. Overall, this article is already of high quality and brings new insight into human cognition. 

      We sincerely thank the reviewer for the positive comments. 

      I identified three weaknesses in the manuscript: 

      - One major issue in the study is the absence of significant results in monkeys. Indeed, authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). 

      First, we disagree with the statement about “absence of significant results in monkeys”. We do report a significant interaction which, as noted by the referee, is a crucial positive finding.

      Second, we performed the suggested analysis for experiment 2, using the bilateral ROIs of the putative monkey MDN from previous literature (Mitchell, et al. 2016), which are based on the human study by Fedorenko et al. (PNAS, 2013). 

      Author response table 1.

      Congruity effect for monkeys in Experiment 2 within the ROIs of the MDN (n=3). Significance was assessed with one-sided one-sample t-tests.

      As can be seen, none of the regions within the monkey MDN showed an FDR-corrected significant difference or interaction. Although the absence of a canonical congruity effect makes it difficult to draw strong conclusions, it did approach significance at an uncorrected level in the lateral frontal posterior region, similar to  the large prefrontal effect we report in Figures 4 and 5. Furthermore, for the reversed congruity effect there was never even a trend at the uncorrected level, and the crucial interaction of canonicity and congruity again approached significance in the lateral prefrontal cortex.  

      We also performed an ANOVA  in the human participants of the VV experiment on the average betas across the 7 different fronto-parietal ROIs as used by Mitchell et al to define their equivalent to the monkey brain (Fig 1a, right in Mitchell et al. 2016) with congruity, canonicity and hemisphere (except for the anterior cingulate which is a bilateral ROI) as within-subject factors. We confirmed the results presented in the manuscript (Figure 4C) with notably no significant interaction between congruity and canonicity in any of these ROIs (all F-values (except insula) <1). A significant main effect of congruity was observed in the posterior middle frontal gyrus (MFG) and inferior precentral sulcus at the FDR corrected level. Analyses restricted to the canonical trials found a congruity effect in these two regions plus the anterior insula and anterior cingulate/presupplementary motor area, whereas no ROIs were significant at a FDR corrected level for reverse trials. There was a trend in the middle MFG and inferior precentral region for reversed trials. Crucially, there was not even a trend for the interaction between congruity and canonicity at the uncorrected level. The difference in the effect size between the canonical and reversed direction can therefore be explained by the larger statistical power due to the larger number of congruent trials (70%, versus 10% for the other trial conditions), not by a significant effect by the canonical and the reversed direction. 

      Author response table 2.

      Congruity effect for humans in Experiment 2 within the ROIs of the MDN (n=23).

      These results support our contention that the type of learning of the stimulus pairs was very different in the two species. We thank the reviewer for suggesting these relevant additional analyses.

      - While the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. 

      We agree that this is an interesting question, although it is also very open-ended. For instance, we could report each subjects’ individual whole-brain results, but this would take too much space (and the interested reader will be able to do so from the data that we make available as part of this publication). As a step in this direction, we provide below a figure showing the individual congruity effects, separately for each experiment and for each ROI of table 5, and for each of the 52 participants for whom an fMRI localizer was available:

      Author response image 1.

      Difference in mean betas between congruent and incongruent conditions in a-priori linguistic and mathematical ROIs (see definition and analyses in Table 5) in both experiments (experiment 1 = AV, left panel; experiment 2= VV, right panel). Dots correspond to participants (red: canonical trials, green reversed trials).The boxplot notch is located at the median and the lower and upper box hinges at the 25th and 75th centiles. Whiskers extend to 1.5 inter-quartile ranges on either side of the hinges. ROIs are ranked by the median of the Incongruent-Congruent difference across canonical and reversed order, within a given experiment. For purposes of comparison between the two experiments, we have underlined with colors the top-five common ROIs between the two experiments. N.s.: non-significant congruity effect (p>0.05)

      Several regions show a rather consistent difference across subjects (see, for instance, the posterior STS in experiment 1, left panel). Overall, only 3 of the 52 participants did not show any beta superior to 2 in canonical or reversed in any ROIs. The consistency is quite striking, given the limited number of test trials (in total only 16 incongruent trials per direction per participant), and the fact that these ROIs were selected for their responses to spoken or written  sentences, as part of a subsidiary task quite different from the main task.

      - Some details are missing in the methods.  

      Thank you for these comments, we reply to them point-by-point below.

      Reviewer #3 (Public Review): 

      This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their finding to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human. 

      This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing. 

      We thank the reviewer for the careful summary of the manuscript, and the positive comments.

      Methods - Design issues: 

      The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless). 

      Thank you for this comment. 

      Indeed, for experiment 1, the amount of training and testing was not equal for the humans and monkeys, as also mentioned by reviewer 2. We now describe in more detail how many training and imaging days we used for each experiment and each species, as well as the number of blocks per day and the number of trials per block (see lines 572-577). We also added the information on the amount of training receives to all of the legends of the Tables.

      We are sorry for giving the impression that we trained until the monkeys learned this. This was not the case. Based on previous literature, we actually anticipated that the short training would not be sufficient, and therefore planned additional training in advance. Specifically, Meyer & Olson (2011) had observed pair learning in the inferior temporal cortex of macaque monkeys after 816 exposures per pair. This is similar to the additional training we gave, about 80 blocks with 12 trials per pair per block. This is  now explained in more detail (lines 577-580).

      Furthermore, we strongly disagree with the pejorative term p-hacking. The aim of the experiment was not to show a congruency effect in the canonical direction in monkeys, but to track and compare their behavior in the same paradigm as that of humans for the reverse direction. It would have been unwise to stop after human-identical training and only show that humans learn better, which is a given. Instead, we looked at brain activations at both times, at the end of human-identical training and when the monkeys had learned the pairs in the canonical direction. 

      Finally, in experiment 2, monkeys were tested after the same 3 days of training as humans. We wrote: “Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations” (lines 252-253).

      (2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front. 

      Thank you for raising this important point. We already had a section on “limitations” in the manuscript, which we now extended (line 478-485). Indeed, this study is following a previous study in 5-month-old infants using EEG, in which we already showed that after learning associations between labels and categories, infants spontaneously generalize learning to the reversed pairs after a short learning period in the lab (Kabdebon et al, PNAS, 2019). We also cited preliminary results of the same paradigm as used in the current study but using EEG in 4-month-old infants (Ekramnia and Dehaene-Lambertz, 2019), where we replicated the results obtained by Kabdebon et al. 2019 showing that preverbal infants spontaneously generalize learning to the reversed pairs. 

      Functional MRI in awake infants remains a challenge at this age (but see our own work, DehaeneLambertz et al, Science, 2002), especially because the experimental design means only a few trials in the conditions of interest (10%) and thus a long experimental duration that exceed infants’ quietness and attentional capacities in the noisy MRI environment. (We discuss this on lines 493-496.)

      (3) Humans have big advantages in processing and discriminating spoken stimuli and associating them with visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still, it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences. 

      As the reviewer wrote, we deliberately performed Experiment 2 with visual shapes to control for various factors that might have explained the monkeys' failure in Experiment 1. 

      (4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do? 

      The referee is correct: our use of the word “reciprocally” was improper (although see Amalric et Dehaene, 2016 for significant differences in both directions when non-mathematical sentences concern specific knowledge). We changed the formulation to clarify this as follows: “In these ROIs, we recovered the subject-specific coordinates of each participant’s 10% best voxels in the following comparisons: sentences vs rest for the 6 language Rois ; reading vs listening for the VWFA ; and numerical vs non-numerical sentences for the 8 mathematical ROIs.” (lines 678-680).

      Methods - Analysis issues: 

      (5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative. 

      It is not clear to us which result the reviewer is referring to. In Tables 1-4, we report the values that we found significant in the whole brain analysis, we do not report additional statistical tests for this data. For Table 5, the subject-specific voxels were identified through a separate localizer experiment, which was designed to pinpoint the precise activation areas for each subject in the domains of oral and written language-processing and math. Subsequently, we compared the activation at these voxel locations across different conditions of the main experiment. Thus, the two datasets were distinct, and there was no double dipping. In both interpretations of the comment, we therefore disagree with the reviewer.

      Framing: 

      (6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions. 

      First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights into how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy. 

      We agree with the referee that the term “mechanism” is ambiguous and, for systems neuroscientists, may suggest more than we are able to do here with functional MRI. We changed the title to “Brain areas for reversible symbolic reference, a potential singularity of the human brain”. This title better describes our specific contribution: mapping out the areas involved in reversibility in humans, and showing that they do not seem to respond similarly in macaque monkeys.

      Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly, if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task. 

      As explained in the introduction, the reversibility test addressed a very minimal core property of symbolic reference. There cannot be a symbol if its attachment doesn’t operate in both directions. Thus, this property is necessary – but we agree that it is not sufficient. Indeed, more tests are needed to establish whether and how the learned symbols are used in further downstream compositional tasks (as discussed in our recent TICS papers, Dehaene et al. 2022). We added a sentence in the introduction to acknowledge this fact:

      “Such reversibility is a core and necessary property of symbols, although we readily acknowledge that it is not sufficient, since genuine symbols present additional referential and compositional properties that will not be tested in the present work.” (lines 89-92).

      Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above). 

      We have published an extensive review including a description of our use of the term “singularity” (Dehaene et al., TICS 2022). Here is a short except: “Humans are different even in domains such as drawing and geometry that do not involve communicative language. We refer to this observation using the term “human cognitive singularity”, the word singularity being used here in its standard meaning (the condition of being singular) as well as its mathematical sense (a point of sudden change). Hominization was certainly a singularity in biological evolution, so much so that it opened up a new geological age (the Anthropocene). Even if evolution works by small continuous change (and sometimes it doesn’t [4]), it led to a drastic cognitive change in humans.”

      We find the referee’s use of the pejorative term ”insinuate” quite inappropriate. From the title on, we are quite nuanced and refer only to a “potential singularity”. Furthermore, as noted above, we explicitly mention in the discussion the limitations of our study, and in particular the fact that only a single non-human species was tested (see lines 486-493). We are working hard to get chimpanzee data, but this is remarkably difficult for us, and we hope that our paper will incite other groups to collect more evidence on this point.

      (7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and DehaeneLambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").

      We fail to understand the putative circularity that the referee sees in our introduction. We urge him/her to re-read it, and hope that, with the changes that we introduced, it does boil down to his/her summary, i.e. “What is uniquely human? One possibility is spontaneous reversal of temporal associations."

      Reviewer #1 (Recommendations For The Authors): 

      In general, the manuscript was very clear, easy to read, and compelling. I would recommend the authors carefully check the text for consistency and minor typos. For example: 

      The sample size for the monkeys kept changing throughout the paper. E.g., Experiment 1: n = 2 (line 149); n = 3 (line 205).  

      Thank you for catching this error, we corrected it. The number of animals was indeed 2  for experiment 1, and 3 for experiment 2. (Animals JD and YS participated in experiment 1 and JD, JC and DN in experiment 2. So only JD participated in both experiments.)

      Similarly, the number of stimulus pairs is reported inconsistently (4 on line 149, 5 pairs later in the paper). 

      We’re sorry that this was unclear. We used 5 sets of 4 audio-visual pairs each. We now clarify this, on line 157 and on lines 514-516.

      At least one case of p>0.0001, rather than p < 0.0001 (I assume). 

      Thank you once again, we now corrected this.

      Reviewer #2 (Recommendations For The Authors): 

      One major issue in the study is the absence of significant results in monkeys. Indeed, the authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). In other words: what are the statistics for the MDN regarding congruity, canonicity, and interaction in both species? Since the authors have already performed this type of analysis for language and Math ROIs (table 5), it should be relatively easy for them to extend it to the MDN. Demonstrating that results in monkeys are far from significant could further convince the reader. 

      Furthermore, while the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. Specifically, it would be valuable to describe the proportion of human participants in which the effects of congruency, canonicity, and their interaction are significant. Additionally, stating the variability of the F-values for each effect would provide reassurance to the reader regarding the distinctiveness of humans in comparison to monkeys. Low variability in the results would serve to mitigate concerns that the observed disparity is merely a consequence of testing a unique subset of monkeys, which may differ from the general population. Indeed, this would be a greater support to the notion that the dissimilarity stems from a genuine distinction between the two species. 

      We responded to both of these points above.

      In terms of methods, details are missing: 

      - How many trials of each condition are there exactly? (10% of 44 trials is 4.4) : 

      We wrote: “In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials).”(See lines 596-600.)

      Thus, each block comprised 4 initial trials, 28 canonical congruent trials, 4 canonical incongruent, 4 reverse congruent and 4 reverse incongruent trials, i.e. 4+28+3x4=40 trials.

      - How long is one trial? 

      As written in the method section: “In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds” i.e. 700+100+700=1500, plus 3 to 4.75 seconds of blank between the trials (see lines 531-533).

      - How are the stimulus presentations jittered? 

      See : “The pairs were separated by a variable inter-trial-interval randomly chosen among eight different durations between 3 and 4.75 seconds (step=250 ms). The series of 8 intervals was randomized again each time it was completed.”(lines 533-535).

      - What is the statistical power achieved for humans? And for monkeys? 

      We know of no standard way to define power for fMRI experiments. Power will depend on so many parameters, including the fMRI signal-to-noise ratio, the attention of the subject, the areas being considered, the type of analysis (whole-brain versus ROIs), etc.

      - Videos are mentioned in the methods, is it the image and sound? It is not clear. 

      We’re sorry that it was unclear. Video’s were only used for the training of the human subjects. We now corrected this in the method section (lines 552-554).

      Reviewer #3 (Recommendations For The Authors): 

      The main recommendations are to adjust the framing (making it less bold and more connected to the empirical evidence) and to ensure independence in the statistical analyses of the fMRI data. 

      See our replies to the reviewer’s comments on “Framing” above. In particular, we changed the title of the paper from “Brain mechanisms of reversible symbolic reference” to “Brain areas for reversible symbolic reference”.

      References cited in this response

      Dehaene, S., Al Roumi, F., Lakretz, Y., Planton, S., & Sablé-Meyer, M. (2022). Symbols and mental programs : A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9), 751‑766. https://doi.org/10.1016/j.tics.2022.06.010.

      Dehaene-Lambertz, Ghislaine, Stanislas Dehaene, et Lucie Hertz-Pannier. Functional Neuroimaging of Speech Perception in Infants. Science 298, no 5600 (2002): 2013-15. https://doi.org/10.1126/science.1077066.

      Ekramnia M, Dehaene-Lambertz G. 2019. Investigating bidirectionality of associations in young infants as an approach to the symbolic system. Presented at the CogSci. p. 3449.

      Fedorenko E, Duncan J, Kanwisher N (2013) Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A 110:16616-16621.

      Kabdebon, Claire, et Ghislaine Dehaene-Lambertz. « Symbolic Labeling in 5-Month-Old Human Infants ». Proceedings of the National Academy of Sciences 116, no 12 (2019): 5805-10. https://doi.org/10.1073/pnas.1809144116.

      Mitchell, D. J., Bell, A. H., Buckley, M. J., Mitchell, A. S., Sallet, J., & Duncan, J. (2016). A Putative Multiple-Demand System in the Macaque Brain. Journal of Neuroscience, 36(33), 8574‑8585. https://doi.org/10.1523/JNEUROSCI.0810-16.2016

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely appreciate the editors for overseeing an efficient review process and for upholding the high standards of the journal. We have made extensive revisions to the manuscript after carefully reviewing the reviewers’ comments. We have addressed all the comments in our response and have incorporated the changes suggested by the reviewers to the best of our abilities. Notably, we have made the following major changes to the manuscript:

      (1) We have increased the patient cohort size from 10 to 23 for evaluating the levels of YEATS2 and H3K27cr.

      (2) To further strengthen the clinical relevance of our study, we have checked the expression of major genes involved in the YEATS2-mediated histone crotonylation axis (YEATS2, GCDH, ECHS1, Twist1 along with H3K27cr levels) in head and neck cancer tissues using immunohistochemistry.

      (3) We have performed extensive experiments to look into the role of p300 in assisting YEATS2 in regulating promoter histone crotonylation.

      The changes made to the manuscript figures have been highlighted in our response. We have also updated the Results section in accordance with the updated figures. Tables 1-4 and Supplementary files 1-3 have been moved to one single Excel workbook named ‘Supplementary Tables 1-8’. Additional revisions have been made to improve the overall quality of the manuscript and enhance data visualization. These additional changes are highlighted in the tracked changes version of the manuscript.

      Our response to the Public Reviews and ‘Recommendations to the Authors’ can be found below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates a mechanism between the histone reader protein YEATS2 and the metabolic enzyme GCDH, particularly in regulating epithelial-to-mesenchymal transition (EMT) in head and neck cancer (HNC).

      Strengths:

      Great detailing of the mechanistic aspect of the above axis is the primary strength of the manuscript.

      Weaknesses:

      Several critical points require clarification, including the rationale behind EMT marker selection, the inclusion of metastasis data, the role of key metabolic enzymes like ECHS1, and the molecular mechanisms governing p300 and YEATS2 interactions.

      We would like to sincerely thank the reviewer for the detailed, in-depth, and positive response. We have implemented constructive revisions to the manuscript to address the reviewer’s concerns effectively.

      Major Comments:

      (1) The title, "Interplay of YEATS2 and GCDH mediates histone crotonylation and drives EMT in head and neck cancer," appears somewhat misleading, as it implies that YEATS2 directly drives histone crotonylation. However, YEATS2 functions as a reader of histone crotonylation rather than a writer or mediator of this modification. It cannot itself mediate the addition of crotonyl groups onto histones. Instead, the enzyme GCDH is the one responsible for generating crotonyl-CoA, which enables histone crotonylation. Therefore, while YEATS2 plays a role in recognizing crotonylation marks and may regulate gene expression through this mechanism, it does not directly catalyse or promote the crotonylation process.

      We thank the reviewer for their insightful comment regarding the precision of our title. We agree that the initial wording 'mediates' could imply a direct enzymatic role for YEATS2 in histone crotonylation, which is indeed not the case. As the reviewer correctly points out, YEATS2 functions as a 'reader' of histone crotonylation marks.

      However, our research demonstrates that YEATS2 plays a crucial indirect regulatory role in the establishment of these crotonylation marks. Specifically, our data indicates that YEATS2 facilitates the recruitment of the histone crotonyltransferase p300 to specific gene promoters, such as that of SPARC. This recruitment mechanism directly impacts the localized deposition of crotonyl marks on nearby histone residues. Therefore, while YEATS2 does not directly catalyze the addition of crotonyl groups, its presence and interaction with p300 are essential for the regulation and establishment of histone crotonylation at these critical sites.

      To accurately reflect this nuanced, yet significant, regulatory mechanism, we have revised the title. We are replacing 'mediates' with 'regulates' to precisely convey that YEATS2 influences the histone crotonylation process, albeit indirectly, through its role in recruiting the enzymatic machinery. The updated title will now read: 'Interplay of YEATS2 and GCDH regulates histone crotonylation and drives EMT in head and neck cancer.' We believe this change maintains the core message of our findings while enhancing the scientific accuracy of the title.

      (2) The study suggests a link between YEATS2 and metastasis due to its role in EMT, but the lack of clinical or pre-clinical evidence of metastasis is concerning. Only primary tumor (PT) data is shown, but if the hypothesis is that YEATS2 promotes metastasis via EMT, then evidence from metastatic samples or in vivo models should be included to solidify this claim.

      We thank the reviewer for their valuable suggestion regarding the need for clinical or pre-clinical evidence of metastasis. We fully agree that direct evidence linking YEATS2 to metastasis would significantly strengthen our claims, especially given its demonstrated role in EMT.

      Our primary objective in this study was to meticulously dissect the molecular mechanisms by which YEATS2 regulates histone crotonylation and drives EMT in head and neck cancer. We have provided comprehensive upstream and downstream molecular insights into this process, culminating in a clear demonstration of YEATS2's functional importance in promoting EMT through multiple in vitro phenotypic assays (e.g., Matrigel invasion, wound healing, 3D invasion assays). As the reviewer notes, EMT is a widely recognized prerequisite for cancer metastasis[1]. Therefore, establishing YEATS2 as a driver of EMT directly implicates its potential role in metastatic progression.

      To further address the reviewer's concern and bridge the gap between EMT and metastasis, we have performed additional analyses that will be incorporated into the revised manuscript:

      Clinical Correlation with Tumor Grade: We analyzed publicly available head and neck cancer patient datasets. Our analysis revealed a significant positive correlation between YEATS2 expression and increasing tumor grade. Specifically, we observed significantly higher YEATS2 expression in Grade 2-4 tumors compared to Grade 1 tumors. Given that higher tumor grades are frequently associated with increased metastatic potential and poorer prognosis in HNC[2], this finding provides compelling clinical correlative evidence linking elevated YEATS2 expression to more aggressive disease.

      Gene Set Enrichment Analysis (GSEA) for Metastasis Pathways: To further explore the biological processes associated with YEATS2 in a clinical context, we performed GSEA on TCGA HNC patient samples stratified by high versus low YEATS2 expression. This analysis robustly demonstrated a positive enrichment of metastasis-related gene sets in the high YEATS2 expression group, compared to the low YEATS2 group. This strengthens the mechanistic link by showing that pathways associated with metastasis are co-ordinately upregulated when YEATS2 is highly expressed.

      These new clinical data provide strong correlative evidence supporting a direct association of YEATS2 with metastasis, building upon our detailed mechanistic dissection of its role in EMT.

      (3) There seems to be some discrepancy in the invasion data with BICR10 control cells (Figure 2C). BICR10 control cells with mock plasmids, specifically shControl and pEGFP-C3 show an unclear distinction between invasion capacities. Normally, we would expect the control cells to invade somewhat similarly, in terms of area covered, within the same time interval (24 hours here). But we clearly see more control cells invading when the invasion is done with KD and fewer control cells invading when the invasion is done with OE. Are these just plasmid-specific significant effects on normal cell invasion? This needs to be addressed.

      We thank the reviewer for their careful examination of Figure 2C and their insightful observation regarding the appearance of the control cells in relation to the knockdown (Figure 2B) and overexpression (Figure 2C) experiments. We understand how, at first glance, the control invasion levels across these panels might seem disparate.

      We wish to clarify that Figure 2B (YEATS2 knockdown) and Figure 2C (YEATS2 overexpression) represent two entirely independent experiments, conducted with distinct experimental conditions and methodologies, as detailed in our Methods section.

      Specifically:

      Figure 2B (Knockdown): Utilizes lentivirus-mediated transduction for stable shRNA delivery (shControl as control).

      Figure 2C (Overexpression): Utilizes transfection with plasmid DNA (pEGFP-C3 as control) via a standard transfection reagent.

      These fundamental differences in genetic manipulation methods (transduction vs. transfection), along with potential batch-to-batch variations in reagents or cell passage number at the time of each independent experiment, can indeed lead to variations in absolute basal invasion rates of control cells[3].

      Therefore, the invasion capacity of BICR10 control cells in Figure 2B (shControl) should only be compared to the YEATS2 knockdown conditions within that same panel. Similarly, the invasion capacity of control cells in Figure 2C (pEGFP-C3) should only be compared to the YEATS2 overexpression conditions within that specific panel. The crucial finding in each panel lies in the relative change in invasion caused by YEATS2 manipulation (knockdown or overexpression) compared to its respective, concurrently run control.

      We have ensured that all statistical analyses (as indicated in the figure legends and methods) were performed by comparing the experimental groups directly to their matched internal controls within each independent experiment. The significant increase in invasion upon YEATS2 overexpression and the significant decrease upon YEATS2 knockdown, relative to their respective controls, are robust and reproducible findings.

      (4) In Figure 3G, the Western blot shows an unclear band for YEATS2 in shSP1 cells with YEATS2 overexpression condition. The authors need to clearly identify which band corresponds to YEATS2 in this case.

      We thank the reviewer for pointing out the ambiguity in the YEATS2 Western blot for the shSP1 + pEGFP-C3-YEATS2 condition in Figure 3G. We apologize for this lack of clarity. The two bands seen in the shSP1+pEGFP-C3-YEATS2 condition correspond to the endogenous YEATS2 band (lower band) and YEATS2-GFP band (upper band, corresponding to overexpressed YEATS2-GFP fusion protein, which has a higher molecular weight). To avoid confusion, the endogenous band is now highlighted (marked by *) in the lane representing the shSP1+pEGFP-C3-YEATS2 condition. We have also updated the figure legend accordingly.

      (5) In ChIP assays with SP1, YEATS2 and p300 which promoter regions were selected for the respective genes? Please provide data for all the different promoter regions that must have been analysed, highlighting the region where enrichment/depletion was observed. Including data from negative control regions would improve the validity of the results.

      Throughout our study, we have performed ChIP-qPCR assays to check the binding of SP1 on YEATS2 and GCDH promoter, and to check YEATS2 and p300 binding on SPARC promoter. Using transcription factor binding prediction tools and luciferase assays, we selected multiple sites on the YEATS2 and GCDH promoter to check for SP1 binding. The results corresponding to the site that showed significant enrichment were provided in the manuscript. The region of SPARC promoter in YEATS2 and p300 ChIP assay was selected on the basis of YEATS2 enrichment found in the YEATS2 ChIP-seq data. The ChIP-qPCR data for all the promoter regions investigated (including negative controls) can be found below (Author response image 1.).

      Authors’ response image 1.

      (A) SP1 ChIP-qPCR results indicating SP1 occupancy on different regions of YEATS2 promoter. YEATS2 promoter region showing SP1 binding sites (indicated by red boxes) is shown above. SP1 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 3D. (B) SP1 ChIPqPCR results indicating SP1 occupancy on different regions of GCDH promoter. GCDH promoter region showing SP1 binding sites (indicated by red boxes) is shown above. SP1 showed significant enrichment at F2R2 region. The results corresponding to F2R2 region were included in Figure 7E. (C) YEATS2 ChIP-qPCR results in shControl vs. shYEATS2 BICR10 cells indicating YEATS2 occupancy on different regions of SPARC promoter. SPARC promoter region showing YEATS2 ChIP-seq and H3K27cr ChIP-seq signals is shown above. YEATS2 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 5C. (D) p300 ChIP-qPCR results in shControl vs. shYEATS2 BICR10 cells indicating p300 occupancy on different regions of SPARC promoter. p300 showed significant enrichment at F1R1 region. The results corresponding to F1R1 region were included in Figure 5F.

      (6) The authors establish a link between H3K27Cr marks and GCDH expression, and this is an already well-known pathway. A critical missing piece is the level of ECSH1 in patient samples. This will clearly delineate if the balance shifted towards crotonylation.

      We greatly appreciate the reviewer's insightful comment regarding the importance of assessing ECSH1 levels in patient samples to clearly delineate the metabolic balance shifting towards crotonylation. We fully agree that this is a critical piece of evidence.

      To directly address this point and substantiate our claim regarding the altered metabolic balance in HNC, we had previously analyzed the expression of both GCDH and ECHS1 in TCGA HNC RNA-seq data (as presented in Figure 4—figure supplement 1A and B). This analysis revealed a consistent increase in GCDH expression and a concomitant decrease in ECHS1 expression in tumor samples compared to normal tissues. Based on these findings, we hypothesized that this altered expression profile would indeed lead to an accumulation of crotonyl-CoA and, consequently, an overall increase in histone crotonylation in HNC.

      To further validate and extend these findings at the protein level, we have now performed immunohistochemistry (IHC) analysis for both ECHS1 and GCDH in a cohort of HNC normal vs. tumor tissues. Our IHC results strikingly corroborate the RNA-seq data: GCDH consistently showed increased protein expression in tumor samples, whereas ECHS1 exhibited significantly reduced protein expression in tumors compared to their adjacent normal counterpart tissues (Figure 4E and Authors’ response figure 5).

      These new data, combined with existing TCGA HNC RNA-seq analysis strongly supports our proposed mechanism where altered GCDH and ECHS1 expression contributes to increased histone crotonylation in head and neck cancer.

      (7) The p300 ChIP data on the SPARC promoter is confusing. The authors report reduced p300 occupancy in YEATS2-silenced cells, on SPARC promoter. However, this is paradoxical, as p300 is a writer, a histone acetyltransferase (HAT). The absence of a reader (YEATS2) shouldn't affect the writer (p300) unless a complex relationship between p300 and YEATS2 is present. The role of p300 should be further clarified in this case. Additionally, transcriptional regulation of SPARC expression in YEATS2 silenced cells could be analysed via downstream events, like Pol-II recruitment. Assays such as Pol-II ChIP-qPCR could help explain this.

      We greatly appreciate the reviewer's insightful observation regarding the apparently paradoxical reduction of p300 occupancy on the SPARC promoter upon YEATS2 silencing (Figure 5F), and their call for further clarification of p300's role and the potential complex relationship with YEATS2. We agree that this point required further mechanistic investigation.

      As we have shown through RNA-seq and ChIP-seq analyses, YEATS2 broadly influences histone crotonylation levels at gene promoters, thereby impacting gene expression. While p300 is indeed a known histone acetyltransferase (HAT) with promiscuous acyltransferase activity, including crotonyltransferase activity[4], the precise mechanism by which its occupancy is affected by a 'reader' protein like YEATS2 was unclear. Our initial data suggested a dependency of p300 recruitment on YEATS2.

      To directly address the reviewer's concern and thoroughly delineate the molecular mechanism of cooperativity between YEATS2 and p300 in regulating histone crotonylation, we have now performed a series of targeted experiments, which have been incorporated into the revised manuscript:

      (a) Validation of p300's role in SPARC expression: We performed p300 knockdown in BICR10 cells, followed by immunoblotting to assess SPARC protein levels. As expected, a significant decrease in SPARC protein levels was observed upon p300 knockdown (Figure 5G). This confirms p300's direct involvement in SPARC gene expression.

      (b) Direct interaction between YEATS2 and p300: To investigate a potential physical association, we performed co-immunoprecipitation assays to check for an interaction between endogenous YEATS2 and p300. Our results clearly demonstrate the presence of YEATS2 in the p300-immunoprecipitate sample, indicating that YEATS2 and p300 physically interact and likely function together as a complex to drive the expression of target genes like SPARC (Figure 5H). This direct interaction provides the mechanistic basis for how YEATS2 influences p300 occupancy.

      (c) Impact on transcriptional activity (Pol II recruitment): As suggested, we performed RNA Polymerase II (Pol II) ChIP-qPCR on the SPARC promoter in YEATS2 knockdown cells. We observed a significant decrease in Pol II occupancy on the SPARC promoter after YEATS2 knockdown in BICR10 cells (Figure 6C). This confirms that YEATS2 silencing leads to reduced transcriptional initiation/elongation at this promoter.

      (d) p300's direct role in H3K27cr on SPARC promoter: To confirm p300's specific role in crotonylation at this locus, we performed H3K27cr ChIP-qPCR after p300 knockdown. As anticipated, a significant decrease in H3K27cr enrichment was observed on the SPARC promoter upon p300 knockdown (Figure 6J), directly demonstrating p300's crotonyltransferase activity at this site.

      (e) Rescue of p300 occupancy and H3K27cr by YEATS2 overexpression in SP1deficient cells: To further establish the YEATS2-p300 axis, we performed SP1 knockdown (which reduces YEATS2 expression) followed by ectopic YEATS2 overexpression, and then assessed p300 occupancy and H3K27cr levels on the SPARC promoter. While SP1 knockdown led to a decrease in both p300 and H3K27cr enrichment, we observed a significant rescue of both p300 occupancy and H3K27cr enrichment upon YEATS2 overexpression in the shSP1 cells (Figure 6E and F). This provides strong evidence that YEATS2 acts downstream of SP1 to regulate p300 recruitment and H3K27cr levels.

      Collectively, these comprehensive new results clearly establish that YEATS2 directly interacts with and assists in the recruitment of p300 to the SPARC promoter. This recruitment is crucial for p300's localized crotonyltransferase activity, leading to increased H3K27cr marks and subsequent activation of SPARC transcription. This clarifies the previously observed 'paradox' and defines a novel cooperative mechanism between a histone reader (YEATS2) and a writer (p300) in regulating histone crotonylation and gene expression.

      (8) The role of GCDH in producing crotonyl-CoA is already well-established in the literature. The authors' hypothesis that GCDH is essential for crotonyl-CoA production has been proven, and it's unclear why this is presented as a novel finding. It has been shown that YEATS2 KD leads to reduced H3K27cr, however, it remains unclear how the reader is affecting crotonylation levels. Are GCDH levels also reduced in the YEATS2 KD condition? Are YEATS2 levels regulating GCDH expression? One possible mechanism is YEATS2 occupancy on GCDH promoter and therefore reduced GCDH levels upon YEATS2 KD. This aspect is crucial to the study's proposed mechanism but is not addressed thoroughly.

      We appreciate the reviewer's valuable comment questioning the novelty of GCDH's role in crotonyl-CoA production and seeking further clarification on how YEATS2 influences crotonylation levels beyond its reader function.

      We agree that GCDH's general role in producing crotonyl-CoA is well-established[5,6]. Our study, however, aims to delineate a novel epigenetic-metabolic crosstalk in head and neck cancer, specifically investigating how the interplay between the histone crotonylation reader YEATS2 and the metabolic enzyme GCDH contributes to increased histone crotonylation and drives EMT in this context.

      Our initial investigations using GSEA on publicly available TCGA RNA-seq data revealed that HNC patients with high YEATS2 expression also exhibit elevated expression of genes involved in the lysine degradation pathway, prominently including GCDH. Recognizing the known roles of YEATS2 in preferentially binding H3K27cr7 and GCDH in producing crotonylCoA, we hypothesized that the elevated H3K27cr levels observed in HNC are a consequence of the combined action of both YEATS2 and GCDH. We have provided evidence that increased nuclear GCDH correlates with higher H3K27cr abundance, likely due to an increased nuclear pool of crotonyl-CoA, and that YEATS2 contributes through its preferential maintenance of crotonylation marks by recruiting p300 (as detailed in Figure 5FH and Figure 6J-L of the manuscript and elaborated in our response to point 7). Thus, our work highlights that both YEATS2 and GCDH are crucial for the regulation of histone crotonylation-mediated gene expression in HNC.

      To directly address the reviewer's query regarding YEATS2's influence on GCDH levels and nuclear histone crotonylation:

      • YEATS2 does not transcriptionally regulate GCDH: We did not find any evidence of YEATS2 directly regulating the expression levels of GCDH at the transcriptional level in HNC cells.

      • Novel finding: YEATS2 regulates GCDH nuclear localization: Crucially, we discovered that YEATS2 downregulation significantly reduces the nuclear pool of GCDH in head and neck cancer cells (Figure 7G). This is a novel mechanism suggesting that YEATS2 influences histone crotonylation not only by affecting promoter H3K27cr levels via p300 recruitment, but also by regulating the availability of the crotonyl-CoA producing enzyme, GCDH, within the nucleus.

      • Common upstream regulation by SP1: Interestingly, we found that both YEATS2 and GCDH expression are commonly regulated by the transcription factor SP1 in HNC. Our data demonstrate that SP1 binds to the promoters of both genes, and its downregulation leads to a decrease in their respective expressions (Figure 3 and Figure 7). This provides an important upstream regulatory link between these two key players.

      • Functional validation of GCDH in EMT: We further assessed the functional importance of GCDH in maintaining the EMT phenotype in HNC cells. Matrigel invasion assays after GCDH knockdown and overexpression in BICR10 cells revealed that the invasiveness of HNC cells was significantly reduced upon GCDH knockdown and significantly increased upon GCDH overexpression (results provided in revised manuscript Figure 7F and Figure 7—figure supplement 1F).

      These findings collectively demonstrate a multifaceted role for YEATS2 in regulating histone crotonylation by both direct recruitment of the writer p300 and by influencing the nuclear availability of the crotonyl-CoA producing enzyme GCDH. We acknowledge that the precise molecular mechanism governing YEATS2's effect on GCDH nuclear localization remains an exciting open question for future investigation, but our current data establishes a novel regulatory axis.

      (9) The authors should provide IHC analysis of YEATS2, SPARC alongside H3K27cr and GCDH staining in normal vs. tumor tissues from HNC patients.

      We thank the reviewer for their suggestion. We have performed IHC analysis for YEATS2, H3K27cr and GCDH in normal and tumor samples obtained from HNC patient.

      Reviewer #2 (Public review):

      Summary:

      The manuscript emphasises the increased invasive potential of histone reader YEATS2 in an SP1-dependent manner. They report that YEATS2 maintains high H3K27cr levels at the promoter of EMT-promoting gene SPARC. These findings assigned a novel functional implication of histone acylation, crotonylation.

      We thank the reviewer for the constructive comments. We are committed to making beneficial changes to the manuscript in order to alleviate the reviewer’s concerns.

      Concerns:

      (1) The patient cohort is very small with just 10 patients. To establish a significant result the cohort size should be increased.

      We thank the reviewer for this suggestion. We have increased the number of patient samples to assess the levels of YEATS2 (n=23 samples) and the results have been included in Figure 1G and Figure 1—figure supplement 1F.

      (2) Figure 4D compares H3K27Cr levels in tumor and normal tissue samples. Figure 1G shows overexpression of YEATS2 in a tumor as compared to normal samples. The loading control is missing in both. Loading control is essential to eliminate any disparity in protein concentration that is loaded.

      To address the reviewer’s concern, we have repeated the experiment and used H3 as a loading control as nuclear protein lysates from patient samples were used to check YEATS2 and H3K27cr levels.

      (3) Figure 4D only mentions 5 patient samples checked for the increased levels of crotonylation and hence forms the basis of their hypothesis (increased crotonylation in a tumor as compared to normal). The sample size should be more and patient details should be mentioned.

      As part of the revision, we have now checked the H3K27cr levels in a total of 23 patient samples and the results have been included in Figure 4D and Figure 4— figure supplement 1D. Patient details are provided in Supplementary Table 6.

      (4) YEATS2 maintains H3K27Cr levels at the SPARC promoter. The p300 is reported to be hyper-activated (hyperautoacetylated) in oral cancer. Probably, the activated p300 causes hyper-crotonylation, and other protein factors cause the functional translation of this modification. The authors need to clarify this with a suitable experiment.

      We thank the reviewer for this insightful comment regarding the functional relationship between YEATS2 and p300 in the context of H3K27cr, especially considering reports of p300 hyper-activation in oral cancer. We agree that a precise clarification of p300's role and its cooperativity with YEATS2 is crucial to fully understand the functional translation of this modification.

      As we have shown through global RNA-seq and ChIP-seq analyses, YEATS2 broadly affects gene expression by regulating histone crotonylation levels at gene promoters. We also recognize that the histone writer p300 is a promiscuous acyltransferase, known to add various non-acetyl marks, including crotonylation[4]. Our initial data, showing decreased p300 occupancy on the SPARC promoter upon YEATS2 downregulation (Figure 5F), suggested a strong dependency of p300 on YEATS2 for its recruitment. To fully delineate the molecular mechanism of this cooperativity and clarify how YEATS2 influences p300-mediated histone crotonylation and its functional outcomes, we have performed the following series of experiments, which have been integrated into the revised manuscript:

      (a) Validation of p300's role in SPARC expression: We performed p300 knockdown in BICR10 cells, followed by immunoblotting to assess SPARC protein levels. As expected, a significant decrease in SPARC protein levels was observed upon p300 knockdown (Figure 5G). This confirms p300's direct involvement in SPARC gene expression.

      (b) Direct interaction between YEATS2 and p300: To investigate a potential physical association, we performed co-immunoprecipitation assays to check for an interaction between endogenous YEATS2 and p300. Our results clearly demonstrate the presence of YEATS2 in the p300-immunoprecipitate sample, indicating that YEATS2 and p300 physically interact and likely function together as a complex to drive the expression of target genes like SPARC (Figure 5H). This direct interaction provides the mechanistic basis for how YEATS2 influences p300 occupancy.

      (c) Impact on transcriptional activity (Pol II recruitment): As suggested, we performed RNA Polymerase II (Pol II) ChIP-qPCR on the SPARC promoter in YEATS2 knockdown cells. We observed a significant decrease in Pol II occupancy on the SPARC promoter after YEATS2 knockdown in BICR10 cells (Figure 6C). This confirms that YEATS2 silencing leads to reduced transcriptional initiation/elongation at this promoter.

      (d) p300's direct role in H3K27cr on SPARC promoter: To confirm p300's specific role in crotonylation at this locus, we performed H3K27cr ChIP-qPCR after p300 knockdown. As anticipated, a significant decrease in H3K27cr enrichment was observed on the SPARC promoter upon p300 knockdown (Figure 6J), directly demonstrating p300's crotonyltransferase activity at this site.

      (e) Rescue of p300 occupancy and H3K27cr by YEATS2 overexpression in SP1deficient cells: To further establish the YEATS2-p300 axis, we performed SP1 knockdown (which reduces YEATS2 expression) followed by ectopic YEATS2 overexpression, and then assessed p300 occupancy and H3K27cr levels on the SPARC promoter. While SP1 knockdown led to a decrease in both p300 and H3K27cr enrichment, we observed a significant rescue of both p300 occupancy and H3K27cr enrichment upon YEATS2 overexpression in the sh_SP1_ cells (Figure 6K and L). This provides strong evidence that YEATS2 acts downstream of SP1 to regulate p300 recruitment and H3K27cr levels.

      Collectively, these comprehensive new results clearly establish that YEATS2 directly interacts with and assists in the recruitment of p300 to the SPARC promoter. This recruitment is crucial for p300's localized crotonyltransferase activity, leading to increased H3K27cr marks and subsequent activation of SPARC transcription. This clarifies the previously observed 'paradox' and defines a novel cooperative mechanism between a histone reader (YEATS2) and a writer (p300) in regulating histone crotonylation and gene expression.

      (5) I do not entirely agree with using GAPDH as a control in the western blot experiment since GAPDH has been reported to be overexpressed in oral cancer.

      We would like to clarify that GAPDH was not used as a loading control for protein expression comparisons between normal and tumor samples. GAPDH was used as a loading control only in experiments using head and neck cancer cell lines where shRNA-mediated knockdown or overexpression was employed. These manipulations specifically target the genes of interest and are not expected to alter GAPDH expression, making it a suitable loading control in these instances.

      (6) The expression of EMT markers has been checked in shControl and shYEATS2 transfected cell lines (Figure 2A). However, their expression should first be checked directly in the patients' normal vs. tumor samples.

      We thank the reviewer for the suggestion. We have now checked the expression of EMT marker Twist1 alongside YEATS2 expression in normal vs. tumor tissue samples using IHC (Figure 4E).

      (7) In Figure 3G, knockdown of SP1 led to the reduced expression of YEATS2 controlled gene Twist1. Ectopic expression of YEATS2 was able to rescue Twist1 partially. In order to establish that SP1 directly regulates YEATS2, SP1 should also be re-introduced upon the knockdown background along with YEATS2 for complete rescue of Twist1 expression.

      To address the reviewer’s concern regarding the partial rescue of Twist1 in SP1 depleted-YEATS2 overexpressed cells, we performed the experiment as suggested by the reviewer. We overexpressed both SP1 and YEATS2 in SP1-depleted cells and found that Twist1 depletion was almost completely rescued.

      Authors’ response image 2.

      Immunoblot depicting the decreased Twist1 levels on SP1 knockdown and its subsequent rescue of expression upon YEATS2 and SP1 overexpression in BICR10 (endogenous YEATS2 band indicated by *).

      (8) In Figure 7G, the expression of EMT genes should also be checked upon rescue of SPARC expression.

      We thank the reviewer for the suggestion. We have examined the expression of EMT marker Twist1 on YEATS2/ GCDH rescue. On overexpressing both YEATS2 and GCDH in sh_SP1_ cells we found that the depleted expression of Twist1 was rescued.

      Authors’ response image 3.

      Immunoblot depicting the decreased Twist1 levels on SP1 knockdown and its subsequent rescue of expression upon dual overexpression of YEATS2 and GCDH in BICR10 (* indicates GFP-tagged YEATS2 probed using GFP antibody).

      Reviewer #1 (Recommendations for the authors):

      While the study offers insights into the specific role of this axis in regulating epithelial-tomesenchymal transition (EMT) in HNC, its broader mechanistic novelty is limited by prior discoveries in other cancer types (https://doi.org/10.1038/s41586-023-06061-0). The manuscript would benefit from the inclusion of metastasis data, the role of key metabolic enzymes like ECHS1, the molecular mechanisms governing p300 and YEATS2 interactions, additional IHC data, negative control data in ChIP, and an explanation of discrepancies in certain figures.

      We thank the reviewer for their constructive suggestions. We have made extensive revisions to our manuscript to substantiate our findings. We have looked into the expression of ECHS1/ GCDH in HNC tumor tissues using IHC, performed extensive experiments to validate the role of p300 in YEATS2-mediated histone crotonylation, and provided additional data supporting our findings wherever required. The revised figures have been provided in the updated version of the manuscript and also in the Authors’ response.

      Minor Comments:

      (1) The study begins with a few EMT markers, such as Vimentin, Twist, and N-Cadherin to validate the role of YEATS2 in promoting EMT. Including a broader panel of EMT markers would strengthen the conclusions about the effects of YEATS2 on EMT and invasion. Additionally, the rationale for selecting these EMT markers is not fully elaborated. Why were other well-known EMT players not included in the analysis?

      On performing RNA-seq with shControl and sh_YEATS2_ samples, we discovered that TWIST1 was showing decrease in expression on YEATS2 downregulation. So Twist1 was investigated as a potential target of YEATS2 in HNC cells. N-Cadherin was chosen because it is known to get upregulated directly by Twist1[8]. Further, Vimentin was chosen as it a well-known marker for mesenchymal phenotype and is frequently used to indicate EMT in cancer cells[9].

      Authors’ response image 4.

      IGV plot showing the decrease in Twist1 expression in shControl vs. shYEATS2 RNA-seq data.

      Other than the EMT-markers used in our study, the following markers were amongst those that showed significant change in gene expression on YEATS2 downregulation.

      Authors’ response table 1.

      List of EMT-related genes that showed significant change in expression on YEATS2 knockdown in RNA-seq analysis.

      As depicted in the table above, majority of the genes that showed downregulation on YEATS2 knockdown were mesenchymal markers, while epithelial-specific genes such as Ecadherin and Claudin-1 showed upregulation. This data signifies the essential role of YEATS2 in driving EMT in head and neck cancer.

      (2) The authors use Ponceau staining, but the rationale behind this choice is unclear. Ponceau is typically used for transfer validation. For the same patient, western blot loading controls like Actin/GAPDH should be shown. Also, at various places throughout the manuscript, Ponceau staining has been used. These should also be replaced with Actin/GAPDH blots.

      Ponceau S staining is frequently used as alternative for housekeeping genes like GAPDH as control for protein loading[10]. However, to address this issue, we have repeated the western and used H3 as a loading control as nuclear protein lysates from patient samples were used to check YEATS2 and H3K27cr levels.

      For experiments (In Figures 5E, 6F, 6I, and 7H ) where we assessed SPARC levels in conditioned media obtained from BICR10 cells (secretory fraction), Ponceau S staining was deliberately used as the loading control. In such extracellular protein analyses, traditional intracellular housekeeping genes (like Actin or GAPDH) are not applicable. Ponceau S has been used as a control for showing SPARC expression in secretory fraction of mammalian cell lines in previous studies as well11.  

      (3) The manuscript briefly mentions that p300 was identified as the only protein with increased expression in tumours compared to normal tissue in the TCGA dataset. What other writers were checked for? Did the authors check for their levels in HNC patients?

      We thank the reviewer for this observation. As stated by previous studies [12,13], p300 and GCN5 are the histone writers that can act as crotonyltransferases at the H3K27 position. Although the crotonyltransferase activity of GCN5 has been demonstrated in yeast, it has not been confirmed in human. Whereas the histone crotonyltransferase activity of p300 has been validated in human cells using in vitro HCT assays[4,14]. Therefore, we chose to focus on p300 for further validation of its role in YEATS2mediated regulation of histone crotonylation. We did not check the levels of p300 in HNC patient tissues. However, p300 showed higher expression in tumor as compared to normal in publicly available HNC TCGA RNA-seq data (Figure 5—figure supplement 1G).

      We acknowledge that the original statement in the manuscript, 'For this we looked at expression of the known writers of H3K27Cr mark in TCGA dataset, and discovered that p300 was the only protein that had increased expression in tumor vs. normal HNC dataset…', was indeed slightly misleading. Our intention was to convey that p300 is considered the major and most validated histone crotonyltransferase capable of influencing crotonylation at the H3K27 position in humans, and that its expression was notably increased in the HNC TCGA tumor dataset. We have now reframed this sentence in the revised manuscript to accurately reflect our findings and focus, as follows:

      'For this, we checked the expression of p300, a known writer of H3K27cr mark in humans, in the TCGA dataset. We found that p300 had increased expression in tumor vs. normal HNC dataset…'

      This revised wording more accurately reflects our specific focus on p300's established role and its observed upregulation in HNC.

      (4) Figure 6E, blot should be replaced. The results aren't clearly visible.

      We thank the reviewer for this observation. We have repeated the western blot and the Figure 6E (Figure 6F in the revised version of manuscript) has now been replaced with a cleaner blot.

      (5) Reference 9 and 19 are the same. Please rectify.

      We apologize for this inadvertent error. We have rectified this error in the updated version of the manuscript.

      References

      (1) Brabletz, T.; Kalluri, R.; Nieto, M. A.; Weinberg, R. A. EMT in Cancer. Nat Rev Cancer 2018, 18(2), 128–134. https://doi.org/10.1038/nrc.2017.118.

      (2) Pisani, P.; Airoldi, M.; Allais, A.; Aluffi Valletti, P.; Battista, M.; Benazzo, M.; Briatore, R.; Cacciola, S.; Cocuzza, S.; Colombo, A.; Conti, B.; Costanzo, A.; Della Vecchia, L.; Denaro, N.; Fantozzi, C.; Galizia, D.; Garzaro, M.; Genta, I.; Iasi, G. A.; Krengli, M.; Landolfo, V.; Lanza, G. V.; Magnano, M.; Mancuso, M.; Maroldi, R.; Masini, L.; Merlano, M. C.; Piemonte, M.; Pisani, S.; Prina-Mello, A.; Prioglio, L.; Rugiu, M. G.; Scasso, F.; Serra, A.; Valente, G.; Zannetti, M.; Zigliani, A. Metastatic Disease in Head & Neck Oncology. Acta Otorhinolaryngol Ital 2020, 40 (SUPPL. 1), S1–S86. https://doi.org/10.14639/0392-100X-suppl.1-40-2020.

      (3) Lin, J.; Zhang, P.; Liu, W.; Liu, G.; Zhang, J.; Yan, M.; Duan, Y.; Yang, N. A Positive Feedback Loop between ZEB2 and ACSL4 Regulates Lipid Metabolism to Promote Breast Cancer Metastasis. Elife 2023, 12, RP87510. https://doi.org/10.7554/eLife.87510.

      (4) Liu, X.; Wei, W.; Liu, Y.; Yang, X.; Wu, J.; Zhang, Y.; Zhang, Q.; Shi, T.; Du, J. X.; Zhao, Y.; Lei, M.; Zhou, J.-Q.; Li, J.; Wong, J. MOF as an Evolutionarily Conserved Histone Crotonyltransferase and Transcriptional Activation by Histone Acetyltransferase-Deficient and Crotonyltransferase-Competent CBP/P300. Cell Discov 2017, 3 (1), 17016. https://doi.org/10.1038/celldisc.2017.16.

      (5) Jiang, G.; Li, C.; Lu, M.; Lu, K.; Li, H. Protein Lysine Crotonylation: Past, Present, Perspective. Cell Death Dis 2021, 12 (7), 703. https://doi.org/10.1038/s41419-021-03987-z.

      (6) Yuan, H.; Wu, X.; Wu, Q.; Chatoff, A.; Megill, E.; Gao, J.; Huang, T.; Duan, T.; Yang, K.; Jin, C.; Yuan, F.; Wang, S.; Zhao, L.; Zinn, P. O.; Abdullah, K. G.; Zhao, Y.; Snyder, N. W.; Rich, J. N. Lysine Catabolism Reprograms Tumour Immunity through Histone Crotonylation. Nature 2023, 617 (7962), 818–826. https://doi.org/10.1038/s41586-023-06061-0.

      (7) Zhao, D.; Guan, H.; Zhao, S.; Mi, W.; Wen, H.; Li, Y.; Zhao, Y.; Allis, C. D.; Shi, X.; Li, H. YEATS2 Is a Selective Histone Crotonylation Reader. Cell Res 2016, 26 (5), 629–632. https://doi.org/10.1038/cr.2016.49.

      (8) Alexander, N. R.; Tran, N. L.; Rekapally, H.; Summers, C. E.; Glackin, C.; Heimark, R. L. NCadherin Gene Expression in Prostate Carcinoma Is Modulated by Integrin-Dependent Nuclear Translocation of Twist1. Cancer Res 2006, 66 (7), 3365–3369.

      https://doi.org/10.1158/0008-5472.CAN-05-3401.

      (9) Satelli, A.; Li, S. Vimentin in Cancer and Its Potential as a Molecular Target for Cancer Therapy. Cellular and Molecular Life Sciences 2011, 68 (18), 3033–3046. https://doi.org/10.1007/s00018-011-0735-1.

      (10) Romero-Calvo, I.; Ocón, B.; Martínez-Moya, P.; Suárez, M. D.; Zarzuelo, A.; Martínez-Augustin, O.; de Medina, F. S. Reversible Ponceau Staining as a Loading Control Alternative to Actin in Western Blots. Anal Biochem 2010, 401 (2), 318–320. https://doi.org/https://doi.org/10.1016/j.ab.2010.02.036.

      (11) Ling, H.; Li, Y.; Peng, C.; Yang, S.; Seto, E. HDAC10 Inhibition Represses Melanoma Cell Growth and BRAF Inhibitor Resistance via Upregulating SPARC Expression. NAR Cancer 2024, 6 (2), zcae018. https://doi.org/10.1093/narcan/zcae018.

      (12) Gao, D.; Li, C.; Liu, S.-Y.; Xu, T.-T.; Lin, X.-T.; Tan, Y.-P.; Gao, F.-M.; Yi, L.-T.; Zhang, J. V; Ma, J.Y.; Meng, T.-G.; Yeung, W. S. B.; Liu, K.; Ou, X.-H.; Su, R.-B.; Sun, Q.-Y. P300 Regulates Histone Crotonylation and Preimplantation Embryo Development. Nat Commun 2024, 15 (1), 6418. https://doi.org/10.1038/s41467-024-50731-0.

      (13) Li, K.; Wang, Z. Histone Crotonylation-Centric Gene Regulation. Epigenetics Chromatin 2021, 14 (1), 10. https://doi.org/10.1186/s13072-021-00385-9.

      (14) Sabari, B. R.; Tang, Z.; Huang, H.; Yong-Gonzalez, V.; Molina, H.; Kong, H. E.; Dai, L.; Shimada, M.; Cross, J. R.; Zhao, Y.; Roeder, R. G.; Allis, C. D. Intracellular Crotonyl-CoA Stimulates Transcription through P300-Catalyzed Histone Crotonylation. Mol Cell 2015, 58 (2), 203–215. https://doi.org/https://doi.org/10.1016/j.molcel.2015.02.029.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Response to the reviewers

      Reviewer 1:

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Initial reply: Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Revised reply: We clarified our modelling choices in the ”Modelling strategy” subsection of the results section.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      Initial reply: We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      •    Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      •    Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      •    Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      •    Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their *subjective* feelings. It might have been better to query participants about perceived stimulus intensity levels. This perspective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Initial reply: Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the relevance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      Initial reply: The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.12.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Initial reply: Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Revised reply: We restructured introduction, results and parts of the methods. We followed the reviewer’s suggestion regarding enhancing clarity through graphical diagrams. We have visualised the experimental design in Figure 1D. Furthemore, we have visualised the two main computational models (eRL and eKF) in Figure 2, following from Jepma et al. (2018). As a result, we have updated the notation in Section 4.4 to be clearer and consistent with the graphical representation (rename the variable referring to observed thermal input from Ot to Nt).

      Reviewer Comment 1.6 — In lines 99-100, the statement ”following the work by [23]” would be more helpful if it included a concise summary of the main concepts from the referenced work.

      - It would be helpful to have descriptions of the conditions that Figure 1C is elaborating on.

      - In line 364, the ”N {t}” in the sentence ”The observation on trial t, N {t}”, should be O {t}.

      Initial reply: Thank you for spotting these and for providing the suggestions. We will include the correction in the revised version.

      Revised reply: We have added the following regarding the lines 99-100:

      ”We build on the work by [23], who show that pain perception is strongly influenced by expectations as defined by a cue that predicts high or low pain. In contrast to the cue-paradigm from [23], the primary aim of our experiment was to determine whether the expectations participants hold about the sequence itself inform their perceptual beliefs about the intensity of the stimuli.”

      See comment in the previous reply, regarding the notation change from Ot to Nt.

      Reviewer 2:

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential implications for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Initial reply: Thank you very much for these positive comments.

      Reviewer 3:

      Summary:

      I am pleased to have had the opportunity to review this manuscript, which investigated the role of statistical learning in the modulation of pain perception. In short, the study showed that statistical aspects of temperature sequences, with respect to specific manipulations of stochasticity (i.e., randomness of a sequence) and volatility (i.e., speed at which a sequence unfolded) influenced pain perception. Computational modelling of perceptual variables (i.e., multi-dimensional ratings of perceived or predicted stimuli) indicated that models of perception weighted by expectations were the best explanation for the data. My comments below are not intended to undermine or question the quality of this research. Rather, they are offered with the intention of enhancing what is already a significant contribution to the pain neuroscience field. Below, I highlight the strengths and weaknesses of the manuscript and offer suggestions for incorporating additional methodological details.

      Strengths:

      The manuscript is articulate, coherent, and skilfully written, making it accessible and engaging.

      - The innovative stimulation paradigm enables the exploration of expectancy effects on perception without depending on external cues, lending a unique angle to the research.

      - By including participants’ ratings of both perceptual aspects and their confidence in what they perceived or predicted, the study provides an additional layer of information to the understanding of perceptual decision-making. This information was thoughtfully incorporated into the modelling, enabling the investigation of how confidence influences learning.

      - The computational modelling techniques utilised here are methodologically robust. I commend the authors for their attention to model and parameter recovery, a facet often neglected in previous computational neuroscience studies.

      - The well-chosen citations not only reflect a clear grasp of the current research landscape but also contribute thoughtfully to ongoing discussions within the field of pain neuroscience.

      Initial reply: We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Initial reply: Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally transformed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens.

      Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Revised reply: We re-plotted Figure 1E-F with a different exemplary participant, whose rating go above the pain threshold. We also included all participant pain perception and prediction ratings, noxious input sequences and confidence ratings in the supplement in Figures S1-S3.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      Initial reply: We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Revised reply: We clarified our modelling choices in the ”2.2 Modelling strategy” subsection.

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      Initial reply: While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Revised reply: We elaborated on the significance statements in the ”Modelling Results” subsection:

      • We considered at least a 2 sigma effect as indication of a significant difference. In each condition, the expectation weighted models (eKF and eRL) provided better fit than models without this element (KF and RL; approx. 2-4 sigma difference, as reported in Figure 5A-D). This suggests that regardless of the levels of volatility and stochasticity, participants still weigh perception of the stimuli with their expectation.

      and in the first paragraph of the Discussion:

      • When varying different levels of inherent uncertainty in the sequences of stimuli (stochasticity and volatility), the expectation and confidence weighted models fitted the data better than models weighted for confidence but not for expectations (Figure 5A-D). The expectation-weighted bayesian (KF) model offered a better fit than the expectation-weighted, model-free RL model, although in conditions of high stochasticity this difference was short of significance. Overall, this suggests that participants’ expectations play a significant role in the perception of sequences of noxious stimuli.

      We are aware of the limitations and lack of clear guidance regarding using sigma effects to establish significance (as per reviewer’s suggestion: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009). Here we decided to use the above-mentioned threshold of 2-sigma as an indication of significance, but note the potential limitations of the inferences - especially when distinguishing between eRL/eKF models.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      Initial reply: We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      Initial reply: It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Revised reply: We increased the number of simulations per model pair to ≈ 100 (after rejecting fits based on diagnostics criteria - E-BFMI and divergent transitions) and updated the confusion matrix (Table S4). Although the confusion between eRL and eKF remains, the model recovery shows good distinction between expectation weighted vs non-expectation weighted (and Random) models, which supports our main conclusion in the paper.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines significance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Initial reply: Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Revised reply: We clarify this further, as per our revised response to Comment 3.3 above. We have also added the following statement in section 4.5.1 (Methods, Model comparison): ”There’s no agreed-upon threshold of SEs that determines significance, but the higher the sigma difference, the more robust is the effect.”

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the xaxis and the recovered parameters on the y-axis would effectively convey this missing information.

      Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Initial reply: Thanks for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Revised reply: We included parameter recovery scatter plots for each model and parameter in the Supplement Figures S7-S11.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Initial reply: Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Revised reply: We have considered the suggested diagnostics and include bulk and tail ESS values for each condition, model, parameter in the Supplement Tables S6-S9. We also report number of chain with low E-BFMI (0), number of divergent transitions (0) and the E-BFMI values per chain in Table S10.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regulation.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      Initial reply: This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

      Revised reply: We have removed this statement from the revised version.

      Reviewer Comment 3.10 — In relation to the comment on model comparison in my public review, I believe the following link may provide further insight and clarify the basis for my observation. It discusses the use of standard error in model comparison and may be useful for the authors in addressing this particular point: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009

      Initial reply: Thank you for this suggestion, we will consider the forum discussion in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).

      In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.

      The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.

      While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.

      The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.

      I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).

      We thank the reviewer for the evaluation and address the detailed points below.

      Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Strengths:

      (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.

      (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.

      Weaknesses:

      It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.

      We are grateful to the reviewer for the critique. Indeed, the degree to which clustered inputs belonging to a functional neuronal assembly occur on interneuron dendrites is an open question. However, Chen et al (2013, Nature 499:295-300) reported that dendritic domains of PV-positive interneurons in visual cortex, unlike their somata, exhibit calcium transients in vivo which are highly tuned to stimulus orientation. This suggests that clustered inputs to dendritic segments may well belong to functional assemblies, much as in principal cells (e.g. Wilson et al, 2016, Nature Neuroscience 19:1003–1009; Iacaruso et al, 2017, Nature 547;449–452). In our earlier work reporting NMDAR-dependent supralinear summation of glutamate uncaging-evoked responses at a subset of dendrites on PV-positive interneurons, we demonstrated how this arrangement in an oscillating feedback circuit could be exploited to stabilise neuronal assemblies.

      Reviewer #3 (Public review):

      Summary:

      The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.

      Weaknesses:

      (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.

      We thank the reviewer for the helpful comments.

      We did not actually claim that EPSP amplitude has no role in determining the magnitude of non-linearity: “Among possible sources of variability for voltage supralinearity, we did not observe a systematic dependence on the average amplitude of individual uEPSPs […] (Fig. S2)”. Whilst we fully agree that, at first sight, a positive dependence of supralinearity on uEPSP amplitude might be expected simply from the voltage-dependent kinetics of NMDARs, there are two main reasons why this could have been obscured. First, the expected relationship is non-monotonic, because with large local depolarizations the driving force collapses, as seen in the overall sigmoid shape of the average relationship between the scaled observed response and arithmetic sum (e.g. Figs 2a & c; 4c & e). Therefore, we would arguably expect a parabolic relationship rather than a simple positive slope relating the degree of supralinearity to the average amplitude of individual uEPSPs. Second, given that the uncaging distance varied substantially, the average amplitudes of the individual uEPSPs recorded at the soma would have undergone different degrees of electrotonic attenuation and further distortion by active conductances before they were measured. Ultimately, the plots in Fig. S2 show too much scatter to be able to exclude a positive or parabolic relationship of nonlinearity to uEPSP amplitude. To avoid misunderstanding, we have changed the sentence in the Results that refers to Fig. S2 to: “Among possible sources of variability for voltage supralinearity, we did not observe a significant monotonic dependence on the average amplitude of individual uEPSPs, distance from the uncaging location along the dendrite to the soma, [or] the dendrite order (Fig. S2)”.

      As for the relative contributions of NMDARs and AMPARs, voltage clamp recordings from both neurogliaform and OLM interneurons have already been reported, with the conclusion that neurogliaform cells exhibit relatively larger NMDAR-mediated currents (e.g. Chittajallu et al. 2017; Booker et al. 2021; Mercier et al. 2022), entirely in keeping with the conclusions of our study. Repeating these measurements would add little to the study. Furthermore, because the mean baseline uEPSP amplitude was <0.5 mV (Fig S2), it would be difficult to obtain reliable meaurements of isolated NMDAR-mediated uEPSCs.

      Turning to the high variability of supralinearity, indeed, the 95% confidence interval for the data in Fig. 2d is 73%, 213%. This degree of variability is consistent with the wide range of NMDAR/AMPAR ratios reported by Chittajallu et al. 2017 (their Fig. 1g), compounded by the expected non-monotonic relationship alluded to above.

      (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.

      We have plotted the degree of nonlinearity against dendritic diameter for neurogliaform cells (under baseline conditions and in D-AP5) in Fig S2h-k. We did not observe any significant linear correlations, other than between amplitude nonlinearity and dendrite diameter post D-AP5. This does not negate the possibility that the significant difference in average dendritic diameters between neurogliaform and OLM cells contributes to differences in impedance (which we have rephrased as “Among possible explanations is that the local dendritic impedance is greater in neurogliaform cells, lowering the threshold for recruitment of regenerative currents”).

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      The difference in average dendritic diameters between OLM and neurogliaform cells is highly significant (Fig. 8q, P<0.001). We do not claim that dendritic diameter (and by implication local impedance) is the only determinant of the degree of non-linearity. The suggestion that a gradient of glutamate receptor density contributes is interesting. However, the results of uncaging experiments targeting more distal OLM dendrites of similar diameter as neurogliaform dendrites would be subject to numerous confounds, not least the very different electrotonic attenuation, likely differences in various active conductances, and the presence of spines in OLM dendrites (which are generally sparse and were not reliably imaged in our experiments). Moreover, the cell would have to remain patched for longer in order for the fluorescent dyes to invade the distal dendrites. This alone could potentially result in systematic biases among groups. We now cite Guirrado et al (2014) and Gramuntell et al (2021) to highlight that factors other than dendritic diameter per se, such as inhomogeneity in spine and NMDA receptor density may also contribute to the heterogeneity of nonlinear summation in OLM cells.

      (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.

      The same response to point 1) above applies to OLM cells. As with neurogliaform cells, mean OLM baseline asynchronous (separate response) amplitudes were <0.5 mV, making it very difficult to record an isolated NMDAR-mediated uEPSC. Having said that, NMDARs do contribute to EPSCs elicited by stimulation of multiple afferents (e.g. Booker et al, 2021). We do not claim that dendritic calcium transients cannot be elicited following activation of NMDARs in OLM cells. We simply reported that the evoked uEPSPs, designed to approximate individual synaptic signals, were sub-threshold for detectable dendritic calcium signals under conditions that were suprathreshold in neurogliaform cells. The statement has been amended to specify that there were no detectable signals under our recording conditions. There is no evidence presented in the manuscript to suggest that OLM NMDARs are calcium impermeable and indeed no such claim was made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There is a large variability in the observed dendritic nonlinearity, in NDNF IN dendrites e.g. the uEPSP amplitude nonlinearity measure varies from as low as 10-20% to over 200%. As only single dendrites were recorded from each IN, it is unclear if this variability is among the cells or between individual dendrites. While the authors analyzed some potential factors, such as distance along the dendrites, branch order, or response magnitude (amplitude and integral), they did not find any substantial correlation. It remains open if different dendrites of NDNF INs, located in the str. moleculare vs. those in or projecting towards str. radiatum, have divergent properties. Similarly, for SOM INs an important question is if axon-carrying dendrites show distinct properties.

      In this context, it would be interesting to see not only values for the mean nonlinearity but also the maximal nonlinearity and its distribution.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in Fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify the analysis method. We did not address specifically whether dendrites projecting in different directions behaved differently. This is an interesting question beyond the scope of this study. Nor did we compare axon-carrying OLM dendrites to other dendrites.

      Figures:

      Figure 1: The gray line in plots g and h is not explained. While it looks like an identity line, the legend in plot i ("asynchronous") interferes.

      In plots g and h the gray line is the line of identity. In plot i it is an estimate of the linear summation. In plot i it is not the line of identity as it does not start at the origin with a slope of 1. The figure legend has been amended to clarify.

      In the same panels (Figure 1g,h, and subsequent figures) consider changing the title from "soma (voltage)" to uEPSP.

      The titles have been amended.

      In panel Figure 1i note the missing "(" in the title.

      Title amended.

      In panel Figure 1h: Shouldn't the X-axis label and legend text read "Arithmetic sum of (EPSP) integrals" instead of "Integral of arithmetic sum").

      The wording more accurately reflects the analytical operations. The asynchronous (separate) responses were summed arithmetically first, and then the integral was taken of each cumulative sum. We have therefore left the axis title and legend unchanged.

      Figure 2a,c: Could you please describe how the scaling was performed for the two axes?

      Method section amended.

      In the same panels (Figure 2a,c, and subsequent figures), the legend seems to be misleading: the plot is NS Amplitude/Integral vs Arithmetic sum, and the black line is the identity line (or scaled interpolation of the arithmetic sum, which is essentially the same).

      The scaled arithmetic sums (uEPSP amplitude, integral) represent linear summation and so overlap with the line of identity. The interpolation estimate of the asynchronous (separate) calcium transient response does not overlap with the line of identity as this estimate does not start at the origin with a slope of 1. The legends throughout the manuscript have been amended to clarify this.

      Figure 2b,d,f (and subsequent figures) slope plots: Please indicate that this is the average amplitude supralinearity for the individual recorded dendrites. Note here that the Results text mentions only the average amplitude supralinearity, but not the slop plots, paired mean difference, or Gardner-Altman estimation, illustrated in the figures.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify.

      Fig 2e: The legend (both text and figure, also in the following figures) is confusing, as the gray line and diamonds are defined as separate 12(?) responses, but it seems to represent a linear interpolation of the scaled arithmetic sums (ultimately nothing else but an identity line).

      The grey line shows the linear interpolation output between the calcium transient measurements at 1 uncaging location and at 12 uncaging locations. The 12th uncaging location is indicated in the key as “separate 12”. The linear interpolation in these plots does represent linear summation but is not the line of identity as it does not begin at the origin and does not have a slope of 1.

      Reviewer #2 (Recommendations for the authors):

      This study is well-developed and technically executed. I only have minor comments for the authors:

      (1) To target NDNF+ neurons, the authors use the NDNF-Cre mouse line and a Cre-dependent AAV using the mDLX promotor. Why the mDLX promotor? Would it have been sufficient to use any Cre-dependent fluorophore?

      Pilot experiments revealed leaky expression when a virus driving flexed ChR2 under a non-specific promoter (EF1a) was injected in the neocortex of Ndnf-Cre mice (Author response image 1). In our hands, and in line with Dimidschtein et al (2016),  the use of the mDLX enhancer reduced off-target expression.

      Author response image 1.

      A. AAV2/5-EF1a-DIO-hChR2(H134R)-mCherry injected into superficial neocortex of Ndnf-Cre mice led to expression in a few pyramidal neurons in addition to layer 1 neurogliaform cells. B. Patch-clamp recording from a non-labelled pyramidal cell showed that an optogenetically evoked glutamatergic current remained after blockade of GABAA and GABAB receptors, further confirming limited specificity of expression of ChR2. (Data from M Muller, M Mercier and V Magloire, Kullmann lab.)

      (2) The distance of the uncaring sites from the soma plays a key role. The authors should indicate the mean distance of the cluster and its variance.

      Uncaging distance from soma is indicated for both NGF and OLM interneurons in the supplementary figures S2 and S3 respectively.

      (3) Martina et al., in Science 2000, showed high levels of Na+ channels in the dendrites of OLM cells and hinted that spikes could occur in them. The authors should discuss this possible discrepancy.

      Discussion amended.

      (4) Looking at Figure 1d, the EPSPs look exceptionally long-lasting, longer than those observed by stimulating axonal inputs. Could this indicate spill-over excitation? If so, how could this affect the outcome of this study?

      The asynchronous (separate responses) decay to baseline within 100 ms, similar to the neurogliaform EPSPs evoked by electrical stimulation of axons in the SLM in Mercier et al. 2022. We observed clear plateau potentials in a minority of cells (e.g. Fig. S1b). Such plateau potentials can be generated by dendritic calcium channels and we do not consider that glutamate spillover needs to be invoked to account for them.

      (5) In the legend of Figure 2: "n=11 dendrites in 11 cells from 9 animals". Why do the authors only study 11 dendrites from 11 cells? Isn't it possible to repeatedly stimulate clusters of synaptic inputs onto the same cells? In principle, could one test many dendrites of the same cell at different distances from the soma? It is also remarkable that there were very few cells per animal.

      The goal always was to record from as many dendrites as possible from the same cells whilst maintaining high standards of cell health. When cell health indicators such as blebbing, input resistance change or resting voltage change were detected, no further dendritic location could be tested with reasonable confidence. In a given 400 um slice there would be relatively few healthy candidate cells at a suitable depth to attempt to patch-clamp.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This solid study investigates the transdifferentiation of chicken embryonic fibroblasts into muscle and fat cells in 3D to create whole-cut meat mimics. The study is important and provides a method to control muscle, fat, and collagen content within the 3D meat mimics and thus provides a new avenue for customized cultured meat production. Limitations of this study include the use of transgene for transdifferentiation and thus the creation of GMO food.

      We are grateful for the substantial effort that editors and reviewers put into assessing our manuscript and providing insightful feedback. We have tried to address, as much as possible, all comments and criticisms. We believe that we have now a significantly improved manuscript. Below, there is a point-by-point response.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors presented here a novel 3D fibroblast culture and transdifferentiation approach for potential meat production with GelMA hydrogel.

      Strengths:

      (1) Reduced serum concentration for 3D chicken fibroblast culture and transdifferentiation is optimized.

      (2) Efficient myogenic transdifferentiation and lipogenesis as well as controlled fat deposition are achieved in the 3D GelMA.

      Weaknesses:

      (1) While the authors stated the rationale of using fibroblasts instead of myogenic/adipogenic stem cells for meat production, the authors did not comment on the drawbacks/disadvantages of genetic engineering (e.g., forced expression of MyoD) in meat production.

      Thanks for the reviewer for raise this important issue. We have now described this drawback in the discussion part.

      As a proof-of-concept study, we sought to explore the potential of utilizing the transdifferentiation integrated transgene tools for overexpressing a transdifferentiation factor to achieve the maximum muscle production. However, it is important to acknowledge that genetically modified meat products derived from the genetic engineering of cultured cells will not be suitable for consumer acceptance and market viability. We are currently testing other non-genomic integrating delivery means such as modRNAs and chemical cocktails to induce myogenic transdifferentiation in fibroblasts. We believe the new non-genomic integration means would be compatible for the meat production and consumer acceptance.

      Please see lines 439-445.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products.”

      (2) While the authors cited one paper to state the properties and applications of GelMA hydrogel in tissue engineering and food processing, concerns/examples of the food safety with GelMA hydrogel are not discussed thoroughly.

      Thank you for pointing out this issue. We discussed the drawbacks of Gelma hydrogel applications in the meat production in the main text.

      GelMA-based hydrogels have shown great potential due to their biocompatibility and mechanical tenability. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used Gelma hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider Gelma hydrogen as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022).

      Bomkamp, C., Skaalure, S. C., Fernando, G. F., Ben‐Arye, T., Swartz, E. W., & Specht, E. A. J. A. S. (2022). Scaffolding biomaterials for 3D cultivated meat: prospects and challenges. Advanced Science (Weinh), 9(3), 2102908.

      Jeong, D., Seo, J. W., Lee, H. G., Jung, W. K., Park, Y. H., & Bae, H. (2022). Efficient Myogenic/Adipogenic Transdifferentiation of Bovine Fibroblasts in a 3D Bioprinting System for Steak-Type Cultured Meat Production. Advanced Science (Weinh), 9(31), e2202877.

      Li, Y., Liu, W., Li, S., Zhang, M., Yang, F., & Wang, S. J. J. o. F. F. (2021). Porcine skeletal muscle tissue fabrication for cultured meat production using three-dimensional bioprinting technology. Journal of Future Foods, 1(1), 88-97.

      Park, S., Hong, Y., Park, S., Kim, W., Gwon, Y., Jang, K.-J., & Kim, J. J. J. o. B. E. (2023). Designing Highly Aligned Cultured Meat with Nanopatterns-Assisted Bio-Printed Fat Scaffolds. Journal of Biosystems Engineering, 48(4), 503-511.

      We discussed the drawbacks of GelMA hydrogel. Please see lines 445-457.

      “Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (3) In Fig. 4C, there seems no significant difference in the Vimentin expression between Fibroblast_MyoD and Myofibroblast. The conclusion of "greatly reduced in the myogenic transdifferentiated cells" is overstated.

      Thanks for pointing out this mistake.

      We revised the wording accordingly. The vimentin expression was reduced in fibroblast_MyoD compare to the original fibroblast.

      Please see lines 231-233.

      “The fibroblast intermediate filament Vimentin (Tarbit et al., 2019) was abundantly expressed in the fibroblasts but reduced in the myogenic transdifferentiated cells (Figure 4C)”

      (4) The presented cell culture platform is only applied to chicken fibroblasts and should be tested in other species such as pigs and fish.

      Thank you for the suggestion.

      In this pilot cultured meat study, we utilized chicken embryonic fibroblasts. These specific cells were chosen for their near-immortal nature and robustness in culture, as well as the inducible myogenic capacity. In our previous experiments (Ren et al, Cell Reports, 2022, 40:111206), we have tested the myogenic transdifferentiation potential of fibroblasts from mice, pigs, and chickens, and observed varying efficiencies of myogenesis. It is important to note that fibroblast cells derived from different species, or even different tissues within the same species, would exhibit significant variations in their capacities for myogenic and adipogenic transdifferentiation.

      In this proof-of-concept study we used only one source of fibroblasts for testing culture meat production and confirmed the myogenic/adipogenic transdifferentiation could be manipulated as feasible means to precisely control muscle, fat and collagen content. We would expect that different origins of fibroblasts to display different transdifferentiation efficiencies and thus produce various muscle/fat ratios in meat mimics. That is beyond the scope of current study.

      Furthermore, we are also testing myogenic/adipogenic transdifferentiation of fibroblasts from pigs through non-genomic integration approaches. We believe only the non-transgene tools are viable solutions for culture meat production in the future. We added the species information in the discussion part.

      See lines 515-517.

      “This approach can be readily extrapolated to other species such as pigs and presents promising avenues for the large-scale production of customized and versatile meat products that may cater to varying consumer preferences.”

      Reviewer #2 (Public Review):

      The manuscript by Ma et al. tries to develop a protocol for cell-based meat production using chicken fibroblasts as three-dimensional (3D) muscle tissues with fat accumulation. The authors used genetically modified fibroblasts which can be forced to differentiate into muscle cells and formulated 3D tissues with these cells and a biphasic material (hydrogel). The degrees of muscle differentiation and lipid deposition in culture were determined by immunohistochemical, biochemical, and molecular biological evaluations. Notably, the protocol successfully achieved the process of myogenic and lipogenic stimulation in the 3D tissues.

      Overall, the study is reasonably designed and performed including adequate analysis. The manuscript is clearly written with well-supported figures. While it presents valuable results in the field of cultivated meat science and skeletal muscle biology, some critical concerns were identified. First, it is unclear whether some technical approaches were really the best choice for cell-based meat production. Next, more careful evaluations and justifications would be required to properly explain biological events in the results. These points include additional evaluations and considerations with regard to myocyte alignment and lipid accumulation in the differentiated 3D tissues. The present data are very suggestive in general, but further clarifications and arguments would properly support the findings and conclusions.

      Thanks for the reviewer’s comments. We have performed additional experiments and analysis to address the critical questions. We also revised the text extensively to clarify or discuss some of the concerns, such as the cell alignment and cellular distribution of intramuscular fat issues. We expect the revised data and text could adequately support the conclusions of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1, the authors used 1% chicken serum. Have the authors tested other lower concentrations? It will be interesting to see the lowest chicken serum concentrations in fibroblast culture and transdifferentiation;

      Thank you for your suggestion.

      Yes, we actually have tested the lower concentrations of serum, such as 1% FBS, and 0.5% chicken serum. However, the cells are not in a healthy state under these low levels of serum, as shown by the abnormal cell morphology and nearly no cell growth. Please see the revised Supplementary Figure S1D, in which we added the 1%FBS and 0.5% chicken serum data. Hence, the 1% chicken serum is optimal in our hands. We will also test other types of specialized serum-free medium in future experiments.

      (2) In Figure 2, the authors should quantify the fold expansion of fibroblasts cultured in 3D gel after 1, 3, 5, and 9 days since this data is important for future meat manufacturing. In addition, long-term expansion (e.g., 1 month) in 3D gel should also be shown;

      Thanks for the question. We have quantified the cell growth in 3D by measuring the PHK26 stained cells. Since the cells were implanted into the gel, they propagated exponentially from 1 day to 9 days. The cell proliferation data provide good reference for the future meat manufacturing (Figure 2D). We have tried the long-term expansion in 3D but failed to measure the cell proliferation. Because the 3D gel always collapsed during 12-15 days in cell culture for some unknown reasons, either the cells are grown too crowded to compromise the gel structure or the gel matrix itself is not strong enough for standing long-term. We believe the cells will grow well in long-term if we provide enough 3D attachment surface, since they grow indefinitely in 2D. We will testing different 3D matrix in the future.

      Please see the revised Figure 2D for the quantification of cells.

      (3) In Figure 3, please also show MyoD staining as it'll be interesting to see the expression of exogenous and endogenous MyoD expression after dox treatment. In Figure G, the hydrogel meat seems very small, please show/discuss the maximum size of hydrogel meat that may be achieved using this approach;

      Thanks for asking this information. We performed the immunostaining by using the anti-MyoD and anti-Flag to show the expression of all MyoD (exogenous and endogenous) and only exogenous MyoD after dox treatment. The MyoD and 3xFlag were fused in-frame in the transgene plasmid and thus the anti-Flag staining indicate the exogenous MyoD expression and anti-MyoD staining indicate the expression of exogenous and endogenous MyoD together.

      As shown in Figure S4, we found that almost 100% of cells were positive for MyoD staining and 60% of which expressed Flag, these data were consistent with our previous results (Ren et al., 2022, Cell Reports).

      Author response image 1.

      As for the size of the culture meat based on hydrogel, we discussed the possibilities in scalable production of hydrogel based whole-cut meat mimics. Please see lines 446-449. “Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters.”

      (4) In Figure 5 and Supplementary Figure 6, please quantify the Oil-red O+ fat cells in the 2D and 3D lipogenic induction. Also in Fig. 6B, quantify the oil-red+MHC+ cells;

      Thank you for this advice. We have quantified the oil-red O stained images in the result “Stimulate the fat deposition in chicken fibroblasts in 3D” using analysis software imageJ and the quantification of Oil-red O area was added to the corresponding graphs (Figure 5C, Figure S6C and S6F).

      However, due to the unique structure of the 3D matrix, many MHC+ and Oil Red O+ double-positive cells overlap with each other across different Z-stack layers in 3D. This overlap makes it challenging to accurately position and quantify the double-positive cells as the different layers interfere with each other.

      (5) In Figure 7, please show immunostaining images of collagen and other major ECMs;

      Thank you for this question. We have tried to stain collagen networks the by the Picrosirius Red staining but failed. Instead, we employed the laminin immunostainings to confirm that the ECM contents in the 3D matrix is increasing steadily during cell culturation.

      Please see Figure 7C. Lines 346-348.

      “the laminin protein content was accumulated and increased steadily during 3D culturation (Figure 7C) “

      (6) In Figure 8, please show hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI. A Venn Diagram showing the overlap and distinct gene expression among these groups is also appreciated.

      Thank you for the suggestion.

      We added the hierarchical clustering analysis of whole transcriptomes of 3D_fibroblasts, 3D_MyoD, 3D+FI, and 3D_MyoD+FI using Euclidean distance with ward.D cluster method. Please see Figure 8B. The result showed that these groups formed two large clusters, in which the 3D+FI clustered separately and the 3D_fibroblasts, 3D_MyoD and 3D_MyoD+FI were more similar. Please see Figure 8B.

      As the reviewer suggested, we also compared the transcriptomes of 3D_MyoD, 3D+FI, and 3D_MyoD+FI to the original 3D_fibroblasts to identify differentially expression genes (DEG) and then analyzed the overlap and distinct DEGs respectively. As shown in Figure 8D, the Venn Diagram showed that majority of DEG from 3D_MyoD+FI (3D_MyoD+FI versus 3D_fibroblasts) are overlapped with 3D_MyoD and 3D+FI, indicating that 3D_MyoD+FI are compatible with myogenic and adipogenic function.

      Please see the revised Figure 8.

      Reviewer #2 (Recommendations For The Authors):

      In this study, the authors demonstrated a new approach for cultivated meat production using chicken fibroblasts. Specifically, the cells were cultured as 3D and induced muscle differentiation and lipid deposition. The manuscript contains a good set of data, which would be valuable to researchers in the fields of both cell-based meat and skeletal muscle biology. From the aspect of cultivated meat science, the rationale behind the idea is understandable, but it remains unclear whether the proposed approach was really the best choice to achieve their final goal. On the other hand, when we read this manuscript as a paper in skeletal muscle biology, the overall approach was not innovative enough and several uncertain issues remain. The authors should add more sufficient justifications, arguments, and discussions.

      (1) When considering their goal to produce edible meat products, the current approach has some concerns. First, there are issues with the approach used for the induction of myogenesis by MyoD transgene. This makes the end products GMO foods, which are not easily acceptable to a wide range of consumers. Next, the hydrogel was used for 3D tissue formation, but it is unclear whether this matrix type is edible, safe, and bio-comparable for cell-based meat production. The authors already discussed these points by excusing that the current work remains proof-of-concept. However, more careful considerations and justifications would be required.

      Thank you for the suggestion.

      We acknowledge that the current transgene myogenic induction method is not suitable for mass production of culture meat because of the GMO food concerns. We utilized the MyoD transgene as the means of myogenic transdifferentiation at the first place, because of the ease of genetic manipulation and maximum efficiency. We are current testing non-genomic integration tools such as chemical cocktails and modified RNAs for myogenic transdifferentiation.

      When it comes to the applications of hydrogel in the food industry, certain types of hybrid hydrogels, such as those made from pectin or sodium polyacrylate, are not only edible but also safe for consumption. While GelMA hydrogel is typically utilized in tissue engineering and subsequent implantation in patients for therapeutic regenerative medicine purposes, it has not been commonly employed in food processing. In this study, we cultivated cells within GelMA hydrogel due to its durability and ease of use in cell culture. Moving forward, we plan to investigate alternative types of matrices to develop cultured meat suitable for food applications.

      We have now described the GMO and hydrogel drawbacks in the discussion part. Please see lines 439-457.

      “As a proof-of-concept, we utilized the transgene method to achieve maximum myogenic induction and the final products still retain the foreign transgene fragment in the cells’ genome. It is therefore posing a risk of genetic modified food which is not suitable for mass production. In the next step, other non-transgenic means such as non-integrating vectors, chemical reprogramming, modified RNAs, and recombinant transgene removal techniques will be explored to develop transgene-free end products. Another food safety concern in this study is the use of GelMA hydrogel for culture meat production. Due to its excellent biocompatibility and mechanical flexibility, GelMA-based hydrogel has demonstrated significant potential in scalable 3D cell culture for creating artificial tissue ranging in sizes from millimeters to centimeters. It is widely used in 3D cell culture and tissue engineering for regenerative medicine, but less common in food processing and agricultural applications. Due to its special photo-crosslinking properties, biocompatibility and degradability, it allows this material to be shaped into complex tissue structures by 3D printing or modelling. Many researchers have also used GelMA hydrogel as a scaffold for culture meat production (Jeong et al., 2022; Li et al., 2021; Park et al., 2023). Later research will carefully consider hydrogel as well as other types of scaffold biomaterials for cost-effective and food-safety compliant culture meat production (Bomkamp et al., 2022). ”

      (2) From the view of skeletal muscle biology, the approaches (MyoD overexpression, hydrogel-based 3D tissue formation, and lipogenic induction) have already been tested.

      Thank you for the insightful comments from the perspective of skeletal muscle cell biology. We totally agree that the current approaches including MyoD overexpression, 3D cell culture and lipogenic induction, were routine experiments in muscle cell biology. However, we want to highlight that utilization of these classical and robust muscle cell approaches, combine with the unique advantages of fibroblast cells (easily accessible, immortalized, cost-effective, ...) would provide a novel and practical avenue for culture meat production. We stated these issues in the revised manuscript in the discussion part.

      Please see lines 511-515.

      “In conclusion, we have effectively utilized immortalized chicken fibroblasts in conjunction with classical myogenic/adipogenic transdifferentiation approaches within 3D hydrogel to establish a cultured meat model. This model allows for the precise regulation of the synthesis of key components found in conventional meat, including muscle, fat, and ECM.”

      (3) The common emphasis in this manuscript is to use the advantages of 3D culture for tissue differentiation. As the authors described, skeletal muscle is a highly aligned tissue. In this study, some results successfully demonstrated advantages in terms of myocyte alignment, maturation, and lipid deposition. However, the current results cannot address whether the entire 3D tissues maintained these advantageous characteristics or not. Because the method for 3D formation does not have any additional modifications to make the cells aligned, like micropatterning, scaffolding, or bioprinting.

      Thank you for the suggestion.

      We agree with the reviewer that the skeletal muscle tissues are composed of well organized, directional bundles of fibers, and the cell alignment would greatly affect the meat tenderness and sensory properties. Therefore, it is a desired attribute if the cells in the culture meat matrix could be aligned together. But this alignment would require sophisticated biomaterial engineering mainly involved in the scaffold manipulation which is beyond the scope of this study. The hydrogel used in this study formed different sizes of pores at random directions and we would expect the embedded cells to be totally non-directional. But we still found localized cell alignments in some parts of the gel matrix which confirming the cell-cell interactions, please see figure 3D. We describe this feature in the results part. In the future, we will be testing the application of physical or electrical stimulations to the matrix to see if we can align the cells better to make all the muscle cells in the whole matrix to align together.

      Please see lines 186-190.

      “The separate XY axis views of the orthogonal projections at different depths (Figure 3D) and a multi-angle video (Supplementary Video 2) also showed the several myotubes were aligned together. Nevertheless, many myotubes were oriented in different directions, preventing the entire matrix from aligning in one direction.”

      (4) In the skeletal muscle, fat accumulation mainly occurs in adipocytes between myocytes. This means that "intra-" muscular fat deposition is identified. However, lipid deposition within myocytes also occurred in this preparation (Supplementary Figure 7C). This situation is not "intra-" muscular accumulation, which sounds different from what is going on in normal skeletal muscle tissues. Please explain what happened and what biological situations accounted for this. Also, the authors should clarify better how lipogenesis was induced in the 3D tissues, such as cell types (transdifferentiated myocytes, remained/un-transdifferentiated fibroblasts, or both).

      Thank you for the very insightful question. We have revised the corresponding text to further explain the intramuscular fat distribution in different cell types in culture meat.

      We totally agree with the reviewer that intramuscular fat accumulation may occur mainly in the intramuscular adipocytes. However, under some pathological and physiological conditions in human and animals, the lipid droplets were also abundantly observed inside myofibers (intramyocellular lipids within myofiber cytoplasm). For instance, high intramyocellular lipid content was found in insulin resistance patients and paradoxically in endurance trained athletes, (doi.org/10.1016/j.tem.2012.05.009), as well as in some farm animals under intensive selective breeding (doi:10.2174/1876142910901010059). In the current study, with the Oil Red O staining of lipid droplets, we identified lipid deposition in both the transdifferentiated myocytes and the remained un-transdifferentiated fibroblasts in the culture meat. This lipid distribution pattern is comparable to the intramuscular fat storage pattern observed in some human and animals, in which fat accumulation occurs in both myofibers (intramyocellular lipids) and intramuscular adipocyte cells (extramyocellular lipids) which reside within the muscle tissue bundle but between myofibers. We reason that current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts. It is difficult to compare the absolute amount of lipids between these two types of cells via the Oil Red O staining. Also, it is almost impossible to separate these two types of cells from the 3D meat mimics. Thus, we can only confirm the lipid deposition occurs in both transdifferentiated myocytes and un-transdifferentiated fibroblasts, but without knowing which one is dominant and the major contributor to the intramuscular fat content in the culture meat.

      Please see lines 486-492.

      “In this study, the deposition of fat in the myotubes/myofibers facilitated the storage of significant lipid quantities in transdifferentiated muscle cells, known as intramyocellular lipids. Additionally, we observed Oil Red O staining in the remaining un-transdifferentiated fibroblasts, resembling cells of intramuscular adipocytes (extramyocellular lipids) found within muscle tissue. Hence, current adipogenic induction treatment caused lipogenesis in both the MyoD-transdifferentiated cells and un-transdifferentiated fibroblasts.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This manuscript is a valuable study of the responses of GPi neurons to DBS stimulation in human PD and dystonia patients and it finds evidence for altered short-term and long-term plasticity in response to DBS between the two patient populations. This data set is of interest to both basic and clinical researchers working in the field of DBS and movement disorders. While there was enthusiasm for the potential significance of these findings, support for their conclusions was incomplete. Thir data may be indicative of more interesting and complex interpretations than currently considered in the article. 

      The authors would like to express their gratitude to the Editorial Team and Reviewers for their invaluable feedback which helped to improve the manuscript.

      Reviewer #1:

      Summary:

      Sumarac et al investigate differences in globus pallidus internus (GPi) spike activity and short- and long-term plasticity of direct pathway projections in patients with Parkinson's disease (PD) and dystonia. Their main claims are that GPi neurons exhibit distinct characteristics in these two disorders, with PD associated with specific power-frequency oscillations and dystonia showing lower firing rates, increased burstiness, and less regular activity. Additionally, long-term plasticity and synaptic depression appear to differ between the two conditions. The authors suggest that these findings support the concept of hyperfunctional GPi output in PD and hypofunctional output in dystonia, possibly driven by variations in the plasticity of striato-pallidal synapses. Overall enthusiasm is relatively high, but I think the discussion omits discussing findings that don't align well with standard models. 

      Strengths: 

      These types of studies are valuable as the data arise from patients who have dystonia or PD. This could provide unique insights into disease pathophysiology that might not be recapitulated in animal systems work. 

      Thank you for the positive feedback.

      Weaknesses: 

      - The rate model and indirect/direct pathway ideas lack explanatory power; too much of the hypothesis generation and discussion in this manuscript is set in the context of these old ideas. Their data in my view emphasize this somewhat emphatically. Most patients with the 'hypokinetic' movement disorder PD have dystonia as a part of their motor features. Dystonia is a form of excessive muscle activation that on the one hand is 'hyperkinetic' but on the other usually decreases the speed of motor tasks, even in patients with primary dystonia. Similarly, PD patients display a bewildering variety of hyperkinetic manifestations as well (rest tremor, dystonia, dyskinesia). If these are truly independent classifications, i.e. hyper- versus hypo-kinetic, the authors must acknowledge that there is considerable overlap in the spike activity across groups - numerous dystonia patients display higher discharge rates than the majority of the PD sample. Based on the firing rate alone, it would not be possible to distinguish these groups. 

      Thank you for your insightful comments regarding the discussion of the rate model and the distinction between hyperkinetic and hypokinetic movement disorders. We acknowledge that the rate model, primarily derived from limited number of animal subjects [1], may not fully encapsulate the complexities of Parkinson's disease (PD) and dystonia. Our study aimed to validate animal model findings in humans by correlating single-neuron features with disease symptom severity. However, we concur with the Reviewer’s comment regarding the overlapping motor features in hypokinetic and hyperkinetic disorders. We can speculate that the overlap in neuronal properties may be reflected in the overlap of, for example, hyperkinetic features being also present in PD, as suggested by the Reviewer. Per the Reviewer’s request, we have now acknowledged this notion in the manuscript. Interestingly, hypokinetic symptoms have been reported to occur in dystonia in response to GPi-stimulation and have been associated with beta activity in the LFP [2], which reinforces the notion that neural activity may be more related to specific symptoms rather than diseases as a whole. Supplementing our analyses, in addition to total UPDRSIII scores, we have now provided correlations with only hypokinetic (i.e. bradykinesia) subscores of the UPDRSIII to focus on more direct assessment of hypokinetic features in PD versus hyperkinetic features in dystonia. We have updated our methods and results accordingly.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [2] R. Lofredi et al., “Pallidal Beta Activity Is Linked to Stimulation-Induced Slowness in Dystonia,” Movement Disorders, vol. 38, no. 5, pp. 894–899, 2023, doi: 10.1002/mds.29347.

      Amendments to the manuscript:

      “Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients.”

      “Given that UPDRSIII includes both hypokinetic and hyperkinetic symptoms of PD, we further sought to disaggregate the score by only considering items 23-26 in UPDRSIII, which assess hypokinetic symptoms of PD.”

      “… with a marginally stronger correlation for PD hypokinetic symptoms only (items 23-26 of UPDRSIII, Spearman's rho=0.32, p=.0330; Supplementary Fig. 3)”

      Supplementary Fig. 3: We provided correlations with hypokinetic (i.e., bradykinesia) subscore of the UPDRSIII. There is very little difference between correlation results of UPDRSIII total (Fig. 1) and the hypokinetic-only subscore (Supplementary Fig. 3).

      “though our results do not change substantially when only hypokinetic PD features are considered (Supplementary Fig. 3).”

      - If beta power is pathognomonic of parkinsonism, the authors found no differences in beta-related spike discharges across the groups. One would have predicted greater beta power in PD than in primary dystonia. This should be discussed explicitly and an interpretation should be provided. 

      We agree with the reviewer that considering the previous LFP literature, one might have expected a difference in single-neuron oscillation power between PD and dystonia. However, while prior studies [3], [4] have reported significant differences in oscillatory power between the two diseases, researchers examined local field potential (LFP) activity only. Other work [5] in non-human primates investigated single-neuron oscillations and reported no differences between PD and dystonia at the single-neuron level, in line with our findings. However, despite the lack of difference in overall power presented here, we provide evidence that the strength of the beta-frequency single-neuron oscillations nevertheless correlates with symptom severity in PD but not dystonia; whereas the strength of the theta-frequency single-neuron oscillations correlates with symptom severity in dystonia but not PD.

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      Amendments to the manuscript:

      “Although previous research has reported differences in the LFP power between PD and dystonia [27,28], a study in non-human primates found no such differences in single-neuron oscillatory strength [8], as reflected in our findings. However, despite a lack of difference in overall power across disorders, we were able to derive disease/frequency-specific relationships with respect to clinical scores (Fig. 1C; oscillatory features).”

      - The study lacks a healthy control group, making it challenging to differentiate disease-specific findings from normal variations in GPi activity and plasticity. Although this is acknowledged in the discussion, this complicates the interpretation of the results. The sample sizes for PD and dystonia patients are relatively small, and the study combines various forms of dystonia, potentially masking subtype-specific differences. A larger and more homogenous sample could enhance the study's reliability.

      Indeed, intraoperative microelectrode recordings cannot be obtained in healthy individuals. We agree with the Reviewer that this limits the interpretation of the data. However, directly comparing clinical correlations with single neuron readouts between two distinct clinical entities may, to some degree, compensate for the lack of healthy control data. This contrast, while not providing a healthy control, is still able to point to disease-specific differences. This approach has previously been used to comparisons at the LFP level [6]. While the sample size is indeed small, it is comparable or even higher to similar studies that have investigated the relation of symptom severity of single neuron readouts [7]. The Reviewer is right in that we do not differentiate between generalized or cervical dystonia. We chose to do so because our subgroup analysis provided in the Supplementary Material did not suggest specific differences; though there is insufficient data from specific dystonia subtypes to make formal statistical comparisons. Indeed, future studies should investigate specific subtypes further.

      [6] R. Lofredi et al., “Pallidal beta bursts in Parkinson’s disease and dystonia,” Movement Disorders, vol. 34, no. 3, pp. 420–424, 2019, doi: 10.1002/mds.27524.

      [7] A. Gulberti et al., “Subthalamic and nigral neurons are differentially modulated during parkinsonian gait,” Brain, p. awad006, Feb. 2023, doi: 10.1093/brain/awad006.

      Amendments to the manuscript:

      “While we did not observe differences across dystonia subtypes (Supplementary Fig. 1), future studies in larger patient cohorts would are warranted. Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - While they mention that data are available on request, sharing data openly would increase transparency and allow for independent validation of the results. It is unclear how sharing deidentified data would compromise patient privacy or present ethical issues of any kind, as claimed by the authors. 

      Much of the data in question were collected under an old Research Ethics Board (REB) protocol which did not address data sharing. However, we have consulted with our REB and gained retroactive permission to post de-identified data which are now available in the Supplementary Material.

      Amendments to the manuscript:

      “The data that support the findings of this study are available in a public repository (see: https://osf.io/nqzd2/)”

      - They appropriately acknowledge several limitations, such as the inability to use pharmacological interventions and the need for further research in the chronic setting. 

      Thank you for the comment.

      - The manuscript highlights differences in GPi activity and plasticity between PD and dystonia but could provide more context on the clinical implications of these findings, particularly regarding what the implications would be novel paradigms for deep brain stimulation. 

      Thank you for the comment. Our finding that striato-pallidal plasticity decays more slowly in dystonia compared to PD may relate to the slower time course of symptom relief associated with GPi-DBS in dystonia, as presently outlined in the discussion. On the other hand, symptoms are also suppressed for longer after the cessation of stimulation in dystonia compared to PD, which may reflect long-term plastic changes [8], [9]. In the context of clinical DBS, plasticity modulation may be facilitated by intermittent stimulation algorithms that may achieve the necessary plastic network change by applying stimulation for a defined time but could then be switched off for improved energy consumption and perhaps as a means of mitigating side effects. DBS devices with chronic sensing may enable monitoring of evoked potential amplitudes for future adaptive stimulation applications; however, currently available devices are limited by low sampling rates, but future devices may overcome these technical limitations.

      [8] D. Ruge et al., “Deep brain stimulation effects in dystonia: time course of electrophysiological changes in early treatment.,” Mov Disord, vol. 26, no. 10, pp. 1913–1921, Aug. 2011, doi: 10.1002/mds.23731.

      [9] D. Ruge et al., “Shaping reversibility? Long-term deep brain stimulation in dystonia: the relationship between effects on electrophysiology and clinical symptoms.,” Brain, vol. 134, no. Pt 7, pp. 2106–2115, Jul. 2011, doi: 10.1093/brain/awr122.

      Amendments to the manuscript:

      “While further work is certainly required to better understand disease-related differences in plasticity, our findings may nevertheless motivate the development of periodic intermittent (ON/OFF) DBS strategies which periodically modulate synaptic plasticity for therapeutic benefits which outlast stimulation delivery, as have recently been employed in preclinical work [52,53].”

      - While statistical tests are mentioned, the manuscript could benefit from a more detailed presentation of statistical methods, including correction for multiple comparisons and effect sizes. Did the authors consider different recording sites within each patient as independent observations? I think this is not appropriate if that was the case. 

      Thank you for your constructive feedback. In response to the concerns regarding the statistical methods, we have expanded our analysis to provide a more comprehensive statistical overview. Specifically, we implemented the Bonferroni correction for multiple comparisons across each of the seven tests conducted for the differences in single-neuron features between PD and dystonia. The adjustment revealed that only the burst index and coefficient of variation retain statistical significance after post hoc correction, while the firing rate does not. Results of the Bonferroni corrections are now presented in Supplementary Table 3. Reflecting on the initial comment about firing rates between the two disorders, our updated findings underscore the limitation of using firing rates alone to differentiate between PD and dystonia, and instead, our analysis now points to burstiness and firing irregularity as more reliable discriminators. Regarding the clinical correlations, we refined our statistical analysis by employing nonparametric Monte Carlo permutation tests with 5000 permutations, as used in recent work [10], [11]. This method is chosen for its independence from assumptions regarding data distribution. Specifically, we computed and tested the Spearman rho for significance using the permutation test. Then, to address multiple comparisons, we controlled the false discovery rate (FDR) using the Benjamini-Hochberg procedure. Results of these comparisons are now presented in Supplementary Table 4. Lastly, to address the concern regarding recording site independence within patients, we updated our plasticity analysis methodology. In our study, 6 out of 18 patients had multiple recording sites. Thus, to account for this, we employed linear mixed models (LMM) with patient ID as a random factor to appropriately account for the non-independence of these observations.

      [10] v Lofredi et al., “Dopamine-dependent scaling of subthalamic gamma bursts with movement velocity in patients with Parkinson’s disease,” Elife, vol. 7, p. e31895, Feb. 2018, doi: 10.7554/eLife.31895.

      [11] R. Lofredi et al., “Subthalamic beta bursts correlate with dopamine-dependent motor symptoms in 106 Parkinson’s patients,” npj Parkinsons Dis., vol. 9, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41531-022-00443-3.

      Amendments to the manuscript:

      “For comparing differences in single-neuron features between PD and dystonia, significant results were followed up with post hoc multiple comparisons with a Bonferroni correction. For clinical correlations, non-parametric Monte Carlo permutation tests were used, avoiding assumptions about data distribution. The tested values were randomly shuffled 5,000 times to form a probability distribution, with the p-value reflecting the original sample rank. All tests underwent adjustment for multiple comparisons, controlling the false discovery rate (FDR) at an α-level of 0.05.”

      “analyzed using a linear mixed model (LMM) with patient ID as a random factor, normalized fEP amplitudes as the response variable, and epoch as a fixed effect”

      “using a LMM with patient ID as a random factor”

      “However, none of the clinical correlations survived Benjamini-Hochberg FDR-correction for multiple comparisons (Supplementary Table 4).”

      “In PD, fEP amplitudes were significantly greater after compared to before HFS (LMM; p = .0075, effect size = 5.42 ± 1.79; Fig. 2C), while in dystonia, the increase approached but did not reach statistical significance (LMM; p = .0708, effect size = 2.82 ± 1.45; Fig. 2C).”

      All statistics were updated in the results section and the figures.

      “Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - The manuscript could elaborate on the potential mechanisms underlying the observed differences in GPi activity and plasticity and their relevance to the pathophysiology of PD and dystonia. 

      Thank you for your feedback. We have enhanced the manuscript by integrating additional discussions on previous studies related to plasticity in dystonia and PD (e.g., [12], [13]), which highlight excessive plasticity in dystonia. Although these may appear contradictory to our findings of increased plasticity in PD compared to dystonia, we propose (also justified by previous literature) that chronic dopaminergic medication use may lead to synaptic over-sensitization, which has been hypothesized as a biological mechanism underlying levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      Amendments to the manuscript:

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the magnitude of direct pathway plasticity [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Reviewer #2: 

      Summary: 

      The authors investigated how neuronal activity and metrics of plasticity using local electrical stimulation in the GPi were different between Parkinson's disease and dystonia patients. 

      Strengths: 

      The introduction highlights the importance of the work and the fundamental background needed to understand the rest of the paper. It also clearly lays out the novelty (i.e., that the dynamics of plastic effects in GPi between dystonia and PD have not been directly compared). 

      The methods are clearly described and the results are well organized in the figures. 

      The results are strong with measurements from a large population of patients for each disease group and with distinct findings for each group. 

      Thank you for the kind appraisal.

      Weaknesses: 

      The discussion was hard to follow in several places, making it difficult to fully appreciate how well the authors' claims and conclusions are justified by their data, mostly in relation to the plasticity results. It may help to summarize the relevant findings for each section first and then further expand on the interpretation, comparison with prior work, and broader significance. Currently, it is hard to follow each section without knowing which results are being discussed until the very end of the section. With the current wording in the "Neuronal correlates.." section, it is not always clear which results are from the current manuscript, and where the authors are referring to past work.

      Thank you for this feedback. The main findings are now summarized in a paragraph at the beginning of the Discussion section, before being discussed in comparison to other studies in the literature in subsequent sub-sections. Moreover, throughout the Discussion, findings from our study are now always reflected by a reference to the relevant figure to more easily differentiate current findings from previous literature. Additionally, Discussion sub-sections have been expanded to consider additional literature in response to various comments throughout the Review process (including the subsequent Review comment).

      Amendments to the manuscript:

      Paper findings are referenced to figures which depict the results at hand; discussion sub-sections expanded; and the following text has been added at the start of the Discussion:

      “In particular, we found that GPi neurons exhibited lower firing rates, but greater burstiness and variability in dystonia compared to PD (Fig. 1A). While no differences were found in the power of spiketrain oscillations across disorders (Fig. 1B), we found that PD symptom severity positively correlated with the power of low-beta frequency spiketrain oscillations, whereas dystonia symptom severity positively correlated with the power of theta frequency spiketrain oscillations (Fig. 1C). Dystonia symptom severity moreover correlated negatively with firing rate, and positively with neuronal variability. These results are discussed in greater detail with respect to previous literature in the subsequent Discussion section entitled “Neuronal correlates of PD and dystonia.” In response to electrical stimulation (protocol depicted in Fig. 2A), we found significant increases in the amplitudes of positive-going stimulation-evoked field potential amplitudes (considered to reflect striato-pallidal synaptic strength; as exemplified in Fig. 2B) before versus after HFS in both PD and dystonia (Fig. 2C); with recording sites in PD exhibiting significantly greater increases (Fig. 2D). While changes to evoked potential amplitude before versus after stimulation can be considered to be reflective of long-term plasticity [15,18], the dynamics of evoked potentials during HFS (as depicted in Fig. 2E) can be considered as reflective of short-term synaptic plasticity [18,21]. To this end, our findings are suggestive of faster latency synaptic depression in PD compared to dystonia (Fig. 2F/G). Plasticity findings are discussed in greater detail in the Discussion section entitled “Direct pathway plasticity.”

      Also, I felt that more discussion could be used to highlight the significance of the current results by comparing and/or contrasting them to prior relevant work and mechanisms. The novelty or impact is not very clear as written. Could this be further substantiated in the Discussion? 

      Thank you for the feedback. The discussion has been expanded to include additional literature that is relevant to the findings reported in the manuscript. For example, with regards to the neuronal correlates sub-section, we now highlight the important findings [15] that show changes to the discharge rates and oscillatory tendencies of GPi neurons in non-human primates in response to staged MPTP applications to progressively titrate motor severity; these results substantiate our lack of correlation with firing rates in PD, and presence of a clinical correlation with beta oscillations. We additionally now emphasize human studies that found LFP power difference between PD and dystonia [3], [4]; but simultaneously highlight studies that did not find such differences in spike-train oscillations (in non-human primates) [5], which is reflective of our own findings. With regards to our plasticity sub-section, we have added new content related to previous literature on plasticity in dystonia and PD (also addressed in response to a query from Reviewer #1). For example, we bring to light a variety of previous studies [12], [13] emphasizing excessive plasticity in dystonia. However, while such studies may seem to contradict our findings of greater plasticity in PD compared to dystonia, we additionally provide hypotheses (justified by previous literature) that prolonged used of dopaminergic medication may result in synaptic over-sensitization, thus giving rise to levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression. Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients. While differences in discharge rates were nevertheless observed between PD and dystonia, it may be that the combination of rate and pattern (reflected in the BI and CV) changes best differentiates the two disorders.”

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation (LTP) at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that LTP effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Some specific comments and questions about the Discussion: 

      Lines 209-211 - This sentence was hard to understand, could it be clarified? 

      Lines 211-213 - What do phasic and tonic components mean exactly? Could this be specifically defined? Are there specific timescales (as referred to in Intro)?

      Lines 215-217 - It's not clear what was delayed in dystonia, and how the authors are trying to contrast this with the faster time course in PD. I think some of this is explained in the introduction, but could also be re-summarized here as relevant to the results discussed. 

      Lines 223-224 - I'm not sure I follow the implication that network reorganization leads to delayed functional benefits. Could this be further elaborated? 

      Reply & Amendments to the manuscript: Thank you for your feedback. We've made the following concise revisions to address the comments:

      We've clarified lines 209-211 to explain that variations in electrical stimulation effects on pathways in PD and dystonia may reveal the operational mechanisms of DBS, despite a common target:

      “The variation in the modulation of these projections / pathways to electrical stimulation may also indicate the mechanism by which DBS operates across PD and dystonia, despite a common stimulation target.”

      In response to the second comment on lines 211-213 about phasic and tonic components, we now specify that phasic refers to dynamic muscle contractions, and tonic to continuous muscle contractions, providing clear definitions relevant to our context:

      “Clinical studies in dystonia have shown that DBS leads to a more rapid improvement in the transient, dynamic muscle contractions (phasic components) of the disorder when compared to the sustained, continuous muscle contractions (tonic or fixed components) [33]”

      For lines 215-217, we've refined our discussion to clearly contrast the delayed response in dystonia with the faster onset in PD:

      “This contrast with PD, where the, the maximal clinical response to DBS occurs within a much faster time course [13,36].”

      On lines 223-224, we've expanded the explanation of how network reorganization may lead to delayed functional benefits, highlighting adjustments in neural connectivity and synaptic efficacy in response to stimulation:

      “which involves adjustments in neural connectivity or synaptic efficacy in response to the stimulation [14,35].”

      Could the absence of a relationship between FR and disease in PD be discussed? 

      Thank you for raising this point. Despite observing higher firing rates in PD compared to dystonia, it is unexpected that these rates do not correlate with symptom severity according to the rate model of PD [1]. However, despite the lack of correlations with firing rates, our findings align with similar animal work of Muralidharan et al. [15], which reported that neuronal firing rates within the GPi of rhesus monkeys did not increase linearly with respect to varying intensities of parkinsonian motor severity. We did however show that low beta oscillatory strength within the GPi may play a significant role in the manifestation of motor symptoms in PD; which is also in line with findings of Muralidharan and colleagues. As per the Reviewer’s request, we have included this content into our discussion.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression.”

      “Indeed, Muralidharan and colleagues [25] also showed linear group-level relationships between low-beta frequency spiketrain oscillations and disease severity in parkinsonian non-human primates, despite the lack of linear relationships with spike discharge rates (as discussed above).”

      It wasn't very clear how the direct pathway can be attributed to plasticity changes if the GPi makes up both the direct and indirect pathways. Could this be further clarified? 

      The reviewer brings up an important nuanced point. Recent work from our lab [16] shows that inhibitory evoked fields in STN (which receives inhibitory fields from GPe; no other inhibitory sources) are persistent with very minimal depression during HFS. On the other hand, inhibitory fields in the SNr (which receives majority of its inhibitory inputs from striatum; though some come by way of GPe as well per anatomical literature) depress quickly. We have previously also shown these rapidly depressing fields in GPi [17], [18], which also receives the majority of its inhibitory inputs via striatum, though some also from GPe. As such, the disaggregation of striatum-mediated versus GPe-mediated inhibitory fields is achieved based on: lack of rapidly depressing inhibitory evoked field potentials in STN (which receives inhibitory inputs via GPe and not striatum), but a common presence of rapidly depressing evoked field potentials in SNr and GPi (which both receive most of their inhibitory inputs from striatum); differences in the morphology of purportedly GPe- (fast latency) versus striatum-mediated (slow latency) evoked field potentials [16]; and the presence of slow latency caudato-nigral evoked field potentials in slices [19] that are reversed by GABA antagonist application [20]. These points are indeed outlined in the first paragraph of the Discussion sub-section “Direct pathway plasticity.” However, we have now additionally added a point to the Limitations that inhibitory inputs to the GPi also come by way of GPe, though in a lesser abundance.

      [16] L. A. Steiner et al., “Persistent synaptic inhibition of the subthalamic nucleus by high frequency stimulation,” Brain Stimul, vol. 15, no. 5, pp. 1223–1232, 2022, doi: 10.1016/j.brs.2022.08.020.

      [17] L. D. Liu, I. A. Prescott, J. O. Dostrovsky, M. Hodaie, A. M. Lozano, and W. D. Hutchison, “Frequency-dependent effects of electrical stimulation in the globus pallidus of dystonia patients.,” J Neurophysiol, vol. 108, no. 1, pp. 5–17, Jul. 2012, doi: 10.1152/jn.00527.2011.

      [18] L. Milosevic et al., “Modulation of inhibitory plasticity in basal ganglia output nuclei of patients with Parkinson’s disease,” Neurobiology of Disease, vol. 124, pp. 46–56, Apr. 2019, doi: 10.1016/j.nbd.2018.10.020.

      [19] M. Yoshida and W. Precht, “Monosynaptic inhibition of neurons of the substantia nigra by caudato-nigral fibers,” Brain Res, vol. 32, no. 1, pp. 225–228, Sep. 1971, doi: 10.1016/0006-8993(71)90170-3.

      [20] W. Precht and M. Yoshida, “Blockage of caudate-evoked inhibition of neurons in the substantia nigra by picrotoxin,” Brain Res, vol. 32, no. 1, pp. 229–233, Sep. 1971, doi: 10.1016/0006-8993(71)90171-5.

      Amendments to the manuscript:

      “Indeed, GPi receives the greatest abundance of inhibitory inputs from striatum (direct pathway), but also it also receives inhibitory inputs by way of GPe (indirect pathway). Although we can functionally disaggregate these pathway-specific responses based on differences in morphology and dynamics of GPe-mediated versus striatum-mediated inhibitory fEPs [21]; the possibility of compounded effects cannot be completely ruled out.”

      The mechanism of short- and long-term plasticity as applied in the protocols used in this work are outlined in reference to previous citations [15, 16, 18]. Because this is a central aspect of the current work and interpreting the results, it was difficult to appreciate how these protocols provide distinct metrics of short and long-term plasticity in GPi without some explanation of how it applies to the current work and the specific mechanisms. It would also help to be able to better link how the results fit with the broader conclusions. 

      Short-term plasticity is measured as the dynamic change to the fEP during ongoing HFS. For long-term plasticity analyses, the fEP amplitudes during LFS were compared pre- versus post-HFS. To make this analysis more intuitive we have added a protocol illustration to Fig 2. We have moreover greatly expanded the discussion to include more literature related to disease-specific differences in plasticity, and implications of modulating plasticity using DBS.

      Amendments to the manuscript:

      Added new panel to Fig 2

      Author response image 1.

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      In the Conclusion, it was difficult to understand the sentence about microcircuit interaction (line 232) and how it selectively modulates the efficacy of target synapses. Some further explanation here would be helpful. Also, it was not clear how these investigations (line 237) provide cellular-level support for closed-loop targeting. Could the reference to closed-loop targeting also be further explained? 

      We agree with the reviewer that the current wording may be confusing. We have changed the wording to be clearer. We have additionally added content related to closed-loop DBS based on chronic monitoring of evoked potential responses.

      Amendments to the manuscript:

      “Furthermore, chronic monitoring of evoked fields may allow for tracking of subcortical neuronal projections as indexed by inhibitory fields reported in this study. microcircuit interaction to selectively modulate the efficacy of target synapses.”

      future applications of DBS may also benefit from closed loop tuning of basal-ganglia-thalamo-cortical circuit dynamics and plasticity through chronic monitoring of evoked potential responses [56].

      How is the burst index calculated (Methods)? 

      Thank you for pointing out that the burst index definition was missing from the paper. It has now been added to the manuscript.

      Amendments to the manuscript:

      “The burst index was computed by taking the ratio of the means from a two-component Gaussian mixture model applied to the log interspike interval distribution, a modification of the previous mode-over-mean ISI method [20]”

      Figures and figure captions are missing some details:

      Fig. 1 - What does shading represent? 

      The shading in Fig. 1 illustrates results that were significant before adjustment for multiple comparisons.

      Amendments to the manuscript:

      “Depicted scatterplots are results that were significant before correction for multiple comparisons”

      Fig. 2 - Can the stimulation artifact be labeled so as not to be confused with the physiological signal? Is A representing the average of all patients or just one example? Are there confidence intervals for this data as it's not clear if the curves are significantly different or not (may not be important to show if just one example)? Same for D. What is being plotted in E? Is this the exponential fitted on data? Can this be stated in the figure citation directly so readers don't have to find it in the text, where it may not be directly obvious which figure the analyses are being applied towards? 

      Thank you for your comments regarding Fig. 2. We have made the following revisions to address the concerns:

      To clarify the presence of stimulation artifacts and differentiate them from the physiological signal, we have updated Panel B and E in the updated Fig. 2 which highlight the stimulation artifacts accordingly.

      Regarding the comment about Panel A (now B in the updated figure), it represents one single example per disease, rather than an average of all patients.

      In response to the comment about what is plotted in Panel E, we have revised the figure caption to explicitly state that it includes the exponential fit on the data.

      Amendments to the manuscript:

      Figure 2 panel B and E now highlight stimulation artifacts.

      Author response image 2.

      Author response image 3.

      The figure captions could use more details, that can be taken from the text, so that readers can understand figures without searching for relevant details across the paper. 

      Thank you for your feedback. We have revised the figure captions accordingly to provide more details.

      Amendments to the manuscript:

      “Fig 1 – GPi spiketrain feature analyses and clinical correlates of PD and dystonia. (A) With respect to (A) rate-based spiketrain features, firing rate was greater in PD while burst index (BI) and coefficient of variation (CV) were greater in dystonia; whereas no differences were found for (B) oscillatory spiketrain features for theta, alpha, low beta, high beta frequencies. MWU statistical results depicted are not corrected for multiple comparisons; after correction using the Bonferroni method, only CV and BI results remain significant (please see Supplementary Table 3). (C) In PD, the power of low beta spiketrain oscillations positively correlated (Spearman correlation) with symptom severity; in dystonia, neuronal firing rate negatively correlated with symptom severity, whereas CV and the power of theta spiketrain oscillations positively correlated with symptom severity. Depicted scatterplots are results that were significant before correction for multiple comparisons; however, none of the results persist after Benjamini-Hochberg correction for false discovery rate (please see Supplementary Table 4).”

      “Fig 2 – Long-term and short-term effects of HFS on striato-pallidal plasticity in PD and dystonia. (A) Schematic of the plasticity protocol to assess long-term plasticity via fEP amplitude comparisons pre- versus post-HFS and short-term plasticity via fEP dynamics during HFS. (B) Highlights example fEP traces for measuring long-term plasticity pre- versus post-HFS, with (C) displaying group-level fEP amplitudes pre- versus post-HFS across diseases. (D) Illustrates the amount of plasticity (i.e., percentage change in fEP amplitudes pre- versus post-HFS) in both PD and dystonia, with PD showing higher levels of plasticity. (E) Provides an example of fEP traces during HFS for assessing short-term plasticity, with (F) depicting group-level decay rates of fEP amplitudes using an exponential fit on the fEP amplitudes over the first 5 stimulus pulses across diseases. (G) Shows the half-life of the fitted exponential (i.e., rate of attenuation of fEP amplitudes) between PD and dystonia, with PD demonstrating faster fEP attenuation.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of reviewers’ comments and our revisions: 

      We thank the reviewers for their thoughtful feedback. This feedback has motivated multiple revisions and additions that, in our view, have greatly improved the manuscript. This is especially true with regard to a major goal of this study: clearly defining existing scientific perspectives and delineating their decoding implications. In addition to building on this conceptual goal, we have expanded existing analyses and have added a new analysis of generalization using a newly collected dataset. We expect the manuscript will be of very broad interest, both to those interested in BCI development and to those interested in fundamental properties of neural population activity and its relationship with behavior.

      Importantly, all reviewers were convinced that MINT provided excellent performance, when benchmarked against existing methods, across a broad range of standard tasks:

      “their method shows impressive performance compared to more traditional decoding approaches” (R1) 

      “The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method.” (R2) 

      “The fact that performance on stereotyped tasks is high is interesting and informative…” (R3)

      This is important. It is challenging to design a decoder that performs consistently across multiple domains and across multiple situations (including both decoding and neural state estimation). MINT does so. MINT consistently outperformed existing lightweight ‘interpretable’ decoders, despite being a lightweight interpretable decoder itself. MINT was very competitive with expressive machine-learning methods, yet has advantages in flexibility and simplicity that more ‘brute force’ methods do not. We made a great many comparisons, and MINT was consistently a strong performer. Of the many comparisons we made, there was only one where MINT was at a modest disadvantage, and it was for a dataset where all methods performed poorly. No other method we tested was as consistent. For example, although the GRU and the feedforward network were often competitive with MINT (and better than MINT in the one case mentioned above), there were multiple other situations where they performed less well and a few situations where they performed poorly. Moreover, no other existing decoder naturally estimates the neural state while also readily decoding, without retraining, a broad range of behavioral variables.

      R1 and R2 were very positive about the broader impacts of the study. They stressed its impact both on decoder design, and on how our field thinks, scientifically, about the population response in motor areas: 

      “This paper presents an innovative decoding approach for brain-computer interfaces” (R1)

      “presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour” (R1)

      “the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field” (R1)

      “The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality” (R2)

      “This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design.” (R2)

      “this work is also broadly impactful for neuroscientific analysis... Thus, MINT will likely impact neuroscience research generally.” (R2)

      We agree with these assessments, and have made multiple revisions to further play into these strengths. As one example, the addition of Figure 1b (and 6b) makes this the first study, to our knowledge, to fully and concretely illustrate this emerging scientific perspective and its decoding implications. This is important, because multiple observations convince us that the field is likely to move away from the traditional perspective in Figure 1a, and towards that in Figure 1b. We also agree with the handful of weaknesses R1 and R2 noted. The manuscript has been revised accordingly. The major weakness noted by R1 was the need to be explicit regarding when we suspect MINT would (and wouldn’t) work well in other brain areas. In non-motor areas, the structure of the data may be poorly matched with MINT’s assumptions. We agree that this is likely to be true, and thus agree with the importance of clarifying this topic for the reader. The revision now does so. R1 also wished to know whether existing methods might benefit from including trial-averaged data during training, something we now explore and document (see detailed responses below). R2 noted two weaknesses: 1) The need to better support (with expanded analysis) the statement that neural and behavioral trajectories are non-isometric, and 2) The need to more rigorously define the ‘mesh’. We agree entirely with both suggestions, and the revision has been strengthened by following them (see detailed responses below).

      R3 also saw strengths to the work, stating that:

      “This paper is well-structured and its main idea is clear.” 

      “The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories.” 

      “The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength.”

      However, R3 also expressed two sizable concerns. The first is that MINT might have onerous memory requirements. The manuscript now clarifies that MINT has modest memory requirements. These do not scale unfavorably as the reviewer was concerned they might. The second concern is that MINT is: 

      “essentially a table-lookup rather than a model.”

      Although we don’t agree, the concern makes sense and may be shared by many readers, especially those who take a particular scientific perspective. Pondering this concern thus gave us the opportunity to modify the manuscript in ways that support its broader impact. Our revisions had two goals: 1) clarify the ways in which MINT is far more flexible than a lookup-table, and 2) better describe the dominant scientific perspectives and their decoding implications.

      The heart of R3’s concern is the opinion that MINT is an effective but unprincipled hack suitable for situations where movements are reasonably stereotyped. Of course, many tasks involve stereotyped movements (e.g. handwriting characters), so MINT would still be useful. Nevertheless, if MINT is not principled, other decode methods would often be preferable because they could (unlike MINT in R3’s opinion) gain flexibility by leveraging an accurate model. Most of R3’s comments flow from this fundamental concern: 

      “This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model.”

      “MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks.”

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not).”

      The manuscript has been revised to clarify that MINT is considerably more flexible than a lookup table, even though a lookup table is used as a first step. Yet, on its own, this does not fully address R3’s concern. The quotes above highlight that R3 is making a standard assumption in our field: that there exists a “movement space and associated neural space”. Under this perspective, one should, as R3 argues fully explore the movement space. This would perforce fully explore the associated neural subspace. One can then “model the neural subspace and its association to movement”. MINT does not use a model of this type, and thus (from R3’s perspective) does not appear to use a model at all. A major goal of our study is to question this traditional perspective. We have thus added a new figure to highlight the contrast between the traditional (Figure 1a) and new (Figure 1b) scientific perspectives, and to clarify their decoding implications.

      While we favor the new perspective (Figure 1b), we concede that R3 may not share our view. This is fine. Part of the reason we believe this study is timely, and will be broadly read, is that it raises a topic of emerging interest where there is definitely room for debate. If we are misguided – i.e. if Figure 1a is the correct perspective – then many of R3’s concerns would be on target: MINT could still be useful, but traditional methods that make the traditional assumptions in Figure 1a would often be preferable. However, if the emerging perspective in Figure 1b is more accurate, then MINT’s assumptions would be better aligned with the data than those of traditional methods, making it a more (not less) principled choice.

      Our study provides new evidence in support of Figure 1b, while also synthesizing existing evidence from other recent studies. In addition to Figure 2, the new analysis of generalization further supports Figure 1b. Also supporting Figure 1b is the analysis in which MINT’s decoding advantage, over a traditional decoder, disappears when simulated data approximate the traditional perspective in Figure 1a.

      That said, we agree that the present study cannot fully resolve whether Figure 1a or 1b is more accurate. Doing so will take multiple studies with different approaches (indeed we are currently preparing other manuscripts on this topic). Yet we still have an informed scientific opinion, derived from past, present and yet-to-be-published observations. Our opinion is that Figure 1b is the more accurate perspective. This possibility makes it reasonable to explore the potential virtues of a decoding method whose assumptions are well-aligned with that perspective. MINT is such a method. As expected under Figure 1b, MINT outperforms traditional interpretable decoders in every single case we studied. 

      As noted above, we have added a new generalization-focused analysis (Figure 6) based on a newly collected dataset. We did so because R3’s comments highlight a deep point: which scientific perspective one takes has strong implications regarding decoder generalization. These implications are now illustrated in the new Figure 6a and 6b. Under Figure 6a, it is possible, as R3 suggests, to explore “the whole movement space and associated neural space” during training. However, under Figure 6b, expectations are very different. Generalization will be ‘easy’ when new trajectories are near the training-set trajectories. In this case, MINT should generalize well as should other methods. In contrast, generalization will be ‘hard’ when new neural trajectories have novel shapes and occupy previously unseen regions / dimensions. In this case, all current methods, including MINT, are likely to fail. R3 points out that traditional decoders have sometimes generalized well to new tasks (e.g. from center-out to ‘pinball’) when cursor movements occur in the same physical workspace. These findings could be taken to support Figure 6a, but are equally consistent with ‘easy’ generalization in Figure 6b. To explore this topic, the new analysis in Figure 6c-g considers conditions that are intended to span the range from easy to hard. Results are consistent with the predictions of Figure 6b. 

      We believe the manuscript has been significantly improved by these additions. The revisions help the manuscript achieve its twin goals: 1) introduce a novel class of decoder that performs very well despite being very simple, and 2) describe properties of motor-cortex activity that will matter for decoders of all varieties.

      Reviewer #1: 

      Summary: 

      This paper presents an innovative decoding approach for brain-computer interfaces (BCIs), introducing a new method named MINT. The authors develop a trajectory-centric approach to decode behaviors across several different datasets, including eight empirical datasets from the Neural Latents Benchmark. Overall, the paper is well written and their method shows impressive performance compared to more traditional decoding approaches that use a simpler approach. While there are some concerns (see below), the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field. 

      We thank the reviewer for these comments. We share their enthusiasm for the trajectory-centric approach, and we are in complete agreement that this perspective has both scientific and decoding implications. The revision expands upon these strengths.

      Strengths: 

      The adoption of a trajectory-centric approach that utilizes statistical constraints presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour. This is one of the strongest aspects of the paper. 

      Again, thank you. We also expect the trajectory-centric perspective to have a broad impact, given its relevance to both decoding and to thinking about manifolds.

      The thorough evaluation of the method across various datasets serves as an assurance that the superior performance of MINT is not a result of overfitting. The comparative simplicity of the method in contrast to many neural network approaches is refreshing and should facilitate broader applicability. 

      Thank you. We were similarly pleased to see such a simple method perform so well. We also agree that, while neural-network approaches will always be important, it is desirable to also possess simple ‘interpretable’ alternatives.

      Weaknesses:  

      Comment 1) Scope: Despite the impressive performance of MINT across multiple datasets, it seems predominantly applicable to M1/S1 data. Only one of the eight empirical datasets comes from an area outside the motor/somatosensory cortex. It would be beneficial if the authors could expand further on how the method might perform with other brain regions that do not exhibit low tangling or do not have a clear trial structure (e.g. decoding of position or head direction from hippocampus) 

      We agree entirely. Population activity in many brain areas (especially outside the motor system) presumably will often not have the properties upon which MINT’s assumptions are built. This doesn’t necessarily mean that MINT would perform badly. Using simulated data, we have found that MINT can perform surprisingly well even when some of its assumptions are violated. Yet at the same time, when MINT’s assumptions don’t apply, one would likely prefer to use other methods. This is, after all, one of the broader themes of the present study: it is beneficial to match decoding assumptions to empirical properties. We have thus added a section on this topic early in the Discussion: 

      “In contrast, MINT and the Kalman filter performed comparably on simulated data that better approximated the assumptions in Figure 1a. Thus, MINT is not a ‘better’ algorithm – simply better aligned with the empirical properties of motor cortex data. This highlights an important caveat. Although MINT performs well when decoding from motor areas, its assumptions may be a poor match in other areas (e.g. the hippocampus). MINT performed well on two non-motor-cortex datasets – Area2_Bump (S1) and DMFC_RSG (dorsomedial frontal cortex) – yet there will presumably be other brain areas and/or contexts where one would prefer a different method that makes assumptions appropriate for that area.”

      Comment 2) When comparing methods, the neural trajectories of MINT are based on averaged trials, while the comparison methods are trained on single trials. An additional analysis might help in disentangling the effect of the trial averaging. For this, the authors could average the input across trials for all decoders, establishing a baseline for averaged trials. Note that inference should still be done on single trials. Performance can then be visualized across different values of N, which denotes the number of averaged trials used for training. 

      We explored this question and found that the non-MINT decoders are harmed, not helped, by the inclusion of trial-averaged responses in the training set. This is presumably because the statistics of trialaveraged responses don’t resemble what will be observed during decoding. This statistical mismatch, between training and decoding, hurts most methods. It doesn’t hurt MINT, because MINT doesn’t ‘train’ in the normal way. It simply needs to know rates, and trial-averaging is a natural way to obtain them. To describe the new analysis, we have added the following to the text.

      “We also investigated the possibility that MINT gained its performance advantage simply by having access to trial-averaged neural trajectories during training, while all other methods were trained on single-trial data. This difference arises from the fundamental requirements of the decoder architectures: MINT needs to estimate typical trajectories while other methods don’t. Yet it might still be the case that other methods would benefit from including trial-averaged data in the training set, in addition to single-trial data. Alternatively, this might harm performance by creating a mismatch, between training and decoding, in the statistics of decoder inputs. We found that the latter was indeed the case: all non-MINT methods performed better when trained purely on single-trial data.”

      Reviewer #2:

      Summary: 

      The goal of this paper is to present a new method, termed MINT, for decoding behavioral states from neural spiking data. MINT is a statistical method which, in addition to outputting a decoded behavioral state, also provides soft information regarding the likelihood of that behavioral state based on the neural data. The innovation in this approach is neural states are assumed to come from sparsely distributed neural trajectories with low tangling, meaning that neural trajectories (time sequences of neural states) are sparse in the high-dimensional space of neural spiking activity and that two dissimilar neural trajectories tend to correspond to dissimilar behavioral trajectories. The authors support these assumptions through analysis of previously collected data, and then validate the performance of their method by comparing it to a suite of alternative approaches. The authors attribute the typically improved decoding performance by MINT to its assumptions being more faithfully aligned to the properties of neural spiking data relative to assumptions made by the alternatives. 

      We thank the reviewer for this accurate summary, and for highlighting the subtle but important fact that MINT provides information regarding likelihoods. The revision includes a new analysis (Figure 6e) illustrating one potential way to leverage knowledge of likelihoods.

      Strengths:  

      The paper did an excellent job critically evaluating common assumptions made by neural analytical methods, such as neural state being low-dimensional relative to the number of recorded neurons. The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality. 

      Thank you. We also hope that the shift in perspective is the most important contribution of the study. This shift matters both scientifically and for decoder design. The revision expands on this strength. The scientific alternatives are now more clearly and concretely illustrated (especially see Figure 1a,b and Figure 6a,b). We also further explore their decoding implications with new data (Figure 6c-g).

      The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method. The authors also provided nice intuition regarding why MINT may offer performance improvement in some cases and in which instances MINT may not perform as well. 

      Thank you. We were pleased to be able to provide comparisons across so many datasets (we are grateful to the Neural Latents Benchmark for making this possible).

      In addition to providing a philosophical discussion as to the advantages of MINT and benchmarking against alternatives, the authors also provided a detailed description of practical considerations. This included training time, amount of training data, robustness to data loss or changes in the data, and interpretability. These considerations not only provided objective evaluation of practical aspects but also provided insights to the flexibility and robustness of the method as they relate back to the underlying assumptions and construction of the approach. 

      Thank you. We are glad that these sections were appreciated. MINT’s simplicity and interpretability are indeed helpful in multiple ways, and afford opportunities for interesting future extensions. One potential benefit of interpretability is now explored in the newly added Figure 6e. 

      Impact: 

      This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design. However, this work is also broadly impactful for neuroscientific analysis to relate neural spiking activity to observable behavioral features. Thus, MINT will likely impact neuroscience research generally. The methods are made publicly available, and the datasets used are all in public repositories, which facilitates adoption and validation of this method within the greater scientific community. 

      Again, thank you. We have similar hopes for this study.

      Weaknesses (1 & 2 are related, and we have switched their order in addressing them): 

      Comment 2) With regards to the idea of neural and behavioral trajectories having different geometries, this is dependent on what behavioral variables are selected. In the example for Fig 2a, the behavior is reach position. The geometry of the behavioral trajectory of interest would look different if instead the behavior of interest was reach velocity. The paper would be strengthened by acknowledgement that geometries of trajectories are shaped by extrinsic choices rather than (or as much as they are) intrinsic properties of the data. 

      We agree. Indeed, we almost added a section to the original manuscript on this exact topic. We have now done so:

      “A potential concern regarding the analyses in Figure 2c,d is that they require explicit choices of behavioral variables: muscle population activity in Figure 2c and angular phase and velocity in Figure 2d. Perhaps these choices were misguided. Might neural and behavioral geometries become similar if one chooses ‘the right’ set of behavioral variables? This concern relates to the venerable search for movement parameters that are reliably encoded by motor cortex activity [69, 92–95]. If one chooses the wrong set of parameters (e.g. chooses muscle activity when one should have chosen joint angles) then of course neural and behavioral geometries will appear non-isometric. There are two reasons why this ‘wrong parameter choice’ explanation is unlikely to account for the results in Figure 2c,d. First, consider the implications of the left-hand side of Figure 2d. A small kinematic distance implies that angular position and velocity are nearly identical for the two moments being compared. Yet the corresponding pair of neural states can be quite distant. Under the concern above, this distance would be due to other encoded behavioral variables – perhaps joint angle and joint velocity – differing between those two moments. However, there are not enough degrees of freedom in this task to make this plausible. The shoulder remains at a fixed position (because the head is fixed) and the wrist has limited mobility due to the pedal design [60]. Thus, shoulder and elbow angles are almost completely determined by cycle phase. More generally, ‘external variables’ (positions, angles, and their derivatives) are unlikely to differ more than slightly when phase and angular velocity are matched. Muscle activity could be different because many muscles act on each joint, creating redundancy. However, as illustrated in Figure 2c, the key effect is just as clear when analyzing muscle activity. Thus, the above concern seems unlikely even if it can’t be ruled out entirely. A broader reason to doubt the ‘wrong parameter choice’ proposition is that it provides a vague explanation for a phenomenon that already has a straightforward explanation. A lack of isometry between the neural population response and behavior is expected when neural-trajectory tangling is low and output-null factors are plentiful [55, 60]. For example, in networks that generate muscle activity, neural and muscle-activity trajectories are far from isometric [52, 58, 60]. Given this straightforward explanation, and given repeated failures over decades to find the ‘correct’ parameters (muscle activity, movement direction, etc.) that create neural-behavior isometry, it seems reasonable to conclude that no such isometry exists.”

      Comment 1) The authors posit that neural and behavioral trajectories are non-isometric. To support this point, they look at distances between neural states and distances between the corresponding behavioral states, in order to demonstrate that there are differences in these distances in each respective space. This supports the idea that neural states and behavioral states are non-isometric but does not directly address their point. In order to say the trajectories are non-isometric, it would be better to look at pairs of distances between corresponding trajectories in each space. 

      We like this idea and have added such an analysis. To be clear, we like the original analysis too: isometry predicts that neural and behavioral distances (for corresponding pairs of points) should be strongly correlated, and that small behavioral distances should not be associated with large neural distances. These predictions are not true, providing a strong argument against isometry. However, we also like the reviewer’s suggestion, and have added such an analysis. It makes the same larger point, and also reveals some additional facts (e.g. it reveals that muscle-geometry is more related to neural-geometry than is kinematic-geometry). The new analysis is described in the following section:

      “We further explored the topic of isometry by considering pairs of distances. To do so, we chose two random neural states and computed their distance, yielding dneural1. We repeated this process, yielding dneural2. We then computed the corresponding pair of distances in muscle space (dmuscle1 and dmuscle2) and kinematic space (dkin1 and dkin2). We considered cases where dneural1 was meaningfully larger than (or smaller than) dneural2, and asked whether the behavioral variables had the same relationship; e.g. was dmuscle1 also larger than dmuscle2? For kinematics, this relationship was weak: across 100,000 comparisons, the sign of dkin1 − dkin2 agreed with dneural1 − dneural2 only 67.3% of the time (with 50% being chance). The relationship was much stronger for muscles: the sign of dmuscle1 − dmuscle2 agreed with dneural1 − dneural2 79.2% of the time, which is far more than expected by chance yet also far from what is expected given isometry (e.g. the sign agrees 99.7% of the time for the truly isometric control data in Figure 2e). Indeed there were multiple moments during this task when dneural1 was much larger than dneural2, yet dmuscle1 was smaller than dmuscle2. These observations are consistent with the proposal that neural trajectories resemble muscle trajectories in some dimensions, but with additional output-null dimensions that break the isometry [60].”

      Comment 3) The approach is built up on the idea of creating a "mesh" structure of possible states. In the body of the paper the definition of the mesh was not entirely clear and I could not find in the methods a more rigorous explicit definition. Since the mesh is integral to the approach, the paper would be improved with more description of this component. 

      This is a fair criticism. Although MINTs actual operations were well-documented, how those operations mapped onto the term ‘mesh’ was, we agree, a bit vague. The definition of the mesh is a bit subtle because it only emerges during decoding rather than being precomputed. This is part of what gives MINT much more flexibility than a lookup table. We have added the following to the manuscript.

      “We use the term ‘mesh’ to describe the scaffolding created by the training-set trajectories and the interpolated states that arise at runtime. The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations.

      Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local”

      We have also added Figure 4d. This new analysis documents the fact that decoded states are near trainingset trajectories, which is why the term ‘mesh’ is appropriate.

      Reviewer #3:

      Summary:  

      This manuscript develops a new method termed MINT for decoding of behavior. The method is essentially a table-lookup rather than a model. Within a given stereotyped task, MINT tabulates averaged firing rate trajectories of neurons (neural states) and corresponding averaged behavioral trajectories as stereotypes to construct a library. For a test trial with a realized neural trajectory, it then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior. The method can also interpolate between these tabulated trajectories. The authors mention that the method is based on three key assumptions: (1) Neural states may not be embedded in a lowdimensional subspace, but rather in a high-dimensional space. (2) Neural trajectories are sparsely distributed under different behavioral conditions. (3) These neural states traverse trajectories in a stereotyped order.  

      The authors conducted multiple analyses to validate MINT, demonstrating its decoding of behavioral trajectories in simulations and datasets (Figures 3, 4). The main behavior decoding comparison is shown in Figure 4. In stereotyped tasks, decoding performance is comparable (M_Cycle, MC_Maze) or better (Area 2_Bump) than other linear/nonlinear algorithms

      (Figure 4). However, MINT underperforms for the MC_RTT task, which is less stereotyped (Figure 4).  

      This paper is well-structured and its main idea is clear. The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories. The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength. However, I have several major concerns. I believe several of the conclusions in the paper, which are also emphasized in the abstract, are not accurate or supported, especially about generalization, computational scalability, and utility for BCIs. MINT is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling. These aspects will limit MINT's utility for real-world BCIs and tasks. These properties will also limit MINT's generalizability from task to task, which is important for BCIs and thus is commonly demonstrated in BCI experiments with other decoders without any retraining. Furthermore, MINT's computational and memory requirements can be prohibitive it seems. Finally, as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations. I expand on these concerns below.  

      We thank the reviewer for pointing out weaknesses in our framing and presentation. The comments above made us realize that we needed to 1) better document the ways in which MINT is far more flexible than a lookup-table, and 2) better explain the competing scientific perspectives at play. R3’s comments also motivated us to add an additional analysis of generalization. In our view the manuscript is greatly improved by these additions. Specifically, these additions directly support the broader impact that we hope the study will have.

      For simplicity and readability, we first group and summarize R3’s main concerns in order to better address them. (These main concerns are all raised above, in addition to recurring in the specific comments below. Responses to each individual specific comment are provided after these summaries.)

      (1) R3 raises concerns about ‘computational scalability.’ The concern is that “MINT's computational and memory requirements can be prohibitive.” This point was expanded upon in a specific comment, reproduced below:

      I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract.

      The revised manuscript clarifies that our statement (that computations are simple and scalable) is absolutely accurate. There is no need to compute, or store, a massive lookup table. There are three tables: two of modest size and one that is tiny. This is now better explained:

      “Thus, the log-likelihood of , for a particular current neural state, is simply the sum of many individual log-likelihoods (one per neuron and time-bin). Each individual log-likelihood depends on only two numbers: the firing rate at that moment and the spike count in that bin. To simplify online computation, one can precompute the log-likelihood, under a Poisson model, for every plausible combination of rate and spike-count. For example, a lookup table of size 2001 × 21 is sufficient when considering rates that span 0-200 spikes/s in increments of 0.1 spikes/s, and considering 20 ms bins that contain at most 20 spikes (only one lookup table is ever needed, so long as its firing-rate range exceeds that of the most-active neuron at the most active moment in Ω). Now suppose we are observing a population of 200 neurons, with a 200 ms history divided into ten 20 ms bins. For each library state, the log-likelihood of the observed spike-counts is simply the sum of 200 × 10 = 2000 individual loglikelihoods, each retrieved from the lookup table. In practice, computation is even simpler because many terms can be reused from the last time bin using a recursive solution (Methods). This procedure is lightweight and amenable to real-time applications.”

      In summary, the first table simply needs to contain the firing rate of each neuron, for each condition, and each time in that condition. This table consumes relatively little memory. Assuming 100 one-second-long conditions (rates sampled every 20 ms) and 200 neurons, the table would contain 100 x 50 x 200 = 1,000,000 numbers. These numbers are typically stored as 16-bit integers (because rates are quantized), which amounts to about 2 MB. This is modest, given that most computers have (at least) tens of GB of RAM. A second table would contain the values for each behavioral variable, for each condition, and each time in that condition. This table might contain behavioral variables at a finer resolution (e.g. every millisecond) to enable decoding to update in between 20 ms bins (1 ms granularity is not needed for most BCI applications, but is the resolution used in this study). The number of behavioral variables of interest for a particular BCI application is likely to be small, often 1-2, but let’s assume for this example it is 10 (e.g. x-, y-, and z-position, velocity, and acceleration of a limb, plus one other variable). This table would thus contain 100 x 1000 x 10 = 1,000,000 floating point numbers, i.e. an 8 MB table. The third table is used to store the probability of s spikes being observed given a particular quantized firing rate (e.g. it may contain probabilities associated with firing rates ranging from 0 – 200 spikes/s in 0.1 spikes/s increments). This table is not necessary, but saves some computation time by precomputing numbers that will be used repeatedly. This is a very small table (typically ~2000 x 20, i.e. 320 KB). It does not need to be repeated for different neurons or conditions, because Poisson probabilities depend on only rate and count.

      (2) R3 raises a concern that MINT “is essentially a table-lookup rather than a model.’ R3 states that MINT 

      “is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling.”

      and that,

      “as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations.”

      This concern is central to most subsequent concerns. The manuscript has been heavily revised to address it. The revisions clarify that MINT is much more flexible than a lookup table, even though MINT uses a lookup table as its first step. Because R3’s concern is intertwined with one’s scientific assumptions, we have also added the new Figure 1 to explicitly illustrate the two key scientific perspectives and their decoding implications. 

      Under the perspective in Figure 1a, R3 would be correct in saying that there exist traditional interpretable decoders (e.g. a Kalman filter) whose assumptions better model the data. Under this perspective, MINT might still be an excellent choice in many cases, but other methods would be expected to gain the advantage when situations demand more flexibility. This is R3’s central concern, and essentially all other concerns flow from it. It makes sense that R3 has this concern, because their comments repeatedly stress a foundational assumption of the perspective in Figure 1a: the assumption of a fixed lowdimensional neural subspace where activity has a reliable relationship to behavior that can be modeled and leveraged during decoding. The phrases below accord with that view:

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “it will not generalize… even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space”

      “I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics”

      Thus, R3 prefers a model that 1) assumes a low-dimensional subspace that is fixed across tasks and 2) assumes a consistent ‘association’ between neural activity and kinematics. Because R3 believes this is the correct model of the data, they believe that decoders should leverage it. Traditional interpretable method do, and MINT doesn’t, which is why they find MINT to be unprincipled. This is a reasonable view, but it is not our view. We have heavily revised the manuscript to clarify that a major goal of our study is to explore the implications of a different, less-traditional scientific perspective.

      The new Figure 1a illustrates the traditional perspective. Under this perspective, one would agree with R3’s claim that other methods have the opportunity to model the data better. For example, suppose there exists a consistent neural subspace – conserved across tasks – where three neural dimensions encode 3D hand position and three additional neural dimensions encode 3D hand velocity. A traditional method such as a Kalman filter would be a very appropriate choice to model these aspects of the data.

      Figure 1b illustrates the alternative scientific perspective. This perspective arises from recent, present, and to-be-published observations. MINT’s assumptions are well-aligned with this perspective. In contrast, the assumptions of traditional methods (e.g. the Kalman filter) are not well-aligned with the properties of the data under this perspective. This does not mean traditional methods are not useful. Yet under Figure 1b, it is traditional methods, such as the Kalman filter, that lack an accurate model of the data. Of course, the reviewer may disagree with our scientific perspective. We would certainly concede that there is room for debate. However, we find the evidence for Figure 1b to be sufficiently strong that it is worth exploring the utility of methods that align with this scientific perspective. MINT is such a method. As we document, it performs very well.

      Thus, in our view, MINT is quite principled because its assumptions are well aligned with the data. It is true that the features of the data that MINT models are a bit different from those that are traditionally modeled. For example, R3 is quite correct that MINT does not attempt to use a biomimetic model of the true transformation from neural activity, to muscle activity, and thence to kinematics. We see this as a strength, and the manuscript has been revised accordingly (see paragraph beginning with “We leveraged this simulated data to compare MINT with a biomimetic decoder”).

      (3) R3 raises concerns that MINT cannot generalize. This was a major concern of R3 and is intimately related to concern #2 above. The concern is that, if MINT is “essentially a lookup table” that simply selects pre-defined trajectories, then MINT will not be able to generalize. R3 is quite correct that MINT generalizes rather differently than existing methods. Whether this is good or bad depends on one’s scientific perspective. Under Figure 1a, MINT’s generalization would indeed be limiting because other methods could achieve greater flexibility. Under Figure 1b, all methods will have serious limits regarding generalization. Thus, MINT’s method for generalizing may approximate the best one can presently do. To address this concern, we have made three major changes, numbered i-iii below:

      i) Large sections of the manuscript have been restructured to underscore the ways in which MINT can generalize. A major goal was to counter the impression, stated by R3 above, that: 

      “for a test trial with a realized neural trajectory, [MINT] then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior”.

      This description is a reasonable way to initially understand how MINT works, and we concede that we may have over-used this intuition. Unfortunately, it can leave the misimpression that MINT decodes by selecting whole trajectories, each corresponding to ‘a behavior’. This can happen, but it needn’t and typically doesn’t. As an example, consider the cycling task. Suppose that the library consists of stereotyped trajectories, each four cycles long, at five fixed speeds from 0.5-2.5 Hz. If the spiking observations argued for it, MINT could decode something close to one of these five stereotyped trajectories. Yet it needn’t. Decoded trajectories will typically resemble library trajectories locally, but may be very different globally. For example, a decoded trajectory could be thirty cycles long (or two, or five hundred) perhaps speeding up and slowing down multiple times across those cycles.

      Thus, the library of trajectories shouldn’t be thought of as specifying a limited set of whole movements that can be ‘selected from’. Rather, trajectories define a scaffolding that outlines where the neural state is likely to live and how it is likely to be changing over time. When we introduce the idea of library trajectories, we are now careful to stress that they don’t function as a set from which one trajectory is ‘declared’ to be the right one:

      “We thus designed MINT to approximate that manifold using the trajectories themselves, rather than their covariance matrix or corresponding subspace. Unlike a covariance matrix, neural trajectories indicate not only which states are likely, but also which state-derivatives are likely. If a neural state is near previously observed states, it should be moving in a similar direction. MINT leverages this directionality.

      Training-set trajectories can take various forms, depending on what is convenient to collect. Most simply, training data might include one trajectory per condition, with each condition corresponding to a discrete movement. Alternatively, one might instead employ one long trajectory spanning many movements. Another option is to employ many sub-trajectories, each briefer than a whole movement. The goal is simply for training-set trajectories to act as a scaffolding, outlining the manifold that might be occupied during decoding and the directions in which decoded trajectories are likely to be traveling.”

      Later in that same section we stress that decoded trajectories can move along the ‘mesh’ in nonstereotyped ways:

      “Although the mesh is formed of stereotyped trajectories, decoded trajectories can move along the mesh in non-stereotyped ways as long as they generally obey the flow-field implied by the training data. This flexibility supports many types of generalization, including generalization that is compositional in nature. Other types of generalization – e.g. from the green trajectories to the orange trajectories in Figure 1b – are unavailable when using MINT and are expected to be challenging for any method (as will be documented in a later section).”

      The section “Training and decoding using MINT” has been revised to clarify the ways in which interpolation is flexible, allowing decoded movements to be globally very different from any library trajectory.

      “To decode stereotyped trajectories, one could simply obtain the maximum-likelihood neural state from the library, then render a behavioral decode based on the behavioral state with the same values of c and k. This would be appropriate for applications in which conditions are categorical, such as typing or handwriting. Yet in most cases we wish for the trajectory library to serve not as an exhaustive set of possible states, but as a scaffolding for the mesh of possible states. MINT’s operations are thus designed to estimate any neural trajectory – and any corresponding behavioral trajectory – that moves along the mesh in a manner generally consistent with the trajectories in Ω.”

      “…interpolation allows considerable flexibility. Not only is one not ‘stuck’ on a trajectory from Φ, one is also not stuck on trajectories created by weighted averaging of trajectories in Φ. For example, if cycling speed increases, the decoded neural state could move steadily up a scaffolding like that illustrated in Figure 1b (green). In such cases, the decoded trajectory might be very different in duration from any of the library trajectories. Thus, one should not think of the library as a set of possible trajectories that are selected from, but rather as providing a mesh-like scaffolding that defines where future neural states are likely to live and the likely direction of their local motion. The decoded trajectory may differ considerably from any trajectory within Ω.”

      This flexibility is indeed used during movement. One empirical example is described in detail:

      “During movement… angular phase was decoded with effectively no net drift over time. This is noteworthy because angular velocity on test trials never perfectly matched any of the trajectories in Φ. Thus, if decoding were restricted to a library trajectory, one would expect growing phase discrepancies. Yet decoded trajectories only need to locally (and approximately) follow the flow-field defined by the library trajectories. Based on incoming spiking observations, decoded trajectories speed up or slow down (within limits).

      This decoding flexibility presumably relates to the fact that the decoded neural state is allowed to differ from the nearest state in Ω. To explore… [the text goes on to describe the new analysis in Figure 4d, which shows that the decoded state is typically not on any trajectory, though it is typically close to a trajectory].”

      Thus, MINT’s operations allow considerable flexibility, including generalization that is compositional in nature. Yet R3 is still correct that there are other forms of generalization that are unavailable to MINT. This is now stressed at multiple points in the revision. However, under the perspective in Figure 1b, these forms of generalization are unavailable to any current method. Hence we made a second major change in response to this concern…  ii) We explicitly illustrate how the structure of the data determines when generalization is or isn’t possible. The new Figure 1a,b introduces the two perspectives, and the new Figure 6a,b lays out their implications for generalization. Under the perspective in Figure 6a, the reviewer is quite right: other methods can generalize in ways that MINT cannot. Under the perspective in Figure 6b, expectations are very different. Those expectations make testable predictions. Hence the third major change… iii) We have added an analysis of generalization, using a newly collected dataset. This dataset was collected using Neuropixels Probes during our Pac-Man force-tracking task. This dataset was chosen because it is unusually well-suited to distinguishing the predictions in Figure 6a versus Figure 6b. Finding a dataset that can do so is not simple. Consider R3’s point that training data should “explore the whole movement space and the associated neural space”. The physical simplicity of the Pac-Man task makes it unusually easy to confirm that the behavioral workspace has been fully explored. Importantly, under Figure 6b, this does not mean that the neural workspace has been fully explored, which is exactly what we wish to test when testing generalization. We do so, and compare MINT with a Wiener filter. A Wiener filter is an ideal comparison because it is simple, performs very well on this task, and should be able to generalize well under Figure 1a. Additionally, the Wiener filter (unlike the Kalman Filter) doesn’t leverage the assumption that neural activity reflects the derivative of force. This matters because we find that neural activity does not reflect dforce/dt in this task. The Wiener filter is thus the most natural choice of the interpretable methods whose assumptions match Figure 1a.

      The new analysis is described in Figure 6c-g and accompanying text. Results are consistent with the predictions of Figure 6b. We are pleased to have been motivated to add this analysis for two reasons. First, it provides an additional way of evaluating the predictions of the two competing scientific perspectives that are at the heart of our study. Second, this analysis illustrates an underappreciated way in which generalization is likely to be challenging for any decode method. It can be tempting to think that the main challenge regarding generalization is to fully explore the relevant behavioral space. This makes sense if a behavioral space has “an associated neural space”. However, we are increasingly of the opinion that it doesn’t. Different tasks often involve different neural subspaces, even when behavioral subspaces overlap. We have even seen situations where motor output is identical but neural subspaces are quite different. These facts are relevant to any decoder, something highlighted in the revised Introduction:

      “MINT’s performance confirms that there are gains to be made by building decoders whose assumptions match a different, possibly more accurate view of population activity. At the same time, our results suggest fundamental limits on decoder generalization. Under the assumptions in Figure 1b, it will sometimes be difficult or impossible for decoders to generalize to not-yet-seen tasks. We found that this was true regardless of whether one uses MINT or a more traditional method. This finding has implications regarding when and how generalization should be attempted.”

      We have also added an analysis (Figure 6e) illustrating how MINT’s ability to compute likelihoods can be useful in detecting situations that may strain generalization (for any method). MINT is unusual in being able to compute and use likelihoods in this way.

      Detailed responses to R3: we reproduce each of R3’s specific concerns below, but concentrate our responses on issues not already covered above.

      Main comments: 

      Comment 1. MINT does not generalize to different tasks, which is a main limitation for BCI utility compared with prior BCI decoders that have shown this generalizability as I review below. Specifically, given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space). 

      First, the authors provide a section on generalization, which is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task. The former is critical for any algorithm, but it does not imply the latter. For example, removing one direction of cycling from the training set as the authors do here is an example of generating poor training data because the two behavioral (and neural) directions are non-overlapping and/or orthogonal while being in the same space. As such, it is fully expected that all methods will fail. For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not). Many BCI studies have indeed shown this generalization ability using a model. For example, in Weiss et al. 2019, center-out reaching tasks are used for training and then the same trained decoder is used for typing on a keyboard or drawing on the 2D screen. In Gilja et al. 2012, training is on a center-out task but the same trained decoder generalizes to a completely different pinball task (hit four consecutive targets) and tasks requiring the avoidance of obstacles and curved movements. There are many more BCI studies, such as Jarosiewicz et al. 2015 that also show generalization to complex realworld tasks not included in the training set. Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement. On the contrary, MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks. So, unlike these prior BCIs methods, MINT will likely actually need to include every task in its library, which is not practical. 

      I suggest the authors remove claims of generalization and modify their arguments throughout the text and abstract. The generalization section needs to be substantially edited to clarify the above points. Please also provide the BCI citations and discuss the above limitation of MINT for BCIs. 

      As discussed above, R3’s concerns are accurate under the view in Figure 1a (and the corresponding Figure 6a). Under this view, a method such as that in Gilja et al. or Jarosiewicz et al. can find the correct subspace, model the correct neuron-behavior correlations, and generalize to any task that uses “the same 2D computer screen and associated neural space”, just as the reviewer argues. Under Figure 1b things are quite different.

      This topic – and the changes we have made to address it – is covered at length above. Here we simply want to highlight an empirical finding: sometimes two tasks use the same neural subspace and sometimes they don’t. We have seen both in recent data, and it is can be very non-obvious which will occur based just on behavior. It does not simply relate to whether one is using the same physical workspace. We have even seen situations where the patterns of muscle activity in two tasks are nearly identical, but the neural subspaces are fairly different. When a new task uses a new subspace, neither of the methods noted above (Gilja nor Jarosiewicz) will generalize (nor will MINT). Generalizing to a new subspace is basically impossible without some yet-to-be-invented approach. On the other hand, there are many other pairs of tasks (center-out-reaching versus some other 2D cursor control) where subspaces are likely to be similar, especially if the frequency content of the behavior is similar (in our recent experience this is often critical). When subspaces are shared, most methods will generalize, and that is presumably why generalization worked well in the studies noted above.

      Although MINT can also generalize in such circumstances, R3 is correct that, under the perspective in Figure 1a, MINT will be more limited than other methods. This is now carefully illustrated in Figure 6a. In this traditional perspective, MINT will fail to generalize in cases where new trajectories are near previously observed states, yet move in very different ways from library trajectories. The reason we don’t view this is a shortcoming is that we expect it to occur rarely (else tangling would be high). We thus anticipate the scenario in Figure 6b.

      This is worth stressing because R3 states that our discussion of generalization “is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task.” We have heavily revised this section and improved it. However, it was never inaccurate. Under Figure 6b, these two concepts absolutely are mixed up. If different tasks use different neural subspaces, then this requires collecting different “informative training data” for each. One cannot simply count on having explored the physical workspace.

      Comment 2. MINT is shown to achieve competitive/high performance in highly stereotyped datasets with structured trials, but worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped. This shows that MINT is valuable for decoding in repetitive stereotyped use-cases. However, it also highlights a limitation of MINT for BCIs, which is that MINT may not work well for real-world and/or less-constrained setups such as typing, moving a robotic arm in 3D space, etc. This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model. Indeed, the authors acknowledge that the lower performance on MC_RTT (Figure 4) may be caused by the lack of repeated trials of the same type. However, real-world BCI decoding scenarios will also not have such stereotyped trial structure and will be less/un-constrained, in which MINT underperforms. Thus, the claim in the abstract or lines 480-481 that MINT is an "excellent" candidate for clinical BCI applications is not accurate and needs to be qualified. The authors should revise their statements according and discuss this issue. They should also make the use-case of MINT on BCI decoding clearer and more convincing. 

      We discussed, above, multiple changes and additions to the revision that were made to address these concerns. Here we briefly expand on the comment that MINT achieves “worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped”. All decoders performed poorly on this task. MINT still outperformed the two traditional methods, but this was the only dataset where MINT did not also perform better (overall) than the expressive GRU and feedforward network. There are probably multiple reasons why. We agree with R3 that one likely reason is that this dataset is straining generalization, and MINT may have felt this strain more than the two machine-learning-based methods. Another potential reason is the structure of the training data, which made it more challenging to obtain library trajectories in the first place. Importantly, these observations do not support the view in Figure 1a. MINT still outperformed the Kalman and Wiener filters (whose assumptions align with Fig. 1a). To make these points we have added the following:

      “Decoding was acceptable, but noticeably worse, for the MC_RTT dataset… As will be discussed below, every decode method achieved its worst estimates of velocity for the MC_RTT dataset. In addition to the impact of slower reaches, MINT was likely impacted by training data that made it challenging to accurate estimate library trajectories. Due to the lack of repeated trials, MINT used AutoLFADS to estimate the neural state during training. In principle this should work well. In practice AutoLFADS may have been limited by having only 10 minutes of training data. Because the random-target task involved more variable reaches, it may also have stressed the ability of all methods to generalize, perhaps for the reasons illustrated in Figure 1b.

      The only dataset where MINT did not perform the best overall was the MC_RTT dataset, where it was outperformed by the feedforward network and GRU. As noted above, this may relate to the need for MINT to learn neural trajectories from training data that lacked repeated trials of the same movement (a design choice one might wish to avoid). Alternatively, the less-structured MC_RTT dataset may strain the capacity to generalize; all methods experienced a drop in velocity-decoding R2 for this dataset compared to the others. MINT generalizes somewhat differently than other methods, and may have been at a modest disadvantage for this dataset. A strong version of this possibility is that perhaps the perspective in Figure 1a is correct, in which case MINT might struggle because it cannot use forms of generalization that are available to other methods (e.g. generalization based on neuron-velocity correlations). This strong version seems unlikely; MINT continued to significantly outperform the Wiener and Kalman filters, which make assumptions aligned with Figure 1a.”

      Comment 3. Related to 2, it may also be that MINT achieves competitive performance in offline and trial-based stereotyped decoding by overfitting to the trial structure in a given task, and thus may not generalize well to online performance due to overfitting. For example, a recent work showed that offline decoding performance may be overfitted to the task structure and may not represent online performance (Deo et al. 2023). Please discuss. 

      We agree that a limitation of our study is that we do not test online performance. There are sensible reasons for this decision:

      “By necessity and desire, all comparisons were made offline, enabling benchmarked performance across a variety of tasks and decoded variables, where each decoder had access to the exact same data and recording conditions.”

      We recently reported excellent online performance in the cycling task with a different algorithm

      (Schroeder et al. 2022). In the course of that study, we consistently found that improvements in our offline decoding translated to improvements in our online decoding. We thus believe that MINT (which improves on the offline performance of our older algorithm) is a good candidate to work very well online. Yet we agree this still remains to be seen. We have added the following to the Discussion:

      “With that goal in mind, there exist three important practical considerations. First, some decode algorithms experience a performance drop when used online. One presumed reason is that, when decoding is imperfect, the participant alters their strategy which in turn alters the neural responses upon which decoding is based. Because MINT produces particularly accurate decoding, this effect may be minimized, but this cannot be known in advance. If a performance drop does indeed occur, one could adapt the known solution of retraining using data collected during online decoding [13]. Another presumed reason (for a gap between offline and online decoding) is that offline decoders can overfit the temporal structure in training data [107]. This concern is somewhat mitigated by MINT’s use of a short spike-count history, but MINT may nevertheless benefit from data augmentation strategies such as including timedilated versions of learned trajectories in the libraries”

      Comment 4. Related to 2, since MINT requires firing rates to generate the library and simple averaging does not work for this purpose in the MC_RTT dataset (that does not have repeated trials), the authors needed to use AutoLFADS to infer the underlying firing rates. The fact that MINT requires the usage of another model to be constructed first and that this model can be computationally complex, will also be a limiting factor and should be clarified. 

      This concern relates to the computational complexity of computing firing-rate trajectories during training. Usually, rates are estimated via trial-averaging, which makes MINT very fast to train. This was quite noticeable during the Neural Latents Benchmark competition. As one example, for the “MC_Scaling 5 ms Phase”, MINT took 28 seconds to train while GPFA took 30 minutes, the transformer baseline (NDT) took 3.5 hours, and the switching nonlinear dynamical system took 4.5 hours.

      However, the reviewer is quite correct that MINT’s efficiency depends on the method used to construct the library of trajectories. As we note, “MINT is a method for leveraging a trajectory library, not a method for constructing it”. One can use trial-averaging, which is very fast. One can also use fancier, slower methods to compute the trajectories. We don’t view this as a negative – it simply provides options. Usually one would choose trial-averaging, but one does not have to. In the case of MC_RTT, one has a choice between LFADS and grouping into pseudo-conditions and averaging (which is fast). LFADS produces higher performance at the cost of being slower. The operator can choose which they prefer. This is discussed in the following section:

      “For MINT, ‘training’ simply means computation of standard quantities (e.g. firing rates) rather than parameter optimization. MINT is thus typically very fast to train (Table 1), on the order of seconds using generic hardware (no GPUs). This speed reflects the simple operations involved in constructing the library of neural-state trajectories: filtering of spikes and averaging across trials. At the same time we stress that MINT is a method for leveraging a trajectory library, not a method for constructing it. One may sometimes wish to use alternatives to trial-averaging, either of necessity or because they improve trajectory estimates. For example, for the MC_RTT task we used AutoLFADS to infer the library. Training was consequently much slower (hours rather than seconds) because of the time taken to estimate rates. Training time could be reduced back to seconds using a different approach – grouping into pseudo-conditions and averaging – but performance was reduced. Thus, training will typically be very fast, but one may choose time-consuming methods when appropriate.”

      Comment 5. I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract. 

      As discussed above, the manuscript has been revised to clarify that our statement was accurate.

      Comment 6. In addition to the above technical concerns, I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics (e.g., fixed points/limit cycles). While it is of course valid and even insightful to propose different assumptions from existing models as the authors do here, they do not actually translate these assumptions into a new model. Without a model and by just tabulating the data, I don't believe we can provide interpretation or advance the understanding of the fundamentals behind neural computations. As such, I am not clear as to how this library building approach can advance neuroscience or how these assumptions are useful. I think the authors should clarify and discuss this point. 

      As requested, a major goal of the revision has been to clarify the scientific motivations underlying MINT’s design. In addition to many textual changes, we have added figures (Figures 1a,b and 6a,b) to outline the two competing scientific perspectives that presently exist. This topic is also addressed by extensions of existing analyses and by new analyses (e.g. Figure 6c-g). 

      In our view these additions have dramatically improved the manuscript. This is especially true because we think R3’s concerns, expressed above, are reasonable. If the perspective in Figure 1a is correct, then R3 is right and MINT is essentially a hack that fails to model the data. MINT would still be effective in many circumstances (as we show), but it would be unprincipled. This would create limitations, just as the reviewer argues. On the other hand, if the perspective in Figure 1b is correct, then MINT is quite principled relative to traditional approaches. Traditional approaches make assumptions (a fixed subspace, consistent neuron-kinematic correlations) that are not correct under Figure 1b.

      We don’t expect R3 to agree with our scientific perspective at this time (though we hope to eventually convince them). To us, the key is that we agree with R3 that the manuscript needs to lay out the different perspectives and their implications, so that readers have a good sense of the possibilities they should be considering. The revised manuscript is greatly improved in this regard.

      Comment 7. Related to 6, there seems to be a logical inconsistency between the operations of MINT and one of its three assumptions, namely, sparsity. The authors state that neural states are sparsely distributed in some neural dimensions (Figure 1a, bottom). If this is the case, then why does MINT extend its decoding scope by interpolating known neural states (and behavior) in the training library? This interpolation suggests that the neural states are dense on the manifold rather than sparse, thus being contradictory to the assumption made. If interpolation-based dense meshes/manifolds underlie the data, then why not model the neural states through the subspace or manifold representations? I think the authors should address this logical inconsistency in MINT, especially since this sparsity assumption also questions the low-dimensional subspace/manifold assumption that is commonly made. 

      We agree this is an important issue, and have added an analysis on this topic (Figure 4d). The key question is simple and empirical: during decoding, does interpolation cause MINT to violate the assumption of sparsity? R3 is quite right that in principle it could. If spiking observations argue for it, MINT’s interpolation could create a dense manifold during decoding rather than a sparse one. The short answer is that empirically this does not happen, in agreement with expectations under Figure 1b. Rather than interpolating between distant states and filling in large ‘voids’, interpolation is consistently local. This is a feature of the data, not of the decoder (MINT doesn’t insist upon sparsity, even though it is designed to work best in situations where the manifold is sparse).

      In addition to adding Figure 4d, we added the following (in an earlier section):

      “The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations. Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I appreciate the detailed methods section, however, more specifics should be integrated into the main text. For example on Line 238, it should additionally be stated how many minutes were used for training and metrics like the MAE which is used later should be reported here.

      Thank you for this suggestion. We now report the duration of training data in the main text:

      “Decoding R^2 was .968 over ~7.1 minutes of test trials based on ~4.4 minutes of training data.”

      We have also added similar specifics throughout the manuscript, e.g. in the Fig. 5 legend:

      “Results are based on the following numbers of training / test trials: MC\_Cycle (174 train, 99 test), MC\_Maze (1721 train, 574 test), Area2\_Bump (272 train, 92 test), MC\_RTT (810 train, 268 test).”

      Similar additions were made to the legends for Fig. 6 and 8. Regarding the request to add MAE for the multitask network, we did not do so for the simple reason that the decoded variable (muscle activity) has arbitrary units. The raw MAE is thus not meaningful. We could of course have normalized, but at this point the MAE is largely redundant with the correlation. In contrast, the MAE is useful when comparing across the MC_Maze, Area2_Bump, and MC_RTT datasets, because they all involve the same scale (cm/s).

      Regarding the MC_RTT task, AutoLFADS was used to obtain robust spike rates, as reported in the methods. However, the rationale for splitting the neural trajectories after AutoLFADS is unclear. If the trajectories were split based on random recording gaps, this might lead to suboptimal performance? It might be advantageous to split them based on a common behavioural state? 

      When learning neural trajectories via AutoLFADS, spiking data is broken into short (but overlapping) segments, rates are estimated for each segment via AutoLFADs, and these rates are then stitched together across segments into long neural trajectories. If there had been no recording gaps, these rates could have been stitched into a single neural trajectory for this dataset. However, the presence of recording gaps left us no choice but to stitch together these rates into more than one trajectory. Fortunately, recording gaps were rare: for the decoding analysis of MC_RTT there were only two recording gaps and therefore three neural trajectories, each ~2.7 minutes in duration. 

      We agree that in general it is desirable to learn neural trajectories that begin and end at behaviorallyrelevant moments (e.g. in between movements). However, having these trajectories potentially end midmovement is not an issue in and of itself. During decoding, MINT is never stuck on a trajectory. Thus, if MINT were decoding states near the end of a trajectory that was cut short due to a training gap, it would simply begin decoding states from other trajectories or elsewhere along the same trajectory in subsequent moments. We could have further trimmed the three neural trajectories to begin and end at behaviorallyrelevant moments, but chose not to as this would have only removed a handful of potentially useful states from the library.

      We now describe this in the Methods:

      “Although one might prefer trajectory boundaries to begin and end at behaviorally relevant moments (e.g. a stationary state), rather than at recording gaps, the exact boundary points are unlikely to be consequential for trajectories of this length that span multiple movements. If MINT estimates a state near the end of a long trajectory, its estimate will simply jump to another likely state on a different trajectory (or earlier along the same trajectory) in subsequent moments. Clipping the end of each trajectory to an earlier behaviorally-relevant moment would only remove potentially useful states from the libraries.”

      Are the training and execution times in Table 1 based on pure Matlab functions or Mex files? If it's Mex files as suggested by the code, it would be good to mention this in the Table caption.

      They are based on a combination of MATLAB and MEX files. This is now clarified in the table caption:

      “Timing measurements taken on a Macbook Pro (on CPU) with 32GB RAM and a 2.3 GHz 8-Core Intel Core i9 processor. Training and execution code used for measurements was written in MATLAB (with the core recursion implemented as a MEX file).”

      As the method most closely resembles a Bayesian decoder it would be good to compare performance against a Naive Bayes decoder. 

      We agree and have now done so. The following has been added to the text:

      “A natural question is thus whether a simpler Bayesian decoder would have yielded similar results. We explored this possibility by testing a Naïve Bayes regression decoder [85] using the MC_Maze dataset. This decoder performed poorly, especially when decoding velocity (R2 = .688 and .093 for hand position and velocity, respectively), indicating that the specific modeling assumptions that differentiate MINT from a naive Bayesian decoder are important drivers of MINT’s performance.”

      Line 199 Typo: The assumption of stereotypy trajectory also enables neural states (and decoded behaviors) to be updated in between time bins. 

      Fixed

      Table 3: It's unclear why the Gaussian binning varies significantly across different datasets. Could the authors explain why this is the case and what its implications might be? 

      We have added the following description in the “Filtering, extracting, and warping data on each trial” subsection of the Methods to discuss how 𝜎 may vary due to the number of trials available for training and how noisy the neural data for those trials is:

      “First, spiking activity for each neuron on each trial was temporally filtered with a Gaussian to yield single-trial rates. Table 3 reports the Gaussian standard deviations σ (in milliseconds) used for each dataset. Larger values of σ utilize broader windows of spiking activity when estimating rates and therefore reduce variability in those rate estimates. However, large σ values also yield neural trajectories with less fine-grained temporal structure. Thus, the optimal σ for a dataset depends on how variable the rate estimates otherwise are.”

      An implementation of the method in an open-source programming language could further enhance the widespread use of the tool. 

      We agree this would be useful, but have yet not implemented the method in any other programming languages. Implementation in Python is still a future goal.

      Reviewer #2 (Recommendations For The Authors): 

      - Figures 4 and 5 should show the error bars on the horizontal axis rather than portraying them vertically. 

      [Note that these are now Figures 5 and 6]

      The figure legend of Figure 5 now clarifies that the vertical ticks are simply to aid visibility when symbols have very similar means and thus overlap visually. We don’t include error bars (for this analysis) because they are very small and would mostly be smaller than the symbol sizes. Instead, to indicate certainty regarding MINT’s performance measurements, the revised text now gives error ranges for the correlations and MAE values in the context of Figure 4c. These error ranges were computed as the standard deviation of the sampling distribution (computed via resampling of trials) and are thus equivalent to SEMs. The error ranges are all very small; e.g. for the MC_Maze dataset the MAE for x-velocity is 4.5 +/- 0.1 cm/s. (error bars on the correlations are smaller still).

      Thus, for a given dataset, we can be quite certain of how well MINT performs (within ~2% in the above case). This is reassuring, but we also don’t want to overemphasize this accuracy. The main sources of variability one should be concerned about are: 1) different methods can perform differentially well for different brain areas and tasks, 2) methods can decode some behavioral variables better than others, and 3) performance depends on factors like neuron-count and the number of training trials, in ways that can differ across decode methods. For this reason, the study examines multiple datasets, across tasks and brain areas, and measures performance for a range of decoded variables. We also examine the impact of training-set-size (Figure 8a) and population size (solid traces in Fig. 8b, see R2’s next comment below). 

      There is one other source of variance one might be concerned about, but it is specific to the neuralnetwork approaches: different weight initializations might result in different performance. For this reason, each neural-network approach was trained ten times, with the average performance computed. The variability around this average was very small, and this is now stated in the Methods.

      “For the neural networks, the training/testing procedure was repeated 10 times with different random seeds. For most behavioral variables, there was very little variability in performance across repetitions. However, there were a few outliers for which variability was larger. Reported performance for each behavioral group is the average performance across the 10 repetitions to ensure results were not sensitive to any specific random initialization of each network.”

      - For Figure 6, it is unclear whether the neuron-dropping process was repeated multiple times. If not, it should be since the results will be sensitive to which particular subsets of neurons were "dropped". In this case, the results presented in Figure 6 should include error bars to describe the variability in the model performance for each decoder considered. 

      A good point. The results in Figure 8 (previously Figure 6) were computed by averaging over the removal of different random subsets of neurons (50 subsets per neuron count), just as the reviewer requests. The figure has been modified to include the standard deviation of performance across these 50 subsets. The legend clarifies how this was done.

      Reviewer #3 (Recommendations For The Authors): 

      Other comments: 

      (1) [Line 185-188] The authors argue that in a 100-dimensional space with 10 possible discretized values, 10^100 potential neural states need to be computed. But I am not clear on this. This argument seems to hold only in the absence of a model (as in MINT). For a model, e.g., Kalman filter or AutoLFADS, information is encoded in the latent state. For example, a simple Kalman filter for a linear model can be used for efficient inference. This 10^100 computation isn't a general problem but seems MINT-specific, please clarify. 

      We agree this section was potentially confusing. It has been rewritten. We were simply attempting to illustrate why maximum likelihood computations are challenging without constraints. MINT simplifies this problem by adding constraints, which is why it can readily provide data likelihoods (and can do so using a Poisson model). The rewritten section is below:

      “Even with 1000 samples for each of the neural trajectories in Figure 3, there are only 4000 possible neural states for which log-likelihoods must be computed (in practice it is fewer still, see Methods). This is far fewer than if one were to naively consider all possible neural states in a typical rate- or factor-based subspace. It thus becomes tractable to compute log-likelihoods using a Poisson observation model. A Poisson observation model is usually considered desirable, yet can pose tractability challenges for methods that utilize a continuous model of neural states. For example, when using a Kalman filter, one is often restricted to assuming a Gaussian observation model to maintain computational tractability “

      (2) [Figure 6b] Why do the authors set the dropped neurons to zero in the "zeroed" results of the robustness analysis? Why not disregard the dropped neurons during the decoding process? 

      We agree the terminology we had used in this section was confusing. We have altered the figure and rewritten the text. The following, now at the beginning of that section, addresses the reviewer’s query: 

      “It is desirable for a decoder to be robust to the unexpected loss of the ability to detect spikes from some neurons. Such loss might occur while decoding, without being immediately detected. Additionally, one desires robustness to a known loss of neurons / recording channels. For example, there may have been channels that were active one morning but are no longer active that afternoon. At least in principle, MINT makes it very easy to handle this second situation: there is no need to retrain the decoder, one simply ignores the lost neurons when computing likelihoods. This is in contrast to nearly all other methods, which require retraining because the loss of one neuron alters the optimal parameters associated with every other neuron.”

      The figure has been relabeled accordingly; instead of the label ‘zeroed’, we use the label ‘undetected neuron loss’.

      (3) Authors should provide statistical significance on their results, which they already did for Fig. S3a,b,c but missing on some other figures/places. 

      We have added error bars in some key places, including in the text when quantifying MINT’s performance in the context of Figure 4. Importantly, error bars are only as meaningful as the source of error they assess, and there are reasons to be careful given this. The standard method for putting error bars on performance is to resample trials, which is indeed what we now report. These error bars are very small. For example, when decoding horizontal velocity for the MC_Maze dataset, the correlation between MINT’s decode and the true velocity had a mean and SD of the sampling distribution of 0.963 +/- 0.001. This means that, for a given dataset and target variable, we have enough trials/data that we can be quite certain of how well MINT performs. However, we want to be careful not to overstate this certainty. What one really wants to know is how well MINT performs across a variety of datasets, brain areas, target variables, neuron counts, etc. It is for this reason that we make multiple such comparisons, which provides a more valuable view of performance variability.

      For Figure 7, error bars are unavailable. Because this was a benchmark, there was exactly one test-set that was never seen before. This is thus not something that could be resampled many times (that would have revealed the test data and thus invalidated the benchmark, not to mention that some of these methods take days to train). We could, in principle, have added resampling to Figure 5. In our view it would not be helpful and could be misleading for the reasons noted above. If we computed standard errors using different train/test partitions, they would be very tight (mostly smaller than the symbol sizes), which would give the impression that one can be quite certain of a given R^2 value. Yet variability in the train/test partition is not the variability one is concerned about in practice. In practice, one is concerned about whether one would get a similar R^2 for a different dataset, or brain area, or task, or choice of decoded variable. Our analysis thus concentrated on showing results across a broad range of situations. In our view this is a far more relevant way of illustrating the degree of meaningful variability (which is quite large) than resampling, which produces reassuringly small but (mostly) irrelevant standard errors.

      Error bars are supplied in Figure 8b. These error bars give a sense of variability across re-samplings of the neural population. While this is not typically the source of variability one is most concerned about, for this analysis it becomes appropriate to show resampling-based standard errors because a natural concern is that results may depend on which neurons were dropped. So here it is both straightforward, and desirable, to compute standard errors. (The fact that MINT and the Wiener filter can be retrained many times swiftly was also key – this isn’t true of the more expressive methods). Figure S1 also uses resampling-based confidence intervals for similar reasons.

      (4) [Line 431-437] Authors state that MINT outperforms other methods with the PSTH R^2 metric (trial-averaged smoothed spikes for each condition). However, I think this measure may not provide a fair comparison and is confounded because MINT's library is built using PSTH (i.e., averaged firing rate) but other methods do not use the PSTH. The author should clarify this. 

      The PSTH R^2 metric was not created by us; it was part of the Neural Latents Benchmark. They chose it because it ensures that a method cannot ‘cheat’ (on the Bits/Spike measure) by reproducing fine features of spiking while estimating rates badly. We agree with the reviewer’s point: MINT’s design does give it a potential advantage in this particular performance metric. This isn’t a confound though, just a feature. Importantly, MINT will score well on this metric only if MINT’s neural state estimate is accurate (including accuracy in time). Without accurate estimation of the neural state at each time, it wouldn’t matter that the library trajectory is based on PSTHs. This is now explicitly stated:

      “This is in some ways unsurprising: MINT estimates neural states that tend to resemble (at least locally) trajectories ‘built’ from training-set-derived rates, which presumably resemble test-set rates. Yet strong performance is not a trivial consequence of MINT’s design. MINT does not ‘select’ whole library trajectories; PSTH R2 will be high only if condition (c), index (k), and the interpolation parameter (α) are accurately estimated for most moments.”

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful and positive assessment of our manuscript. Maybe our findings are best summarized in the model below, showing that KDM5 inhibition/loss mediates a viral mimicry and DNA damage response through the generation of R-loops in genomic repeats. This is a different mechanism from the more well studied double-stranded RNA-induced “viral mimicry” response. Our studies also suggest that KDM5 inhibition may have a larger therapeutic window than STING agonists, since KDM5 inhibition seemingly does not induce “viral mimicry” in normal breast epithelial cells. 

      Author response image 1.

      Model of viral mimicry activation. De-repression of repetitive elements may trigger dsRNA formation, which activates the RIG-1/MDA5 pathway, as well as PKR. Alternatively, derepression of these elements may induce transcription replication conflicts (TRCs), resulting in R-loop formation. R-loops can lead to DNA damage, and/or activate the cGAS/STING pathway. Both the MAVS pathway and the cGAS/STING pathway converge to activate type I interferon (IFN) responses, resulting in decreased cell fitness and/or increased immunogenicity.

      We do agree with the assessment that the study would be strengthened by in vivo studies. However, there are 4 different isoforms of KDM5 (3 in females), and existing KDM5specific inhibitors do not have adequate PK/PD properties for in vivo studies. We would also like to note that most mouse studies have not been proven to accurately predict immunotherapy responses in patients. Future studies in ex vivo tumor models would strengthen the clinical relevance of these studies. In the interim, we have added some normal macrophage studies in Figure S5 and an example of studies in normal T-cells below. Such studies will also be important to ensure that future KDM5 inhibitors do not have adverse effects on the immune system. Here, we observe that KDM5 inhibition appears to have neutral or slightly reduced T cell viability with KDM5 inhibition (Author response image 2a). However, KDM5 inhibition also results in increased CD107a expression in T-cells, indicative of a more cytotoxic phenotype (Author response image 2b). These studies suggest that KDM5 inhibitors do not have significant adverse effects on T cells or macrophages (figure S5) in the normal immune environment.

      Author response image 2.

      KDM5 inhibition does not have significant adverse effects on T-cells. a) Fold change proliferation of T-cells from 2 different human donors (left and right panels on graph) activated with 0.25ug/ml CD3 and treated with the indicated concentrations of C48 or a positive control (CBLB) compared to vehicle controls. b. FACS plots and histograms of CD107a surface expression (x-axis) versus forward scatter (FSC, y-axis) of T-cells from 2 different humans donors activated with 0.25ug/ml or 0.5mug/ml CD3 and treated with the indicated concentrations of C48.

      Specific comments and answers to Reviewer #1:

      We have added some additional analysis of data from other breast cancer cell lines to strengthen our points (Figure S2f, Figure S3e, Figure S4g-h, k.) We have also uploaded all the data to Geo with the following accession numbers :

      GSE296387: H3K4me3 CUT-and-Tag data

      GSE296584: S9.6 CUT-and-Tag data

      GSE296974: RNA-sequencing data

      Responses to Reviewer #1 (Recommendations for the authors):

      (1) We have not conducted genomic studies comparing KDM5 expression to retroelement activation status in the tumor data sets but recognize that this is important for future studies. Again, there are several KDM5 isoforms and looking at repeat expression in these larger data sets is complex. We have added some data correlating KDM5 expression with ISG signatures in Figure S3j-l as well as in the graph below (Author response image 3). The correlation with ISG and AP signatures is modest, but strongest for KDM5B and C in breast cancer data sets, consistent with our disruption data for these 2 isoforms. As mentioned above, we do agree that future studies of KDM5s along with a broader analysis of other epigenetic modifying enzymes over repeats in various cancer types will shed light on the role of histone modifying enzymes in suppressing “viral mimicry” in tumors.

      Author response image 3.

      Correlation between gene expression and IFN gene set GSVA scores in breast cancer cell lines. a) Pearson correlation score between gene expression and IFN signature (ISG) gene set variation analysis (GSVA) scores in breast cancer cell lines as reported in DepMap. Higher ranks indicate an inverse correlation between expression of the individual gene and the expression of the ISG gene set. Correlation ranks for KDM5A, B and C are highlighted. b) as in a), but comparing gene expression to antigen presentation (AP) GSVA scores.

      (2) We apologize for the mislabeling in figure 2B – has been corrected in the revised version.

      (3) We agree that blocking the cGAS/STING pathway, only partially rescues the ISREGFP and HLA-A, B, C phenotype in HCC1428 cells. We have added data (Figure S2f) showing that this rescue is stronger in MCF7 cells. It is possible that the MDA5/MAVS pathway may also contribute to activation of the Type I interferon response. However, we have data that MAVS plays a minor (if any) role in this context, as MAVS KO minimally decreases C48-induced ISRE-GFP activity and HLA-A, B, C surface expression in HCC1428 cells (added Figure S2g).

      Furthermore, there is no significant increase in dsRNA observed (using J2 antibody as a readout in immunofluorescence experiments) with C48 treatment as compared to 5’-azacytidine treatment or ADAR K/O (data not included). However, we have not performed MAVS/PKR K/O experiments to completely rule out the involvement of the dsRNA sensing pathways.

      (4) These experiments were performed in the operetta imaging system, rather than confocal imaging, and therefore we do not have such images. Quantification of RNaseH1-GFP in the whole cell is reported in the figure, as RNaseH1-GFP signal is increased in both the nucleus and the cytoplasm with C48 treatment. This is not unexpected, as our data suggest that R-loop formation occurs in repetitive regions of the genome that are de-repressed by KDM5 inhibition in the nucleus, and the RNA/DNA hybrids, generated from R-loops, may activate cGAS/STING pathway in the cytoplasm.

      (5) Disruption of siXPF and siXPG is relatively toxic in itself. Complete knockouts in breast cancer cells were not viable and we partially knocked down XPF using siRNA instead. We do agree that these kinds of rescue studies need to be expanded upon in future studies, but they served as further proof of the conclusions presented here.

      (6) We have provided all the data in Geo and alternative representations can be made.

      (7) Unfortunately, CUT-and-Tag experiments were not performed in cells expressing siXPF and therefore we cannot provide this data. However, XPF has been previously shown to be responsible for excising R-loops from the genome, rendering them detectable by cGAS/STING in the cytoplasm (Crossley et al, 2022, referenced in the current MS). Therefore, while we demonstrate that XPF knockdown attenuates type I IFN pathway activation upon KDM5 inhibition, it may not necessarily reduce R-loop formation in retroelements; it may just prevent their excision and downstream cGAS/STING activation. We do agree that CUT-and-Tag experiments in cells treated with siXPF versus siControl will have to be performed in the future to test this hypothesis.

      Responses to Reviewer #2 (Recommendations for the authors):

      (1) We have modified the text as well as the figure legend to state that this is a simplistic representation of the pathway in normal cells. As stated in the introduction, these pathways can be modified in tumors. The data presented suggest that the dsRNA pathway can be activated in all breast cancer cell lines tested, whereas more variation is observed in the activation of the STING pathway.  

      (2) The ADAR guides target ADAR 110 and p150 but not ADAR2. This has been clarified in the text.  

      (3) The guides have been renamed in the figure as the reviewer suggests.  

      (4) It has been shown by others that KDM5 can occupy the STING promoter (https://pubmed.ncbi.nlm.nih.gov/30080846/); which supports the reviewer’s suggestion that STING upregulation in HMECs may be due to increased H3K4me3 at the STING gene. However, we argue that STING upregulation is not sufficient to activate “viral mimicry” due to the absence of “tumor-specific R-loops” (due to an increase in TRC in tumor cells) in normal cells. It is interesting to note that the S9.6 signal in subtelomeric regions is increased in HMECS similar to what is observed in tumor cells. However, the S9.6 signal over other repeats is not (Author response image 4), suggesting that C48-induced increases over non-telomeric repeats are tumor specific. This suggests that the tumor-specific increases in R-loop formation, which lead to “viral mimicry” activation, are not driven by those formed in subtelomeric regions. Future studies will have to expand on these findings.

      Author response image 4.

      Percent of S9.6 reads that align to repetitive genome in HMEC cells. (a) % of total aligned S9.6 reads that map to subtelomeric region in HMEC cells treated with DMSO or 2.5 μM C48. (b) % of total aligned S9.6 reads that map to repetitive elements in general in HMEC cells treated as in a).

      (5) Clarity on R-loop quantification has been added to the figure legend as well as in the Materials and Methods section. Mean fluorescence intensity in the whole cell (this includes both nuclear and cytoplasmic signals) was quantified together and normalized to the number of DAPI-stained nuclei per well. As mentioned above all quantified in the Operetta imaging system.

      (6) We have added some data that shows that increases in H3K4me3 is observed in and around ISGs upon KDM5 inhibition (Figure S4f). However, without time course experiments it is difficult to assess whether these are direct effects of the KDM5 inhibitor or indirect effects from activation of Type I IFN (similarly to what has previously been reported with 5’-azacytidine induction of “viral mimicry”, https://pubmed.ncbi.nlm.nih.gov/26317465/).

      (7) We have previously included data showing that S9.6 reads in repeats that do not display C48-mediated increases in H3K4me3 also do not increase with C48 treatment (this is now Figure S4o). In addition, we have added some data showing that repeats with increased H3K4me3 and repeats with increased transcription upon C48 treatment also have increased S9.6 reads. Repeats that display both increases in H3K4me3 and mRNA expression have even greater increases in S9.6 signal compared to repeats that have increases in either one (Figure S4m-n). Taken together, this data suggest that KDM5 inhibition increases H3K4me3 in repeats, thereby allowing for their transcription, which can increase the probability of Transcription replication conflicts (TRC) and R-loop formation at such loci.

      (8) As mentioned earlier in this response, while we observe increased S9.6 reads in subtelomeric regions of HCC1428 cells upon KDM5 inhibition, we also observe this in normal HMEC cells. Since KDM5 inhibition does not induce viral mimicry in HMEC cells, this suggests that R-loops formed in subtelomeric regions do not dictate the response observed with C48 treatment in breast cancer cells.

      We hope that these answers to the reviewers comments as well as the additional data provided strengthens our findings.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      We sincerely value the insightful and constructive feedback (italicized) provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. In response to these valuable comments, we have significantly revised the manuscript to enhance clarity and accuracy. Specifically, we have corrected an oversight related to the robot’s velocity and secondary antibody ratios, and addressed previously missing values in Figs. 3E and 4E. Importantly, these corrections did not alter the outcomes of our results. Additionally, we have enriched our manuscript with new data analyses, as reflected in Figures 1B, 1F, 2H-J, 4D, 4F-H, S1A, S1C-E, S3H, S5, and Table 1, ensuring a more comprehensive presentation of our findings. Below are our responses detailing each comment and explaining the modifications integrated into the revised manuscript.

      Reviewer 1:

      (1) To address the question of whether PAG photostimulation biases the cells that respond to the robot, a counterbalanced experiment, in which the BLA activity is initially recorded during the foraging vs. robot test and the PAG stimulation happens at the end of the session, should have been performed.

      In our study, we investigated fear behavior and BLA cell responses to intrinsic dPAG photostimulation (320 pulses) in naïve animals, followed by their reactions to an extrinsic predatory robot. We recognize the reviewer's concern regarding the potential  influence of initial dPAG photostimulation on BLA neuron responses to the robot. We address this issue in our discussion (pg. 13) as follows: “However, it is crucial to consider the recent discovery that optogenetic stimulation of CA3 neurons (3000 pulses) leads to gain-of-function changes in CA3-CA3 recurrent (monosynaptic) excitatory synapses (Oishi et al., 2019). Although there is no direct connection between dPAG neurons and the BLA (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), and no studies have yet demonstrated gain-of-function changes in polysynaptic pathways to our knowledge, the potential for our dPAG photostimulation (320 pulses) to induce similar changes in amygdalar neurons, thereby enhancing their sensitivity to predatory threats, cannot be dismissed.”

      (2) In Figure 3, it is unclear which criteria (e.g. response latency, minimum Z score, spike fidelity) was used to identify the BLA neurons that were indirectly activated by PAG stimulation. A graphic containing at least the distribution of the response latencies for each BLA neuron after PAG laser activation is needed.

      We have specified the criteria for determining the responsiveness of BLA neurons to dPAG stimulation on page 22. This involves analyzing the first 500-ms post-stimulation across five 0.1-s bins. Units were classified as ‘stim cells’ if they showed z-scores greater than 3 (z > 3) in any of the bins during the initial 500-ms period post-stimulation. Neurons activated by both pellet procurement and dPAG stimulation were not included in the 'stim cell' category. Additionally, we have included a graphic in the revised manuscript (Fig. S3C) that presents the distribution of response latencies of BLA neurons to dPAG stimulation.

      (3) To strengthen the claim that it is a BLA-PVT-PAG circuit that carries information about predatory threat, a new experiment using CTB and cFos could be used to demonstrate that PAG neurons that project to PVT are recruited during the robot exposure.

      Our study primarily aimed to explore the transmission of threat signals between the dPAG and BLA. We acknowledge that our evidence for the PVT’s intermediary role, derived from CTB injections in the BLA and subsequent CTB+cFos co-labeling analysis in the PVT (Fig. 4G and 4H), is limited. Accordingly, we have moderated the emphasis on the PVT’s involvement in both the abstract and introduction. We now present the PVT’s role as a promising direction for future research in the discussion section of our revised manuscript.

      (4) In Fig 2, the authors' interpretation is that photostimulation of PAG neurons elicits fleeing responses in the rats. However, there is a vast literature demonstrating that the PAG is also involved in nociception. Although this is recognized by the authors in the first part of the introduction and briefly described in the discussion, the authors should more explicitly explain that PAG stimulation produces analgesia and thus is unlikely to underlie the escaping responses observed. This may not be intuitive for a broader audience.

      We appreciate the reviewer's insightful suggestion to elaborate on the PAG involvement in nociception and analgesia, as supported by the literature. While our initial manuscript acknowledged these functions, we have now expanded our discussion to address the PAG’s multifaceted roles (pg. 12): “As mentioned in the introduction, the dPAG is recognized as part of the ascending nociceptive pathway to the BLA (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). The dPAG is also implicated in non-opioid analgesia (e.g., Bagley and Ingram 2020, Cannon et al. 1982, Fields 2000). However, it is essential to emphasize that, despite its roles in pain modulation, the primary behavior observed in dPAG-stimulated, naive rats foraging for food in an open arena was goal-directed escape to the safe nest, underscoring the dPAG’s critical function in survival behaviors.” Note that this aligns with human studies on PAG stimulation (e.g., Carrive and Morgan 2012, Magierek et al. 2003), particularly those by Amano et al. (Amano et al. 1982), which reported patients feeling an urge to escape, similar to being chased, upon PAG stimulation.

      (5) To truly demonstrate the functional links between the PAG and BLA, more experiments are needed. For example, one could record from BLA neurons during the robot surge while performing optogenetic inhibition of the PAG neurons. There is also no evidence that activity in the indirect pathway that connects the PAG to the BLA is indispensable for the expression of defensive responses towards the robot (e.g., causality tests using chemogenetic or optogenetic inactivation).

      We agree that incorporating optogenetic inhibition of PAG neurons while simultaneously recording from BLA neurons during a robot surge would strengthen the evidence for the functional connectivity between the PAG and BLA. Such an experiment would necessitate the transfection and photoinhibition of a wide array of dPAG neurons responsive to predatory threats. This procedure is technically more viable in transgenic mouse models, given their suitability for genetic manipulation. In light of this, and in response to the suggestions in the Joint Public Review, we have revised the abstract, introduction, and discussion to offer a more cautious interpretation of our findings. This revision reflects a careful consideration of both the evidence and the limitations inherent in our study (pg. 13): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG is necessary for the BLA’s response to predatory threats. To establish causality, it is essential to conduct experiments such as optogenetic inhibition to determine whether the dPAG is indispensable for activating BLA neurons and initiating escape behavior in the face of threats. The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions (e.g., Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993), suggests the need for future studies using transgenic mouse models. Should inactivation of the dPAG negate the BLA's response to predatory threats, it would underscore the dPAG's central role in this defensive mechanism. Conversely, if BLA responses remain unaffected by dPAG inactivation, this could indicate the existence of multiple pathways for antipredatory defense mechanisms.”

      (6) The manuscript lacks information about the number of rats and trials that were used across the experiments (e.g. Fig 2G-J). In some occasions, the authors start the experiments with a specific number of animals and then reduce the N by half without providing a rationale (e.g. Fig. 3). Equally confusing is the experimental timeline. For example: a) Were the pre-robot, robot, and post-robot sessions always performed within the same day? b) It was described that microdrivable arrays were used, but did the same rats experienced the robot test more than one time? c) How many bins were used for normalization during the Z-score calculation and when were the data binned at 100 ms versus 1 s? d) How many trials were used for each analysis? For example, to identify robot cells, did the authors establish a minimum number of trials per animal to calculate the peristimulus time histograms? Having a significant number of trials is critical to make sure that the observed neuronal responses are replicable across the trials. e) How was the neuronal activity related to "pellet retrieval" aligned during robot sessions? Was the activity aligned with the moment in which the rat touches the pellet or when the animal returns to the nest with the pellet? f) How did the authors control for trials in which the rat consumed the pellets in the same local vs. those in which they returned to the nest to eat it? All these points are extremely important for future replicability.

      We apologize for any confusion caused by the initial lack of detail in our experimental procedures. The revised manuscript has been updated with comprehensive methodological details:  

      (i) The study involved thirteen rats (ChR2, n = 9; EYFP, n = 4), subjected to dPAG stimulation using fixed light parameters (473 nm, 20 Hz, 10-ms pulse width, 2 s duration) during Long and Short pellet distance trials (refer to Fig. 2E-G). The stimulation intensity was adjusted to each animal's response (fleeing behavior), ranging from 1-3 mW. Additional testing occurred over multiple days, with incremental adjustments to stimulation parameters (intensity, frequency, duration) after confirming normal baseline foraging behavior (Fig. 2H-J, at x = 0). These details are now clearly depicted in the manuscript.

      (ii) The primary objective was to investigate BLA neuron responses to dPAG opto-stimulation. Six rats were initially tested, with three later assessed for their reactions to dPAG stimulation in the presence of an actual predator, to gauge behavioral effects.

      (iii) Regarding the experimental timeline:

      a) Pre-robot, robot, and post-robot sessions were conducted successively on the same day.

      b) Sessions with the robot predator were repeated until habituation occurred or when unit recordings were deemed invalid due to microdrive limitations or the absence of unit detection. Throughout these sessions, the success rate for pellet retrieval remained consistently low. Specifically, the mean success rate for the dPAG recordings was 2.803% + 1.311. For the BLA recordings, animals did not succeed in retrieving pellets during any of the robot trials. To provide a more detailed account of the methodology, the manuscript has been updated to include the number of recording days and the units recorded in the "Behavioral Procedures" section.

      c) As described in Materials and Methods, unit recording data were binned at 0.1-s intervals and normalized against a 5-s pre-event baseline (50 bins). For statistical analyses in Figure 1F’s rightmost column, 1-s bins were used to simplify post-hoc analysis corrections.

      d) Each recording session consisted of 5-15 trials. Trials were excluded if rats attempted to procure the pellet within 10 s post-dPAG stimulation or robot activation, ensuring accurate characterization of unit responsiveness. Consequently, the number of trials varied among subjects.

      e) Pellet retrieval was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger.

      f) Animals were trained to retrieve pellets and return to their nest for consumption prior to robot testing sessions, as elaborated in the “Baseline foraging” section.

      (7) In the abstract, the authors mention that predictive cues are ambiguous during naturalistic predatory threats, but it is not clear what do they mean by ambiguous. In addition, in the introduction section, the authors describe that the present study will investigate how the dPAG and BLA communicate threat signals. However, the author should clarify right in the beginning that these two regions are not monosynaptically connected with each other and cite the proper references.

      The abstract’s original sentence, “…where predictive cues are ambiguous and do not afford reiterative trial-and-error learning…” has been refined to “…characterized by less explicit cues and the absence of reiterative trial-and-error learning events …” This adjustment more accurately reflects that cues in natural settings often lack the clear and consistent quality of those in controlled experimental settings, which is necessary for the straightforward process of trial-and-error learning.

      Regarding the dPAG and BLA connectivity, the revised introduction (pg. 5) now states: “Considering the lack of direct monosynaptic projections between dPAG and BLA neurons (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), we utilized anterograde and retrograde tracers in the dPAG and BLA, respectively. This was complemented by c-Fos expression analysis following exposure to predatory threats. Our anatomical findings suggest that the paraventricular nucleus of the thalamus (PVT) may be part of a network that conveys predatory threat information from the dPAG to the BLA.”

      (8) In the introduction section, the authors should clarify that the US information is conveyed from the PAG to BLA via the lateral thalamus (posterior intralaminar nucleus, medial geniculate nucleus) or dorsal midline thalamus (paraventricular nucleus of the thalamus). The statement regarding how "the PAG functions as part of the ascending pain transmission pathway, providing footshock US information to the BLA" is misleading because the PAG does not send monosynaptic projections directly to the BLA.

      The revised text (pg. 3) now reads: “…suggest that the dPAG is part of the ascending US pain transmission pathway to the BLA, the presumed site for CS-US association formation (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). This pathway is thought to be mediated through the lateral and dorsal-midline thalamus regions, including the posterior intralaminar nucleus and paraventricular nucleus of the thalamus (Krout and Loewy, 2000; McNally, Johansen, and Blair, 2011; Yeh, Ozawa, and Johansen, 2021; but see Brunzell and Kim, 2001).”

      (9) The author's assumption that threat information flows from the PAG to the BLA, rather than BLA to PAG, based on electrical stimulation and lesion experiments performed in previous studies is problematic for at least three reasons: a) Electrical stimulation can activate fibers of passage as well as presynaptic neurons antidromically. b) The lesion approach may not have targeted 100% of the neurons in PAG, which extends anatomically along the antero-posterior axis of the midbrain for several millimeters in rats. This observation also disagrees with more recent studies using optogenetics and imaging tools demonstrating that the PAG is the downstream target of the BLA-CeA pathway. c) The authors cited prior reports describing the role of the amygdala-PAG pathway in dampening the US response and providing a negative signal to the PAG. However, a series of previous studies demonstrating that the PAG serves as the downstream target of the central nucleus of the amygdala for the expression of defensive response are completely ignored by the authors. Here are just some examples: Massi et al, 2023, PMID: 36652513; Tovote et al 2016, PMID: 27279213; Penzo et al, 2014 PMID: 24523533).

      We recognize the complexities in interpreting findings from electrical stimulation and lesion studies. Our prior work (Kim et al. 2013) supports the conclusion that predatory threat information directionally flows from the dPAG to the BLA, as evidenced by distinct behavioral outcomes from experimental manipulations of dPAG and BLA. Specifically, dPAG stimulation-induced fleeing behavior was blocked by BLA lesions (as well as muscimol inactivation), whereas BLA stimulation-induced fleeing was unaffected by dPAG or combined dPAG+vPAG lesions (refer to Fig. 5A), suggesting a flow from dPAG to BLA. Our manuscript further clarifies that dPAG optostimulation results confirmed that escape behavior in foraging rats, induce by dPAG electrical stimulation (Kim et al. 2013), was activated by intrinsic dPAG neurons rather than by fibers of passage or current spread to other brain regions.  

      Furthermore, the PAG’s anatomical and functional diversity, with distinct segments along its longitudinal axis associated with different defensive behaviors, reinforces our conclusions. The dPAG is implicated in flight responses, while the vPAG is associated with freezing behavior (e.g., Bandler and Shipley 1994, Kim, Rison, and Fanselow 1993, Lefler, Campagner, and Branco 2020, Morgan, Whitney, and Gold 1998). The critiques' referenced studies primarily focus on the BLA-CeA-vPAG circuit's role in freezing during Pavlovian fear conditioning, contrasting with our emphasis on the dPAG-PVT-BLA circuit and its mediation in escape behavior in response to naturalistic predatory threats.

      We also note that different invasive procedures can yield varying behavioral outcomes. For example, both acute (e.g., optogenetic and muscimol inactivation) and chronic (e.g., surgical ablation) manipulations within the same brain circuit have shown diverse effects across species (Otchy et al. 2015). Moreover, optogenetics comes with its own set of conceptual and technical challenges (Adamantidis et al. 2015), including the difficulty of targeting, quantifying and photo-inhibiting 100% of PAG neurons. Despite the limitations of each technique, our collective evidence from lesions, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recordings (the present study) supports the premise that the dPAG acts upstream of the BLA in processing predatory threat information.

      (10) In the discussion, the authors suggest that the PVT may be the interface between the PAG and the BLA for the expression of antipredatory defensive behavior during their foraging vs. robot test, but previous studies looking at the role of PVT in antipredator defensive behavior and/or approach-avoidance conflict tasks are not cited and discussed in the manuscript (Engelke et al, 2021, PMID: 33947849; Choi et al 2019, PMID: 30979815; Choi and McNally 2017, PMID: 28193686).

      We thank the reviewer for pointing out these pivotal studies, which we have carefully reviewed and integrated into the revised manuscript (pg. 14): “These results, in conjunction with previous research on the roles of the dPAG, PVT, and BLA in producing flight behaviors in naïve rats (Choi and Kim 2010, Daviu et al. 2020, Deng, Xiao, and Wang 2016, Kim et al. 2013, Kim et al. 2018, Kong et al. 2021, Ma et al. 2021, Reis et al. 2021), the anterior PVT’s involvement in cat odor-induced avoidance behavior (Engelke et al. 2021), and the PVT’s regulation of behaviors motivated by both appetitive and aversive stimuli (Choi and McNally 2017, Choi et al. 2019), suggest the involvement of the dPAGàPVTàBLA pathways in antipredatory defensive mechanisms, particularly as rats leave the safety of the nest to forage in an open arena (Figure 4I) (Reis et al. 2023).”  

      (11) The authors use the expression "looming robot predator" in many cases throughout the manuscript. However, it is unclear whether the defensive responses observed in the rats are elicited by the looming stimulus produced by the movement of the robot towards the rats. The authors describe that rats do not respond to a stationary robot, but would the sound produced by the movement of the robot elicit defensive responses? Would non-approaching lateral or dorsoventral movements (not associated with looming) be sufficient to induce defensive behavior in the rats? There is a vast literature in the field about defensive behaviors induced by looming stimuli. The authors should empirically demonstrate that the escaping responses induced by the robot are mediated by looming or refrain to use the looming terminology to avoid confusion.

      Our use of "looming robot predator" is based on empirical evidence from a prior parametric study, which identified the forward, or 'looming,' motion of the Robogator as the key stimulus eliciting a flight response in rats (Kim, Choi, and Lee 2016). This reaction significantly decreased when the robot moved backward from the same starting position, producing a similar sound, and was absent when the robot remained stationary. This suggests that neither sound alone nor the mere presence of a novel object provokes goal-directed escape behavior (Kong et al. 2021). This aligns with studies indicating that simulated looming stimuli, like an expanding disk, induce flight or freezing responses in mice (De Franceschi et al. 2016, Yilmaz and Meister 2013).

      It should be noted that the 2013 study by Yilmaz & Meister (Yilmaz and Meister 2013) on the looming disk paradigm showed that not all mice responded to the stimuli (e.g., Figs. 2A and 3A), with those that did exhibiting rapid habituation by the second exposure. This contrasts with our predatory robot paradigm (Choi and Kim 2010), where all rats consistently fled from the looming robotic predator across multiple trials, underscoring the critical role of looming motion in simulating predator attacks that trigger flight behavior in rats.

      Thus, the term "looming" accurately captures the nature of the robot's movement and its effect on eliciting defensive responses in rats. Nonetheless, should the editors agree with the reviewer's suggestion to minimize potential confusion, we are willing to substitute "looming" with "approaching," although we consider the terms to be synonymous in the context of our study.

      (12) If the authors are citing the Rescorla-Wagner model, they should include at least one additional sentence to explain it, as many people in the field are not familiar with this model.

      In response to the request for clarification on the Rescorla-Wagner model, we have added an explanatory sentence (pg. 4): “Fundamentally, the negative feedback circuit between the amygdala and the dPAG serves as a biological implementation of the Rescorla–Wagner (1972) model, a foundational theory of associative learning that emphasizes the importance of prediction errors in reinforcement (i.e., US), as applied to FC (Fanselow 1998).”

      (13) The authors need to include the normality test used to determine whether a parametric or non-parametric statistical analysis was the most appropriate test for each experiment.

      We have included the outcomes of the normality tests, detailed in Table S1.

      (14) In Fig. 1F, the authors show a representative PAG neuron with peristimulus-time histogram and rasters reaching frequencies higher than 100 Hz and sustained firing rates of >50 Hz following robot activation. The authors should include a firing rate analysis (e.g., average firing rate and maximum firing rate before and after robot activation) of the 22 robot-responsive PAG neurons recorded during the session to clarify whether this high firing rate, which is atypical in other brain regions, is commonly observed in the PAG. Showing the isolated waveforms of some representative neurons would help to clarify whether the activity is being recorded from a single-isolated unit instead of multiple units within the same channel.

      In response to the critique, we have expanded our analysis to include both average and maximum firing rates before and after robot activation for the 22 robot-responsive PAG neurons. This detailed firing rate analysis, illustrating their distribution, has been incorporated into the revised manuscript (refer to Figure S1C and S1D). Furthermore, to alleviate concerns regarding the identification of single-unit activity versus potential multi-unit recordings, we have included peri-event raster plots and waveforms for two additional representative neurons in Figure 1F.

      (15) In Figure 2, the authors should indicate when the recordings are performed on anesthetized vs. freely-moving awake animals.

      In the original manuscript, we specified that the optrode recordings depicted in Figure 2B were conducted on anesthetized rats. To enhance clarity and directly address the critique, we have now clearly indicated this condition in Figure 2A as well.

      (16) The optogenetic stimulation parameters used in Fig 2H indicate that 0.5 mW was sufficient to induce behavioral changes. This is surprising because most optogenetic experiments in the field use much higher intensities (> 5mW). If much lower intensities are sufficient to drive PAG-mediated behaviors, this may be a very important observation that should be conveyed to the field. I recommend the reviewers clarify if they in fact used 0.5 mW and then discuss that the laser intensity used in the experiments was 10X lower than that required for other brain regions

      In our study, we indeed observed that 0.5 mW of dPAG stimulation increased the latency to procure the pellet without completely preventing the action. Notably, at 1 mW, more than half of the animals (n = 5/9 rats; Fig. 2H) and at 3 mW, all rats (9/9) failed to procure the pellet and fled from the foraging area to the nest (Fig. 2G). These results indicate that even lower intensities were sufficient to elicit behavioral changes through dPAG stimulation in a large foraging arena, highlighting the dPAG's sensitivity to optogenetic manipulation. This finding is consistent with our earlier research on dPAG electrical stimulation, which required significantly lower intensities to provoke defensive behaviors compared to the BLA. Specifically, the stimulation intensity needed for aversive behavior in the dPAG was substantially lower (dPAG: 65.0 ± 6.85 µA) than for the BLA (BLA: 275.0 ± 24.44 µA) (Kim et al. 2013). Furthermore, Deng et al. (Deng, Xiao, and Wang 2016) showed that 1 mW of blue light could elicit a 60% freezing response, with 2 mW triggering flight behavior within a latency of 0.6 seconds.

      (17) In Fig 2 G-J, how many animals are being used per group and how was the sequence of the experiments performed? This is very important for replicability.

      A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J. The experimental sequence for these animals consisted of successive pre-stimulation, stimulation, post-stimulation, and robot sessions. We have updated the manuscript to include this information.

      (18) For the photostimulation of PAG neurons in Figs. 2 and 3, the authors need to clarify if the same parameters of laser stimulation used during the anesthetized recordings were also used during the behavioral tests. Also, the wavelength corresponding to the blue laser should be 473 nm instead of 437 nm.

      We thank the reviewer for identifying the error. We confirm that the opto-stimulation parameters (473 nm, 10-ms pulse width, 2 s duration) were consistently applied across both anesthetized recordings and behavioral tests. This consistency has been explicitly stated in the revised manuscript to ensure clarity regarding our experimental approach.

      (19) In Fig. 3I, how was the representative trials selected? Instead of picking up the most representative trials, the authors should demonstrate the response of the cell during the entire session.

      In response to the critique, we clarify that the color-coded PETH shown in Fig. 3I represents averaged BLA activity across a comprehensive set of trials. This includes 8 pre-stimulation, 10 stimulation, and 8 post-stimulation trials for the robot-activated sessions, with a similar distribution for non-stimulated sessions. This approach was chosen to provide a representative overview of the cell's response throughout the entire session. To address the request for more detailed data, we have added traditional PETHs to the revised manuscript (see Fig. S3H), which depict the cell's response across all trials.

      (20) Fig 4 D should demonstrate a colabeling between the anterograde PAG fibers in the PVT and the retrogradely labeled neurons from BLA instead of PAG fibers only.

      We wish to clarify that Fig. 4D is intended to show the distribution of dPAG terminals within the midline thalamic nuclei, as noted in prior research (Krout and Loewy 2000). Although dPAG terminals are distributed throughout the midline thalamus, our observations have specifically highlighted a notable increase in c-Fos expression within the paraventricular nucleus of the thalamus (PVT) in rats subjected to the robotic predator stimulus, in contrast to those in the foraging-only control condition (Fig. 4E). Addressing the reviewer's point, we direct attention to Fig. 4G, which includes images labeled "Robot-experienced" and "Merge." This figure demonstrates a subset of PVT neurons that were retrogradely labeled with CTB injected into the BLA, anterogradely labeled with AAV injected into the dPAG, and activated (as indicated by c-Fos expression) in response to the robotic predator. This provides specific colabeling evidence between anterograde PAG fibers in the PVT and retrogradely labeled neurons from the BLA, directly addressing the critique.

      (21) The resolution of the cFos images is very low and makes it hard to appreciate.

      We have updated Figs. 4F and 4G with high-resolution versions to ensure the details are more clearly visible. Furthermore, should there be a need for even greater clarity, we are prepared to supply the images as TIFF files, which are known for preserving high image quality.

      Reviewer 2:

      (1) The text is clearly written, and I appreciated the inclusion of interesting citations, such as the one about paintings by cavemen. The authors also do a good job of discussing the underlying theoretical framework and the figures are easy to understand. Although the topic is very interesting, the amount of novel work is somewhat low. Figure 1 shows that dPAG cells are activated by the predator, and this has been shown by many prior reports. Similarly, Figure 2 shows that dPAG activation creates defensive responses, and this too has been shown by many prior reports.

      We appreciate the reviewer’s positive remarks. We acknowledge the rich body of research documenting dPAG neuronal activation by various predator cues such as odors (e.g., fox urine) (Lu et al. 2023), and scenarios involving anesthetized or spontaneously moving rat/cat predators, either physically partitioned or harness-restrained (Bindi et al. 2022, Deng, Xiao, and Wang 2016, Esteban Masferrer et al. 2020). Nevertheless, our study distinguishes itself by examining dPAG neuronal responses to a robotic predator, uniquely designed to replicate consistent looming motions across multiple trials and subjects within an environment that simulates natural foraging conditions, inclusive of a safe nest (cf. Choi and Kim, 2010). This approach allowed us to not only reveal the immediate activation of dPAG neurons in response to a rapidly approaching predator but also to explore the consequent fleeing behavior towards safety, thereby providing new insights into the dPAG's role in mediating goal-directed defensive responses in a more ecologically-relevant setting. Furthermore, our investigation extends beyond these findings to assess the impact of dPAG activation on BLA neuronal responses and their functional connectivity during predator-prey interactions, offering a fresh perspective on the neural circuits that support survival behaviors in animals when confronted with naturalistic threats.

      (2) The results in Figure 3 are novel and interesting, but the characterization of BLA activity is incomplete. For example, what are the percentages of BLA cells that are inhibited or activated by all major behaviors observed? These behaviors include approach to pellet, escape from robot, freezing, stretch-attend postures, etc. These same analyses should also be added to dPAG activity in Figure 1. How does BLA single cell encoding of these behaviors relate to their responsivity to dPAG stimulation? And, finally, it is unclear what is the significance of BLA correlated synchronous firing. Is the animal more or less likely to be performing certain behaviors when correlated BLA firing occurs?

      Our analysis, as presented in Figs. 3I, 3K, and S3D-F, selectively focused on BLA cell responses during distinct behaviors such as approaching a pellet and escaping from the robot. These behaviors were selected because their precise temporal markers allow for accurate correlation with BLA cell activity, building on the findings of our previous research (Kim et al. 2018, Kong et al. 2021).

      The robot's motion, programmed to advance a fixed distance before retreating to its starting position, is designed to repeatedly elicit foraging, thus facilitating analysis of neural changes during conflict situations involving food approach and predator avoidance. However, this also leads to the rapid diminution of freezing and stretch-attend postures inside the nest as animals quickly adapt to the robot's movement pattern, rendering a time-stamped analysis of these behaviors unfeasible under our experimental conditions. While the inclusion of these behaviors in our analysis would be insightful, especially in extended interaction scenarios where the robot advances to the nest opening and remains before returning in a less predictable manner, such conditions would likely reduce foraging behavior due to increased fear, deviating from our study's primary objective of elucidating the interactions between the dorsal periaqueductal gray (dPAG) and the basolateral amygdala (BLA) functions.

      Regarding the significance of BLA correlated synchronous firing, our findings, particularly in Figures 3M-O and S4, demonstrate significant synchronous activity among BLA neuronal pairs during encounters with the robot, as opposed to pre-stim, stim, and post-stim sessions. This synchrony is notably prominent among neurons responsive to dPAG stimulation, indicating that BLA neurons involved in processing dPAG signals may play a crucial role in enhancing BLA network coherence to effectively manage predatory threat information (pg. 13).

      (3) In Figure 4, the authors identify the PVT as a potential region that can mediate dPAG to BLA communication via anatomical tracing. However, functional assays are missing. For example, if the PVT is inhibited chemogenetically, does this result in a smaller number of BLA cells that are activated by dPAG stimulation? Does activation of the dPAG-PVT or the PVT-BLA projections cause defensive behaviors? Functionally showing that the dPAG-PVT-BLA circuit controls defensive actions would be a major advance in the field and would greatly enhance the significance of this paper. It would also provide an anatomical substrate to support the view that the BLA is downstream of the dPAG, which was first demonstrated by the authors in their elegant 2013 PNAS paper.

      We appreciate the reviewer’s constructive critique and valuable suggestions on the necessity for functional validation of the dPAG-PVT-BLA circuit's involvement in mediating defensive behaviors. In light of these comments, we have carefully considered and included a discussion on the importance of these proposed experiments as a direction for future research in our manuscript revision (also see response to Reviewer 1’s critique #5).

      Our initial work in 2013 (Kim et al. 2013) laid the groundwork for identifying BLA neurons responsive to dPAG stimulation and suggested the PVT as a potential relay in this neural circuit. Recognizing the limitations of our current study, which does not include direct functional assays, we have adjusted our manuscript to convey the speculative aspect of the dPAG-PVT-BLA circuit’s role more accurately. Moreover, we have enriched our discussion by citing relevant studies that lend support to our proposed circuit mechanism. These references serve to place our findings within the broader context of existing research and highlight the imperative for subsequent studies to empirically confirm the functional significance of the dPAG-PVT-BLA pathway in driving defensive behaviors.

      Reviewer 3:

      (1) The Introduction refers to a negative feedback amygdala-dPAG from a study of the Johansen group, but in this case, the authors were referring to the ventrolateral and not the dorsal PAG.

      We thank the reviewer for pointing out the need to distinguish between the dPAG and vPAG regions in our introduction. While Johansen et al. (2010) investigated the roles of PAG (including both dPAG and vPAG regions; see their Supplementary Figs. 4, 5, and 10), the differentiation between their specific contributions to the amygdala's negative feedback mechanism was not explicitly detailed in their initial publication. This distinction was further elaborated upon in later work by the same group (Yeh, Ozawa, and Johansen 2021), which specifically illuminated the dPAG's role in conditioned fear memory formation and its neural pathways to the PVT that influence fear learning. To reflect this nuanced understanding, we have revised our introduction (pg. 3): “In parallel, Johansen et al. (2010) found that pharmacological inhibition of the PAG, encompassing both dPAG and vPAG regions, diminishes the behavioral and neural responses in the amygdala elicited by periorbital shock US, thereby impairing the acquisition of auditory FC.”

      (2) In the experiments recording dPAG in response to the predator threat, the authors mentioned cells activated by the predator threat, referred to as "robot cells." Were these cells inhibited in response to threat?

      In the Result and Materials and Methods sections, we report that 23.4% (22 out of 94) of dPAG neurons, termed “robot cells,” showed a significant increase in firing rates (z > 3) within a latency of less than 500 ms during exposure to the looming robot threat, but not during the pre- and post-robot sessions. These cells are highlighted in Figures 1E-G. In contrast, we identified only a single unit exhibiting a decrease in activity (z-score < -3) in response to the robot threat. Given the overwhelming prevalence of cells with excitatory responses to the threat, our discussions and analyses have primarily centered on these excited cells. Nevertheless, to ensure a full depiction of our observations, we have included data on the inhibited unit in the revised manuscript, specifically in Figure S1E.

      (3) The authors claim that tetrodes were implanted in the dorsal PAG; however, the electrodes' tips shown in the figures are positioned more ventrally in the lateral PAG (see Figures 1B, S5A).

      The PAG is anatomically organized into dorsomedial (dmPAG), dorsolateral (dlPAG), lateral (lPAG), and ventrolateral (vlPAG) columns along the rostro-caudal axis of the aqueduct. The designation "dorsal PAG" (dPAG) traditionally encompasses the dmPAG, dlPAG, and lPAG regions, a classification supported by extensive track-tracing, neurochemical, and immunohistochemical evidence (e.g., (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993)). As Bandler and Shipley (Bandler and Shipley 1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (Schenberg et al. 2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switch-off behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, electrode placements were strictly within these specified dPAG regions. The electrode tip locations depicted in Figures 1B and S5A correspond with the -6.04 mm template (left panel below) from Paxinos & Watson’s atlas (Paxinos and Watson 1998), situated anteriorly to the emergence of the  vlPAG (right panel below). To enhance clarification in our manuscript, we provide a detailed definition of the dPAG that includes the dmPAG, dlPAG,  and lPAG, and support our electrode placement rationale with references to established literature (pg. 5).

      Author response image 1.

      (4) It would be nice to include a series of observations applying inhibitory tools (i.e., optogenetic photo inhibition) in the dPAG and BLA and see how they affect the behavioral responses in the 'approach food-avoid predator' paradigm. Moreover, it would be interesting to explore how inhibiting the dPAG to PVT pathway influences the flee response during the robot surge.

      We appreciate the suggestion to explore the effects of optogenetic inhibition in the dPAG and BLA on behavioral responses within the 'approach food-avoid predator' paradigm, as well as the potential impact of inhibiting the dPAG to PVT pathway on flee responses during robot surge incidents. As mentioned in our response to Reviewer 1’s critique #5, the application of optogenetic inhibition necessitates transfecting, quantifying, and photoinhibiting a comprehensive set of dPAG neurons activated by predatory threats. This approach is more viable in future studies that can leverage transgenic mouse models for their genetic tractability. Following the Joint Public Review’s recommendations, we have revised our manuscript to ensure a more measured interpretation of our data, carefully balancing the evidence from tracer studies against the limitations of our current methodology.

      Furthermore, referencing Reviewer 1’s critique #9, it is important to consider that various invasive techniques can yield different behavioral outcomes. For instance, research by Olveczky and colleagues (Otchy et al. 2015) demonstrated that acute manipulations (i.e., optogenetic and muscimol inactivation) and chronic surgical ablation of the same brain circuit can produce distinct effects in rats and finches. Despite these methodological constraints, our collective results from lesion, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recording (present) studies cohesively suggest that the dPAG functions upstream of the BLA in processing predatory threat signals.

      (5) The authors should also examine whether 'synaptic' appositions exist between the anterogradely labeled terminals from the dPAG and the double labeled CTB and cFOS neurons in the PVT.

      We appreciate the suggestion to investigate the presence of synaptic appositions, which could potentially offer valuable insights into the synaptic connections and functional interactions within this neural circuit. However, due to the specialized nature of electron microscopy required for these examinations and the extensive resources it entails, this line of inquiry falls beyond the scope of our current study. We hope to address this aspect in future studies, where we can dedicate the necessary resources and expertise to conducting these intricate analyses.

      (6) It is odd to see the projection fields shown in Fig. 4D, where the projection to the PVT looks much sparser compared to other targets in the thalamus and hypothalamus. If the projection to the PVT has such an important function, why does it seem so weak? This should be discussed. Also, because the projection to the PVT seems sparse, the authors should consider alternative paths like the one involving the cuneiform nucleus. The cuneiform nucleus is an important region responding to looming shadows with strong bidirectional links to the dorsolateral periaqueductal gray, providing strong projections to the rostral PVT.

      The perceived scarcity of the dPAG-PVT pathway might not reflect its functional significance accurately. The PVT's small size could make its projections appear less dense in broad anatomical studies. To address this, we have updated Figure 4D with a high-resolution image that offers a detailed view of the PVT region. This enhancement (refer to the updated Fig. 4, bottom) more accurately depicts the projection density within the PVT. It is also critical to consider that the functional impact of neural pathways is not solely dependent on the quantity of projecting neurons. For instance, work by Deisseroth and colleagues (Rajasethupathy et al. 2015) has shown that even relatively sparse monosynaptic projections from the anterior cingulate cortex to the hippocampus can exert significant effects on neural circuit dynamics. Additionally, we have expanded our discussion to consider the potential roles of other circuits, such as the cuneiform nucleus, in driving the behavioral responses observed in our study (pg. 15): “Given the recent significance attributed to the superior colliculus in detecting innate visual threats (Lischinsky and Lin 2019, Wei et al. 2015, Zhou et al. 2019) and the cuneiform nucleus in the directed flight behavior of mice (Bindi et al. 2023, Tsang et al. 2023), further exploration into the communication between these structures and the dPAG-BLA circuitry is warranted.”

      (7) Finally, in the Discussion, it would be nice to comment on how the BLA mediates flee responses. Which pathways are likely involved?

      This excellent suggestion has been incorporated in the discussion (pg. 15): “Future studies will also need to delineate the downstream pathways emanating from the BLA that orchestrate goal-directed flight responses to external predatory threats as well as internal stimulations from the dPAG/BLA circuit. Potential key structures include the dorsal/posterior striatum, which has been associated with avoidance behaviors in response to airpuff in head-fixed mice (Menegas et al. 2018) and flight reactions triggered by auditory looming cues (Li et al. 2021). Additionally, the ventromedial hypothalamus (VMH) has been implicated in flight behaviors in mice, evidenced by responses to the presence of a rat predator (Silva et al. 2013) and upon optogenetic activation of VMH Steroidogenic factor 1 (Kunwar et al. 2015) or the VMH-anterior hypothalamic nucleus pathway (Wang, Chen, and Lin 2015). Investigating the indispensable role of these structures in flight behavior could involve lesion or inactivation studies. Such interventions are anticipated to inhibit flight behaviors elicited by amygdala stimulation and predatory threats, confirming their critical involvement. Conversely, activating these structures in subjects with an inactivated or lesioned amygdala, which would typically inhibit fear responses to external threats (Choi and Kim 2010), is expected to induce fleeing behavior, further elucidating their functional significance.”

      Adamantidis, A., S. Arber, J. S. Bains, E. Bamberg, A. Bonci, G. Buzsaki, J. A. Cardin, R. M. Costa, Y. Dan, Y. Goda, A. M. Graybiel, M. Hausser, P. Hegemann, J. R. Huguenard, T. R. Insel, P. H. Janak, D. Johnston, S. A. Josselyn, C. Koch, A. C. Kreitzer, C. Luscher, R. C. Malenka, G. Miesenbock, G. Nagel, B. Roska, M. J. Schnitzer, K. V. Shenoy, I. Soltesz, S. M. Sternson, R. W. Tsien, R. Y. Tsien, G. G. Turrigiano, K. M. Tye, and R. I. Wilson. 2015. "Optogenetics: 10 years after ChR2 in neurons--views from the community."  Nat Neurosci 18 (9):1202-12. doi: 10.1038/nn.4106.

      Amano, K., T. Tanikawa, H. Kawamura, H. Iseki, M. Notani, H. Kawabatake, T. Shiwaku, T. Suda, H. Demura, and K. Kitamura. 1982. "Endorphins and pain relief. Further observations on electrical stimulation of the lateral part of the periaqueductal gray matter during rostral mesencephalic reticulotomy for pain relief."  Appl Neurophysiol 45 (1-2):123-35.

      Bagley, E. E., and S. L. Ingram. 2020. "Endogenous opioid peptides in the descending pain modulatory circuit."  Neuropharmacology 173:108131. doi: 10.1016/j.neuropharm.2020.108131.

      Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization."  Prog Brain Res 87:269-305. doi: 10.1016/s0079-6123(08)63056-3.

      Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression."  Prog Brain Res 107:285-300. doi: 10.1016/s0079-6123(08)61871-3.

      Bandler, R., and M. T. Shipley. 1994. "Columnar organization in the midbrain periaqueductal gray: modules for emotional expression?"  Trends Neurosci 17 (9):379-89. doi: 10.1016/0166-2236(94)90047-7.

      Bindi, R. P., C. C. Guimaraes, A. R. de Oliveira, F. F. Melleu, M. A. X. de Lima, M. V. C. Baldo, S. C. Motta, and N. S. Canteras. 2023. "Anatomical and functional study of the cuneiform nucleus: A critical site to organize innate defensive behaviors."  Ann N Y Acad Sci 1521 (1):79-95. doi: 10.1111/nyas.14954.

      Bindi, R. P., R. G. O. Maia, F. Pibiri, M. V. C. Baldo, S. L. Poulter, C. Lever, and N. S. Canteras. 2022. "Neural correlates of distinct levels of predatory threat in dorsal periaqueductal grey neurons."  Eur J Neurosci 55 (6):1504-1518. doi: 10.1111/ejn.15633.

      Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections."  J Comp Neurol 351 (4):585-601. doi: 10.1002/cne.903510408.

      Cannon, J. T., G. J. Prieto, A. Lee, and J. C. Liebeskind. 1982. "Evidence for opioid and non-opioid forms of stimulation-produced analgesia in the rat."  Brain Res 243 (2):315-21. doi: 10.1016/0006-8993(82)90255-4.

      Carrive, P, and M. M. Morgan. 2012. "Periaqueductal Gray." In The Human Nervous System, edited by J. K.; Paxinos Mai, G., 367-400. London: Academic Press.

      Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization."  Behav Brain Res 58 (1-2):27-47. doi: 10.1016/0166-4328(93)90088-8.

      Choi, E. A., P. Jean-Richard-Dit-Bressel, C. W. G. Clifford, and G. P. McNally. 2019. "Paraventricular Thalamus Controls Behavior during Motivational Conflict."  J Neurosci 39 (25):4945-4958. doi: 10.1523/JNEUROSCI.2480-18.2019.

      Choi, E. A., and G. P. McNally. 2017. "Paraventricular Thalamus Balances Danger and Reward."  J Neurosci 37 (11):3018-3029. doi: 10.1523/JNEUROSCI.3320-16.2017.

      Choi, J. S., and J. J. Kim. 2010. "Amygdala regulates risk of predation in rats foraging in a dynamic fear environment."  Proc Natl Acad Sci U S A 107 (50):21773-7. doi: 10.1073/pnas.1010079108.

      De Franceschi, G., T. Vivattanasarn, A. B. Saleem, and S. G. Solomon. 2016. "Vision Guides Selection of Freeze or Flight Defense Strategies in Mice."  Curr Biol 26 (16):2150-4. doi: 10.1016/j.cub.2016.06.006.

      De Oca, B. M., J. P. DeCola, S. Maren, and M. S. Fanselow. 1998. "Distinct regions of the periaqueductal gray are involved in the acquisition and expression of defensive responses."  J Neurosci 18 (9):3426-32. doi: 10.1523/JNEUROSCI.18-09-03426.1998.

      Deng, H., X. Xiao, and Z. Wang. 2016. "Periaqueductal Gray Neuronal Activities Underlie Different Aspects of Defensive Behaviors."  J Neurosci 36 (29):7580-8. doi: 10.1523/JNEUROSCI.4425-15.2016.

      Engelke, D. S., X. O. Zhang, J. J. O'Malley, J. A. Fernandez-Leon, S. Li, G. J. Kirouac, M. Beierlein, and F. H. Do-Monte. 2021. "A hypothalamic-thalamostriatal circuit that controls approach-avoidance conflict in rats."  Nat Commun 12 (1):2517. doi: 10.1038/s41467-021-22730-y.

      Esteban Masferrer, M., B. A. Silva, K. Nomoto, S. Q. Lima, and C. T. Gross. 2020. "Differential Encoding of Predator Fear in the Ventromedial Hypothalamus and Periaqueductal Grey."  J Neurosci 40 (48):9283-9292. doi: 10.1523/JNEUROSCI.0761-18.2020.

      Fanselow, M. S. 1998. "Pavlovian conditioning, negative feedback, and blocking: mechanisms that regulate association formation."  Neuron 20 (4):625-7. doi: 10.1016/s0896-6273(00)81002-8.

      Fields, H. L. 2000. "Pain modulation: expectation, opioid analgesia and virtual pain."  Prog Brain Res 122:245-53. doi: 10.1016/s0079-6123(08)62143-3.

      Gross, C. T., and N. S. Canteras. 2012. "The many paths to fear."  Nat Rev Neurosci 13 (9):651-8. doi: 10.1038/nrn3301.

      Herry, C., and J. P. Johansen. 2014. "Encoding of fear learning and memory in distributed neuronal circuits."  Nat Neurosci 17 (12):1644-54. doi: 10.1038/nn.3869.

      Kim, E. J., O. Horovitz, B. A. Pellman, L. M. Tan, Q. Li, G. Richter-Levin, and J. J. Kim. 2013. "Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats."  Proc Natl Acad Sci U S A 110 (36):14795-800. doi: 10.1073/pnas.1310845110.

      Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats."  Sci Adv 4 (4):eaar7328. doi: 10.1126/sciadv.aar7328.

      Kim, J. J., J. S. Choi, and H. J. Lee. 2016. "Foraging in the face of fear: Novel strategies for evaluating amygdala functions in rats." In Living without an amygdala, edited by D. G. Amaral and R. Adolphs, 129-148. The Guilford Press.

      Kim, J. J., R. A. Rison, and M. S. Fanselow. 1993. "Effects of amygdala, hippocampus, and periaqueductal gray lesions on short- and long-term contextual fear."  Behav Neurosci 107 (6):1093-8. doi: 10.1037//0735-7044.107.6.1093.

      Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network."  Elife 10. doi: 10.7554/eLife.72040.

      Krout, K. E., and A. D. Loewy. 2000. "Periaqueductal gray matter projections to midline and intralaminar thalamic nuclei of the rat."  J Comp Neurol 424 (1):111-41. doi: 10.1002/1096-9861(20000814)424:1<111::aid-cne9>3.0.co;2-3.

      Kunwar, P. S., M. Zelikowsky, R. Remedios, H. Cai, M. Yilmaz, M. Meister, and D. J. Anderson. 2015. "Ventromedial hypothalamic neurons control a defensive emotion state."  Elife 4. doi: 10.7554/eLife.06633.

      Lefler, Y., D. Campagner, and T. Branco. 2020. "The role of the periaqueductal gray in escape behavior."  Curr Opin Neurobiol 60:115-121. doi: 10.1016/j.conb.2019.11.014.

      Li, Z., J. X. Wei, G. W. Zhang, J. J. Huang, B. Zingg, X. Wang, H. W. Tao, and L. I. Zhang. 2021. "Corticostriatal control of defense behavior in mice induced by auditory looming cues."  Nat Commun 12 (1):1040. doi: 10.1038/s41467-021-21248-7.

      Lischinsky, J. E., and D. Lin. 2019. "Looming Danger: Unraveling the Circuitry for Predator Threats."  Trends Neurosci 42 (12):841-842. doi: 10.1016/j.tins.2019.10.004.

      Lu, B., P. Fan, M. Li, Y. Wang, W. Liang, G. Yang, F. Mo, Z. Xu, J. Shan, Y. Song, J. Liu, Y. Wu, and X. Cai. 2023. "Detection of neuronal defensive discharge information transmission and characteristics in periaqueductal gray double-subregions using PtNP/PEDOT:PSS modified microelectrode arrays."  Microsyst Nanoeng 9:70. doi: 10.1038/s41378-023-00546-8.

      Magierek, V., P. L. Ramos, N. G. da Silveira-Filho, R. L. Nogueira, and J. Landeira-Fernandez. 2003. "Context fear conditioning inhibits panic-like behavior elicited by electrical stimulation of dorsal periaqueductal gray."  Neuroreport 14 (12):1641-4. doi: 10.1097/00001756-200308260-00020.

      McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit."  Trends Neurosci 34 (6):283-92. doi: 10.1016/j.tins.2011.03.005.

      Menegas, W., K. Akiti, R. Amo, N. Uchida, and M. Watabe-Uchida. 2018. "Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli."  Nat Neurosci 21 (10):1421-1430. doi: 10.1038/s41593-018-0222-1.

      Morgan, M. M., P. K. Whitney, and M. S. Gold. 1998. "Immobility and flight associated with antinociception produced by activation of the ventral and lateral/dorsal regions of the rat periaqueductal gray."  Brain Res 804 (1):159-66. doi: 10.1016/s0006-8993(98)00669-6.

      Otchy, T. M., S. B. Wolff, J. Y. Rhee, C. Pehlevan, R. Kawai, A. Kempf, S. M. Gobes, and B. P. Olveczky. 2015. "Acute off-target effects of neural circuit manipulations."  Nature 528 (7582):358-63. doi: 10.1038/nature16442.

      Paxinos, G., and C. Watson. 1998. The Rat Brain in Stereotaxic Coordinates. San Diego: Academic Press.

      Rajasethupathy, P., S. Sankaran, J. H. Marshel, C. K. Kim, E. Ferenczi, S. Y. Lee, A. Berndt, C. Ramakrishnan, A. Jaffe, M. Lo, C. Liston, and K. Deisseroth. 2015. "Projections from neocortex mediate top-down control of memory retrieval."  Nature 526 (7575):653-9. doi: 10.1038/nature15389.

      Ressler, R. L., and S. Maren. 2019. "Synaptic encoding of fear memories in the amygdala."  Curr Opin Neurobiol 54:54-59. doi: 10.1016/j.conb.2018.08.012.

      Schenberg, L. C., R. M. Povoa, A. L. Costa, A. V. Caldellas, S. Tufik, and A. S. Bittencourt. 2005. "Functional specializations within the tectum defense systems of the rat."  Neurosci Biobehav Rev 29 (8):1279-98. doi: 10.1016/j.neubiorev.2005.05.006.

      Silva, B. A., C. Mattucci, P. Krzywkowski, E. Murana, A. Illarionova, V. Grinevich, N. S. Canteras, D. Ragozzino, and C. T. Gross. 2013. "Independent hypothalamic circuits for social and predator fear."  Nat Neurosci 16 (12):1731-3. doi: 10.1038/nn.3573.

      Tsang, E., C. Orlandini, R. Sureka, A. H. Crevenna, E. Perlas, I. Prankerd, M. E. Masferrer, and C. T. Gross. 2023. "Induction of flight via midbrain projections to the cuneiform nucleus."  PLoS One 18 (2):e0281464. doi: 10.1371/journal.pone.0281464.

      Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear."  Braz J Med Biol Res 36 (5):557-66. doi: 10.1590/s0100-879x2003000500002.

      Walker, D. L., and M. Davis. 1997. "Involvement of the dorsal periaqueductal gray in the loss of fear-potentiated startle accompanying high footshock training."  Behav Neurosci 111 (4):692-702. doi: 10.1037//0735-7044.111.4.692.

      Wang, L., I. Z. Chen, and D. Lin. 2015. "Collateral pathways from the ventromedial hypothalamus mediate defensive behaviors."  Neuron 85 (6):1344-58. doi: 10.1016/j.neuron.2014.12.025.

      Wei, P., N. Liu, Z. Zhang, X. Liu, Y. Tang, X. He, B. Wu, Z. Zhou, Y. Liu, J. Li, Y. Zhang, X. Zhou, L. Xu, L. Chen, G. Bi, X. Hu, F. Xu, and L. Wang. 2015. "Processing of visually evoked innate fear by a non-canonical thalamic pathway."  Nat Commun 6:6756. doi: 10.1038/ncomms7756.

      Yeh, L. F., T. Ozawa, and J. P. Johansen. 2021. "Functional organization of the midbrain periaqueductal gray for regulating aversive memory formation."  Mol Brain 14 (1):136. doi: 10.1186/s13041-021-00844-0.

      Yilmaz, M., and M. Meister. 2013. "Rapid innate defensive responses of mice to looming visual stimuli."  Curr Biol 23 (20):2011-5. doi: 10.1016/j.cub.2013.08.015.

      Zhou, Z., X. Liu, S. Chen, Z. Zhang, Y. Liu, Q. Montardy, Y. Tang, P. Wei, N. Liu, L. Li, R. Song, J. Lai, X. He, C. Chen, G. Bi, G. Feng, F. Xu, and L. Wang. 2019. "A VTA GABAergic Neural Circuit Mediates Visually Evoked Innate Defensive Responses."  Neuron 103 (3):473-488 e6. doi: 10.1016/j.neuron.2019.05.027.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This well-written report uses functional neuroimaging in human observers to provide convincing evidence that activity in the early visual cortex is suppressed at locations that are frequently occupied by a task-irrelevant but salient item. This suppression appears to be general to any kind of stimulus, and also occurs in advance of any item actually appearing. The work in its present form will be valuable to those examining attention, perception, learning and prediction, but with a few additional analyses could more informatively rule out potential alternative hypotheses. Further discussion of the mechanistic implications could clarify further the broad extent of its significance. 

      We thank the editor and the reviewers for the positive evaluation of our manuscript and the thoughtful comments. Below we provide a detailed point-by-point reply to the reviewers’ comments.

      In addition to addressing the reviewers' comments, we have improved the figure legends by explicitly describing the type of error bars depicted in the figures, information which was previously only listed in the Materials and Methods section. Specifically, the statement: “Error bars denote within-subject SEM” was added to several figures, as applicable. We believe that briefly reiterating this information in the figure legends enhances clarity and enables readers to interpret the results more accurately and efficiently. We also updated our code and data sharing statement, as well as opened the repository for the public: “Analysis and experiment code, as well as data required to replicate the results reported in this manuscript are available here: https://doi.org/10.17605/OSF.IO/G4RXV. Raw MRI data is available upon request.”

      Public Reviews

      Reviewer #1 (Public review): 

      Summary: 

      The authors investigated if/how distractor suppression derived from statistical learning may be implemented in early visual cortex. While in a scanner, participants conducted a standard additional singleton task in which one location more frequently contained a salient distractor. The results showed that activity in EVC was suppressed for the location of the salient distractor as well as for neighbouring neutral locations. This suppression was not stimulus specific - meaning it occurred equally for distractors, targets and neutral items - and it was even present in trials in which the search display was omitted. Generally, the paper was clear, the experiment was well-designed, and the data are interesting. Nevertheless, I do have several concerns mostly regarding the interpretation of the results. 

      (1) My biggest concern with the study is regarding the interpretation of some of the results. Specifically, regarding the dynamics of the suppression. I appreciate that there are some limitations with what you might be able to say here given the method but I do feel as if you have committed to a single interpretation where others might still be at play. Below I've listed a few alternatives to consider. 

      We agree with the reviewer that there are important alternatives to consider. Adequately addressing these alternatives will substantially increase the inferences we can draw from our data. Therefore, we address each alternative interpretation in detail below.

      (a) Sustained Suppression. I was wondering if there is anything in your results that would speak for or against the suppression being task specific. That is, is it possible that people are just suppressing the HPDL throughout the entire experiment (i.e., also through ITI, breaks, etc., rather than just before and during the search). Since the suppression does not seem volitional, I wonder if participants might apply a blanket suppression to HPDL un l they learn otherwise. Since your localiser comes a er the task you might be able to see hints of sustained suppression in the HPDL during these trials.  

      It is indeed possible that participants suppressed the HPDL throughout the entire experiment, instead of proactively instantiating suppression on each trial. While possible, we believe that this account is less likely to explain the present results, given the utilized analysis approach, a voxel-wise GLM fit to the BOLD data per run (see Materials and Methods for details). Specifically, we derived parameter estimates from this GLM per location to estimate the relative suppression. Sustained suppression would modulate BOLD responses throughout the run, i.e. presumably also during the implicit baseline period used to estimate the contrast parameter estimates per location. Hence, sustained suppression should not result in a differential modulation between locations, as the BOLD response at the HPDL during the baseline period would be equally suppressed as during the trial. Inspired by the reviewer’s comment, we now clarify this critical point in the manuscript’s Discussion section:

      “Third, participants might have suppressed the HPDL consistently throughout the experiment. This sustained suppression account differs from the proactive suppression proposed here. While this alternative is plausible, we believe that it is less likely to account for the present results, given the analysis conducted. Specifically, we computed voxel-wise parameter estimates and contrasted the obtained betas between locations. Under a sustained suppression account, the HPDL would show suppression even during the implicit baseline period, which would obscure the observed BOLD suppression at and near the HPDL.” 

      (b) Enhancement followed by suppression. Another alternative that wasn't discussed would be an initial transient enhancement of the HPDL which might be brought on by the placeholders followed by more sustained suppression through the search task. Of course, on the whole this would look like suppression, but this still seems like it would hold different implications compared to simply "proactive suppression". This would be something like search and destroy however could be on the location level before the actual onset of the search display.  

      R1 correctly points out that BOLD data, given the poor temporal resolution, do not allow for the detection of potential transient enhancements at the HPDL followed by a later and more pronounced suppression (akin to “search and destroy”). We fully agree with this assessment. However, we also argue that a transient enhancement followed by sustained suppression before search display onset constitutes proactive suppression in line with our interpretation, because suppression would still arise proactively (i.e., before search, and hence distractor, onset). Whether transient enhancement precedes suppression cannot be elucidated by our data, but we believe that it constitutes an interesting avenue for future studies using me-resolved and spatially specific recording methods. We now clarify this important implementational variation in the updated manuscript.

      “Finally, due to the limited temporal resolution of BOLD data, the present data do not elucidate whether the present suppression is preceded by a brief attentional enhancement of the HPDL, as implied by some prior work (Huang et al., 2024). On this account the HPDL would see transient enhancement, followed by sustained suppression, akin to a ‘search and destroy’ mechanism. Critically, we believe that this variation would nonetheless constitute proactive distractor suppression as the suppression would still arise before search onset. Using temporally and spatially resolved methods to explore potential transient enhancements preceding suppression is a promising avenue for future research charting the neural mechanisms underlying distractor suppression.”

      (2) I was also considering whether your effects might be at least partially attributable to priming type effects. This would be on the spatial (not feature) level as it is clear that the distractors are switching colours. Basically, is it possible that on trial n participants see the HPDL with the distractor in it and then on trial n+1 they suppress that location. This would be something distinct from the statistical learning framework and from the repetition suppression discussion you have already included. To test for this, you could look at the trials that follow omission or trials. If there is no suppression or less suppression on these trials it would seem fair to conclude that the suppression is at least in part due to the previous trial. 

      We agree with the reviewer that it is plausible that participants particularly suppress locations which on previous trials contained a distractor. To address this possibility, we conducted a new analysis and adjusted the manuscript accordingly:

      “Second, participants may have suppressed locations that contained the distractor on the previous trial, reflecting a spatial priming effect. This account constitutes a complementary but different perspective than statistical learning, which integrates implicit prior knowledge across many trials. We ruled out that spatial priming explains the present results by contrasting BOLD suppression magnitudes on trials with the distractor at the HPDL and trials where the distractor was not at the HPDL on the previous trial. Results, depicted in Supplementary Figure 4 showed that distractor suppression was statistically significant across both trial types, including trials without a distractor at the HPDL on the preceding trial. This indicates that the observed BOLD suppression is unlikely to be driven by priming and is instead more consistent with statistical learning. Moreover, results did not yield a statistically significant difference between trial types based on the distractor location in the preceding trial. However, these results should not be taken to suggest that spatial priming cannot contribute to distractor suppression; for details see: Supplementary Figure 4.” (p. 13).

      We note that this analysis approach slightly differs from the reviewer’s suggestion, which considered omission trials. However, we decided to exclude trials immediately following an omission to ensure that both conditions were matched as closely as possible. In particular, omission trials represent extended rest periods, which could alter participants’ state and especially modulate the visually evoked BOLD responses (e.g., potentially increasing the dynamic range) compared to trials that did not follow omissions. Our analysis approach avoids this difference while still addressing the hypothesis put forward by the reviewer. We now provide the full explanation and results figure of this priming analysis in the figure text of Supplementary Figure 4: 

      Reviewer #2 (Public review): 

      The authors of this work set out to test ideas about how observers learn to ignore irrelevant visual information. Specifically, they used fMRI to scan participants who performed a visual search task. The task was designed in such a way that highly salient but irrelevant search items were more likely to appear at a given spatial location. With a region-of-interest approach, the authors found that activity in visual cortex that selectively responds to that location was generally suppressed, in response to all stimuli (search targets, salient distractors, or neutral items), as well as in the absence of an anticipated stimulus. 

      Strengths of the study include: A well-written and well-argued manuscript; clever application of a region of interest approach to fMRI design, which allows articulating clear tests of different hypotheses; careful application of follow-up analyses to rule out alternative, strategy-based accounts of the findings; tests of the robustness of the findings to detailed analysis parameters such as ROI size; and exclusion of the role of regional baseline differences in BOLD responses. 

      We thank the reviewer for the positive evaluation of our manuscript.

      The report might be enhanced by analyses (perhaps in a surface space) that distinguish amongst the multiple "early" retinotopic visual areas that are analysed in the aggregate here. 

      We agree with the reviewer that an exploratory analysis separating early visual cortex (EVC) into its retinotopic areas could be an interesting addition. Our reasoning to combine early visual areas into one mask in the original analyses was two-fold: First, we did not have an a priori reason to expected distinct neural suppression between these early ROIs. Therefore, we did not acquire retinotopy data to reliably separate early visual areas (e.g. V1, V2 and V3), instead opting to increase the number of search task trials. The lack of retinotopy data inherently limits the reliability of the resulting cortical segmentation. However, we now performed an analysis separating early visual cortex into V1 and V2 and report the details as Supplementary Text 1:

      “In an exploratory analysis we investigated whether subdivisions of EVC exhibit different representations of priority signals. In brief, we used FreeSurfer to reconstruct brain surfaces (recon-all) from each subject’s anatomical scan. From these reconstructions we derived V1_exvivo and V2_exvivo labels, which were transformed into volume space using ‘mri_label2vol’ and merged into a bilateral mask for each ROI. We then selected the voxels within each ROI that were most responsive to the four stimulus locations, based on independent localizer data. This voxel selection followed the procedure outlined in the Materials and Methods: Region of Interest (ROI) Definition. To accommodate the subdivision into two ROIs (V1 and V2) compared to the single EVC ROI in the main analysis, we halved the number of voxels selected per location. Finally, we applied the same ROI analysis to investigate distractor suppression during search and omission trials, following the procedure described in Materials and Methods: Statistical Analysis. 

      Results of this more fine-grained ROI analyses are depicted in Supplementary Figure 1. First, the results from V2 qualitatively mirrored our primary ROI analysis. BOLD responses in V2 differed significantly between stimulus types (main effect of stimulus type: F<sub>(2,54)</sub> = 31.11, p < 0.001, 𝜂 = 0.54). Targets elicited larger BOLD responses compared to distractors (t<sub>(27)</sub> = 3.05, p<sub>holm</sub> = 0.004, d = 0.06) and neutral stimuli (t<sub>(27)</sub> = 7.82, p<sub>holm</sub> < 0.001, d = 0.14). Distractors also evoked larger responses than neutral stimuli (t<sub>(27)</sub> = 4.78, p<sub>holm</sub> < 0.001, d = 0.09). These results likely reflect top-down modulation due to target relevance and bo om-up effects of distractor salience. Consistent with the primary ROI analysis, the manipula on of distractor predictability showed a distinct pattern of location specific BOLD suppression in V2 (main effect of location: F<sub>(1.1,52.8)</sub> = 5.01, p = 0.030, 𝜂 = 0.16). Neural populations with receptive fields at the HPDL showed significantly reduced BOLD responses compared to the diagonally opposite neutral location (NL-far; post hoc test HPDL vs NL-far: t<sub>(27)</sub> = 2.69, p<sub>holm</sub> = 0.022, d = 0.62). Again, this suppression was not confined to the HPDL but also extended to close by neutral locations (NL-near vs NL-far: t<sub>(27)</sub> = 2.79, p<sub>holm</sub> = 0.022, d = 0.65). BOLD responses did not differ between HPDL and NL-near locations (HPDL vs NL-near: t<sub>(27)</sub> = 0.11, p<sub>holm</sub> = 0.915, d = 0.03; BF<sub>10</sub> = 0.13). As in the EVC ROI analysis, this suppression pattern was consistent across distractor, target, and neutral stimuli presented at the HPDL and NL-near locations compared to NL-far. In sum, neural responses in V2 were significantly modulated by the distractor contingencies, evident as reduced BOLD responses in neural populations with receptive fields at the HPDL and neutral locations near the location of the frequent distractor (NL-near), relative to the neutral location diagonally across the HPDL (NL-far). 

      In V1, BOLD responses also differed significantly between stimulus types (main effect of stimulus type: F<sub>(1.3,35.6)</sub> = 6.69, p = 0.009, 𝜂 = 0.20). Targets elicited larger BOLD responses compared neutral stimuli (t<sub>(27)</sub> = 3.52, p<sub>holm</sub> = 0.003, d = 0.12) and distractors evoked larger responses than neutral stimuli (t<sub>(27)</sub> = 2.62, p<sub>holm</sub> = 0.023, d = 0.09). However, no difference between targets and distractors was observed (t<sub>(27)</sub> = 0.90, p<sub>holm</sub> = 0.375, d = 0.03; BF<sub>10</sub> = 0.17), suggesting reduced sensitivity to task-related effects in V1. Indeed, analyzing the effect of distractor predictability for BOLD responses in V1 showed a different result than in V2 and the combined EVC ROI. There was no significant main effect of location (F<sub>(2,54)</sub> = 2.20, p = 0.120, 𝜂 = 0.08; BF<sub>10</sub> = 0.77). BOLD responses at NL-near and NL-far were similar (BF<sub>10</sub> = 0.171), with the only reliable difference found between target stimuli at the HPDL and NL-far locations (W = 94, p<sub>holm</sub> = 0.012, r = 0.54).”  

      We include the new result figure as Supplementary Figure 5

      We now include reference to these results in the manuscript’s Discussion section:

      “Are representations of priority signals uniform across EVC? A priori we did not have any hypotheses regarding distinct neural suppression profiles across different early visual areas, hence our primary analyses focused stimulus responses neural populations in EVC, irrespective of subdivision. However, an exploratory analysis suggests that distractor suppression may show different patterns in V1 compared to V2 (Supplementary Figure 5 and Supplementary Text 1). In brief, results in V2 mirrored those reported for the combined EVC ROI (Figure 4). In contrast, results in V1 appeared to be only partially modulated by distractor contingencies, and if so, the modulation was less robust and not as spatially broad as in V2. This suggests the possibility of different effects of distractor predictability across subdivisions of early visual areas. However, these results should be interpreted with caution. First, our design did not optimize the delineation of early visual areas (e.g., no functional retinotopy), limiting the accuracy of V1 and V2 segmentation. Additionally, analyses were conducted in volumetric space, which further reduces spatial precision. Future studies could improve this by including retinotopy runs to accurately delineate V1, V2, and V3, and by performing analyses in surface space. Higher-resolution functional and anatomical MRI sequences would also help elucidate how distractor suppression is implemented across EVC with greater precision.”

      Furthermore, the study could benefit from an analysis that tests the correlation over observers between the magnitude of their behavioural effects and their neural responses. 

      R2 highlights that behavioral facilitation and neural suppression could be correlated across participants. The rationale is that if neural suppression in EVC is related to the facilitation of behavioral responses, we should expect a positive relationship between neural suppression at the HPDL and RTs across participants. In this analysis we focused on the contrast between HPDL and NL-far, as this contrast was statistically significant in both the RT (Figure 2) and the neural suppression analysis (Figure 4). First, we computed for each participant the behavioural benefit of distractor suppression as: RT<sub>facilitation</sub> = RT<sub>NL-far</sub> – RT<sub>HPDL</sub>. Thereby RT facilitation reflects the response speeding due to a distractor appearing at the high probability distractor location compared to the far neutral location. Next, we computed neural suppression as: BOLD<sub>suppression</sub> = BOLD<sub>NL-far</sub> – BOLD<sub>HPDL</sub> Thus, positive values reflect the suppression of BOLD responses at the HPDL comparted to the NL-far location. The BOLD suppression index was computed for each stimulus type separately, as in the main ROI analysis (i.e. for Targets, Neutrals and Distractors). Finally, we correlated RT<sub>facilitation</sub> with BOLD<sub>suppression</sub> across participants using Pearson correlation. Results showed a small, but not statistically significant correlation between RT facilitation and BOLD suppression for distractor (r<sub>(26)</sub> = 0.22, p = 0.257), target (r<sub>(26)</sub> = 0.10, p = 0.598) and neutral (r<sub>(26)</sub> = 0.13, p = 0.519) stimuli. Thus, while the direc on of the correlation was in line with the specula on by the reviewer in the “ Recommendations for the authors”, results were not statistically reliable and therefore inconclusive. As also noted in our preliminary reply to the reviewer comments, it was a priori unlikely that this analysis would yield a statistically significant correlation. An a priori power analysis suggested that, to reach a power of 0.8 at a standard alpha of 0.05, given the present sample size of n=28, the effect size would need to exceed r > 0.75, which seemed unlikely for the correlation of behavioural and neural difference scores. Given the inconclusive nature of the results, we prefer to not include this additional analysis in the manuscript, as we believe that it does not add to the main message of the paper but have it accessible to the interested reader in the public “peer review process”.

      The study provides an advance over previous studies, which iden fied enhancement or suppression in visual cortex as a function of search target/distractor predictability, but in less spatially-specific way. It also speaks to open questions about whether such suppression/enhancement is observed only in response to the arrival of visual information, or instead is preparatory, favouring the la er view. The theoretical advance is moderate, in that it is largely congruent with previous frameworks, rather than strongly excluding an opposing view or providing a major step change in our understanding of how distractor suppression unfolds. 

      We agree with the reviewer that our results are an advancement of prior work, particularly with respect to narrowing down the role of sensory areas and the proactive nature of distractor suppression. However, we argue that this represents a significant step forward for several reasons. First, to our knowledge, the literature on distractor suppression, and visual search in general, is by no means unanimous with respect to the conclusion that distractor suppression is instantiated proactively (Huang et al., 2021, 2022). Indeed, there are several studies suggesting the opposite account; reactive suppression (Chang et al., 2023) or contributions by both proactive and reactive mechanisms (Sauter et al., 2021; Wang et al., 2019). Moreover, studies in support of proactive distractor suppression did not investigate the involvement of (early) sensory areas during suppression. Conversely, to our knowledge most studies investigating the involvement of sensory cortex during distractor suppression did not address the question whether suppression arises proactive or reactively.

      Recommendations for the authors: 

      Reviewer #1 ( Recommendations for the authors): 

      Minor Points: 

      (1) There are several disconnects between the behaviour and the MR results - i.e. not stimulus specific yet there are no deficits for targets appearing the HPDL, also no behavioural suppression for the NLNear but neural suppression found. Nevertheless, the behaviour is used as a way to rule out potential attentional strategies when considering whether there is enhancement in the NL-Far condition. I realise you have a few other points here, but I think it's worth addressing what could be seen as a double standard.

      The reviewer points out an important concern, which we feel could have better been addressed in the manuscript. From our point of view a partial dissociation between neural modulations in EVC and eventual behavioural facilitation is not surprising, given the extensive neural processing beyond EVC required for behaviour. However, this assessment may differ, if one stresses an explicit volitional attentional strategy over an implicit statistical learning account. That said, we clearly do not want to create the impression of using a double standard. The lack of behavioural facilitation for targets at NLfar is not a critical part of our argument against explicit attentional strategies. Therefore, we rephrased the relevant paragraph in the Discussion section to now emphasize the importance of the control analysis excluding participants who reported the correct HPDL in the questionnaire (Figure 5), but nonetheless yielded qualitatively identical results to the main ROI analysis (Figure 4). In our opinion, this control analysis provides more compelling evidence against a volitional attentional strategy account without the risk of crea ng the impression of applying a double standard in the interpretation of behavioural data. Additionally, we now acknowledge the limitation of relying on behavioral data in ruling out volitional attentional strategies in the updated manuscript:

      “It is well established that attention enhances BOLD responses in visual cortex (Maunsell, 2015; Reynolds & Chelazzi, 2004; Williford & Maunsell, 2006). If participants learned the underlying distractor contingencies, they could deploy an explicit strategy by directing their attention away from the HPDL, for example by focusing attention on the diagonally opposite neutral location. This account provides an alternative explanation for the observed EVC modulations. However, while credible, the current findings are not consistent with such an interpretation. First, there was no behavioral facilitation for target stimuli presented at the far neutral location, contrary to what one might expect if participants employed an explicit strategy. However, given the partial dissociation between neural suppression in EVC and behavioral facilitation, additional neural data analyses are required to rule out volitional attention strategies. Thus, we performed a control analysis that excluded all participants that indicated the correct HPDL location in the questionnaire, thereby possibly expressing explicit awareness of the contingencies. This control analysis yielded qualitatively identical results to the full sample, showing significant distractor suppression in EVC. Therefore, it is unlikely that explicit attentional strategies, and the enhancement of locations far from the HPDL, drive the results observed here. Instead the current finding are consistent with an account emphasizing the automa c deployment of spatial priors (He et al., 2022) based on implicitly learned statistical regularities.”

      (2) Does the level of suppression change in any way through the experiment? I.e., does it get stronger in the second vs. first half of the experiment? 

      The reviewer askes an interesting question, whether BOLD suppression may change across the experiment. To address this question, we performed an additional analysis testing BOLD suppression in EVC during the first compared to second half of the MRI experiment. Here we defined BOLD suppression as: BOLD<sub>suppression</sub> = ((BOLD<sub>NL-far</sub> – BOLD<sub>HPDL</sub>) + (BOLD<sub>NL-far</sub> – BOLD<sub>NL-near</sub>)) / 2. Thus, in this formula on of BOLD suppression we summarize the two primary BOLD suppression effects observed in our main results (Figure 4). Additionally, as we previously did not observe any significant differences in BOLD suppression magnitudes between different stimulus types (i.e. suppression was similar for target, distractor and neutral stimuli), we collapsed across stimulus types in this analysis.

      Results, depicted below, showed that during both the initial (Run 1+2) and later part (Run 4+5) of the MRI experiment BOLD suppression was statistically significant (BOLD suppression Run 1+2: W = 331, p = 0.003, r = 0.63; BOLD suppression Run 4+5: W = 320, p = 0.007, r= 0.58) , confirming our main results of reliable distractor suppression even in this subset of trials. However, we did not observe any statistically significant differences between early and late runs of the experiment (t<sub>(27)</sub> = -0.21, p = 0.835, d = -0.04). In fact, a Bayesian paired t-test provided evidence for the absence of a difference in BOLD suppression between early compared to later runs (BF<sub>10</sub> = 0.205), suggesting that distractor suppression in EVC was stable throughout the experiment. A qualitatively similar, pattern was evident during omission trials, with significant distractor suppression during early runs (t<sub>(27)</sub> = 2.70, p = 0.012, d = 0.51), but not quite a statistically significant modulation for later runs (t<sub>(27)</sub> = 1.97, p = 0.059, d = 0.37). Again, there was no evidence for a difference in suppression magnitudes across the experiment (W = 198, p = 0.920, d = -0.025) and support for the absence of a difference in BOLD suppression between early and late runs (BF<sub>10</sub> = 0.278).

      Author response image 1.

      Analysis of BOLD suppression magnitudes in EVC across the MRI experiment phases. BOLD suppression was comparable between early (Run 1+2) and late (Run 4+5) phases of the MRI experiment, suggesting consistent suppression in EVC following statistical learning. Error-bars denote within-subject SEM. * p < 0.05, ** p < 0.01, = BF<sub>10</sub> < 1/3.

      In sum, results suggest that distractor suppression in EVC was stable across runs and did not change significantly throughout the experiment. This result was a priori likely, given that participants already underwent behavioral training before entering the MRI. This enabled them to establish modified spatial priority maps, containing the high probability distractor location contingencies, already before the first MRI run. While specula ve, it is possible that participants may still have consolidated the spatial priority maps during the initial runs, but that this additional consolation is not evident in the data, as later runs may see less engagement by participants due to increasing fa gue towards the end of the MRI experiment. Indeed, rapid learning and stable suppression throughout the remainder of the experiment is also reported by prior work (Lin et al., 2021). We believe that it is highly interesting for future studies to investigate the development of distractor suppression across learning, with initial exposure to the contingencies inside the MRI. However, as the present results are inconclusive, we prefer to not include this analysis in the main manuscript, as it may not provide significant additional insight into the neural mechanisms underlying distractor suppression. 

      (3) In the methods vs. results you have reported the probabili es slightly differently. In the methods you say the HPDL was 6x more likely to contain a distractor whereas in the results you say 4x. Based on the reported trial numbers I think it should be 4, but probably you want to double check that this is consistent and correct throughout. 

      We thank the reviewer for bringing this inconsistency to our attention. We have corrected this oversight in the adjusted manuscript: 

      “One of the four locations of interest was designated the high probability distractor location (HPDL), which contained distractor stimuli (unique color) four mes more o en than any of the remaining three locations of interest. In other words, if a distractor was present on a given trial (42 trials per run), the distractor appeared 57% (24 trials per run) at the HPDL and at one of the other three locations with equal probability (i.e., 14% or 6 trials per run per location).” 

      Reviewer #2 ( Recommendations for the authors): 

      The authors have performed their analyses in the volume rather than the surface, and have grouped together V1, V2, and V3 as "early visual cortex". As the authors' claims lean heavily on the idea that they are measuring "early" visual responses, the study would be improved by delinea ng the ROIS within these different retinotopic regions. Such an approach might be facilitated by analysing data on the reconstructed surface. 

      Please refer to our reply to this analysis suggested in the Public review.

      The authors rightly tread carefully on the causal link between their neural findings and the behavioural outcomes. The picture might be clarified somewhat further by testing for a positive relationship between behavioural effect sizes and neural effect sizes across participants. e.g. to what extent is the search advantage when distractors are presented at the "HPDL" linked to greater suppression of BOLD at the HDPL region of early visual cortex? 

      Please refer to our reply to this analysis suggested in the Public review.

      Some of the claims based on null hypotheses would be better supported by Bayesian tests e.g. page 6 "This pattern of results was the same regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NL-far ..." and "BOLD responses between HPDL and NL-near locations did not reliably differ ..." This is similar to the approach that the authors adopted later in the section "Ruling out attentional modulation".

      We agree with the reviewer that our ROI analyses would benefit from providing evidence for the absence of a modulation. Accordingly, we updated our results by adding equivalent Bayesian tests. Bayes Factors were computed using JASP 0.18.2 (JASP Team, 2024; RRID:SCR_015823) with default settings; i.e. for Bayesian paired t-tests with a Cauchy prior width of 0.707. Qualitative interpretations of BFs were based on Lee and Wagenmakers (2014). We now report the obtained BF in the Results section. 

      “BOLD responses between HPDL and NL-near locations did not reliably differ (HPDL vs NL-near: t<sub>(27)</sub> = 0.47, p<sub>holm</sub> = 0.643, d = 0.08; BF<sub>10</sub> = 0.19).”

      And:

      “Neural responses at HPDL and NL-near did not reliably differ (t<sub>(27)</sub> = 0.21, p<sub>holm</sub> = 0.835 d = 0.04; BF<sub>10</sub> = 0.21).”

      Moreover, we now denote any equivalent results (defined as BF<sub>10</sub><1/3) in Fig. 4 and Fig. 5, and included the descrip on of the associated symbol in the figure text (“ = BF<sub>10</sub> < 1/3”).

      Additionally, we now also report the BF for all paired t-tests reported in Supplementary Table 1.

      Finally, we addressed the statement: “This pattern of results was the same regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NLfar”. Our inten on was to emphasize that the pattern of results reported in the sentence preceding it was evident for distractor, target, or neutral stimulus, and not to suggest that the magnitude of the effect is the same. Hence, to more accurate reflect the results, we changed this sentence to:  “This pattern of results was present regardless whether the distractor, target, or a neutral stimulus presented at the HPDL and NL-near locations compared to NL-far”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Based on previous publications suggesting a potential role for miR-26b in the pathogenesis of metabolic dysfunction-associated steatohepatitis (MASH), the researchers aim to clarify its function in hepatic health and explore the therapeutical potential of lipid nanoparticles (LNPs) to treat this condition. First, they employed both whole-body and myeloid cell-specific miR-26b KO mice and observed elevated hepatic steatosis features in these mice compared to WT controls when subjected to WTD. Moreover, livers from whole-body miR-26b KO mice also displayed increased levels of inflammation and fibrosis markers. Kinase activity profiling analyses revealed distinct alterations, particularly in kinases associated with inflammatory pathways, in these samples. Treatment with LNPs containing miR-26b mimics restored lipid metabolism and kinase activity in these animals. Finally, similar anti-inflammatory effects were observed in the livers of individuals with cirrhosis, whereas elevated miR-26b levels were found in the plasma of these patients in comparison with healthy control. Overall, the authors conclude that miR-26b plays a protective role in MASH and that its delivery via LNPs efficiently mitigates MASH development.

      The study has some strengths, most notably, its employ of a combination of animal models, analyses of potential underlying mechanisms, as well as innovative treatment delivery methods with significant promise. However, it also presents numerous weaknesses that leave the research work somewhat incomplete. The precise role of miR-26b in a human context remains elusive, hindering direct translation to clinical practice. Additionally, the evaluation of the kinase activity, although innovative, does not provide a clear molecular mechanisms-based explanation behind the protective role of this miRNA.

      Therefore, to fortify the solidity of their conclusions, these concerns require careful attention and resolution. Once these issues are comprehensively addressed, the study stands to make a significant impact on the field.

      We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we have addressed these as good as possible during the revision of our manuscript.

      We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.

      The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.

      Reviewer #1 (Recommendations For The Authors):

      A list of recommendations for the authors is presented below:

      (1) The title should emphasize that the majority of experiments were conducted in mice to accurately reflect the scope of the study.

      As suggested we have updated our title to include the statement that we primarily used a murine model:

      “MicroRNA-26b protects against MASH development in mice and can be efficiently targeted with lipid nanoparticles.”

      (2) It would be useful to know more about miR-26b function, including its target genes, tissue-specific expression, and tissue vs. circulating levels. Is it expected that the two strains of the miRNA (i.e., -3p and -5p) act this similarly? Also, miR-26b expression in the liver of individuals with cirrhosis should be determined.

      The function of miR-26b is still rather elusive, making functional studies using this miR very interesting. In a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021) we have eluded several functions of miR-26b that are already investigated. This was particularly already described in carcinogenesis and the neurological field.

      Target gene wise, there are already several targets described in miRbase. However, for our experiments we feel that determination of the specific target genes is beyond the scope of the current manuscript and rather a focus of follow-up projects.

      Regarding the expression of miR-26b, the liver and blood have rather high and similar expressions of both miR-26b-3p and miR-26b-5p as shown in Author response image 1.

      Author response image 1.

      Expression of miR-26b-3p and -5p. Expression of miR-26b-3p (left) and miR-26b-5p (right), generated by using the miRNATissueAtlas 2025 (Rishik et al. Nucleic Acids Research, 2024). Unfortunately, due to restrictions in tissue availability and the lack of stored RNA samples, we are unable to measure miR-26b expression in the human livers. However, based on the potency of the miR-26b mimic loaded LNPs in the mice (Revised Supplemental Figure 2A-B), we are confident that these LNPs also resulted in a overexpression of miR-26b in the human livers.

      (3) Please explain the rationale behind primarily using whole-body miR-26b KO mice rather than the myeloid cell-specific KO model for the studies.

      The main goal of our study is the elucidation of the general role of miR-26b in MASH formation. Therefore, we decided to primarily focus on the whole-body KO model. While we used the myeloid cell-specific KO model to highlight that myeloid cells play an important role in the observed phenotypes, we believe the whole-body KO model is more appropriate as main focus, particularly also in light of the used LNP targeting that also provides a whole-body approach. Furthermore, this focus on the whole-body model also reflects a more therapeutically relevant approach.

      (4) The authors claim that treatment with LNPs containing miR-26b "replenish the miR-26b level in the whole-body deficient mouse" but the results of this observation are not presented.

      This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Revised Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.

      (5) The number of 3 human donors for the precision-cut liver slices is clearly insufficient and clinical parameters need to be shown. Additionally, inconsistencies in individual values in Figures 8B-E need clarification.

      Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number for these experiments. Clinical parameters are not available, but the liver slices were from healthy tissue.

      We have performed these experiments in duplicates for each individual donor. We have now specified this also in the figure legend to explain the individual values in the graphs:

      “…(3 individual donors, cultured in duplicates).”

      (6) Figure 2D: Please include representative images.

      As suggested we have included representative images in our revised manuscript.

      (7) Address discrepancies in the findings across different experimental settings. For example, the expression levels of the lipid metabolism-related genes are not significantly modulated in whole-body miR-26b KO mice (except for Sra), but they are in the myeloid cell-specific model (but not Sra), and none of them are restored after LNPs injections.

      Although Cd36 is not significantly increased in the whole-body miR-26b KO mice, there is a clear tendency towards increased expression, which is now also validated on protein level (Revised Figure 1K-L). In the myeloid cell-specific model we see a similar tendency, although the gene expression difference of Sra is not significantly changed. This could be due to the difference in the model, since only myeloid cells are affected, suggesting that the effects on Sra are to a large extend driven by non-myeloid cells. This would also fit to the tendency to decreased Sra expression in the mimic-LNP treated mice. Due to the larger variation, this difference did not reach significance, which is rather a statistical issue due to relatively small n-numbers. At this moment, we cannot exclude that these receptors are differentially regulated by different cell-types. For this, future studies are needed focussing on cell-specific targeting of miR-26b in somatic cells, like hepatocytes.

      (8) Figure 4A the images are not representative of the quantification.

      We have selected another representative image that is exactly reflecting the average Sirius red positive area, to reflect the quantification appropriately.

      (9) Figures 5 and 7: Are there not significantly decreased/increased kinases? A deeper analysis of these kinase alterations is necessary to understand how miR-26b exerts its role. A comparison analysis of these two datasets might clarify this regard.

      We indeed very often see in these kinome analysis that the general tendency of kinase activity is unidirectional. We believe that this is caused by the highly interconnected nature of kinases. Activation of one signalling cascade will also results in the activation of many other cascades. However, it is interesting to see which pathways are affected in our study and we find it particularly interesting to see that the tendencies is exactly opposite between both comparisons as KO vs. WT shows increase kinase activities, while KO-LNP vs. KO shows a decrease again. Further showing that the method is reflecting a true biological effect that is mediated by miR26b.

      (10) Determinations of the effect of LNPs containing miR-26b in the KO mice are limited to only a few observations (that are not only significant). More extensive findings are needed to conclusively demonstrate the effectiveness of this treatment method. Similar to the experiments with human liver samples (Figures 8A-E).

      We have now elaborated our observations in the mouse model using LNPs by also analysing the effects on inflammation and fibrosis. However, it seems that the treatment time was not long enough to see pronounced changes on these later stages of disease development. Interestingly, the expression of Tgfb was significantly reduced, suggesting at least that the LNPs on genetic levels have an effect already on fibrotic processes. Thereby, it can be suggested that longer mLNP treatment may result in more effects on protein level as well, which remains to be determined in future studies.

      Unfortunately, due to restrictions in tissue availability, we are unable to increase our n-number or read-outs for these experiments at this moment.

      (11) In Figures 8F-H, the observed increase in circulating miR-26b levels in the plasma of cirrhotic individuals seems contradictory to its proposed protective role. This discrepancy requires clarification.

      In the revised discussion (second to last paragraph), we have now elaborated more on the findings in the plasma of cirrhotic individuals in comparison to our murine in-vivo results, to highlight and discuss this discrepancy.

      (12) Figures 8F-H legend mentions using 8-11 patients per group, but the methods section lacks corresponding information about these individuals.

      These patients, together with inclusion/exclusion criteria and definition of cirrhosis are described in the method section 2.14.

      (13) Figure 8G has 7 data points in the cirrhosis group, instead of 8. Any data exclusion should be justified in the methods section.

      As defined in method section 2.15, we have identified outliers using the ROUT = 1 method, which is the reason why Figure 8G only has 7 data points instead of 8.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Peters, Rakateli, et al. aims to characterize the contribution of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. In addition, the authors provide a rescue of the miR-26b using lipid nanoparticles (LNPs), with potential therapeutic implications. In addition, the authors provide useful insights into the role of macrophages and some validation of the effect of miR-26b LNPs on human liver samples.

      Strengths:

      The authors provide a well-designed mouse model, that aims to characterize the role of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by a Western-type diet on the background of Apoe knock-out. The rescue of the phenotypes associated with the model used using miR-26b using lipid nanoparticles (LNPs) provides an interesting avenue to novel potential therapeutic avenues.

      Weaknesses:

      Although the authors provide a new and interesting avenue to understand the role of miR-26b in MASH, the study needs some additional validations and mechanistic insights in order to strengthen the author's conclusions.

      (1) Analysis of the expression of miRNAs based on miRNA-seq of human samples (see https://ccb-compute.cs.uni-saarland.de/isomirdb/mirnas) suggests that miR-26b-5p is highly abundant both on liver and blood. It seems hard to reconcile that despite miRNA abundance being similar in both tissues, the physiological effects claimed by the authors in Figure 2 come exclusively from the myeloid (macrophages).

      We agree with the reviewer that the effects observed in the whole-body KO model are most likely a combination of cellular effects, particularly since miR-26b is also highly expressed in the liver. However, with the LysM-model we merely want to demonstrate that the myeloid cells at least play an important, though not exclusive, role in the phenotype. In the discussion, we also further elaborate on the fact that the observed changes in the liver can me mediated by hepatic changes.

      To stress this, we have adjusted the conclusion of Figure 2:

      “Interestingly, mice that have a myeloid-specific lack of miR-26b also show increased hepatic cholesterol levels and lipid accumulation demonstrated by Oil-red-O staining, coinciding with an increased hepatic Cd36 expression (Figure 2), demonstrating that myeloid miR-26b plays a major, but not exclusive, role in the observed steatosis.”

      (2) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26a-5p is indeed 4-fold higher than miR-26b-5p both in the liver and blood. Since both miRNAs share the same seed sequence, and most of the supplemental regions (only 2 nt difference), their endogenous targets must be highly overlapped. It would be interesting to know whether deletion of miR-26b is somehow compensated by increased expression of miR-26a-5p loci. That would suggest that the model is rather a depletion of miR-26.

      UUCAAGUAAUUCAGGAUAGGU mmu-miR-26b-5p mature miRNA

      UUCAAGUAAUCCAGGAUAGGCU mmu-miR-26a-5p mature miRNA

      This is a very valid point raised by the reviewer, which we actually already explored in a previous study, describing our used mouse model (Van der Vorst et al. BMC Genom Data, 2021). In this manuscript, we could show that miR-26a is not affected by the deficiency of miR-26b (Figure 1G in: Van der Vorst et al. BMC Genom Data, 2021).

      (3) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26b-5p is indeed 50-fold higher than miR-26b-3p in the liver and blood. This difference in abundance of the two strands is usually regarded as one of them being the guide strand (in this case the 5p) and the other being the passenger (in this case the 3p). In some cases, passenger strands can be a byproduct of miRNA biogenesis, thus the rescue experiments using LNPs with both strands in equimolar amounts would not reflect the physiological abundance miR-26b-3p. The non-physiological overabundance of miR-26b-3p would constitute a source of undesired off-targets.

      We agree with the reviewer on this aspect and this is something we had to consider while generating the mimic LNPs. However, we believe that we do not observe and undesired off-target effects, as the effects of the mimic LNPs at least on functional outcomes are relatively mild and only restricted to the expected effects on lipids. Furthermore, the effects on the kinase profile due to the mimic LNP treatment are in line with our expectations. Combined these results suggest at least that potential off-target effects are minor.

      (4) It would also be valuable to check the miRNA levels on the liver upon LNP treatment, or at least the signatures of miR-26b-3p and miR-26b-5p activity using RNA-seq on the RNA samples already collected.

      This is indeed a valid point that we have now addressed. We have measured the mir26b-3p and mir26b-5p expression levels in livers from mice after 4-week WTD with simultaneous injection with either empty LNPs as vehicle control (eLNP) or LNPs containing miR-26b mimics (mLNP) every 3 days. As shown in Supplemental Figure 2A-B, mLNP treatment clearly results in an overexpression of the mir26b in the livers of these mice. We have rephrased the text accordingly by stating that mLNP results in an “overexpression” rather than “replenishment”.

      (5) Some of the phenotypes described, such as the increase in cholesterol, overlap with the previous publication by van der Vorst et al. BMC Genom Data (2021), despite in this case the authors are doing their model in Apoe knock-out and Western-type diet. I would encourage the authors to investigate more or discuss why the initial phenotypes don't become more obvious despite the stressors added in the current manuscript.

      In our previous publication (BMC Genom Data; 2021), we actually did not see any changes in circulating lipid levels. However, in that study we did not evaluate the livers of the mice, so we do not have any information about the hepatic lipid levels.

      As mentioned by the reviewer, we believe that we see much more pronounced phenotypes in the current model because we use the combined stressor of Apoe-/- and Western-type diet, which cannot be compared to the wildtype and chow-fed mice used in the BMC Genom Data manuscript.

      (6) The authors have focused part of their analysis on a few gene makers that show relatively modest changes. Deeper characterization using RNA-seq might reveal other genes that are more profoundly impacted by miR-26 depletion. It would strengthen the conclusions proposed if the authors validated that changes in mRNA abundance (Sra, Cd36) do impact the protein abundance. These relatively small changes or trends in mRNA expression, might not translate into changes in protein abundance.

      As suggested by the reviewer we have now also confirmed that the protein expression of CD36 and SRA is significantly increased upon miR-26b depletion, visualized as Figure 1K-L in the revised manuscript. Unfortunately, we do not have enough material left to perform similar analysis for the LysM-model or the LNP-model, although based on the whole-body effects we are confident that at least for CD36/SRA in this case the gene expression matches effects observed on protein level.

      (7) In Figures 5 and 7, the authors run a phosphorylation array (STK) to analyze the changes in the activity of the kinome. It seems that a relatively large number of signaling pathways are being altered, I think that should be strengthened by further validations by Western blot on the collected tissue samples. For quite a few of the kinases, there might be antibodies that recognise phosphorylation. The two figures lack a mechanistic connection to the rest of the manuscript.<br /> On this point we respectfully have to disagree with the reviewer. We have used a kinase activity profiling approach (PamGene) to analyse the real-time activity of kinases in our lysates. This approach is different than the classical Western blot approach in which only the presence or absence of a specific phosphorylation is detected. Thereby, Western blot analysis does not analyse phosphorylation in real-time, but rather determines whether there has been phosphorylation in the past. Our approach actually determines the real-time, current activity of the kinases, which we believe is a different and perhaps even more reliable read-out measurement. Therefore, validation by Western blot would not strengthen these observations.

      We have particularly tried to connect these observations to the rest of the manuscript by highlighting the observed signalling cascades that are affected, highlighting a role in inflammation and angiogenesis, thereby providing some mechanistic insights.

      Reviewer #2 (Recommendations For The Authors):

      I would encourage the authors to follow-up on some of the more miRNA focused comments made above, which would strengthen the mechanistic part of the work presented.

      I suggest the authors tone down some of some of the claims made (eg. "clearly increased expression", "exacerbated hepatic fibrosis"), given that some of it might need further validation.

      Wherever needed we have tuned down the tone of some claims, although we believe that most claims are already written carefully enough and in line with the observed results.

      Some of the panels that are supposed to have the same amount of animals have variable N, despite they come from the same exact number of RNA samples or tissue lysates (eg. 1G and 1H, vs 1I and 1J).

      This is indeed correct and caused by the fact that some analysis resulted in statistical outliers as identified using the ROUT = 1 method, as also specified in section 2.15 of the method section.

      It would be nice to have representative images of oil-red-o in all the figures where it is quantified (or at least in the supplementary figures).

      As suggested by the reviewer, we have now included representative images for the LysM-model (Revised Figure 2D) and the LNP-model (Revised Figure 6D) as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reveiwer#1 (Public Review):

      Weaknesses:

      While the novel compound showed a promising potency to the HER2-positive gastric cancer cells and xenograft model, it would be great to also to be evaluated with the HER2-positive breast cancer cell models. The author did not compare the current compounds with other therapeutic strategies targeting HER2 expression at the genetic level. It is unclear whether the EGFR inhibitors gefitinib and canertinib but not HER2-specific inhibitors (i.e. tucatinib) were used as a control in the manuscript.

      We appreciate the reviewer’s insightful comments. Evaluating compound 10 on HER2-positive breast cancer cells is indeed crucial, especially given the established HER2-targeting therapies for breast cancer. In response to this concern, we conducted additional experiments to investigate the impact of compound 10 on HER2-positive breast cancer cell lines AU565 and BT474, specifically assessing its HER2 downregulating activity (Author response image 1).

      Author response image 1.

      HER2 downregulatory effect of compound 10 in HER2-positive breast cancer cell lines, AU565 and BT474.

      The selection of gefitinib (an EGFR tyrosine kinase inhibitor) and canertinib (a pan-HER inhibitor) as positive controls in our manuscript is based on their demonstrated ability to inhibit the protein-protein interaction (PPI) between ELF3 and MED23, as previously reported (J Adv Res. 47, (2023) 173-87. 10.1016/j.jare.2022.08.003; Cancer letters. 325, (2012) 72-9. 10.1016/j.canlet.2012.06.004). In referenced studies, SEAP reporter gene assay was utilized to screen compounds for their capacity to disrupt the ELF3-MED23 PPI. This assay involves GAL4-ELF3 binding to a GAL4 binding site in the SEAP reporter gene, followed by interaction with MED23, leading to RNA polymerase II recruitment and SEAP expression in cells (J Am Chem Soc. 2004, 126(49), 15940. doi: 10.1021/ja0445140). Canertinib exhibited stronger inhibitory activity against ELF3-MED23 PPI compared to gefitinib, but also showed non-specific cytotoxicity. YK1 was subsequently developed based on structural analysis of the interfaces between gefitinib and MED23, and between ELF3 and MED23. Considering the previously validated inhibitory activities of gefitinib and canertinib, these drugs were selected as positive controls in the current study to compare the ELF3-MED23 inhibitory efficacy of novel compounds.

      Reveiwer#1 (Recommendations For the Authors):

      (1) It is unclear how compound 5 did not inhibit HER2 overexpression at mRNA but at protein levels as compounds 3 and 10. Could the author further explain the potential mechanism for compound 5?

      While the exact mechanism remains unclear, the results indicated that compound 5 likely affects the protein level of HER2 through somewhat non-specific mechanisms rather than by inhibiting the ELF3-MED23 PPI. Based on this assessment, compound 5 was excluded from further investigation.

      (2) The HER2 expression and its downstream signaling pathway assay are unclear about the approach. It needs to be included in the methods or supplementary.

      We investigated the ELF3-MED23 PPI inhibitory activity and its subsequent effect on HER2 downregulation using a comprehensive approach involving multiple techniques to ensure precise and unbiased experimental results.

      To assess PPI inhibition, we employed the following assays:

      · SEAP reporter gene assay

      · Fluorescence polarization (FP)

      · Split-luciferase complementation assay

      · GST-pulldown

      · Immunoprecipiation (IP)

      HER2 expression levels were evaluated through:

      · SEAP reporter gene assay

      · Luciferase promoter assay

      · Quantification of HER2 mRNA using qPCR

      · Measurement of HER2 protein levels via western blot analysis

      To evaluate downstream signaling of HER2, we analyzed:

      · Phosphorylation levels of MAPK (pMAPK) and AKT (pAKT)

      These methods were systematically applied to elucidate the mechanism of action of compound 10 in inhibiting ELF3-MED23 interaction and subsequently downregulating HER2.

      For clarity, we have revised the manuscript to provide a detailed description of the experimental methods to assess PPI, as described below.

      “SEAP assay was performed as previously described to measure ELF3-MED23 PPI-dependent HER2 transcription [29]. In this assay, the GAL4-ELF3 fusion protein binds to one of the five GAL4 binding sites on the reporter gene (pG4IL2SX). The interaction between the GAL4-ELF3 fusion protein and endogenous MED23 induces the expression of the SEAP. Once expressed, SEAP acts as a phosphatase on the substrate 4-MUP (4-methyl umbelliferyl phosphate), resulting in increased fluorescence. The mammalian expression vector, …”

      “FP assay was conducted following a previously described method to evaluate the molecular interaction between ELF3 and MED23 [29]. The FP assay operates on the principle of the molecular rotation dynamics. When a fluorescently labeled small molecule is excited by polarized light, the emitted fluorescence can be polarized or depolarized depending on the molecular status. Free small molecules rotate rapidly, altering the orientation of their fluorescence dipole and emitting depolarized light. However, when these small molecules bind to large molecules, such as proteins, the resulting complex rotates more slowly, and the emitted light retains much of its original polarization. In this study, different concentrations of (His)6-MED23391–582, as the large molecule, and 10 nM of FITC-labeled ELF3129–145 peptide, as the fluorescence-labeled small molecule, were combined in …”

      (3) It is confusing to me about the order of the experiments, in which the SAR work came after the synthesis and a series of biochemical studies for the characterization of the synthetic compounds. What is the specific reason for this order?

      We concluded that the current approach is appropriate because the analysis was not intended for structural modification and optimization through SAR (Structure-Activity Relationship) analysis. Instead, the primary objective was to elucidate the structural basis underlying the efficacy of PPI inhibition among compounds sharing the same scaffold. We believe this will provide valuable insights for future design and synthesis of new compounds.

      (4) The yield for each step of the general synthesis needs to be included in the scheme 1.

      Scheme 1 has been updated to include the yield of each step of the synthesis process.

      (5) In line 532, the authors stated 28 compounds, should it be 26?

      ‘Twenty-eight compounds’ includes 26 newly synthesized compounds and 2 positive controls, gefitinib and canertinib.

      (6) Introduction part, lines 74 to 75, "While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression" may not be confirmed in lung cancers.

      HER2 overexpression is usually a direct consequence of gene amplification, although overexpression can occur by other mechanisms [Nat Rev Cancer. 2009;9:463–475. doi: 10.1038/nrc2656.; Cell. 2007;129:1275–1286. doi: 10.1016/j.cell.2007.04.034.]. The levels of HER2 protein expression and gene amplification are linearly associated and highly concordant in breast cancer, colorectal cancer, ovarian cancer, and esophageal adenocarcinoma [World J Gastrointest Oncol. 2019, 11(4): 335–347. doi: 10.4251/wjgo.v11.i4.335; J Clin Oncol. 2002;20:719–26. doi.org/10.1200/JCO.2002.20.3.71; Oncology. 2001;61(Suppl 2):14–21. doi.org/10.1159/000055397; Science. 1989, 244(4905):707-12. doi: 10.1126/science.2470152; Cancer. 2014 Feb 1; 120(3): 415–424. doi: 10.1002/cncr.28435]. As reviewer mentioned, the linear association between of HER2 protein expression and gene amplification has not been fully established for NSCLC [ESMO Open. 2022, 100395. doi: 10.1016/j.esmoop.2022.100395].

      Therefore, we change the sentence as describe below.

      “While HER2 gene amplification is the primary mechanism responsible for HER2 overexpression in most HER2-positive cancers, except in lung cancer [16], high transcription rates of HER2 per gene copy have also been observed to contribute.”

      (7) The abstract part, lines 31 and 32, the detailed experimental data for SEAP needs to be expressed in another way.

      SEAP is a type of reporter gene assay. We revised the manuscript as follows and we additionally described it method part.

      “Upon systematic analysis, candidate compound 10 was selected due to its potency in downregulating reporter gene activity of HER2 promoter confirmed by SEAP activity and its effect on HER2 protein and mRNA levels.”

      (8) The author should combine the box for Chalcone, pyrazoline, Licochalcone E, and YK-1, Figures 1 and 2 into a new single Figure.

      We revised the manuscript following the reviewer's comments.

      (9) Provide the list of antibodies and sources for the cell-based and western blot assays.

      Table S1 presents detailed information about the antibodies and dilution ratios used in the cell-based and western blot assays.

      Reveiwer#2 (Public Reviews):

      Weaknesses:

      The rationale behind the proposed structural modifications for the three groups of compounds is not clear.

      Reveiwer#2 (Recommendations For the Authors):

      (1) Based on previous work experience, it would be interesting to evaluate the in silico mode of interaction of compound 10.

      As suggested by the reviewers, we additionally performed in silico docking study to identify the mode of interaction of compound 10 (Author response image 2). As shown below, the results indicate that compound 10 shares a similar binding orientation with YK1, forming an H-bond with the H449 residue. Although it does not interact with the D400 residue, it was predicted to create an additional H-bond with S450, which is right next to H449, thereby reinforcing the overall binding of compound 10 to MED23. Moreover, compound 10 was additionally predicted to form a pi-pi interaction with F399, which has been previously identified as an important interaction for compounds to demonstrate outstanding PPI inhibitory effect against ELF3 and MED23.

      Author response image 2.

      Docking analysis of compound 10.

      (2) The chalcones presented in this study are structurally similar to those previously presented by the group (ref 29). In said work, most of the compounds exhibited activities with IC50 values between 1.3 and 3 μM, with inhibition values at 10 μM ranging between 80 and 90% in the SEAP assay. These results are similar to those observed in this paper for the same assay. Can an explanation be found?

      Chalcones are inherently flexible molecules, giving them a high chance of occupying critical hotspot residues within the binding interface of ELF3-MED23, irrespective of the side chains introduced to this moiety. However, depending on the type of side chains introduced, the overall drug-like properties of compounds can be significantly altered, while still maintaining their PPI inhibitory effect. The significance of this study lies in our effort to enhance metabolic stability through extensive introduction of methoxy groups and other hydrophobic side chains to the chalcone skeleton, while preserving high PPI inhibitory activity.

      (3) Is the replacement of H and OH by OMe necessary? Does it improve any property (activity, selectivity, bioavailability, solubility, etc.)? Regarding the derivatives of group 2, why did they decide to replace the O-H, which in silico demonstrated favorable hydrogen bond interactions with Asp400? How do these molecules look in the binding site? Perhaps this is a point to discuss since the substitution of OH led to the obtaining of inactive molecules, or is the effect due to substitution with the terminal aromatic ring with 3 OMe?

      We modified the hydroxyl group moiety of YK-1 into a methoxy group to reduce the polarity of the compound, thereby enhancing its cell membrane permeability (Author response image 3) and reducing the likelihood of rapid elimination through phase II metabolic pathways in vivo. Additionally, we considered the potential conversion of the methoxy group back to a hydroxyl group via phase I metabolism in vivo.

      Author response image 3.

      Impact of methoxy group introduction on TPSA (total polar surface area) of each molecule. TPSA of each molecule containing chalcone structure were calculated using the Molinspiration webserver.

      (4) Lines 134 and 134: "Only compounds are in red."

      We revised the manuscript following the reviewer's comments.

      (5) Line 171: "Chalcone skeleton, shown in red."

      We revised the manuscript following the reviewer's comments.

      (6) Line 350: "N-1-acetyl-4,5-dihydropyrazoline."

      We revised the manuscript following the reviewer's comments.

      (7) Scheme 1. Replace "h" with "hr".

      We revised the manuscript following the reviewer's comments. Scheme 1 has been replaced by a new version.

      (8) Where is "Table S1" in SI?

      Tables S1 and S2 are supposed to be included in SI. We will ensure that Tables S1 and S2 are properly uploaded to the SI section.

      (9) In Figure 6, Graph D, to enhance comprehension, please incorporate red arrows indicating drug administration.

      We revised Figure 6 (D) following the reviewer's comments. Red arrows indicating drug administration have been incorporated, along with a descriptive comment "Drug administration" next to each arrow. Additionally, the figure legend now includes a clear description of these additions.

      Reveiwer#3 (Public review):

      Weaknesses:

      Compound 10 potency as PPI inhibitor has been shown in only one cell line NCI-N87.

      Reveiwer#3 (Recommendations For the Authors):

      (1) The authors should show this compound 10 is effective in other gastric cancer cells like KATOIII, SNU1.

      We evaluated the HER2 downregulating activity of compound 10 in the gastric cancer cell line, SNU216, which is confirmed to express high level of HER2 protein (Author response image 4).

      Author response image 4.

      HER2 downregulatory effect of compound 10 in HER2-positive gastric cancer cell line, SNU216. (A) Expression levels of HER2 and ELF3 in various gastric cancer cell lines. (B) HER2 downregulation in the SNU216 cell line following treatment with compound 10.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Comment 1: The authors showed increased plasma IL-22 and its expression in the intestine. Are intestinal ILC3s the main source of plasma IL-22?

      Reply: ILC3s are the main source of IL-22 as reported previously (PMID: 30700914). In the small intestine, ILC3s account for about 62% of IL22+ cells. Other IL22+ cells include γδ T, Foxp3+T and CD4+T cells.

      Comment 2: The authors transplanted intestinal ILC3s from NCD mice to DIO mice and showed significant metabolic improvements. However, in Fig. 1, intermittent fasting increased IL-22positive ILC3s proportion rather than changing the total number. Please clarify whether this transplantation is due to increasing ILC3s number or introducing more IL-22 positive ILC3s (which are decreased in DIO). Are these transplanted ILC3s by default homing to the intestine rather than to other tissues?

      Reply: We believe that the transplantation increases ILC3s number, leading to the increment in IL22 levels. The transplanted ILC3s by default are homing to the intestine rather than to other tissues because ILC3s express several homing receptors such as CCR7, CCR9, and α4β7, which modulate their capacity to migrate to the gut (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). Our observation that ILC3s in adipose tissue remained unchanged by ILC3 cell transplantation (Supplementary Figure 5F) also supports this concept.

      Comment 3: Thermogenesis in this acute cold challenge is mainly by brown adipose tissue. Beiging is a chronic and adaptive response. Based on the data in WAT, there is a beiging phenotype, but the core body temperature in acute cold challenge is not an accurate readout. It would be a missed opportunity by not evaluating thermogenic activity in BAT. More browning genes should be included to strengthen the beiging phenotype of WAT. Moreover, inflammation in WAT can be examined to provide a whole picture of adipose tissue remodeling through this pathway.

      Reply: Per suggestion, we performed additional experiments to measure levels of inflammation genes such as Il4, Il1b, Il6, Il22, Il23, Il17a. As shown in supplemental figure 2D, these inflammation relevant genes were not altered.

      Comment 4: For the SVF beige adipocyte differentiation, 100 ng/mL IL-22 was used. This is highly above the physiological concentration at ~5 pg/mL. Please justify this high concentration used.

      Reply: We agree with the reviewer that the dose of IL-22 used is high. However, the efficient dose at 100 ng/ml used in our studies is consistent with the literatures. Previous reports have shown that IL-22 directly activates Stat3 in adipose tissue and primary adipocytes, and promotes the expression of genes involved in triglyceride lipolysis (Lipe and Pnpla2) and fatty-acid β-oxidation (Acox1) at the dose of 100 ng/ml (Wang X, Ota N, et al. Nature. 2014). Consistently, other studies have reported that IL-22 at 100 ng/ml significantly reversed the enhanced expression of CCL2, CCL20 and IL1B mRNAs in granulosa cells in vitro (Qi X, et al. Nat Med. 2019).

      Comment 5: The authors showed increased Ucp1 and Cidea expression by IL-22 treatment in SVFs. Please be aware that these increases are likely due to boosted adipogenesis as told by the morphology. Please examine more adipogenic markers to confirm. Is this higher adipogenesis caused by the high concentration of IL-22?

      Reply: Per suggestion, we examined the expression of adipogenic marker genes such as Pparγand Fabp4. We found that IL-22 did not increase the levels of these adipogenic marker genes relevant to the PBS control as shown in supplemental figure 6F.

      Author response image 1.

      Comment 6: In line 201, the authors drew the conclusion that IL-22 increased SVF beige differentiation. To fully support this conclusion, the authors should assure adipogenesis at the same baseline and then compare beiging, or examine the effect of IL-22 on normal adipogenesis to compare with beige differentiation.

      Reply: We examined the expression of adipogenic marker genes such as Pparγ and Fabp4 and found that IL-22 did not increase the expression of these adipogenic marker genes relevant to the PBS control.

      Reviewer #2:

      This study aims to investigate the mediatory role of intestinal ILC3-derived IL-22 in intermittent fasting-elicited metabolic benefits.

      Strengths:

      The observation of induction of IL-22 production by intestinal ILC3 is significant, and the scRNAseq provides new information into intestine-resident immune cell profiling in response to repeated fasting and refeeding.

      Weaknesses:

      The experimental design for some studies needs to be improved to enhance the rigor of the overall study. There is a lack of direct evidence showing that the metabolically beneficial effects of IF are mediated by intestinal ILC3 and their derived IL-22. The mechanism by which IL-22 induces a thermogenic program is unknown. The browning effect induced by IF may involve constitutive activation of lipolysis, which was not considered.

      Comment 1: Lack of direct evidence showing that IL-22-expressing ILC3s in intestine is the key contributor to intermittent fasting (IF)-mediated elevation of circulating IL-22 levels. The fraction of IL-22-expressing cells was increased threefold by IF but the increase in circulating IL-22 is moderate (Figs. 1J and 1K).

      Reply: IL-22 in circulation is subjected to clearance, degradation, and binding with plasma proteins, et al. Thus, circulating levels of IL-22 may be much lower than the amount secreted by the intestinal IL-22 positive ILC3s.

      Comment 2: The loss of fat mass by IF suggests that the active lipolysis may explain the white fat browning which was not considered. This may apply to the observations in IL-22 treated mice as well as IL-22R KO mice.

      Reply: We analyzed the expression of genes relate to lipolysis in NCD and NCD-IF mice and found that IF did not alter the levels of these genes in white adipose tissues (Supplementary figure 2D). We have addressed this concerns in lines 119, page 6.

      Author response image 2.

      Comment 3: IL-22 administration and adoptive transfer of ILC3 had no significant effect on body weight. Not clear how IL-22 improves insulin sensitivity in this case.

      Reply: Our results are consistent with previous report showing that IL-22 administration improves insulin sensitivity without change in body weight (Qi X, et al. Nat Med. 2019). In addition, previous studies have demonstrated that IL-22 can increase Akt phosphorylation in muscle, liver and adipose tissues, leading to improvement in insulin sensitivity (Wang X, et al. Nature. 2014). We have addressed this potential mechanism in lines192-195, page 9.

      Comment 4: The energy expenditure data look unusual given that there was little increase in oxygen consumption during dark cycle compared to light cycle (Fig.3).

      Reply: The not so obvious difference in oxygen consumption between dark cycle and light cycle may be due to the technical problem of the system.

      Comment 5: The thermogenic capacity for the whole fat pad needs to consider the expression of UCP1 in certain amount of tissue and the total mass for each individual animal because the mRNA level itself does not reflect the whole tissue capacity.

      Reply: We used the whole subcutaneous adipose tissue from one side for qPCR to reflect the whole tissue capacity.

      Comment 6: The design of studies for the adoptive transfer of ILC3 was concerned. The PBS is not a good control for the group with ILC3 cells (Figs. 2A-2H). Similar issue applies for the co-culture study in which beige only is not an ideal control for Beige+ILC3 (Figs. 2I-2J).

      Reply: We agree with the reviewer that the PBS is not a good control. Because we cannot find a similar immune cell without any effect on adipocytes, we designed this experiment based on other studies in which saline or PBS are used as ILC transfer experiment controls (Sasaki T, et al. Cell Rep. 2019; Wang H, et al. Nat Commun. 2019)

      Comment 7: The induction of thermogenesis by IL-22 treatment may be related to enhanced differentiation rather than direct activation of thermogenic genes (Figs. 4G and 4H).

      Reply: Our observation that IL-22 did not alter the levels of genes related to adipogenesis (Supplemental figure 6F) indicates that IL-22 may not alter the differentiation of adipocytes. We addressed this concern in Lines 211-212, page 10.

      Reviewer #3:

      Chen et al. investigated how intermittent fasting causes metabolic benefits in obese mice and found that intestinal ILC3 and IL-22-IL-22R signaling contribute to the beiging of white adipose tissue (WAT) and consequent metabolic benefits including improved glucose and lipid metabolism in diet-induced obese mice. They demonstrate that intermittent fasting causes increased IL22+ILC3 in small intestines of mice. Adoptive transfer of purified intestinal ILC3 or administration of exogenous IL-22 can lead to increases in UCP1 gene expression and energy expenditure as well as improved glucose metabolism. Importantly, the above metabolic benefits caused by intermittent fasting are abolished in IL-22R-/- mice. Using an in vitro experiment, the authors show that ILC3derived IL-22 may directly act on adipocytes to promote SVF beige differentiation. Finally, by performing sc-RNA-seq analysis of intestinal immune cells from mice with different treatments, the authors indicate a possible way of intestinal ILC3 being activated by intermittent fasting. Overall, this study provides a new mechanistic explanation for the metabolic benefits of intermittent fasting and reveals the role of intestinal ILC3 in the enhancement of the whole-body energy expenditure and glucose metabolism likely via IL-22-induced beige adipogenesis.

      Although this study presents some interesting findings, particularly IL-22 derived from intestinal ILC3 could induce beiging of WAT by directly acting on adipocytes, the experimental data are not sufficient to support the key claims in the manuscript.

      Comment 1: Only increased UCP1 expression on mRNA level is not enough to support the beiging of WAT. More methods such as western blotting and immunostaining of UCP1 in WAT are needed to confirm the enhanced beige adipogenesis.

      Reply: Additional experiments have been performed to measure the UCP1 protein by Western blot. The data is included in Figure 4I and Supplementary Figure 2E.

      Comment 2: IL-22 is known to modulate metabolic pathways via multiple downstream functions. The use of whole-body knockout of IL-22R could not exclude the indirect effect on the promotion of beiging of WAT. Specific deletion of IL-22R in adipose tissues is therefore needed to confirm the direct effect of IL-22 on adipocytes which is suggested by the in vitro study.

      Reply: We agreed with the reviewer that specific deletion of IL-22R in adipose tissues is critical to confirm the direct effect of IL-22 on adipocytes. We will generate the AdioQ-IL-22R-/- mice to test this concept further in vivo.

      Comment 3: The authors failed to show the cellular distribution of IL-22R in adipose tissues. This is important because the mechanism that explains the increased beige adipogenesis could be different based on the expression of IL-22R in adipose progenitor cells or mature adipocytes. So it is not appropriate to conclude that "IL-22 then directly activates IL-22R on adipocytes, leading to subsequent induction of beiging of white adipose tissue" in line 407. Additionally, Oil red O staining is needed for Fig 4G and Fig 5J, and protein levels of UCP1 and adipogenesis-related markers are needed to evaluate beige fat differentiation and the whole adipogenesis.

      Reply: Per suggestion, we have added the expression of IL-22R in adipose progenitor cells or mature adipocytes (Supplementary Figure 6E). In addition, protein levels of UCP1 and adipogenesis-related markers to evaluate the whole adipogenesis (Figure 4I, Supplementary figure 6F) are now included. We have also addressed this issue in lines 207-215, page 10.

      Comment 4: Although the authors provided some hypothesis about how intermittent fasting increases IL-22+ILC3 in small intestines by sc-RNA-seq analysis, some functional assays are needed to identify the factors, for example, how about the levels of macrophage-derived IL-23 or AHR ligands in small intestines and whether they contribute to increased percentages of intestinal IL-22+ILC3 following intermittent fasting.

      Reply: We used flow cytometry sorting of macrophages combined with qPCR experiments to preliminarily demonstrate that intermittent fasting increases the expression of molecules such as Cd44 and CCl4 (Supplementary Figure 10B), which may contribute to the increase in the proportion of IL-22+ ILC3s in the intestine under intermittent fasting. Our observation that IL-23 mRNA levels were not changed indicates that this molecule may not the major contributor for the communication between macrophage and ILC3s. Other potential molecules such as AHR ligands remain to be explored.

      Comment 5: What are the differences between adipose ILC3 and intestinal ILC3? Why do transferred ILC3 only migrate to the small intestine but not WAT of recipient mice? It would be better to examine or at least discuss whether other factors from intestinal ILC3 may also contribute to beiging of WAT following intermittent fasting.

      Reply: Intestinal ILC3s specifically express gut homing receptors CCR7, CCR9, and α4β7 (PMID: 26141583; PMID: 26708278; PMID: 25575242; PMID: 34625492). This may explain transplantation of intestinal ILC3s can migrate mainly to the intestine instead of adipose tissue (PMID: 34625492). The proportion of ILC3s in adipose tissue of mice is very small. Their functions have not been clarified yet. We have addressed this issue in lines 156-158, page 8.

      There are some other factors from intestinal ILC3 which may also contribute to beiging of WAT following intermittent fasting. By secreting IL-22, ILC3 enhanced the intestinal mucosal barrier, leading to reduction of the influx of LPS and PGN into the bloodstream under high-fat diet conditions, and subsequent increase in the beiging of white adipose tissue (Chen H, et al. Acta Pharm Sin B. 2022). We have addressed this potential mechanism in lines 344-347, page 16.

      Comment 6: The sensitivity of the IL-22 ELISA kit used in the manuscript was 8.2 pg/mL, according to the information from the methods, however, in Fig. 1J and Fig. 2B, the IL-22 levels in mouse plasma were lower than 6 pg/mL, which was below the sensitivity of the ELISA kit and also the assay range. Please explain.

      Reply: We have double-checked the original data and found that we have made a mistake in calculating the concentration of IL-22. We have corrected this error (Fig. 1J, Fig. 2B).

      Comment 7: In Fig 7A, the significance of the Hypothesis testing should be marked. In Fig 7F and 7G, the contrast between the two groups is not apparent, other comparing ways could be used to enhance the readability.

      Reply: Per suggestion, we have marked the significance of the hypothesis testing between HFD vs NCD and HFD-IF vs HFD in Fig7A. Shown in Fig 7F and 7G are the top 20 enriched interacting proteins between different cell types. The dot plot displays the average expression level and significance of protein interactions in cell types.

      Comment 8: The total food intake of fasting mice fed with NCD or HFD was less than those without fasting, and the food intake rate the author showed in Fig S1 represents the value that was normalized to body weight. So the author should describe it precisely In line 114.

      Reply: We have revised the statement accordingly in line 114-115.

      Comment 9: Western blotting analysis has been described in methods, however, there is no corresponding experimental data in the result part.

      Reply: The Western blotting results are now included.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This study presents an important finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants, hereby linking perception and action during implicit adaptation. The evidence supporting the claims of the authors is convincing. The normative approach of the proposed PEA model, which combines ideas from separate lines of research, including vision research and motor learning, opens avenues for future developments. This work will be of interest to researchers in sensory cue integration and motor learning.

      Thank you for the updated assessment. We are also grateful for the insightful and constructive comments from the reviewers, which have helped us improve the manuscript again. We made necessary changes following their comments (trimmed tests, new analysis results, etc) and responded to the comments in a point-by-point fashion below. We hope to publish these responses alongside the public review. Thank you again for fostering the fruitful discussion here.

      Public Reviews:

      Reviewer #1 (Public Review):

      I appreciate the normative approach of the PEA model and am eager to examine this model in the future. However, two minor issues remain:

      (1) Clarification on the PReMo Model:

      The authors state, "The PReMo model proposes that this drift comprises two phases: initial proprioceptive recalibration and subsequent visual recalibration." This description could misinterpret the intent of PReMo. According to PReMo, the time course of the reported hand position is merely a read-out of the *perceived hand position* (x_hat in your paper). Early in adaptation, the perceived hand position is biased by the visual cursor (x_hat in the direction of the cursor); towards the end, due to implicit adaptation, x_hat reduces to zero. This is the same as PEA. I recommend that the authors clarify PReMo's intent to avoid confusion.

      Note, however, the observed overshoot of 1 degree in the reported hand position. In the PReMo paper, we hypothesized that this effect is due to the recalibration of the perceived visual target location (inspired by studies showing that vision is also recalibrated by proprioception, but in the opposite direction). If the goal of implicit adaptation is to align the perceived hand position (x_hat) with the perceived target position (t_hat), then there would be an overshoot of x_hat over the actual target position.

      PEA posits a different account for the overshoot. It currently suggests that the reported hand position combines x_hat (which takes x_p as input) with x_p itself. What is reasoning underlying the *double occurrence* of x_p?

      There seem to be three alternatives that seem more plausible (and could lead to the same overshooting): 1) increasing x_p's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase), 2) decreasing sigma_p (assuming that participants pay more attention to the hand during the report phase), 3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration. All these options, at least to me, seem equally plausible and testable in the future.

      For clarification of the PReMo model’s take on Fig4A, we now write:

      “The PReMo model proposes that the initial negative drift reflects a misperceived hand location, which gradually reduces to zero, and the late positive drift reflects the influence of visual calibration of the target (Tsay, Kim, Saxena, et al., 2022). ”

      However, we would like to point out that the PEA model does not predict a zero (perceived hand location) even at the late phase of adaptation: it remains negative, though not as large as during initial adaptation (see Figure 4A, red line). Furthermore, we have not seen any plausible way to use a visually biased target to explain the overshoot of the judged hand location (see below when we address the three alternative hypotheses the reviewer raised).

      We don’t think the “double” use of xp is a problem, simply because there are TWO tasks under investigation when the proprioceptive changes are measured along with adaptation. The first is the reaching adaptation task itself: moving under the influence of the clamped cursor. This task is accompanied by a covert estimation of hand location after the movement (). Given the robustness of implicit adaptation, this estimation appears mandatory and automatic. The second task is the hand localization task, during which the subject is explicitly asked to judge where the hand is. Here, the perceived hand is based on the two available cues, one is the actual hand location xp, and the other is the influence from the just finished reaching movement (i.e., ). For Bayesian modeling from a normative perspective, sensory integration is based on the available cues to fulfill the task. For the second task of reporting the hand location, the two cues are xp and (with a possible effect of the visual target, which is unbiased since it is defined as 0 in model simulation; thus, its presence does not induce any shift effect). xp is used sequentially in this sense. Thus, its dual use is well justified.

      Our hypothesis is that the reported hand position results from a combination of from the previous movement and the current hand position xp. However, specifically for the overshoot of the judged hand location in the late part of the adaptation (Fig4A), the reviewer raised three alternative explanations by assuming that the PReMo model is correct. Under the PReMo model, the estimated hand location is only determined by , and xp is not used in the hand location report phase. In addition, (with xp used once) and a visual recalibration of the target can explain away the gradual shift from negative to positive (overshoot).

      We don’t think any of them can parsimoniously explain our findings here, and we go through these three hypotheses one by one:

      (1) increasing xp's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase)

      (2) decreasing σp (assuming that participants pay more attention to the hand during the report phase)

      The first two alternative explanations basically assume that xp has a larger contribution (weighting in Bayesian terms) in the hand location report phase than in the adaptation movement phase, no matter due to an increase in visual uncertainty (alternative explanation 1) or a reduction in proprioceptive uncertainty (alternative explanation 2). Thus, we assume that the reviewer suggests that a larger weight for xp can explain why the perceived hand location changes gradually from negative to positive. However, per the PReMo model, a larger weight for the xp will only affect , which is already assumed to change from negative to zero. More weight in  in the hand report phase (compared to the adaptation movement phase) would not explain away the reported hand location from negative to positive. This is because no matter how much weight the xp has, the PReMo model assumes a saturation for the influence of xp on . Thus would not exceed zero in the late adaptation. Then, the PReMo model would rely on the so-called visual shift of the target to explain the overshoot. This leads us to the third alternative the reviewer raised:

      (3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration.

      The PReMo model originally assumed that the perceived target location was biased in order to explain away the positive overshoot of the reported hand location. We assume that the reviewer suggests that the perceived target position, which is shifted to the positive direction, also “biases” the perceived hand position. We also assume that the reviewer suggests that the perceived hand location after a clamp trial () is zero, and somehow the shifted perceived target position “biases” the reported hand location after a clamp trial. Unfortunately, we did not see any mathematical formulation of this biasing effect in the original paper (Tsay, Kim, Haith, et al., 2022). We are not able to come up with any formulation of this hypothesized biasing effect based on Bayesian cue integration principles. Target and hand are two separate perceived items; how one relates to another needs justification from a normative perspective when discussing Bayesian models. Note this is not a problem for our PEA models, in which both cues used are about hand localization, one is and the other is xp.

      We believe that mathematically formulating the biasing effect (Figure 4A) is non-trivial since the reported hand location changes continuously from negative to positive. Thus, quantitative model predictions, like the ones our PEA model presents here, are needed.

      To rigorously test the possible effect of visual recalibration of the target, there are two things to do: 1) use the psychometric method to measure the biased perception of the target, and 2) re-do Tsay et al. 2020 experiment without the target. For 2), compared to the case with the target, the PEA model would predict a larger overshoot, while the PReMo would predict a smaller overshoot or even zero overshoot. This can be left for future studies.

      (2) Effect of Visual Uncertainty on Error Size:

      I appreciate the authors' response about methodological differences between the cursor cloud used in previous studies and the Gaussian blob used in the current study. However, it is still not clear to me how the authors reconcile previous studies showing that visual uncertainty reduced implicit adaptation for small but not large errors (Tsay et al, 2021; Makino, et al 2023) with the current findings, where visual uncertainty reduced implicit adaptation for large but not small errors.

      Could the authors connect the dots here: I could see that the cursor cloud increases potential overlap with the visual target when the visual error is small, resulting in intrinsic reward-like mechanisms (Kim et al, 2019), which could potentially explain attenuated implicit adaptation for small visual errors. However, why would implicit adaptation in response to large visual errors remain unaffected by the cursor cloud? Note that we did verify that sigma_v is increased in (Tsay et al. 2021), so it is unlikely due to the cloud simply failing as a manipulation of visual uncertainty.

      In addition, we also reasoned that testing individuals with low vision could offer a different test of visual uncertainty (Tsay et al, 2023). The advantage here is that both control and patients with low vision are provided with the same visual input-a single cursor. Our findings suggest that uncertainty due to low vision also shows reduced implicit adaptation in response to small but not large errors, contrary to the findings in the current paper. Missing in the manuscript is a discussion related to why the authors' current findings contradict those of previous results.

      For connecting the dots for two previous studies (Tsay et al., 2021, 2023); Note Makino et al., 2023 is not in this discussion since it investigated the weights of multiple cursors, as opposed to visual uncertainty associated with a cursor cloud):

      First, we want to re-emphasize that using the cursor cloud to manipulate visual uncertainty brings some confounds, making it not ideal for studying visuomotor adaptation. For example, in the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (σv) in our model), but it additionally affects the mean of the distribution (µ). This unnecessary confound is neatly avoided by using cursor blurring, which is still a cursor with its center (µ) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2020, the cursor cloud often overlaps with the visual target; this "target hit" would affect adaptation, possibly via a reward learning mechanism (Kim et al., 2019). This is a second confound that accompanies the cursor cloud. Yes, the cursor cloud was verified as associated with high visual uncertainty (Tsay et al., 2021); this verification was done with a psychophysics method with a clean background, not in the context of a hand reaching a target that is needed. Thus, despite the cursor cloud having a sizeable visual uncertainty, our criticisms for it still hold when used in error-clamp adaptation.

      Second, bearing these confounds of the cursor cloud in mind, we postulate one important factor that has not been considered in any models thus far that might underlie the lack of difference between the single-cursor clamp and the cloud-cursor clamp when the clamp size is large: the cursor cloud might be harder to ignore than a single cursor. For Bayesian sensory integration, the naive model is to consider the relative reliability of cues only. Yes, the cloud is more uncertain in terms of indicating the movement direction than a single cursor. However, given its large spread, it is probably harder to ignore during error-clamp movements. Note that ignoring the clamped cursor is the task instruction, but the large scatter of the cursor cloud is more salient and thus plausible and harder to ignore. This might increase the weighting of the visual cue despite its higher visual uncertainty. This extra confound is arguably minimized by using the blurred cursor as in our Exp4 since the blurred cursor did not increase the visual angle much (Figure 5D; blurred vs single cursor: 3.4mm vs 2.5mm in radius, 3.90o vs  2.87o in spread). In contrast, the visual angle of the dot cloud is at least a magnitude larger (cursor cloud vs. single cursor: at least 25o vs. 2.15o in the spread, given a 10o standard deviation of random sampling).

      Third, for the low-vision study (Tsay et al., 2023), the patients indeed show reduced implicit adaptation for a 3 o clamp (consistent with our PEA model) but an intact adaptation for 30-degree clamp (not consistent). Though this pattern appears similar to what happens for normal people whose visual uncertainty is upregulated by cursor cloud (Tsay et al., 2021), we are not completely convinced that the same underlying mechanism governs these two datasets. Low-vision patients indeed have higher visual uncertainty about color, brightness, and object location, but their visual uncertainty about visual motion is still unknown. Due to the difference in impairment among low vision people (e.g., peripheral or central affected) and the different roles of peripheral and central vision in movement planning and control (Sivak & Mackenzie, 1992), it is unclear about the overall effect of visual uncertainty in low vision people. The direction of cursor movement that matters for visuomotor rotation here is likely related to visual motion perception. Unfortunately, the original study did not measure this uncertainty in low-vision patients. We believe our Exp1 offers a valid method for this purpose for future studies. More importantly, we should not expect low-vision patients to integrate visual cues in the same way as normal people, given their long-term adaptation to their vision difficulties. Thus, we are conservative about interpreting the seemingly similar findings across the two studies (Tsay et al., 2021, 2023) as revealing the same mechanism.

      A side note: these two previous studies proposed a so-called mis-localization hypothesis, i.e., the cursor cloud was mislocated for small clamp size (given its overlapping with the target) but not for large clamp size. They suggested that the lack of uncertainty effect at small clamp sizes is due to mislocalization, while the lack of uncertainty effect at large clamp sizes is because implicit adaptation is not sensitive to uncertainty at large angles. Thus, these two studies admit that cursor cloud not only upregulates uncertainty but also generates an unwanted effect of so-called “mis-localization” (overlapping with the target). Interestingly, their hypothesis about less sensitivity to visual uncertainty for large clamps is not supported by a model or theory but merely a re-wording of the experiment results.

      In sum, our current study cannot offer an easy answer to "connect the dots" in the aforementioned two studies due to methodology issues and the specialty of the population. However, for resolving conflicting findings, our study suggests solutions include using a psychometric test to quantify visual uncertainty for cursor motion (Exp1), a better uncertainty-manipulation method to avoid a couple of confounds (Exp4, blurred cursor), and a falsifiable model. Future endeavors can solve the difference between studies based on the new insights from the current.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influence by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      The authors discussed in the revised version that the proposed model can capture the general implicit motor learning process in addition to the visuomotor rotation task. In the discussion, they emphasize two main principles: the automatic tracking of effector position and the combination of movement cues using Bayesian integration. These principles are suggested as key to understanding and modeling various motor adaptations and skill learning. The proposed model could potentially become a basis for creating new computational models for skill acquisition, especially where current models fall short.

      Weaknesses:

      The proposed model is described as elegant. In this paper, the authors test the model within a limited example condition, demonstrating its relevance to the sensorimotor adaptation mechanisms of the human brain. However, the scope of the model's applicability remains unclear. It has shown the capacity to explain prior data, thereby surpassing previous models that rely on elementary mathematics. To solidify its credibility in the field, the authors must gather more supporting evidence.

      Indeed, our model here is based on one particular experimental paradigm, i.e., the error-clamp adaptation. We used it simply because 1) this paradigm is one rare example that implicit motor learning can be isolated in a clean way, and 2) there are a few conflicting findings in the literature for us to explain away by using a unified model.

      For our model’s broad impact, we believe that as long as people need to locate their effectors during motor learning, the general principle laid out here will be applicable. In other words, repetitive movements with a Bayesian cue combination of movement-related cues can underlie the implicit process of various motor learning. To showcase its broad impact, in upcoming studies, we will extend this model to other motor learning paradigms, starting from motor adaptation paradigms that involve both explicit and implicit processes.

      Reviewer #3 (Public Review):

      (2.1) Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in a first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors are tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately compare their data with an unsuitable other data set (Tsay et al. 2020, instead of Tsay et al. 2021). Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation), than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular the model fits for experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      (2.2) Strengths

      In this study the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in a first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are: 1) a learning and 2) a retention rate, as used in popular state space models and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from, and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      (2.3) Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      For a proper citation of the “concave” adaptation function: we assume the reviewer is referring to the study by Morehead, 2017 which tested large clamp sizes up to 135 o and 175 o. Unsurprisingly, the 135 o and 175 o conditions lead to nearly zero adaptation, possibly due to the trivial fact that people cannot even see the moving cursor. We have quoted this seminar study from the very beginning. All other error-clamp studies with a block design emphasized an invariant or saturated implicit adaptation with large rotations (e.g., Kim, et al., 2019).

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between movement endpoint and disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      For the issues related to visual uncertainty measurement in Exp1:

      First, our visual uncertainty is about cursor motion direction in the display plane, and the measurement in Exp1 has never been done before. Thus, we do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work (Klein & Levi, 1987; Levi et al., 1987) in vision science since their studies showed that the deviation from the fixation is associated with an increase in visual uncertainty. Their study thus inspired us to conduct Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. Any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out-reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles. This is particularly important since we need to estimate parameters from one experiment to predict behaviors in another experiment.

      Second, the 1s delay of the reference cursor has minimal impact on the estimate of visual uncertainty based on previous vision studies. Our Exp1 used a similar visual paradigm by (White et al., 1992), which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6).

      These two problems have been addressed in the revised manuscript, with proper citations listed.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20° which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2°) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Reviewer 3 thinks Tsay 2020 dataset is not appropriate for our theorization, but we respectfully disagree. For the three points raised here, we would like to elaborate:

      (1) As we addressed in the previous response, the reported hand location in Figure 4A (Tsay et al., 2020) is not from a test of proprioceptive recalibration as conventionally defined. In the revision, we explicitly state that this dataset is not about proprioceptive recalibration and also delete texts that might mislead people to think so (see Results section). Instead, proprioceptive recalibration is measured by passive movement, as in our Exp3 (Figure 4E). For error-clamp adaptation here, "the remembered position of the target" is the target. Clearly, the participants did not report the target position, which is ever-present. Instead, their reported hand location shows an interestingly continuous change with ongoing adaptation.

      (2) Since the Tsay 2020 dataset is not a so-called proprioceptive recalibration, we need not take the difference between the reported location and the actual hand location. Indeed, the difference would be ~20 degrees, but comparing it to the previously reported proprioceptive recalibration is like comparing apples to oranges. In fact, throughout the paper, we refer to the results in Fig 4A as “reported hand location”, not proprioceptive recalibration. The target direction is defined as zero degree thus its presence will not bias the reported hand in the Bayesian cue combination (as this visual cue has a mean value of 0). Using the target as the reference also simplifies our modeling.

      (3) Exp3 is crucial for our study since it shows our model and its simple Bayesian cue combination principle are applicable not only to implicit adaptation but also to proprioceptive measures during adaptation. Furthermore, it reproduced the so-called proprioceptive recalibration and explained it away with the same Bayesian cue combination as the adaptation. We noticed that this field has accumulated an array of findings on proprioceptive changes induced by visuomotor adaptation. However, currently, there is a lack of a computational model to quantitatively explain them. Our study at least made an initial endeavor to model these changes.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real life situations.

      The largest caveat raised by the reviewer appears to be directed to the error-clamp paradigm in general, not only to our particular study. In essence, this paradigm indeed requires participants to ignore the clamped error; thus, its induced adaptive response can be attributed to implicit adaptation. The original paper that proposed this paradigm (Morehead et al., 2017) has been cited 220 times (According to Google Scholar, at the time of this writing, 06/2024), indicating that the field has viewed this paradigm in a favorable way.

      Furthermore, we agree that this kind of instruction and feedback (invariant clamp) differ from daily life experience, but it does not prevent us from gaining theoretical insights by studying human behaviors under this kind of "artificial" task setting. Thinking of the saccadic adaptation (Deubel, 1987; Kojima et al., 2004): jumping the target while the eye moves towards it, and this somewhat artificial manipulation again makes people adapt implicitly, and the adaptation itself is a "disastrous" strategy for real-life situations. However, scientists have gained an enormous understanding of motor adaptation using this seemingly counterproductive adaptation in real life. Also, think of perceptual learning of task-irrelevant stimuli (Seitz & Watanabe, 2005, 2009): when participants are required to learn to discriminate one type of visual stimuli, the background shows another type of stimuli, which people gradually learn even though they do not even notice its presence. This "implicit" learning can be detrimental to our real life, too, but the paradigm itself has advanced our understanding of the inner workings of the cognitive system.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      L101: There is a typo: (Tsay et al., 2020), 2020) should be corrected to (Tsay et al., 2020).

      Thanks for pointing it out, we corrected this typo.

      L224-228: It would be beneficial to evaluate the validity of the estimated sigma_u and sigma_p based on previous reports.

      We can roughly estimate σu by evaluating the variability of reaching angles during the baseline phase when no perturbation is applied. The standard deviation of the reaching angle in Exp 2 is 5.128o±0.190o, which is close to the σu estimated by the model (5.048o). We also used a separate perceptual experiment to test the proprioceptive uncertainty (n = 13, See Figure S6), σp from this experiment is 9.737o±5.598o, also close to the σp extracted by the model (11.119o). We added these new analysis results to the final version of the paper.

      L289-298: I found it difficult to understand the update equations of the proprioceptive calibration based on the PEA model. Providing references to the equations or better explanations would be helpful.

      We expanded the process of proprioceptive calibration in Supplementary Text 1 with step-by-step equations and more explanations. 

      Reviewer #3 (Recommendations For The Authors):

      Suggestions (or clarification of previous suggestions) for revisions

      The authors persist on using the Tsay et al 2020 paper despite its many drawbacks which the authors attempt to address in their reply. But the main drawback is that the results in the 2020 paper is NOT relative to the unseen hand but to the visual target the participants were supposed to move their hand to. If the results were converted so to be relative to the unseen hand, the localization biases would be over 20 deg in magnitude.

      The PEA simulations are plotted relative to the unseen hand which makes sense. If the authors want to persist using the Tsay 2020 dataset despite any issues, they at least need to make sure that the simulations are mimicking the same change. That is, the data from Tsay 2020 needs to be converted to the same variable used in the current paper.

      If the main objection for using the Tsay 2021 is that the design would lead to forgetting, we found that active localization (or any intervening active movements like no-cursor reach) does lead to some interference or forgetting (a small reduction in overall magnitude of adaptation) this is not the case for passive localization, see Ruttle et al, 2021 (data on osf). This was also just a suggestion, there may of course also be other, more suitable data sets.

      As stated above, changing the reference system is not necessary, nor does it affect our results. Tsay et al 2020 dataset is unique since it shows the gradual change of reported hand location along with error-clamp adaptation. The forgetting (or reduction in proprioceptive bias), even if it exists, would not affect the fitting quality of our model for the Tsay 2020 dataset: if we assume that forgetting is invariant over the adaptation process, the forgetting would only reduce the proprioceptive bias uniformly across trials. This can be accounted for by a smaller weight on . The critical fact is that the model can explain the gradual drift of the proprioceptive judgment of the hand location.

      By the way, Ruttle et al.'s 2021 dataset is not for error-clamp adaptation, and thus we will leave it to test our model extension in the future (after incorporating an explicit process in the model).

      References

      Deubel, H. (1987). Adaptivity of gain and direction in oblique saccades. Eye Movements from Physiology to Cognition. https://www.sciencedirect.com/science/article/pii/B9780444701138500308

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. ELife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Kojima, Y., Iwamoto, Y., & Yoshida, K. (2004). Memory of learning facilitates saccadic adaptation in the monkey. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 24(34), 7531–7539.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Morehead, J. R., Taylor, J. A., Parvin, D. E., & Ivry, R. B. (2017). Characteristics of implicit sensorimotor adaptation revealed by task-irrelevant clamped feedback. Journal of Cognitive Neuroscience, 29(6), 1061–1074.

      Seitz, & Watanabe. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329–334.

      Seitz, & Watanabe. (2009). The phenomenon of task-irrelevant perceptual learning. Vision Research, 49(21), 2604–2610.

      Sivak, B., & Mackenzie, C. L. (1992). Chapter 10 The Contributions of Peripheral Vision and Central Vision to Prehension. In L. Proteau & D. Elliott (Eds.), Advances in Psychology (Vol. 85, pp. 233–259). North-Holland.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Kim, H. E., Saxena, A., Parvin, D. E., Verstynen, T., & Ivry, R. B. (2022). Dissociable use-dependent processes for volitional goal-directed reaching. Proceedings. Biological Sciences / The Royal Society, 289(1973), 20220415.

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. ELife, 11, e76639.

      Tsay, J. S., Parvin, D. E., & Ivry, R. B. (2020). Continuous reports of sensed hand position during sensorimotor adaptation. Journal of Neurophysiology, 124(4), 1122–1130.

      Tsay, J. S., Tan, S., Chu, M. A., Ivry, R. B., & Cooper, E. A. (2023). Low Vision Impairs Implicit Sensorimotor Adaptation in Response to Small Errors, But Not Large Errors. Journal of Cognitive Neuroscience, 35(4), 736–748.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants. The evidence supporting the claims of the authors is solid, although a better discussion of the link between the model variables and the outcomes of related behavioral experiments would strengthen the conclusions. The work will be of interest to researchers in sensory cue integration and motor learning.

      Public Reviews:

      Reviewer #1 (Public Review):

      This valuable study demonstrates a novel mechanism by which implicit motor adaptation saturates for large visual errors in a principled normative Bayesian manner. Additionally, the study revealed two notable empirical findings: visual uncertainty increases for larger visual errors in the periphery, and proprioceptive shifts/implicit motor adaptation are non-monotonic, rather than ramp-like. This study is highly relevant for researchers in sensory cue integration and motor learning. However, I find some areas where statistical quantification is incomplete, and the contextualization of previous studies to be puzzling.

      Thank you for your feedback and the positive highlights of our study. We appreciate your insights and will address the concerns in our revisions.

      Issue #1: Contextualization of past studies.

      While I agree that previous studies have focused on how sensory errors drive motor adaptation (e.g., Burge et al., 2008; Wei and Kording, 2009), I don't think the PReMo model was contextualized properly. Indeed, while PReMo should have adopted clearer language - given that proprioception (sensory) and kinaesthesia (perception) have been used interchangeably, something we now make clear in our new study (Tsay, Chandy, et al. 2023) - PReMo's central contribution is that a perceptual error drives implicit adaptation (see Abstract): the mismatch between the felt (perceived) and desired hand position. The current paper overlooks this contribution. I encourage the authors to contextualize PReMo's contribution more clearly throughout. Not mentioned in the current study, for example, PReMo accounts for the continuous changes in perceived hand position in Figure 4 (Figure 7 in the PReMo study).

      There is no doubt that the current study provides important additional constraints on what determines perceived hand position: Firstly, it offers a normative Bayesian perspective in determining perceived hand position. PReMo suggests that perceived hand position is determined by integrating motor predictions with proprioception, then adding a proprioceptive shift; PEA formulates this as the optimal integration of these three inputs. Secondly, PReMo assumed visual uncertainty to remain constant for different visual errors; PEA suggests that visual uncertainty ought to increase (but see Issue #2).

      Thank you for the comments and suggestions. We have now incorporated the citation for (Tsay et al., 2024), to acknowledge their clarification on the terms of perceptual error. We also agree that our model differs in two fundamental ways. One is to ditch the concept of proprioceptive shift and its contribution to the perceived hand location; instead, we resort to a “one-shot” integration of three types of cues with Bayesian rules. This is a more elegant and probably more ecological way of processing hand location per Occam's Razor. The second essential change is to incorporate the dependency of visual uncertainty on perturbation size into the model, as opposed to resorting to a ramp function of proprioceptive changes relative to perturbation size. The ramp function is not well grounded in perception studies. Yes, we acknowledged that PReMo is the first to recognize the importance of perceptual error, but highlighted the model differences in our Discussion.

      We also think the PReMo model has the potential to explain Fig 4A. But the Tsay et al., 2022 paper assumes that “a generic shift in visual space” explains the gradual proprioceptive changes from negative to positive (see page 17 in Tsay et al., 2022). We do not think that evoking this visual mechanism is necessary to explain Fig 4A; instead, the proprioceptive change is a natural result of hand deviations during implicit adaptation. As the hand moves away from the target (in the positive direction) during adaptation, the estimated hand location goes alone with it. We believe this is the correct way of explaining Fig4A results. As we played around with the PReMo model, we found it is hard to use visual shift to explain this part of data without additional assumptions (at least not with the ones published in Tsay et al., 2022). Furthermore, our PEA model also parsimoniously explains away the proprioceptive shift observed in a completely different setting, i,e., the proprioceptive changes measured by the passive method as a function of perturbation size in Exp 3.

      We expanded the discussion about the comparison between the two models, especially about their different views for explaining Fig4A.

      Issue #2: Failed replication of previous results on the effect of visual uncertainty.

      (2a) A key finding of this paper is that visual uncertainty linearly increases in the periphery; a constraint crucial for explaining the non-monotonicity in implicit adaptation. One notable methodological deviation from previous studies is the requirement to fixate on the target: Notably, in the current experiments, participants were asked to fixate on the target, a constraint not imposed in previous studies. In a free-viewing environment, visual uncertainty may not attenuate as fast, and hence, implicit adaptation does not attenuate as quickly as that revealed in the current design with larger visual errors. Seems like this current fixation design, while important, needs to be properly contextualized considering how it may not represent most implicit adaptation experiments.

      First, we don’t think there is any previous study that examined visual uncertainty as a function of perturbation size. Thus, we do not have a replication problem here. Secondly, our data indicate that even without asking people to fixate on the target, people still predominantly fixate on the target during error-clamp adaptation (when they are “free” viewing). For our Exp 1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1 now, also see below). We also collected eye-tracking data in Exp 4, which is a typical error-clamp experiment. More than 95% fall with +/- 50 pixels around the center of the screen, even slightly higher than Exp 1. This is well understandable: the typical error-clamp adaptation requires people to ignore the cursor and move the hand towards the target. To minimize the interference of the concurrently moving cursor, people depend on the fixation on the target, the sole task-relevant visual marker in the workspace, to achieve the task goal.

      In sum, forcing the participants to fixate on the target is not because we aimed to make up the linear dependency of visual uncertainty; we required them to do so to mimic the eye-tracking pattern in typical error-clamp learning, which has been revealed in our pilot experiment. The visual uncertainty effect is sound, our study is the first to clearly demonstrate it.

      Author response image 1.

      On a side note (but an important one), the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in Bromberg et al., 2019; de Brouwer et al., 2018, we have an upcoming paper to show this). This is one reason that our new theory would also be applicable to other types of motor adaptation.

      (2b) Moreover, the current results - visual uncertainty attenuates implicit adaptation in response to large, but not small, visual errors - deviates from several past studies that have shown that visual uncertainty attenuates implicit adaptation to small, but not large, visual errors (Tsay, Avraham, et al. 2021; Makino, Hayashi, and Nozaki, n.d.; Shyr and Joshi 2023). What do the authors attribute this empirical difference to? Would this free-viewing environment also result in the opposite pattern in the effect of visual uncertainty on implicit adaptation for small and large visual errors?

      We don’t think all the mentioned previous studies manipulated the visual uncertainty in a parametric way, and none of them provided quantitative measures of visual uncertainty. As we detailed in our Exp4 and in our Discussion, we don’t think Tsay et al., 2021 paper’s manipulation of visual uncertainty is appropriate (see below for 2d). Makino et al., 2023 study used multiple clamped cursors to perturb people, and its effect is not easily accountable since additional processes might be invoked given this kind of complex visual feedback. More importantly, we do not think this is a direct way of modulating visual uncertainty, nor did they provide any evidence.

      (2c) In the current study, the measure of visual uncertainty might be inflated by brief presentation times of comparison and referent visual stimuli (only 150 ms; our previous study allowed for a 500 ms viewing time to make sure participants see the comparison stimuli). Relatedly, there are some individuals whose visual uncertainty is greater than 20 degrees standard deviation. This seems very large, and less likely in a free-viewing environment.

      For our 2AFC, the reference stimulus is the actual clamped cursor, which lasts for 800 ms. The comparison stimulus is a 150-ms dot representation appearing near the reference. For measuring perception of visual motion, this duration is sufficient as previous studies used similar durations (Egly & Homa, 1984; Owsley et al., 1995). We think the 20-degree standard deviation is reasonable given that people fixate on the target, with only peripheral vision to process the fast moving cursor. The steep linear increase in visual uncertainty about visual motion is well documented. The last author of this paper has shown that the uncertainty of visual motion speed (though not about angels) follows the same steep trend (Wei et al., 2010). It is noteworthy that without using our measured visual uncertainty in Exp1, if we fit the adaptation data in Exp2 to “estimate” the visual uncertainty, they are in fact well aligned with each other (see Figure S7 and Supplementary Text 2). This is a strong support that our estimation is valid and accurate. We think this high visual uncertainty is an important message to the field. Thus we now highlighted its magnitude in our Discussion.

      (2d) One important confound between clear and uncertain (blurred) visual conditions is the number of cursors on the screen. The number of cursors may have an attenuating effect on implicit adaptation simply due to task-irrelevant attentional demands (Parvin et al. 2022), rather than that of visual uncertainty. Could the authors provide a figure showing these blurred stimuli (gaussian clouds) in the context of the experimental paradigm? Note that we addressed this confound in the past by comparing participants with and without low vision, where only one visual cursor is provided for both groups (Tsay, Tan, et al. 2023).

      Thank you for raising this important point about types of visual stimuli for manipulating uncertainty. We used Gaussian blur of a single cursor (similar to Burge et al., 2008) instead of a cloud of dots. We now added a figure inset to show how this blur looks.

      Using a cursor cloud Makino et al., 2023; Tsay et al., 2021 to modulate visual uncertainty has inherent drawbacks that make it unsuitable for visuomotor adaptation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v       in         our       model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target, this “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019). This is a second confound that accompanies the cursor cloud.

      Issue #3: More methodological details are needed.

      (3a) It's unclear why, in Figure 4, PEA predicts an overshoot in terms of perceived hand position from the target. In PReMo, we specified a visual shift in the perceived target position, shifted towards the adapted hand position, which may result in overshooting of the perceived hand position with this target position. This visual shift phenomenon has been discovered in previous studies (e.g., (Simani, McGuire, and Sabes 2007)).

      Visual shift, as it is called in Simani et al., 2007, is irrelevant for our task here. The data we are modeling are motor adaptation (hand position changes) and so-called proprioceptive changes (hand localization changes), both are measured and referenced in the extrinsic coordinate, not referenced to a visual target. For instance, the proprioceptive changes are either relative to the actual hand location (Exp 3) or relative to the goal (Fig 4A). We also don’t think visual shift is necessary in explaining the perceptual judgment of an unseen hand (the target shown during the judgment indeed has an effect of reducing the biasing effect of PE, see below for responses to reviewer 3).

      In the PEA model, the reported hand angle is the result of integrating cues from the actual hand position and the estimated hand position (x_hand_hat) from previous movements. This integration process leads to the combined reported hand position potentially overshooting or undershooting, depending on the degree of adaptation. It is the changed proprioceptive cue (because the actively moved hand slowly adapted to the error clamp) leading to the overshoot of the perceived hand position.

      In Results, we now explain these value changes with parentheses. Model details about the mechanisms of cue combination and model predictions can be found in Supplementary Text 1. We believe these detailed explanations can make this apparent.

      (3b) The extent of implicit adaptation in Experiment 2, especially with smaller errors, is unclear. The implicit adaptation function seems to be still increasing, at least by visual inspection. Can the authors comment on this trend, and relatedly, show individual data points that help the reader appreciate the variability inherent to these data?

      Indeed, the adaptation for small errors appears not completely saturated with our designated number of trials. However, this will not affect our model analysis. Our model fitting for PEA and other competing models is done on the time-series of adaptation, not on the saturated adaptation extent (see Fig 3A). Thus, despite that some conditions might not produce the full range of adaptation, the data is sufficient to constrain the models. We now mention this concern in Results; we also emphasize that the model not only explains the adaptation magnitude (operationally defined as adaptation extent measured at the same time, i.e., the end of the adaptation phase) but also the full learning process.

      In response, we have included individual data points in the revised Figure 3B-D to provide a clear illustration of the extent of implicit adaptation, particularly for small perturbations.

      (3c) The same participants were asked to return for multiple days/experiments. Given that the authors acknowledge potential session effects, with attenuation upon re-exposure to the same rotation (Avraham et al. 2021), how does re-exposure affect the current results? Could the authors provide clarity, perhaps a table, to show shared participants between experiments and provide evidence showing how session order may not be impacting results?

      Thank you for raising the issue of session and re-exposure effects. First, we don’t think Exp1 has an effect on Exp4. Exp1 is a perceptual task and Exp4 is a motor adaptation task. Furthermore, Exp1 used random visual stimuli on both sides, thus it did not lead to any adaptation effect on its own. Second, Exp4 indeed had three sessions performed on three days, but the session effect does not change our main conclusion about the visual uncertainty. We used a 3-way repeated-measures anova (3 day x 3 perturbation x 2 visual uncertainty) revealed a significant main effect of day (F(2,36) = 17.693, p<0.001), indicating changes in performance across sessions (see Figure below). Importantly, the effects of perturbation and visual uncertainty (including their interactions) remain the same. The day factor did not interact with them. The main effect of day shows that the overall adaptation effect is reduced across days. Post-hoc pairwise comparisons elucidated that single-trial learning (STL) performance on Day 1 was significantly higher than on Day 2 (p = 0.004) and Day 3 (p < 0.001), with no significant difference between Day 2 and Day 3 (p = 0.106). Other ANOVA details: significant main effects for perturbation (F(1,36) = 8.872, p<0.001) and visual uncertainty (F(1,18) = 49.164, p<0.001), as well as a significant interaction between perturbation size and visual uncertainty (F(2,36) = 5.160, p = 0.013). There were no significant interactions involving the day factor with any other factors (all p > 0.182). Thus, the overall adaptation decreases over the days, but the day does not affect our concerned interaction effect of visual uncertainty and perturbation. The fact that their interaction preserved over different sessions strengthened our conclusion about how visual uncertainty systematically affects implicit adaptation.

      Author response image 2.

      (3d) The number of trials per experiment should be detailed more clearly in the Methods section (e.g., Exp 4). Moreover, could the authors please provide relevant code on how they implemented their computational models? This would aid in future implementation of these models in future work. I, for one, am enthusiastic to build on PEA.

      We have clarified the number of trials conducted in each experiment, with detailed information now readily available in the Methods section of the main text. In addition, we have made the code for data analysis and modeling publicly accessible. These resources can be found in the updated "Data Availability" section of our paper.

      (3f) In addition to predicting a correlation between proprioceptive shift and implicit adaptation on a group level, both PReMo and PEA (but not causal inference) predict a correlation between individual differences in proprioceptive shift and proprioceptive uncertainty with the extent of implicit adaptation (Tsay, Kim, et al. 2021). Interestingly, shift and uncertainty are independent (see Figures 4F and 6C in Tsay et al, 2021). Does PEA also predict independence between shift and uncertainty? It seems like PEA does predict a correlation.

      Thank you for addressing this insightful question. Our PEA model indeed predicts a positive correlation (although not linear) between the proprioceptive uncertainty and the amplitude of the estimated hand position (x_hand_hat). This prediction is consistent with the simulations conducted, using the same parameters that were applied to generate the results depicted in

      Figure 4B of our manuscript (there is a sign flip as x_hand_hat is negative).

      Author response image 3.

      Regarding the absence of a correlation observed in Tsay et al., 2021, we offer several potential explanations for this discrepancy. First, the variability observed in passive hand localization during motor adaptation (as in Tsay et al., 2021) does not directly equal proprioceptive uncertainty, which typically requires psychophysical testing to accurately assess. Second, our study showed that the proprioceptive bias attenuates during the repetitive measurements; in our Exp3, it decreased within a block of three trials. We noticed that Tsay et al., 2021 study used 36 measurements in a row without interleaving adaptation trials. Thus, the “averaged” proprioceptive bias in Tsay’s study might not reflect the actual bias during adaptation. We also noticed that that study showed large individual differences in both proprioceptive bias and proprioceptive variability (not uncertainty), thus getting a positive result, if it were really there, would require a large number of participants, probably larger than their n=30ish sample size. These putative explanations are not put in the revision, which already has a long discussion and has no space for discussing about a null result.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influenced by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      Weaknesses:

      The paper provides a comprehensive account of visuomotor rotation paradigms, addressing incongruent behavioral results with a solid Bayesian integration model. However, its focus is narrowly confined to visuomotor rotation, leaving its applicability to broader motor learning paradigms, such as force field adaptation, saccadic adaptation, and de novo learning paradigms, uncertain. The paper's impact on the broader fields of neuroscience and cognitive science may be limited due to this specificity. While the paper excellently demonstrates that specific behavioral results in visuomotor rotation can be explained by Bayesian integration, a general computational principle, its contributions to other motor learning paradigms remain to be explored. The paper would benefit from a discussion on the model's generality and its limitations, particularly in relation to the undercompensating effects in other motor learning paradigms.

      Thank you for your thoughtful review and recognition of the contributions our work makes towards understanding implicit motor adaptation through the Perceptual Error Adaptation (PEA) model. We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We added more discussion on the possible broad implications of our model in the revision.

      Reviewer #3 (Public Review):

      Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim, et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp-based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease the importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in the first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors is tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate, and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately, compare their data with an unsuitable other data set. Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation) than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular, the model fits experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      Strengths

      In this study, the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods, and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in the first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are 1) learning and 2) retention rate, as used in popular state space models, and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between the movement endpoint and the disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever-present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20{degree sign} which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2{degree sign}) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real-life situations.

      Specific comments:

      A small part of the manuscript relies on replicating or modeling the proprioceptive recalibration in a study we think does NOT measure proprioceptive recalibration (Tsay, Parvin & Ivry, JNP, 2020). In this study, participants reached for a visual target with a clamped cursor, and at the end of the reach were asked to indicate where they thought their hand was. The responses fell very close to the visual target both before and after the perturbation was introduced. This means that the difference between the actual hand position, and the reported/felt hand position gets very large as soon as the perturbation is introduced. That is, proprioceptive recalibration would necessarily have roughly the same magnitude as the adaptation displayed by participants. That would be several times larger than those found in studies where proprioceptive recalibration is measured without a visual anchor. The data is plotted in a way that makes it seem like the proprioceptive recalibration is very small, as they plot the responses relative to the visual target, and not the discrepancy between the actual and reported hand position. It seems to us that this study mostly measures short-term visual memory (of the target location). What is astounding about this study is that the responses change over time to begin with, even if only by a tiny amount. Perhaps this indicates some malleability of the visual system, but it is hard to say for sure.

      Regardless, the results of that study do not form a solid basis for the current work and they should be removed. We would recommend making use of the dataset from the same authors, who improved their methods for measuring proprioception shifts just a year later (Tsay, Kim, Parvin, Stover, and Ivry, JNP, 2021). Although here the proprioceptive shifts during error-clamp adaptation (Exp 2) were tiny, and not quite significant (p<0.08), the reports are relative to the actual location of the passively placed unseen hand, measured in trials separate from those with reach adaptation and therefore there is no visual target to anchor their estimates to.

      Experiment 1 measures visual uncertainty with increased rotation size. The authors cite relevant work on this topic (Levi & Klein etc) which has found a linear increase in uncertainty of the position of more and more eccentrically displayed stimuli.

      First, this is a question where the reported stimuli and effects could greatly benefit from comparisons with the literature in vision science, and the results might even inform it. In order for that to happen, the units for the reported stimuli and effects should (also) be degrees of visual angle (dva).

      As far as we know, all previous work has investigated static stimuli, where with moving stimuli, position information from several parts of the visual field are likely integrated over time in a final estimate of position at the end of the trajectory (a Kalman filter type process perhaps). As far as we know, there are no studies in vision science on the uncertainty of the endpoint of moving stimuli. So we think that the experiment is necessary for this study, but there are some areas where it could be improved.

      Then, the linear fit is done in the space of the rotation size, but not in the space of eccentricity relative to fixation, and these do not necessarily map onto each other linearly. If we assume that the eye-tracker and the screen were at the closest distance the manufacturer reports it to work accurately at (45 cm), we would get the largest distances the endpoints are away from fixation in dva. Based on that assumed distance between the participant and monitor, we converted the rotation angles to distances between fixation and the cursor endpoint in degrees visual angle: 0.88, 3.5, and 13.25 dva (ignoring screen curvature, or the absence of it). The ratio between the perturbation angle and retinal distance to the endpoint is roughly 0.221, 0.221, and 0.207 if the minimum distance is indeed used - which is probably fine in this case. But still, it would be better to do fit in the relevant perceptual coordinate system.

      The first distance (4 deg rotation; 0.88 dva offset between fixation and stimulus) is so close to fixation (even at the assumed shortest distance between eye and screen) that it can be considered foveal and falls within the range of noise of eye-trackers + that of the eye for fixating. There should be no uncertainty on or that close to the fovea. The variability in the data is likely just measurement noise. This also means that a linear fit will almost always go through this point, somewhat skewing the results toward linearity. The advantage is that the estimate of the intercept (measurement noise) is going to be very good. Unfortunately, there are only 2 other points measured, which (if used without the closest point) will always support a linear fit. Therefore, the experiment does not seem suitable to test linearity, only to characterize it, which might be sufficient for the current purposes. We'd understand if the effort to do a test of linearity using many more rotations requires too much effort. But then it should be made much clearer that the experiment assumes linearity and only serves to characterize the assumed linearity.

      Final comment after the consultation session:

      There were a lot of discussions about the actual interpretation of the behavioral data from this paper with regards to past papers (Tsay et al. 2020 or 2021), and how it matches the different variables of the model. The data from Tsay 2020 combined both proprioceptive information (Xp) and prediction about hand position (Xu) because it involves active movements. On the other hand, Tsay et al. 2021 is based on passive movements and could provide a better measure of Xp alone. We would encourage you to clarify how each of the variables used in the model is mapped onto the outcomes of the cited behavioral experiments.

      The reviewers discussed this point extensively during the consultation process. The results reported in the Tsay 2020 study reflect both proprioception and prediction. However, having a visual target contributes more than just prediction, it is likely an anchor in the workspace that draws the response to it. Such that the report is dominated by short-term visual memory of the target (which is not part of the model). However, in the current Exp 3, as in most other work investigating proprioception, this is calculated relative to the actual direction.

      The solution is fairly simple. In Experiment 3 in the current study, Xp is measured relative to the hand without any visual anchors drawing responses, and this is also consistent with the reference used in the Tsay et al 2021 study and from many studies in the lab of D. Henriques (none of which also have any visual reach target when measuring proprioceptive estimates). So we suggest using a different data set that also measures Xp without any other influences, such as the data from Tsay et al 2021 instead.

      These issues with the data are not superficial and can not be solved within the model. Data with correctly measured biases (relative to the hand) that are not dominated by irrelevant visual attractors would actually be informative about the validity of the PEA model. Dr. Tsay has so much other that we recommend using a more to-the-point data set that could actually validate the PEA model.

      As the comments are repetitive at some places, we summarize them into three questions and address it one by one below:

      (1) Methodological Concerns about visual uncertainty estimation in Experiment 1: a) the visual uncertainty is measured in movement angles (degrees), while the unit in vision science is in visual angles (vda). This mismatch of unit hinders direct comparison between the found visual uncertainty and those reported in the literature, and b) a 1-second delay between movement endpoint and the reference marker presentation causes an overestimate of visual uncertainty due to potential degradation of visual memory. c) The linear function of visual uncertainty is a result of having only three perturbation sizes.

      a) As noted by the reviewer, our visual uncertainty is about cursor motion direction in the display plane, which has never been measured before. We do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work Klein & Levi, 1987; Levi et al., 1987 in vision science since their studies showed that the deviation from the fixation is associated with the increase in visual uncertainty. Their study thus inspired our Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. We believe that any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles.

      b) The 1s delay of the reference cursor appears to have minimum impact on the estimate of visual uncertainty, based on previous vision studies. Our Exp1 used a similar visual paradigm by White et al., 1992, which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      c) We agree that if more angles are tested we can be more confident about the linearity of visual uncertainty. However, the linear function is a good approximation of visual uncertainty (as shown in Figure 2C). More importantly, our model performance does not hinge on a strict linear function. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper, as correctly pointed out by the reviewer. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Lastly, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this analysis in the revision. See details in Supplementary text 2 and Figure S7.

      (2) Experiment 3's: the reviewer argues that the Tsay et al., 2020 data does not accurately measure proprioceptive recalibration, thus it is not suitable for showing our model’s capacity in explaining proprioceptive changes during adaptation.

      Response: We agree that the data from Tsay et al., 2020 is not from passive localization, which is regarded as the widely-accepted method to measure proprioceptive recalibration, a recalibration effect in the sensory domain. The active localization, as used in Tsay et al., 2020, is hypothesized as closely related to people’s forward prediction (where people want to go as the reviewer put it in the comments). However, we want to emphasize that we never equated Tsay’s findings as proprioceptive recalibration: throughout the paper we call them “reported hand location”. We reserved “proprioceptive recalibration” to our own Exp3, which used a passive localization method. Thus, we are not guilty of using this term. Secondly, as far as we know, localization bias or changes, no matter measured by passive or active methods, have not been formally modeled quantitatively. We believe our model can explain both, at least in the error-clamp adaptation setting here. Exp3 is for passive localization, the proprioceptive bias is caused by the biasing effect from the just-perceived hand location (X_hand_hat) from the adaptation trial. Tsay et al. 2020 data is for active localization, whose bias shows a characteristic change from negative to positive. This can be explained by just-perceived hand location (X_hand_hat again) and a gradually-adapting hand (X_p). We think this is a significant advance in the realm of proprioceptive changes in adaptation. Of course, our idea can be further tested in other task conditions, e.g., conventional visuomotor rotation or even gain adaptation, which should be left for future studies.

      For technical concerns, Tsay et al., 2020 data set is not ideal: when reporting hand location, the participants view the reporting wheel as well as the original target. As correctly pointed out by the reviewer, the presence of the target might provide an anchoring cue for perceptual judgment, which acts as an attractor for localization. If it were the case, our cue combination would predict that this extra attractor effect would lead to a smaller proprioceptive effect than that is currently reported in their paper. The initial negative bias will be closer to the target (zero), and the later positive bias will be closer to the target too. However, the main trend will remain, i.e. the reported hand location would still show the characteristic negative-to-positive change. The attractor effect of the target can be readily modeled by giving less weight to the just-perceived hand location (X_hand_hat). Thus, we would like to keep Tsay et al., 2020 data in our paper but add some explanations of the limitations of this dataset as well as how the model would fare with these limitations.

      That being said, our model can explain away both passive and active localization during implicit adaptation elicited by error clamp. The dataset from Tsay et al., 2021 paper is not a good substitute for their 2020 paper in terms of modeling, since that study interleaved some blocks of passive localization trials with adaptation trials. This kind of block design would lead to forgetting of both adaptation (Xp in our model) and the perceived hand (X_hand_hat in our model), the latter is still not considered in our model yet. As our Exp3, which also used passive localization, shows, the influence of the perceived hand on proprioceptive bias is short-lived, up to three trials without adaptation trials. Of course, it would be of great interest to design future studies to study how the proprioceptive bias changes over time, and how its temporal changes relate to the perceptual error. Our model provides a testbed to move forward in this direction.

      (3) The reviewer raises concerns about the study's assumption that participants ignore error feedback, questioning the model's applicability to broader contexts and real-world scenarios where ignoring errors might not be viable or common.

      Reviewer 2 raised the same question above. We moved our responses here. “We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.”

      We also add one more important implication of our model: as stated above, our model also explains that the proprioceptive changes, revealed by active or passive localization methods, are brought by (mis)perceived hand localization via Bayesian cue combination. This new insight, though only tested here using the error-clamp paradigm, can be further utilized in other domains, e.g., conventional visuomotor rotation or force field adaptation. We hope this serves as an initial endeavor in developing some computational models for proprioception studies. Please see the extended discussion on this matter in the revision.

      Recommendations for the authors:

      Revisions:

      All three reviewers were positive about the work and have provided a set of concrete and well-aligned suggestions, which the authors should address in a revised version of the article. These are listed below.

      A few points of particular note:

      (1) There are a lot of discussions about the actual interpretation of behavioral data from this paper or past papers (Tsay et al. 2020 or 2021) and how it matches the different variables of the model.

      (2) There are some discussions on the results of the first experiment, both in terms of how it is reported (providing degrees of visual angle) and how it is different than previous results (importance of the point of fixation). We suggest also discussing a few papers on eye movements during motor adaptation from the last years (work of Anouk de Brouwer and Opher Donchin). Could the authors also discuss why they found opposite results to that of previous visual uncertainty studies (i.e., visual uncertainty attenuates learning with large, but not small, visual errors); rather than the other way around as in Burge et al and Tsay et al 2021 and Makino Nozaki 2023 (where visual uncertainty attenuates small, but not large, visual errors).

      (3) It is recommended by several reviewers to discuss the applicability of the model to other areas/perturbations.

      (4) Several reviewers and I believe that the impact of the paper would be much higher if the code to reproduce all the simulations of the model is made available to the readers. In addition, while I am very positive about the fact that the authors shared the data of their experiments, metadata seems to be missing while they are highly important because these data are otherwise useless.

      Thank you for the concise summary of the reviewers’ comments. We have addressed their concerns point by point.

      Reviewer #2 (Recommendations For The Authors):

      L142: The linear increase in visual uncertainty should be substantiated by previous research in vision science. Please cite relevant papers and discuss why the linear model is considered reasonable.

      We cited relevant studies in vision science. Their focus is more about eccentricity inflate visual uncertainty, similar to our findings that deviations from the fixation direction inflate visual uncertainty about motion direction.

      We also want to add that our model performance does not hinge on a strict linear function of visual uncertainty. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Furthermore, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this new analysis in the revision. See details in Supplementary text 2 and Figure S7.

      L300: I found it challenging to understand the basis for this conclusion. Additional explanatory support is required.

      We unpacked this concluding sentence as follows:

      “The observed proprioceptive bias is formally modeled as a result of the biasing effect of the perceived hand estimate x_hand_hat. In our mini-block of passive localization, the participants neither actively moved nor received any cursor perturbations for three trials in a row. Thus, the fact that the measured proprioceptive bias is reduced to nearly zero at the third trial suggests that the effect of perceived hand estimate x_hand_hat decays rather rapidly.”

      L331: For the general reader, a visual representation of what the blurring mask looks like would be beneficial.

      Thanks for the nice suggestion. We added pictures of a clear and a blurred cursor in Figure 5D.

      L390: This speculation is intriguing. It would be helpful if the authors explained why they consider causal inference to operate at an explicit process level, as the reasoning is not clear here, although the idea seems plausible.

      Indeed, our tentative conclusion here is only based on the model comparison results here. It is still possible that causal inference also work for implicit adaptation besides explicit adaptation. We make a more modest conclusion in the revision:

      “The casual inference model is also based on Bayesian principle, then why does it fail to account for the implicit adaptation? We postulate that the failure of the causal inference model is due to its neglect of visual uncertainty as a function of perturbation size, as we revealed in Experiment 1. In fact, previous studies that advocating the Bayesian principle in motor adaptation have largely focused on experimentally manipulating sensory cue uncertainty to observe its effects on adaptation (Burge et al., 2008; He et al., 2016; Körding & Wolpert, 2004; Wei & Körding, 2010), similar to our Experiment 4. Our findings suggest that causal inference of perturbation alone, without incorporating visual uncertainty, cannot fully account for the diverse findings in implicit adaptation. The increase in visual uncertainty by perturbation size is substantial: our Experiment 1 yielded an approximate seven-fold increase from a 4° perturbation to a 64° perturbation. We have attributed this to the fact that people fixate in the desired movement direction during movements. Interestingly, even for conventional visuomotor rotation paradigm where people are required to “control” the perturbed cursor, their fixation is also on the desired direction, not on the cursor itself (de Brouwer, Albaghdadi, et al., 2018; de Brouwer, Gallivan, et al., 2018). Thus, we postulate that a similar hike in visual uncertainty in other “free-viewing” perturbation paradigms. Future studies are warranted to extend our PEA model to account for implicit adaptation in other perturbation paradigms.”

      L789: The method of estimating Sigma_hand in the brain was unclear. Since Bayesian computation relies on the magnitude of noise, the cognitive system must have estimates of this noise. While vision and proprioception noise might be directly inferred from signals, the noise of the hand could be deduced from the integration of these observations or an internal model estimate. This process of estimating noise magnitude is theorized in recursive Bayesian integration models (or Kalman filtering), where the size estimate of the state noise (sigma_hand) is updated concurrently with the state estimate (x_hand hat). The equation in L789 and the subsequent explanation appear to assume a static model of noise estimation. However, in practice, the noise parameters, including Sigma_hand, are likely dynamic and updated with each new observation. A more detailed explanation of how Sigma_hand is estimated and its role in the cognitive process.

      This is a great comment. In fact, if a Kalman filter is used, the learning rate and the state noise all should be dynamically updated on each trial, under the influence of the observed (x_v). In fact, most adaptation models assume a constant learning rate, including our model here. But a dynamic learning rate (B in our model) is something worth trying. However, in our error-clamp setting, x_v is a constant, thus this observation variable cannot dynamically update the Kalman filter; that’s why we opt to use a “static” Bayesian model to explain our datasets. Thus, Sigma_hand can be estimated by using Bayesian principles as a function of three cues available, i.e., the proprioceptive cue, the visual cue, and the motor prediction cue. We added a

      detailed derivation of sigma_hand in the revision in Supplementary text 1.

      Reviewer #3 (Recommendations For The Authors):

      We observed values in Fig 2C for the 64-degree perturbation that seem to be outliers, i.e., greater than 50 degrees. It is unclear how a psychometric curve could have a "slope" or JNP of over 60, especially considering that the tested range was only 60. Since the data plotted in panel C is a collapse of the signed data in panel B, it is perplexing how such large data points were derived, particularly when the signed uncertainty values do not appear to exceed 30.

      Related to the previous point, we would also recommend connecting individual data points: if the uncertainty increases (linearly or otherwise), then people with low uncertainty at the middle distance should also have low uncertainty at the high distance, and people with high uncertainty at one point, should also have that at other distances. Or perhaps the best way to go about this is to use the uncertainty at the two smaller perturbations to predict uncertainty at the largest perturbation for each participant individually?

      Thank you for your suggestion to examine the consistency of individual levels of visual uncertainty across perturbation sizes. First, a sigma_v of 60 degrees is well possible, naturally falling out of the experimental data. It shows some individuals indeed have large visual uncertainty. Given these potential outliers (which should not be readily removed as we don’t have any reason to do so), we estimated the linear function of sigma_v with a robust method, i.e., the GLM with a gamma distribution, which favors right-skewed distribution that can well capture positive outliers. Furthermore, we added in our revision a verification test of our estimates of sigma_v: we used Exp2’s adaptation data to estimate sigma_v without assuming its linear dependency. As shown, the model-fitted sigma_v closely matched the estimated ones from Exp1 (see Supplementary text 2 and Figure S7).

      We re-plotted the sigma_v with connected data points provided, and the data clearly indicate that individuals exhibit consistent levels of visual uncertainty across different perturbation sizes, i.e. those with relatively lower uncertainty at middle distances (in fact, angles) tend to exhibit relatively lower uncertainty at higher distances too, and similarly, those with higher uncertainty at one distance maintain that level of uncertainty at other distances. This is confirmed by spearman correlation analysis to assess the consistency of uncertainties across different degrees of perturbation among individuals. Again, we observed significant correlations between perturbation angles, indicating good individual consistency (4 and 16 degrees, rho = 0.759, p<0.001; 16 and 64 degrees, rho = 0.527, p = 0.026).

      Author response image 4.

      The illustration in Fig 2A does not seem to show a stimulus that is actually used in the experiment (looks like about -30{degree sign} perturbation). It would be good to show all possible endpoints with all other visual elements to scale - including the start-points of the PEST procedure.

      Thanks for the suggestion. We updated Fig 2A to show a stimulus of +16 degree, as well as added an additional panel to show all the possible endpoints.

      Finally (related to the previous point), in lines 589-591 it says the target is a blue cross. Then in lines 614-616, it says participants are to fixate the blue cross or the start position. The start position was supposed to have disappeared, so perhaps the blue plus moved to the start position (which could be the case, when looking at the bottom panel in Fig 2A, although in the illustration the plus did not move fully to the start position, just toward it to some degree). Perhaps the descriptions need to be clarified, or it should be explained why people had to make an eye movement before giving their judgments. And if people could have made either 1) no eye movement, but stayed at fixation, 2) moved to the blue plus as shown in the last panel in Fig 2A, or 3) fixated on the home position, we'd be curious to know if this affected participants' judgments.

      Thanks for pointing that out. The blue cross serves as the target in the movement task, then disappears with the cursor after 800ms of frozen time. The blue cross then appeared in the discrimination task at the center of the screen, i.e. the start location. Subjects were asked to fixate at the blue cross during the visual discrimination task. Note this return the fixation to the home position is exactly what we will see in typical error-clamp adaptation: once the movement is over, people guided their hand back to the home position. We performed a pilot study to record the typical fixation pattern during error-clamp adaptation, and Exp1 was intentionally designed to mimic its fixation sequence. We have now updated the description of Figure 2A, emphasizing the stimulus sequence. .

      In Figure 4A, the label "bias" is confusing as that is used for recalibrated proprioceptive sense of hand position as well as other kinds of biases elsewhere in the paper. What seems to be meant is the integrated hand position (x-hat_hand?) where all three signals are apparently combined. The label should be changed and/or it should be clarified in the caption.

      Thanks for pointing that out, it should be x_hand_hat, and we have corrected this in the revised version of Figure 4.

      In the introduction, it is claimed that larger perturbations have not been tested with "implicit adaptation" paradigms, but in the same sentence, a paper is cited (Moorehead et al., 2017) that tests a rotation on the same order of magnitude as the largest one tested here (95{degree sign}), as well as much larger rotations (135{degree sign} and 175{degree sign}). With error-clamps. Interestingly, there is no adaptation in those conditions, which seems more in line with the sensory cue integration model. Can the PEA model explain these results as well? If so, this should be included in the paper, and if not, it should be discussed as a limitation.

      First, we double checked our manuscript and found that we never claimed that larger perturbations had not been tested.

      We agree that it is always good to have as many conditions as possible. However, the 135 and 175 degree conditions would lead to minimum adaptation, which would not help much in terms of model testing. We postulated that this lack of adaptation is simply due to the fact that people cannot see the moving cursor, or some other unknown reasons. Our simple model is not designed to cover those kinds of extreme cases.

      Specify the size of the arc used for the proprioceptive tests in Exp 3 and describe the starting location of the indicator (controlled by the left hand). Ideally, the starting location should have varied across trials to avoid systematic bias.

      Thank you for the comments. The size of the arc used during these tests, as detailed in the methods section of our paper, features a ring with a 10 cm radius centered at the start position. This setup is visually represented as a red arc in Figure 7B.

      After completing each proprioceptive test trial, participants were instructed to position the indicator at approximately -180° on the arc and then relax their left arm. Although the starting location for the subsequent trial remained at-180°, it was not identical for every trial, thereby introducing slight variability.

      Please confirm that the proprioceptive biases plotted in Fig 4E are relative to the baseline.

      Thank you for bringing this to our attention. Yes, the proprioceptive biases illustrated in Figure 4E are indeed calculated relative to the baseline measurements. We have added this in the method part.

      Data availability: the data are available online, but there are some ways this can be improved. First, it would be better to use an open data format, instead of the closed, proprietary format currently used. Second, there is no explanation for what's in the data, other than the labels. (What are the units? What preprocessing was done?) Third, no code is made available, which would be useful for a computational model. Although rewriting the analyses in a non-proprietary language (to increase accessibility) is not a reasonable request at this point in the project, I'd encourage it for future projects. But perhaps Python, R, or Julia code that implements the model could be made available as a notebook of sorts so that other labs could look at (build on) the model starting with correct code - increasing the potential impact of this work.

      Great suggestions. We are also fully supportive of open data and open science. We now:

      (1) Updated our data and code repository to include the experimental data in an open data format (.csv) for broader accessibility.

      (2) The data are now accompanied by detailed descriptions to clarify their contents.

      (3) We have made the original MATLAB (.m) codes for data analysis, model fitting and simulation available online.

      (4) We also provide the codes in Jupyter Notebook (.ipynb) formats.

      These updates can be found in the revised “Data Availability” section of our manuscript.

      References

      Bromberg, Z., Donchin, O., & Haar, S. (2019). Eye Movements during Visuomotor Adaptation Represent Only Part of the Explicit Learning. eNeuro, 6(6). https://doi.org/10.1523/ENEURO.0308-19.2019

      Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 1–19.

      de Brouwer, A. J., Gallivan, J. P., & Flanagan, J. R. (2018). Visuomotor feedback gains are modulated by gaze position. Journal of Neurophysiology, 120(5), 2522–2531.

      Egly, R., & Homa, D. (1984). Sensitization of the visual field. Journal of Experimental Psychology. Human Perception and Performance, 10(6), 778–793.

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. eLife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Makino, Y., Hayashi, T., & Nozaki, D. (2023). Divisively normalized neuronal processing of uncertain visual feedback for visuomotor learning. Communications Biology, 6(1), 1286.

      Owsley, C., Ball, K., & Keeton, D. M. (1995). Relationship between visual sensitivity and target localization in older adults. Vision Research, 35(4), 579–587.

      Simani, M. C., McGuire, L. M. M., & Sabes, P. N. (2007). Visual-shift adaptation is composed of separable sensory and task-dependent effects. Journal of Neurophysiology, 98(5), 2827–2841.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Chandy, A. M., Chua, R., Miall, R. C., Cole, J., Farnè, A., Ivry, R. B., & Sarlegna, F. R. (2024). Minimal impact of proprioceptive loss on implicit sensorimotor adaptation and perceived movement outcome. bioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2023.01.19.524726

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. eLife, 11, e76639.

      Wei, K., Stevenson, I. H., & Körding, K. P. (2010). The uncertainty associated with visual flow fields and their influence on postural sway: Weber’s law suffices to explain the nonlinearity of vection. Journal of Vision, 10(14), 4.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      The study provides a complete comparative interactome analysis of α-arrestin in both humans and drosophila. The authors have presented interactomes of six humans and twelve Drosophila α-arrestins using affinity purification/mass spectrometry (AP/MS). The constructed interactomes helped to find α-arrestins binding partners through common protein motifs. The authors have used bioinformatic tools and experimental data in human cells to identify the roles of TXNIP and ARRDC5: TXNIP-HADC2 interaction and ARRDC5-V-type ATPase interaction. The study reveals the PPI network for α-arrestins and examines the functions of α-arrestins in both humans and Drosophila.

      Comments

      I will like to congratulate the authors and the corresponding authors of this manuscript for bringing together such an elaborate study on α-arrestin and conducting a comparative study in drosophila and humans.

      Introduction:

      The introduction provides a rationale behind why the comparison between humans and Drosophila is carried out.

      • Even though this is a research manuscript, including existing literature on similar comparison of α-arrestin from other articles will invite a wide readership.

      Results:

      The results cover all the necessary points concluded from the experiments and computational analysis.

      1) The authors could point out the similarity of the α-arrestin in both humans and Drosophila. While comparing α-arrestin in both humans and Drosophila If percentage homology between α-arrestin of both Drosophila and humans needs to be calculated.

      Thank you for your insightful feedback. As suggested by reviewer, we determined percentage homology of α-arrestin protein sequences from human and Drosophila using Clustal Omega. This homology is now illustrated as a heatmap in revised Figure S5. Please note that only the values with percentage homology of 40% or higher are selectively labeled.

      • Citing the direct connecting genes from the network in the text will invite citations and a wider readership.

      Figures:

      The images are elaborate and well-made.

      2) The authors could use a direct connected gene-gene network that pointing interactions. This can be used by other readers working on the same topic and ensure reproducibility and citations.

      We appreciate your valuable comment. Based on the reviewer’s suggestion, we have developed a new website in which one can navigate the gene-gene networks of α-arrestins. These direct connected gene-gene networks are housed in the network data exchange (NDEx) project. Additionally, we have included gene ontology and protein class details for α-arrestins’ interactors in these set of networks, offering a more comprehensive view of α-arrestins’ interactomes.

      On page 24 lines 15-18, we have revised the manuscript to introduce the newly developed website, as follows.

      “Lastly, to assist the research community, we have made comprehensive α-arrestin interactome maps on our website (big.hanyang.ac.kr/alphaArrestin_PPIN). Researchers can search and download their interactomes of interest as well as access information on potential cellular functions and protein class associated with these interactomes.”  

      3-1) The co-expression interactions represented as figures should reveal interaction among the α-arrestin and other genes. Which are the sub-network genes does the α- arrestin interact to/ with from the sub-network? The arrows are only pointing at the sub-networks. The figures do not reveal their interaction. Kindly reveal the interaction in the figure with the proper nodes in the figure.

      3-2) Figure 2: the network attached in both human and drosophila is well represented. The green lines from α-arrestin indicate the strength of the interaction. Several smaller expression networks are seen. But "α-arrestin" in both organisms seems highly disconnected from all the genes. Connected genes have edges, not arrows. If α-arrestin can be shown connected to these gene-gene networks will help in identifying which genes connect with which gene through α-arrestin. This can be used by other readers working on the same topic and ensure reproducibility and citations.

      Thank you for your valuable comment. In response to the reviewer’s recommendation, we’ve added supplementary figure, Figure S4, which illustrates direct interaction between α-arrestin and protein components of clustered complexes (or sub-networks) in addition to the associations shown between α-arrestins and the clustered complexes in Figure 2. We believe that this newly incorporated information regarding direct protein interactions will invite citations and wider readership as the reviewer pointed out.

      On page 12 line 27 to page 13 line 5, we have revised the manuscript to cite the direction interactions between ARRDC3 and proteins involved in ubiquitination-dependent proteolysis, as follows.

      “While the association of ARRDC3 with these ubiquitination-dependent proteolysis complexes is statistically insignificant, ARRDC3 does interact with individual components of these complexes such as NEDD4, NEDD4L, WWP1, and ITCH (Figure S4A). This suggest their functional relevance in this context, as previously reported in both literatures and databases (Nabhan et al., 2010; Shea et al., 2012; Szklarczyk et al., 2015; Warde-Farley et al., 2010) (Puca & Brou, 2014; Xiao et al., 2018).”

      Direct interaction between α-arrestins and protein components of clustered complexes are illustrated in the newly added figure, Figure S4.

      4-1) Figure 4. The Protein blot image was blurred. Kindly provide a higher-resolution image.

      4-2) Figure 5. B. - The authors can provide images with higher resolution blot images. The bands were not visible.

      We appreciate for valuable comment. Unfortunately, the protein blot image was scanned from the original film and the images we provided in the figure represent the highest resolution that we have obtained to date. Raw, uncropped images are shown in Author response image 1 and 2.

      Author response image 1.

      Raw image of Figure 4B

      Author response image 2.

      Raw image of Figure 5B

      5) Figure: 5. A. - I see non-specific amplifications in the gel images. Are these blotting images? or the gel images that were changed to "Grayscale"? Non-specific amplification may imply that the experiment was not repeated and standardized. Was it gel images or blot images?

      We appreciate your insightful comment. The images in Figure 5A represent western blot bands from co-immunoprecipitation assay for analysis of the interaction between TXNIP and HDAC2 proteins. Since immunoblotting using immunoprecipitates can usually detect some non-specific bands from heavy (~ 50 kDa) and light (~25 kDa) chains of the target antibody or from multiple co-immunoprecipitated proteins, we assume that the vague non-specific bands in Figure 5A might be a heavy chain of TXNIP or HDAC2 antibody or an unclear non-specific band. Because target bands showed strong intensity and very clear pattern compared to the non-specific bands in the co-immunoprecipitation assay, we believe that this data is sufficient to support the interaction of TXNIP with HDAC2. Finally, In the revised Figure 5A, we’ve modified the labeling for different experimental conditions, namely siCon and siTXNIP treatments, and added expected size of proteins (kDa), as shown below.

      6) Figure 5. A. RT-PCR analysis: What was your expected size of the amplifications? the ladder indicated is in KDa. Is that right?

      We appreciate your insightful questions. As mentioned above, Figure 5A shows the blotting images of co-immunoprecipitation analysis, and the ladder indicates the molecular weight (kDa) of protein markers. For clearer interpretation, the expected size of target proteins has been added in Figure 5A in the revised manuscript.

      7) How were the band intensities determined?

      Thank you for your question. For quantification of immunoblot results, the densities of target protein bands were analyzed with Image J, as we described in the Materials and Methods.

      Discussion:

      The authors have utilized and discussed the conclusion they draw from their study. But could highlight more on ARRDCs and why it was selected out of the other arrestins. The authors have provided future work directions associated with their work.

      8) Why were only ARRDCs presented amongst all the arrestin in the main part of the manuscript?

      We’re grateful for your valuable feedback. The reason we focused on α-arrestins was that α-arrestins have been discovered relatively recently, especially when compared to more established visual/ β-arrestin proteins in the same arrestin family but the biological functions of many α-arrestins remain largely unexplored, with notable exceptions in the budding yeast model and a few α-arrestins in mammals and invertebrate species. Most importantly, comparative study highlighting the shared or unique features of α-arrestins is yet to be undertaken. To gain a more comprehensive understanding of these unexplored α-arrestins across multiple species, we’ve centered our research on the ARRDCs within the arrestin protein family.

      On page 21 lines 8-17, we’ve edited the manuscript to emphasize the importance of a comparative study on α-arrestins, as detailed below.

      “According to a phylogenetic analysis of arrestin family proteins, α-arrestins were shown to be ubiquitously conserved from yeast to human (Alvarez, 2008). However, compared to the more established visual/ β-arrestin proteins, α-arrestins have been discovered more recently and much of their molecular mechanisms and functions remain mostly unexplored except for budding yeast model (Zbieralski & Wawrzycka, 2022). Based on the high-confidence interactomes of α-arrestins from human and Drosophila, we identified conserved and specific functions of these α-arrestins. Furthermore, we uncovered molecular functions of newly discovered function of human specific α-arrestins, TXNIP and ARRDC5. We anticipate that the discovery made here will enhance current understanding of α-arrestins.”

      9) The discussion could be elaborated more by utilizing the data.

      We appreciate your insightful feedback. Based on the reviewer’s suggestion, we’ve enhanced the discussion in the manuscript to provide a clearer interpretation of our results. First, we’ve added description of conserved protein complexes significantly associated with α-arrestins, stated on page 22 lines 5-12 and lines 23-26.

      Page 22 lines 5-12: “The integrative map of protein complexes also highlighted both conserved and unique relationships between α-arrestins and diverse functional protein complexes. For instance, protein complexes involved in ubiquitination-dependent proteolysis, proteasome, RNA splicing, and intracellular transport (motor proteins) were prevalently linked with α-arrestins in both human and Drosophila. To more precisely identify conserved PPIs associated with α-arrestins, we undertook ortholog predictions within the α-arrestins’ interactomes. This revealed 58 orthologous interaction groups that were observed to be conserved between human and Drosophila (Figure 3).”

      Page 22 lines 23-26: “Additionally, interaction between α-arrestins and entities like motor proteins, small GTPase, ATP binding proteins, and endosomal trafficking components were identified to be conserved. Further validation of these interactions could unveil molecular mechanisms consistently associated with these cellular functions.”

      Secondly, we’ve added description of role of ARRDC5 in osteoclast maturation, as stated on page 23 lines 22-24.

      “Conversely, depletion of ARRDC5 reduces osteoclast maturation, underscoring the pivotal role of ARRDC5 in osteoclast development and function (Figure S9A and B).”

      Lastly, we examined the association between α-arrestins’ interactomes and human diseases, incorporating our findings into the discussion. The newly introduced figure based on the result is Figure S10.

      On page 24 lines 10-14, we’ve added discussion on Figure S10 as follows.

      “We further explored association between α-arrestins’ interactomes and disease pathways (Figure S10). Notably, the interactomes of α-arrestins in human showed clear links to specific diseases. For instance, ARRDC5 is closely associated with disease resulting from viral infection and cardiovascular conditions. ARRDC2, ARRDC4, and TXNIP share common association with certain neurodegenerative diseases, while ARRDC1 is implicated in cancer.”

      Supplementary figures:

      The authors have a rigorous amount of work added together for the success of this manuscript.

      10) The reference section needs editing before publication. Maybe the arrangement was disturbed during compiling.

      Thank you for your valuable comment. Based on the reviewer’s suggestion, we have rearranged the reference section to enhance its clarity. Below are excerpts from the update reference section in the manuscript.

      “Adenuga, D., & Rahman, I. (2010). Protein kinase CK2-mediated phosphorylation of HDAC2 regulates co-repressor formation, deacetylase activity and acetylation of HDAC2 by cigarette smoke and aldehydes. Arch Biochem Biophys, 498(1), 62-73. doi:10.1016/j.abb.2010.04.002

      Adenuga, D., Yao, H., March, T. H., Seagrave, J., & Rahman, I. (2009). Histone Deacetylase 2 Is Phosphorylated, Ubiquitinated, and Degraded by Cigarette Smoke. American Journal of Respiratory Cell and Molecular Biology, 40(4), 464-473. doi:10.1165/rcmb.2008-0255OC

      Akalin, A., Franke, V., Vlahovicek, K., Mason, C. E., & Schubeler, D. (2015). Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics, 31(7), 1127-1129. doi:10.1093/bioinformatics/btu775

      Alvarez, C. E. (2008). On the origins of arrestin and rhodopsin. BMC Evol Biol, 8, 222. doi:10.1186/1471-2148-8-222”

      11) many important references were missing.

      We appreciate and agree with the reviewer’s comment. In response to the reviewer’s recommendation, we’ve thoroughly reviewed the manuscript and below are sections of the manuscript where around 20 new references have been added.

      On page 8 lines 12-14:

      “Utilizing the known affinities between short linear motifs in α-arrestins and protein domains in interactomes(El-Gebali et al., 2019; UniProt Consortium, 2018) “

      On page 8 lines 19-22:

      “One of the most well-known short-linear motifs in α-arrestin is PPxY, which is reported to bind with high affinity to the WW domain found in various proteins, including ubiquitin ligases (Ingham, Gish, & Pawson, 2004; Macias et al., 1996; Sudol, Chen, Bougeret, Einbond, & Bork, 1995)”

      On page 9 lines 3-6:

      “Next, we conducted enrichment analyses of Pfam proteins domains (El-Gebali et al., 2019; Huang da, Sherman, & Lempicki, 2009b) among interactome of each α-arrestin to investigate known and novel protein domains commonly or specifically associated (Figure S3A; Table S5).”

      On page 9 lines 7-10:

      “HECT and C2 domains are well known to be embedded in the E3 ubiquitin ligases such as NEDD4, HECW2, and ITCH along with WW domains (Ingham et al., 2004; Melino et al., 2008; Rotin & Kumar, 2009; Scheffner, Nuber, & Huibregtse, 1995; Weber, Polo, & Maspero, 2019)”

      On page 10 lines 12-16:

      “In fact, the known binding partners, NEDD4, WWP2, WWP1, and ITCH in human and CG42797, Su(dx), Nedd4, Yki, Smurf, and HERC2 in Drosophila, that were detected in our data are related to ubiquitin ligases and protein degradation (C. Chen & Matesic, 2007; Ingham et al., 2004; Y. Kwon et al., 2013; Marin, 2010; Melino et al., 2008; Rotin & Kumar, 2009) (Figure 1E; Figure S2F).”

      On page 13 lines 20-21:

      “Given that α-arrestins are widely conserved in metazoans (Alvarez, 2008; DeWire, Ahn, Lefkowitz, & Shenoy, 2007), “

      On page 14 lines 12-17:

      “The most prominent functional modules shared across both species were the ubiquitin-dependent proteolysis, endosomal trafficking, and small GTPase binding modules, which are in agreement with the well-described functions of α-arrestins in membrane receptor degradation through ubiquitination and vesicle trafficking (Dores et al., 2015; S. O. Han et al., 2013; Y. Kwon et al., 2013; Nabhan et al., 2012; Puca & Brou, 2014; Puca et al., 2013; Shea et al., 2012; Xiao et al., 2018; Zbieralski & Wawrzycka, 2022) (Figure 3).”  

      Reviewer #2

      In this manuscript, the authors present a novel interactome focused on human and fly alpha-arrestin family proteins and demonstrate its application in understanding the functions of these proteins. Initially, the authors employed AP/MS analysis, a popular method for mapping protein-protein interactions (PPIs) by isolating protein complexes. Through rigorous statistical and manual quality control procedures, they established two robust interactomes, consisting of 6 baits and 307 prey proteins for humans, and 12 baits and 467 prey proteins for flies. To gain insights into the gene function, the authors investigated the interactors of alpha-arrestin proteins through various functional analyses, such as gene set enrichment. Furthermore, by comparing the interactors between humans and flies, the authors described both conserved and species-specific functions of the alpha-arrestin proteins. To validate their findings, the authors performed several experimental validations for TXNIP and ARRDC5 using ATAC-seq, siRNA knockdown, and tissue staining assays. The experimental results strongly support the predicted functions of the alpha-arrestin proteins and underscore their importance. `

      I would like to suggest the following analyses to further enhance the study:

      1) It would be valuable if the authors could present a side-by-side comparison of the interactomes of alpha-arrestin proteins, both before and after this study. This visual summary network would demonstrate the extent to which this work expanded the existing interactome, emphasizing the overall contribution of this study to the investigation of the alpha-arrestin protein family.

      We greatly appreciate your insightful feedback. In response to the reviewer’s suggestion, we’ve depicted a network of known PPIs associated with α-arrestins (Figure S2C and D). Furthermore, by comparing our high-confidence PPIs to these known sets, we found that the overlaps are statistically significant and the high-confidence PPIs of α-arrestins broaden the existing interactome (Figure S2E).

      From page 7 line 26 to page 8 line 8, we’ve detailed this side-by-side comparisons of existing interactome and newly discovered high-confidence PPIs of α-arrestins, as outline below.

      “As a result, we successfully identified many known interaction partners of α-arrestins such as NEDD4, WWP2, WWP1, ITCH and TSG101, previously documented in both literatures and PPI databases (Figure S2C-F) (Colland et al., 2004; Dotimas et al., 2016; Draheim et al., 2010; Mellacheruvu et al., 2013; Nabhan et al., 2012; Nishinaka et al., 2004; Puca & Brou, 2014; Szklarczyk et al., 2015; Warde-Farley et al., 2010; Wu et al., 2013). Additionally, we greatly expanded repertoire of PPIs associated with α-arrestins in human and Drosophila, resulting in 390 PPIs between six α-arrestins and 307 prey proteins in human, and 740 PPIs between twelve α-arrestins and 467 prey proteins in Drosophila (Figure S2E). These are subsequently referred to as ‘high-confidence PPIs’ (Table S3).”

      2) While the authors conducted several analyses exploring protein function, there is a need to further explore the implications of the interactome in human diseases. For instance, it would be beneficial to investigate the association of the newly identified interactome members with specific human diseases. Including such investigations would strengthen the link between the interactome and human disease contexts.

      Thank you for your valuable comment. As suggested by the reviewer, we examined the association between α-arrestins’ interactomes and human diseases, incorporating our findings into the discussion. The newly introduced figure based on the result is Figure S10.

      On page 24 lines 10-14, we’ve added discussion on Figure S10 as follows.

      “We further explored association between α-arrestins’ interactomes and disease pathways (Figure S10). Notably, the interactomes of α-arrestins in human showed clear links to specific diseases. For instance, ARRDC5 is closely associated with disease resulting from viral infection and cardiovascular conditions. ARRDC2, ARRDC4, and TXNIP share common association with certain neurodegenerative diseases, while ARRDC1 is implicated in cancer.”

      Reviewer #3:

      Lee, Kyungtae and colleagues have discovered and mapped out alpha-arrestin interactomes in both human and Drosophila through the affinity purification/mass spectrometry and the SAINTexpress method. They found the high confident interactomes, consisting of 390 protein-protein interactions (PPIs) between six human alpha-arrestins and 307 preproteins, as well as 740 PPIs between twelve Drosophila alpha-arrestins and 467 prey proteins. To define and characterize these identified alpha-arrestin interactomes, the team employed a variety of widely recognized bioinformatics tools. These included protein domain enrichment analysis, PANTHER for protein class enrichment, DAVID for subcellular localization analysis, COMPLEAT for the identification of functional complexes, and DIOPT to identify evolutionary conserved interactomes. Through these analyses, they confirmed known alpha-arrestin interactors' role and associated functions such as ubiquitin ligase and protease. Furthermore, they found unexpected biological functions in the newly discovered interactomes, including RNA splicing and helicase, GTPase-activating proteins, ATP synthase. The authors carried out further study into the role of human TXNIP in transcription and epigenetic regulation, as well as the role of ARRDC5 in osteoclast differentiation. This study holds important value as the newly identified alpha-arrestin interactomes are likely aiding functional studies of this group of proteins. Despite the overall support from data for the paper's conclusions, certain elements related to data quantification, interpretation, and presentation demand more detailed explanation and clarification.

      1) In Figure 1B, it is shown that human alpha-arrestins were N-GFP tagged (N-terminal) and Drosophila alpha-arrestins were C-GFP (C-terminal). However, the rationale of why the authors used different tags for human and fly proteins was not explained in the main text and methods.

      We appreciate your valuable comment. Both N- and C-terminally tagged α-arrestins have been used previously. Given that our study aims to increase the repertoire of α-arrestin interacting proteins, where GFP is added might not be a concern. We note that GFP is a relatively bulky tag, and tagging a protein with GFP can potentially abolish the interaction with some of the binding proteins. Follow-up studies utilizing different approaches for detecting protein-protein interactions, such as BioID and yeast two-hybrid, will allow us to build more comprehensive α-arrestin interactomes.

      2) In Figure 2A, there seems to be an error for labeling the GAL4p/GAL80p complex that includes NOTCH2, NOTCH1 and TSC2.

      Thank you for comment. We double-checked COMPLEAT (protein COMPLex Enrichment Analysis Tool) database for the name of protein complex consisting of NOTCH1, NOTCH2, AND TSC2. The database indeed labeled this complex as the “GAL4p/GAL80p complex”. However, given the potential for mis-annotation (since we could not ascertain the relevance of these proteins to the “GAL4p/GAL80p complex”), we chose to exclude this protein complex from the network. The update protein complex network is illustrated in the revised Figure 2A.

      3) In Figure 5, given that knockdown of TXNIP did not affect the levels and nuclear localization of HDAC2, the authors suggest that TXNIP might modulate HDAC2 activity. However, the ChiP assay suggest a different model - TXNIP-HDAC2 interaction might inhibit the chromatin occupancy of HDAC2, reducing histone deacetylation and increasing global chromatin accessibly. The authors need to propose a model consistent with these sets of all data.

      We greatly appreciate your detailed feedback. Our data indicates a global decrease in chromatin accessibility (Figure 4C-G) and a diminished interaction between TXNIP and HDAC2 under depletion of TXNIP (Figure 5A). Additionally, we observed an increased occupancy of HDAC2 and subsequent histone deacetylation at TXNIP-target promoter regions (Figure 5C) without any changes in the HDAC2 expression level (Figure 5A) in TXNIP- knockdown cells. From these observations, we infer that the interaction between TXNIP-HDAC2 might suppress the function of HDAC2, a major gene silencer affecting the formation of condensed or accessible chromatin by deacetylating activity. Although we checked whether TXNIP could induce cytosolic retention of HDAC2 to inhibit nuclear function of HDAC2, TNXIP knockdown did not alter its subcellular localization (Figure 5B).

      To elucidate the mechanism by which TXNIP inhibits the function of HDAC2, we further investigated the effect of TXNIP on the levels of HDAC2 phosphorylation, which is known to be crucial for its deacetylase activity and the formation of transcriptional repressive complex. However, as shown in the Figure S8C and D, the knockdown of TXNIP did not affect the HDAC2 phosphorylation status, as well as the interaction between HDAC2 and other components in NuRD complex in the immunoblotting and co-IP assays, respectively. The results suggest that TXNIP may inhibit the function of HDAC2 independently of these factors.

      Following the reviewer’s suggestion, we carefully provided a proposed model describing the possible role of TXNIP in transcriptional regulation through interaction with HDAC2 and co-repressor complex in Figure S8E.

      Description of these newly added figures can be found in the revised manuscript from page 18 line 7 to 27, as outlined below.

      “HDAC2 typically operates within the mammalian nucleus as part of co-repressor complexes as it lacks ability to bind to DNA directly (Hassig, Fleischer, Billin, Schreiber, & Ayer, 1997). The nucleosome remodeling and deacetylation (NuRD) complex is one of the well-recognized co-repressor complexes that contains HDAC2 (Kelly & Cowley, 2013; Seto & Yoshida, 2014) and we sought to determine if depletion of TXNIP affects interaction between HDAC2 and other components in this NuRD complex. While HDAC2 interacted with MBD3 and MTA1 under normal condition, the interaction between HDAC2 and MBD3 or MTA1 was not affected upon TXNIP depletion (Figure S8C). Next, given that HDAC2 phosphorylation is known to influence its enzymatic activity and stability (Adenuga & Rahman, 2010; Adenuga, Yao, March, Seagrave, & Rahman, 2009; Bahl & Seto, 2021; Tsai & Seto, 2002), we tested if TXNIP depletion alters phosphorylation status of HDAC2. The result indicated, however, that phosphorylation status of HDAC2 does not change upon TXNIP depletion (Figure S8D). In summary, our findings suggest a model where TXNIP plays a role in transcriptional regulation independent of these factors (Figure S8E). When TXNIP is present, it directly interacts with HDAC2, a key component of transcriptional co-repressor complex. This interaction suppresses the HDAC2 ‘s recruitment to target genomic regions, leading to the histone acetylation of target loci possibly through active complex including histone acetyltransferase (HAT). As a result, transcriptional activation of target gene occurs. In contrast, when TXNIP expression is diminished, the interaction between TXNIP and HDAC2 weakens. This restores histone deacetylating activity of HDAC2 in the co-repressor complex, leading to subsequent repression of target gene transcription.”

      4) The authors showed that ectopic expression of ARRDC5 increased osteoclast differentiation and function. Does loss of ARDDC5 lead to defects in osteoclast function and fate determination?

      We appreciate your valuable comment. We have confirmed the endogenous expression of ARRDC5 in osteoclasts and conducted a loss-of-function study using shARRDC5. As determined by qPCR, ARRDC5 was endogenously expressed very low in osteoclasts. Even during RANKL-induced osteoclast differentiation, the CT value (29-31) for ARRDC5 expression was high in osteoclasts compared to the CT value (17-24) for the expression of marker genes Cathepsin K, TRAP, and NFATc1. Even though its endogenous expression was very low, we generated ARRDC5 knockdown cells by infecting BMMs with lentivirus expressing shRNA of ARRDC5 and subsequently differentiated the cells into mature osteoclasts. After five days of differentiation, we observed a significant decrease in the total number of TRAP-positive multinucleated cells (No. of TRAP+ MNCs) in shARRDC5 cells compared to that in the control cells. This result indicates that the loss of ARRDC5 leads to defects in osteoclast differentiation. Result of this loss-of-function study using shARRDC5 is depicted in Figure S9A and B.

      In the revised manuscript, following sentence explaining Figure S9A and B was added on page 19 lines 15-17 as follows.

      “Depletion of ARRDC5 using short hairpin RNA (shRNA) impaired osteoclast differentiation, further affirming its crucial role in this differentiation process (Figure S9A and B).”

      5) From Figure 6D, the authors argued that ARRDC5 overexpression resulted in more V-ATPase signals: however, there is no quantification. Quantification of the confocal images will foster the conclusion. Also, western blots for V-ATPase proteins will provide an alternative way to determine the effects of ARRDC5.

      We appreciate your insightful feedback. As suggested by the reviewer, we quantified V-type ATPase signals using confocal images, which were shown in Figure 6D. The ImageJ program was employed for integrated density measurements, and the integrated density of GFP-GFP overexpressing osteoclasts was set to 1 for relative comparison. The result in the revised Figure 6D revealed a significant increase in V-type ATPase signals in GFP-ARRDC5 overexpressing osteoclasts compared to that in GFP-GFP overexpressing osteoclasts, as outlined below.

      We also agree with the reviewer’s comment that Western blot for V-ATPase proteins will be an alternative way to determine the effects of ARRDC5 in osteoclast differentiation. We have confirmed no different expression of V-type ATPase between GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts using qPCR and western blot analysis. The corresponding western blot result is shown in the revised Figure S9C.

      In addition, the corresponding qPCR that measures the expression level of V-type ATPase between GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts is shown in Author response image 3.

      Author response image 3.

      Moreover, based on the references, the V-type ATPase is localized at the plasma membrane during osteoclast differentiation (Toyomura et al., 2003). Although mRNA and protein expression levels were similar in both cells, localization of V-ATPase in plasma membrane was significantly increased in GFP-ARRDC5 overexpressing osteoclasts compared to that in GFP-GFP osteoclasts, as shown in the revised Figure 6D above.

      6) The results from Figure 6D did not support the authors' argument that ARRDC5 might control the membrane localization of the V-ATPase, as bafilomycin is the V-ATPase inhibitor. ARRDC5 knockdown experiments will help to determine whether ARRDC5 can control the membrane localization of the V-ATPase in osteoclast.

      Thank you for your insightful comment. V-type ATPase has been reported to play an important role in the differentiation and function of osteoclasts (Feng et al., 2009; Qin et al., 2012). Given that various subunits of the V-type ATPase interact with ARRDC5 (Figure 6A), we speculated that ARRDC5 might be involved in the function of this complex and play a role in osteoclast differentiation and function. As answered above, GFP-ARRDC5 overexpressing osteoclasts showed a similar expression level of V-type ATPase to GFP-GFP cells but exhibited increased V-type ATPase signals at the cell membrane compared to those in GFP-GFP cells (Figure 6D). Additionally, co-localization of ARRDC5 and V-type ATPase was observed in the osteoclast membrane (Figure 6D), as predicted by the human ARRDC5-centric PPI network. On the other side, bafilomycin A1, a V-type ATPase inhibitor, not only blocked localization of V-type ATPase to plasma membrane in GFP-ARRDC5 overexpressing osteoclasts, but also reduced ARRDC5 signals (Figure 6D). These results indicate that ARRDC5 plays a role in osteoclast differentiation and function by interacting with V-type ATPase and promoting the localization of V-type ATPase to plasma membrane in osteoclasts.

      V-type ATPase present in osteoclast membrane is important to cell fusion, maturation, and function during osteoclast differentiation (Feng et al., 2009; Qin et al., 2012). GFP-ARRDC5 overexpressing osteoclasts showed a significant increase of V-type ATPase signals in the cell membrane compared to GFP-GFP cells (Figure 6D), and also significantly increased cell fusion (No. of TRAP+ MNCs in Figure 6B) and resorption activity (resorption pit formation in Figure 6C). However, ARRDC5 knockdown in osteoclasts (shARRDC5 cells) showed a significant decrease in No. of TRAP+ MNCs compared to that in the control cells, indicating that the loss of ARRDC5 leads to defects in cell fusion during osteoclast differentiation (Figure S9A and B). As described above, the endogenous expression of ARRDC5 was very low in osteoclasts and could be specifically expressed in a certain timepoint during the differentiation. Therefore, to better understand the interaction with V-type ATPase of ARRDC5 in osteoclasts, ARRDC5 overexpression is more suitable than its knockdown.

      Part of the manuscript on page 19 line 21 to page 20 line 6 was edited to support our statement, as outlined below.

      “The V-type ATPase is localized at the osteoclast plasma membrane (Toyomura et al., 2003) and its localization is important for cell fusion, maturation, and function during osteoclast differentiation (Feng et al., 2009; Qin et al., 2012). Furthermore, its localization is disrupted by bafilomycin A1, which is shown to attenuate the transport of the V-type ATPase to the membrane (Matsumoto & Nakanishi-Matsui, 2019). We analyzed changes in the expression level and localization of V-type ATPase, especially V-type ATPase V1 domain subunit (ATP6V1), in GFP-GFP and GFP-ARRDC5 overexpressing osteoclasts. The level of V-type ATPase expression did not change in osteoclasts regardless of ARRDC5 expression levels (Figure S9C). GFP signals were detected at the cell membrane when GFP-ARRDC5 was overexpressed, indicating that ARRDC5 might also localize to the osteoclast plasma membrane (Figure 6D; Figure S9D). In addition, we detected more V-type ATPase signals at the cell membrane in the GFP-ARRDC5 overexpressing osteoclasts, and ARRDC5 and V-type ATPase were co-localized at the osteoclast membrane (Figure 6D; Figure S9D).”

      7) The tables (excel files) do not have proper names for each table S numbers. Please correct the name of excel files for readers.

      We appreciate your valuable comments. In response to the reviewer’s suggestion, we’ve renamed excel files to more appropriate titles for easier readability. List of renamed tables (excel files) are shown below.

      Table S1. List of α-arrestins from human and Drosophila Table S2. Evaluation sets of α-arrestins PPIs Table S3. Summary tables of SAINTexpress results Table S4. Protein domains and short linear motifs in the α-arrestin interactomes Table S5. Enriched Pfam domains in the α-arrestin interactomes Table S6. Subcellular localizations of α-arrestin interactomes Table S7. Summary of protein complexes and cellular components associated with α-arrestin Table S8. Orthologous relationship of α-arrestin interactomes between human and Drosophila Table S9. Summary of ATAC- and RNA-seq read counts before and after processing Table S10. Differential accessibility of ACRs and gene expression Table S11. Summary of ATAC-seq peaks located in promoters and gene expression level Table S12. List of primer sequences used in this study

      8) http://big.hanyang.ac.kr/alphaArrestin_Fly link does not work. Please fix the link.

      We appreciate your comment. In response to the reviewer’s comment, we have made comprehensive α-arrestin interactome maps on our new website (big.hanyang.ac.kr/alphaArrestin_PPIN) and confirmed that users can be re-directed to networks housed in NDEx.

      Author response image 4.

      Screen shot of the first page of the newly developed website.

      Website address: big.hanyang.ac.kr/‌‌‌‌‌‍‍‍‌‌alphaArrestin_PPIN

      Author response image 5.

      Screen shot of the gene-gene network involving α-arrestin in human.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work from Cui, Pan, Fan, et al explores memory impairment in chronic pain mouse models, a topic of great interest in the neurobiology field. In particular, the work starts from a very interesting observation, that WT mice can be divided into susceptible and unsusceptible to memory impairment upon modelling chronic pain with CCI. This observation represents the basis of the work where the authors identify the sphingosine receptor S1PR1 as down-regulated in the dentate gyrus of susceptible animals and demonstrate through an elegant range of experiments involving AAV-mediated knockdown or overexpression of S1PR1 that this receptor is involved in the memory impairment observed with chronic pain. Importantly for translational purposes, they also show that activation of S1PR1 through a pharmacological paradigm is able to rescue the memory impairment phenotype.

      The authors also link these defects to reduced dendritic branching and a reduced number of mature excitatory synapses in the DG to the memory phenotype.

      They then proceed to explore possible mechanisms downstream of S1PR1 that could explain this reduction in dendritic spines. They identify integrin α2 as an interactor of S1PR1 and show a reduction in several proteins involved in actin dynamic, which is crucial for dendritic spine formation and plasticity.

      They thus hypothesize that the interaction between S1PR1 and Integrin α2 is fundamental for the activation of Rac1 and Cdc42 and consequently for the polymerisation of actin; a reduction in this pathway upon chronic pain would thus lead to impaired actin polymerisation, synapse formation, and thus impaired memory.

      The work is of great interest and the experiments are of very good quality with results of great importance. I have however some concerns. The main concern I have relates to the last part of the work, namely Figures 8 and 9, which I feel are not at the same level as the results presented in the previous 7 Figures, which are instead outstanding.

      In particular:

      - In Figure 8, given the reduction in all the proteins tested, the authors need to check some additional proteins as controls. One good candidate could be RhoA, considering the authors say it is activated by S1PR2 and not by S1PR1;

      Thanks for your suggestion. We tested the expression level of RhoA in mice 7 days and 21 days post CCI as negative controls (Supplemental Figure 9).

      - In addition to the previous point, could the authors also show that the number of neurons is not grossly different between susceptible and unsusceptible mice? This could be done by simply staining for NeuN or performing a western blot for a neuronal-specific protein (e.g. Map2 or beta3-tubulin);

      As suggested, we performed immunofluorescence using NeuN antibody to detect the number of neurons in susceptible and unsusceptible mice. The number is not significantly different between the two populations (Supplementary Figure 7).

      - In Figure 8, the authors should also evaluate the levels of activated RAC1 and activated Cdc42, which are much more important than just basal levels of the proteins to infer an effect on actin dynamics. This is possible through kits that use specific adaptors to pulldown GTP-Rac1 and GTP-Cdc42;

      Thanks for your constructive suggestion. An elevated level and hyperactivation of Rac1 protein are both associated with actin dynamics and dendritic development [1]. We agree that showing the levels of activated RAC1 is better to infer its effect on actin dynamics. Here in Figure 8, the purpose of this experiment is to prove the levels of actin organization related proteins are altered according to the expression level of S1PR1, thus drawing a conclusion that the actin organization was disrupted, but not to specifically emphasize that S1PR1 activated these proteins. We apologize for the confusion made but we think the current data is enough to support the conclusion.

      Thanks again for your advice. Your understanding is greatly appreciated.

      - In Figure 9C, the experiment is performed in an immortalised cell line. I feel this needs to be performed at least in primary hippocampal neurons;

      Thanks for your suggestion. As suggested, we performed the experiment in primary hippocampal neurons. Knockdown of S1pr1 in primary hippocampal neurons induced reduction in the number of branches and filamentous actin. Please refer to the updated Figure 9C.

      - In Figure 9D, the authors use a Yeast two-hybrid system to demonstrate the interaction between S1PR1 and Integrin α2. However, as the yeast two-hybrid system is based on the proximity of the GAL4 activating domain and the GAL4 binding domain, which are used to activate the transcription of reporter genes, the system is not often used when probing the interaction between transmembrane proteins. Could the authors use other transmembrane proteins as negative controls?;

      Thanks for your question. We apologize for the unclear description in the method part. Traditional yeast two-hybrid system can only detect protein interactions that occur in the nucleus, but cannot detect ones between membrane proteins. Here, we utilized the split-ubiquitin membrane-based Yeast two-hybrid system. Briefly, in the ubiquitin system, ubiquitin, a protein composed of 76 amino acid residues that can mediate the ubiquitination degradation of target proteins by proteasomes, is split into two domains, namely Cub at the C-terminus and NbuG at the N-terminus, which are fused and expressed with the bait protein “Bait” and the prey protein “Prey”, respectively. At the same time, Cub is also fused with transcription factors. If Bait and Prey proteins could bind, Cub and NbuG would be brought together and a complete ubiquitin would be formed, which would be recognized by the proteasome and the fused transcription factor would be cut off and enter the cell nucleus to activate the expression of the reporter gene. We then determine whether the Bait and Prey proteins interact with each other through the growth of the yeast.

      Thanks again for pointing this out. We reworded the method in M&M (Line 678-696).

      - In Figure 9E, the immunoblot is very unconvincing. The bands in the inputs are very weak for both ITGA2 and S1PR1, the authors do not show the enrichment of S1PR1 upon its immunoprecipitation and the band for ITGA2 in the IP fraction has a weird appearance. Were these experiments performed on DG lysates only? If so, I suggest the authors repeat the experiment using the whole brain (or at least the whole hippocampus) so as to have more starting material. Alternatively, if this doesn't work, or in addition, they could also perform the immunoprecipitation in heterologous cells overexpressing the two proteins;

      Thanks for the question and suggestion. We used DG lysates from both the dentate gyrus of a single mouse as the starting material. We updated the result which showed clearer bands (Figure 9E).

      - About the point above, even if the results were convincing, the authors can't say that they demonstrate an interaction in vivo. In co-IP experiments, the interaction is much more likely to occur in the lysate during the incubation period rather than being conserved from the in vivo state. These co-IPs demonstrate the ability of proteins to interact, not necessarily that they do it in vivo. If the authors wanted to demonstrate this, they could perform a Proximity ligation assay in primary hippocampal neurons, using antibodies against S1PR1 and ITGA2.

      Thanks for your concern. Co-immunoprecipitation (Co-IP) is the gold standard to identify protein-protein interactions [2], and it is one of the most efficient techniques to study these protein-protein interactions in vivo [3]. We repeated the experiment and followed the experimental procedure exactly to avoid the protein interaction due to over-incubation. Over-incubation, particularly at room temperature, may result in non-specific binding and therefore high background, thus we performed Co-IPs at 4°C to preserve protein interactions. We agree that Proximity ligation assay is better suited for studies of endogenously expressed proteins in primary cells [4]. Since we optimized the experiment procedure to avoid non-specific binding and particularly, Co-IP utilized proteins from DG lysates which could validate the specificity of the protein interaction in native tissue, we prefer to keep the Co-IP result in Figure 9E.

      Thanks again for your suggestion. We appreciate your understanding on this matter.

      - In Figure 9H, could the authors increase the N to see if shItga2 causes further KD in the CCI?

      As suggested, we repeated the experiment and increased the N to 6. As shown in the following picture, shItga2 did not cause further KD in the CCI.

      Author response image 1.

      - To conclusively demonstrate that S1PR1 and ITGA2 participate in the same pathway, they could show that knocking down the two proteins at the same time does not have additive effects on behavioral tests compared to the knockdown of each one of them in isolation.

      Thanks for your suggestion. As suggested, we knocked down the two proteins at the same and did not observe additive effects on behavioral tests compared to the knockdown of each one of them in isolation. Please refer to Figure 9L-O.

      Other major concerns:

      - Supplementary Figure 5: the image showing colocalisation between S1PR1 and CamKII is not very convincing. Is the S1PR1 antibody validated on Knockout or knockdown in immunostaining?;

      S1PR1 is a membrane receptor and the S1P1 antibody (PA1-1040, Invitrogen) shows membranous staining with diffuse dot-like signals (Please refer to the image “A” provided by ThermoFisher Scientific). Here, we utilized the antibody to detect the expression of S1PR1 in DG granule cells. We can see the diffuse dot-like signals aggregated in each single granule cell. CaMKII shows intense staining around the border of the granule cell soma (Image “B”) [5]. According to the images shown in Supplementary Figure 5B, we concluded that S1PR1 is expressed in CaMKII+ cells.

      Besides, as suggested, we validated the S1PR1 antibody on knockdown in immunostaining (Image “C” and “D”). The expression of S1PR1 is significantly decreased compared with the control.

      Author response image 2.

      - It would be interesting to check S1PR2 levels as a control in CCI-chronic animals;

      As suggested, we quantified the S1PR2 levels in Sham and CCI animals, and there is no significant difference between groups (Supplementary Figure 9).

      - Figure 1: I am a bit concerned about the Ns in these experiments. In the chronic pain experiments, the N for Sham is around 8 whereas is around 20 for CCI animals. Although I understand higher numbers are necessary to see the susceptible and unsusceptible populations, I feel that then the same number of Sham animals should be used;

      Thanks for your concern. In the preliminary experiment, we noticed that the ratio of susceptible and unsusceptible populations is around 1:1. After the behavioral tests, we need to further take samples to investigate molecular and cellular changes of each group. Thus, we set sham around 8 and CCI around 20 to ensure that after characterization into susceptible and unsusceptible groups, each group has relatively equal numbers for further investigations.

      - Figures 1E and 1G have much higher Ns than the other panels. Why is that? If they have performed this high number of animals why not show them in all panels?;

      Thanks for your concern. For Figure 1B, C, D and F, we showed the data for each batch of experiment, while for Figure 1E and 1G, we used data collected from all batches of experiment. To show the data from a single batch, we would like to demonstrate the ratio of susceptible to unsusceptible is relatively stable, but not only based on a big sample size.

      - In the experiments where viral injection is performed, the authors should show a zoomed-out image of the brain to show the precision of the injection and how spread the expression of the different viruses was;

      As suggested, we showed the zoomed-out image in Supplementary Figure 6. The viruses are mainly expressed in the hippocampal DG.

      - The authors should check if there is brain inflammation in CCI chronic animals. This would be interesting to explain if this could be the trigger for the effects seen in neurons. In particular, the authors should check astrocytes and microglia. This is of interest also because the pathways altered in Figure 8A are related to viral infection.

      - If the previous point shows increased brain inflammation, it would be interesting for the authors to check whether a prolonged anti-inflammatory treatment in CCI animals administered before the insurgence of memory impairment could stop it from happening;

      - In addition, the authors should speculate on what could be the signal that can induce these molecular changes starting from the site of injury;

      - Also, as the animals are all WT, the authors should speculate on what could render some animals prone to have memory impairments and others resistant.<br />

      Thanks for the above four suggestions. We have observed inflammation including T cell infiltration and microglia activation in the hippocampal DG in CCI chronic animals and also used S1PR1 modulator which has anti-lymphocyte mediated inflammatory effect to prevent the insurgence of memory impairment from happening. We also examined the alteration in the numbers of peripheral T-lymphocyte subsets and the serum levels of cytokines. Furthermore, we found a neuron-microglia dialogue in the DG which may promote the resilience to memory impairment in CCI animals. Since these are unpublished results, we apologize that we would not give much detailed information to the public at the current stage. We will publish these data as soon as possible. Thanks for your understanding.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioural tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers segregated chronic pain mice into memory impairment-susceptible and -unsusceptible subpopulations. They discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits.

      Knockdown of S1PR1 in the DG induced a susceptible phenotype, while overexpression or pharmacological activation of S1PR1 promoted resistance to memory impairment and restored normal synaptic structure. The study identifies actin cytoskeleton-related pathways, including ITGA2 and its downstream Rac1/Cdc42 signaling, as key mediators of S1PR1's effects, offering new insights and potential therapeutic targets for chronic pain-related cognitive dysfunction.

      This manuscript consists of a comprehensive investigation and significant findings. The study provides novel insights into the molecular mechanisms of chronic pain-related memory impairment, highlighting the critical role of S1P/S1PR1 signaling in the hippocampal dentate gyrus. The clear identification of S1P/S1PR1 as a potential therapeutic target offers promising avenues for future research and treatment strategies. The manuscript is well-structured, methodologically sound, and presents valuable contributions to the field.

      Strengths:

      (1) The manuscript is well-structured and written in clear, concise language. The flow of information is logical and easy to follow.

      (2) The segregation of mice into memory impairment-susceptible and -unsusceptible subpopulations is innovative and well-justified. The statistical analyses are robust and appropriate for the data.

      (3) The detailed examination of S1PR1 expression and its impact on synaptic plasticity and actin cytoskeleton reorganization is impressive. The findings are significant and contribute to the understanding of chronic pain-related memory impairment.

      Weaknesses:

      (1) Results: While the results are comprehensive, some sections are data-heavy and could be more reader-friendly with summarized key points before diving into detailed data.

      Thanks for the suggestion. For the first sentence in each part/paragraph, we used statement that summarises what will be investigating in the following experiments to make it more reader-friendly. They are labeled as blue in the main text.

      (2) Discussion: There is a need for a more balanced discussion regarding the limitations of the study. For example, addressing potential biases in the animal model or limitations in the generalizability of the findings to humans would strengthen the discussion. Also, providing specific suggestions for follow-up studies would be beneficial.

      As suggested, we discussed more on the limitations of this study and outlined some directions for future research (Line 481-498).

      (3) Conclusion: The conclusion, while concise, could better highlight the study's broader impact on the field and potential clinical implications.

      Thanks. We reworded the conclusion to better highlight the impacts of this study (Line 501-505).

      Reviewer #3 (Public Review):

      Summary of the Authors' Objectives:

      The authors aimed to delineate the role of S1P/S1PR1 signaling in the dentate gyrus in the context of memory impairment associated with chronic pain. They sought to understand the molecular mechanisms contributing to the variability in memory impairment susceptibility and to identify potential therapeutic targets.

      Major Strengths and Weaknesses of the Study:

      The study is methodologically robust, employing a combination of RNA-seq analysis, viral-mediated gene manipulation, and pharmacological interventions to investigate the S1P/S1PR1 pathway. The use of both knockdown and overexpression approaches to modulate S1PR1 levels provides compelling evidence for its role in memory impairment. The research also benefits from a comprehensive assessment of behavioral changes associated with chronic pain.

      However, the study has some weaknesses. The categorization of mice into 'susceptible' and 'unsusceptible' groups based on memory performance requires further validation. Additionally, the reliance on a single animal model may limit the generalizability of the findings. The study could also benefit from a more detailed exploration of the impact of different types of pain on memory impairment.

      Assessment of the Authors' Achievements:

      The authors successfully identified S1P/S1PR1 signaling as a key factor in chronic pain-related memory impairment and demonstrated its potential as a therapeutic target. The findings are supported by rigorous experimental evidence, including biochemical, histological, and behavioral data. However, the study's impact could be enhanced by further exploration of the molecular pathways downstream of S1PR1 and by assessing the long-term effects of S1PR1 manipulation.

      Impact on the Field and Utility to the Community:

      This study is likely to have a significant impact on pain research by providing a novel perspective on the mechanisms underlying memory impairment in chronic pain conditions. The identification of the S1P/S1PR1 pathway as a potential therapeutic target could guide the development of new treatments.

      Additional Context for Readers:

      The study's approach to categorizing susceptibility to memory impairment could inspire new methods for stratifying patient populations in clinical settings.

      Recommendations:

      (1) A more detailed explanation of the k-means clustering algorithm and its application in categorizing mice should be provided.

      As suggested, we explained the k-means clustering algorithm in details (Line 697-711).

      (2) The discussion on the potential influence of different pain types or sensitivities on memory impairment should be expanded.

      Thanks for your suggestion. We discussed this point in the limitations of this study (Line 484-491).

      (3) The protocol for behavioral testing should be clarified and the potential for learning or stress effects should be addressed.

      Thanks for your suggestion. We clarified the order of the battery of behavioral tests in this study (Line 537-542). We start with the least stressful test (Y-maze) and leave the most stressful of all for last (Morris Water maze) [6]. Besides, we also conducted behavioral assays to prove that a one-day rest is enough to decrease carryover effects from prior test (Y-maze). We examined the stress related behaviors one day after Y-maze (23d post CCI) using open field test (OFT) and elevated plus maze (EPM). As shown in Author response image 3, the tests did not reflect the mice were under stressful circumstances. Thus, the order in which the tests were performed are appropriate in this study.

      Author response image 3.

      (4) Conduct additional behavioral assays for other molecular targets implicated in the study.

      We agree that other molecular targets on susceptibility to memory impairment would be interesting to know. Our study was designed to focus specifically on ITGA2 this time and we'd like to keep the focus intact, but we have included your point as a consideration for future study (Lines 496-498). Thank you for the suggestion.

      (5) The effective drug thresholds and potential non-specific effects of pharmacological interventions should be discussed in more detail.

      As suggested, we emphasized this point of drug SEW2871 in Line 242-245.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      - In Figure 6E the lines of the different groups are not visible. Showing the errors as error bars for each point would probably be better;

      We apologize for the mistake of using mean±SD here instead of mean±SEM. After changing to mean±SEM, the lines of Figure 6E, Figure 7E and 7L become much clearer. It looks a little bit messy to show the error bars since there are numerous points, so we prefer to keep the line style.

      - Do the authors have any speculation on why the % time in the quadrant is not further affected in the KD Itga2 in CCI animals (Figure 9K)?;

      In CCI animals, the level of S1PR1 expression is decreased. ITGA2 may participate in the same pathway with S1PR1. Thus, knocking down ITGA2 in CCI animals will not further affect the animal behaviors. This has been proved by knocking down the two proteins at the same time and no additive effects were observed on behavioral tests compared to the knockdown of each one of them in isolation (Figure 9L-O).

      - In the methods, it's unclear if in the multiple infusion, the animals were anaesthetised or kept awake;

      We have clarified this point in the method. mice were deeply anesthetized by 1% pentobarbital sodium (40 mg/kg, i.p.). (Line 649-650)

      - As the DG is quite small, could the authors clarify if, when performing western blots, they used the two DGs from one animal for each sample or if they pulled together the DGs of several animals?;

      We used the two DGs from one animal for each sample. The amount of protein extracted from each sample is enough for 20-30 times of Western Blot assays. We have now added this to the method for clarity (Line 612).

      - Is it possible to check the correlation between performance in the YM and MWM with S1PR1 levels?;

      We would also be interested in this point. The data that we have cannot reveal this for it is difficult to manipulate the S1PR1 levels by using KD and overexpression viruses.

      - EM images have a poor resolution in the figures, could the authors show higher-resolution images?;

      We have inserted 300 DPI images for high resolution output.

      - In line 268 there is a mention of an "ShLamb1"?

      We apologize for the mistake and it was revised.

      Reviewer #3 (Recommendations For The Authors):

      This study explored the role of S1P/S1PR1 signaling within the dentate gyrus (DG) in chronic pain-related memory impairment using a murine model. The authors identified decreased expression of S1PR1 in the DG of mice susceptible to memory deficits. They demonstrated that S1PR1 knockdown increased susceptibility to memory deficits, whereas its overexpression or pharmacological activation mitigated these effects. Further biochemical and immunofluorescence analyses indicated that disruptions in S1P/S1PR1 signaling were related to disruptions in actin cytoskeleton dynamics, influenced by molecular pathways involving ITGA2, Rac1/Cdc42 signaling, and the Arp2/3 complex. These findings offer intriguing insights and suggest a potential therapeutic target for treating memory impairment in chronic pain.

      Major Concerns:

      The following five major concerns are the same with the five recommendations from Reviewer 3 on Page 9-10. Please refer to the answers above.

      (1) The division of subjects into 'susceptible' and 'unsusceptible' categories requires further clarification regarding the methodologies and rationale employed, particularly concerning the use of the k-means clustering algorithm in data analysis. This explanation will strengthen the scientific grounding of the categorization process.

      (2) The categorization of 'susceptible' and 'unsusceptible' groups might also benefit from a more detailed analysis or discussion concerning the influence of different pain sensitivities or types of pain assessments. Although the study mentions that memory impairment stands independent of pain thresholds, a more nuanced exploration could provide deeper insights.

      (3) The article could benefit from more clarity on the protocol of behavioral testing, especially regarding the potential effects of repeated testing on performance outcomes due to learning or stress.

      (4) While the connection between S1P/S1PR1 signaling and the molecular pathways highlighted (ITGA2, Rac1/Cdc42, Arp2/3) is intriguing, only ITGA2 underwent further behavioral validation in vivo. Conducting additional behavioral assays for one or more of the molecular targets could substantially strengthen these findings.

      (5) Discussions regarding effective drug thresholds and the potential for non-specific effects are essential to fully evaluate the implications of pharmacological interventions utilized in the study.

      Minor Concerns:

      (1) Clarification of evidence of the specific infusion sites in pharmacological experiments would enhance the transparency and replicability of these methods.

      For the infusion of S1PR1 agonist, guide cannula (internal diameter 0.34 mm, RWD) was unilaterally implanted into DG of hippocampus (-1.3 A/P, -1.95 M/L, and -2.02 D/V) as evidenced by Figure 5B.

      (2) It would be beneficial if the manuscript provided details regarding the efficiency and reach of viral transfection within the neuronal population. This information would help in assessing the impact of genetic manipulations.

      S1PR1 immunostaining showed that the efficiency is quite high and the reach of viral transfection is sufficient.

      Author response image 4.

      (3) The manuscript should make explicit the normalization techniques used in quantitative assessments such as Western blotting, including the housekeeping genes or proteins used for this purpose.

      Here, we used housekeeping protein normalization for normalizing Western blot data. GAPDH was used as the internal control. First, the stained blot is imaged, a rectangle is drawn around the target protein in each lane, and the signal intensity inside the rectangle is measured by using ImageJ. The signal intensity obtained can then be normalized by being divided by the signal intensity of the loading internal control (GAPDH) detected on the same blot. The average of the ratios from the control group is calculated, and all individual ratios are divided by this average to obtain a new set of values, which represent the normalized values (Line 619-625).

      (4) Details about the control groups in behavioral assessments were subjected to comparable handling and experimental conditions as the chronic pain groups are crucial, barring nerve injury, for maintaining the integrity of the comparative analysis.

      We agree that a control group and an experimental group is identical in all respects except for one difference-nerve injury. We have added this point in the method (Line 520-522).

      Minor Recommendations:

      The following four minor recommendations are the same with the four minor concerns from Reviewer 3 on Page 12-13. Please refer to the answers above.

      (1) Clarify the specifics of infusion site verification in pharmacological experiments.

      (2) Provide details on the efficiency and neuronal reach of viral transfections.

      (3) Explicitly describe the normalization techniques used in quantitative assessments.

      (4) Ensure that control groups in behavioral assessments undergo comparable handling to maintain analysis integrity.

      References

      (1) Gualdoni, S., et al., Normal levels of Rac1 are important for dendritic but not axonal development in hippocampal neurons. Biology of the Cell, 2007. 99(8): p. 455-464.

      (2) Alam, M.S., Proximity Ligation Assay (PLA). Curr Protoc Immunol, 2018. 123(1): p. e58.

      (3) Song, P., S. Zhang, and J. Li, Co-immunoprecipitation Assays to Detect In Vivo Association of Phytochromes with Their Interacting Partners. Methods Mol Biol, 2021. 2297: p. 75-82.

      (4) Krieger, C.C., et al., Proximity ligation assay to study TSH receptor homodimerization and crosstalk with IGF-1 receptors in human thyroid cells. Frontiers in Endocrinology, 2022. 13.

      (5) Arruda-Carvalho, M., et al., Conditional Deletion of α-CaMKII Impairs Integration of Adult-Generated Granule Cells into Dentate Gyrus Circuits and Hippocampus-Dependent Learning. The Journal of Neuroscience, 2014. 34(36): p. 11919-11928.

      (6) Wolf, A., et al., A Comprehensive Behavioral Test Battery to Assess Learning and Memory in 129S6/Tg2576 Mice. PLoS One, 2016. 11(1): p. e0147733.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to Editors:

      We appreciate the editors’ concern regarding the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process—particularly due to the close temporal and spatial proximity of the stimulation windows and the potential for prolonged disruption. While we agree with that stimulation techniques, such as transcranial magnetic stimulation (TMS), can evoke or modulate neuronal activity both locally within the target region and in remote connected areas of the network. This complex interaction makes drawing clear conclusions about the causal relationship between stimulation and cognitive function more challenging. However, we believe that cause-and-effect relationships in cognitive neuroscience studies using non-invasive brain stimulation (NIBS) can still be robustly established if key assumptions are explicitly tested and confounding factors are rigorously controlled (Bergmann & Hartwigsen et al., 2021, J Cogn Neurosci).

      In our experiment, we addressed these concerns by including a sham TMS condition, an irrelevant control task, and multiple control time points. The results showed that TMS selectively disrupted the IFG-pMTG interaction during specific time windows of the task related to gesture-speech semantic congruency, but not in the sham TMS condition or the control task (gender congruency effect) (Zhao et al., 2021, JN). This selective disruption provides strong evidence for a causal link between IFG-pMTG connectivity and gesture-speech integration in the targeted time window.

      Regarding the potential for transient artifacts from TMS, we acknowledge that previous research has demonstrated that single-pulse TMS induces brief artifacts (0–10 ms) due to direct depolarization of cortical neurons, which momentarily disrupts electrical activity in the stimulated area (Romero et al., 2019, NC). However, in the case of paired-pulse TMS (ppTMS), the interaction between the first and second pulses is more complex. The first pulse increases membrane conductance in the target neurons via shunting inhibition mediated by GABAergic interneurons. This effectively lowers neuronal membrane resistance, “leaking” excitatory current and diminishing the depolarization induced by the second pulse, leading to a reduction in excitability during the paired-pulse interval. This mechanism suppresses the excitatory response to the second pulse, which is reflected in a reduced motor evoked potential (MEP) (Paulus & Rothwell, 2016, J Physiol).

      Furthermore, ppTMS has been widely used in previous studies to infer causal temporal relationships and explore the neural contributions of both structurally and functionally connected brain regions, across timescales as brief as 3–60 ms. We have reviewed several studies that employed paired-pulse TMS to investigate neural dynamics in regions such as the tongue and lip areas of the primary motor cortex (M1), as well as high-level semantic regions like the pMTG, PFC, and ATL (Table 1). These studies consistently demonstrate the methodological rigor and precision of double-pulse TMS in elucidating the temporal dynamics between different brain regions within short temporal windows.

      Given these precedents and the evidence provided, we respectfully assert the validity of the methods employed in our study. We therefore kindly request the editors to reconsider the assessment that “the methods are insufficient for studying tightly-coupled brain regions over short timescales.” We hope that the editors’ concerns about the complexities of TMS-induced effects have been adequately addressed, and that our study’s design and results provide a clear and convincing causal argument for the role of IFG-pMTG in gesture-speech integration.

      Author response table 1.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      Reference

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Amemiya, T., Beck, B., Walsh, V., Gomi, H., & Haggard, P. (2017). Visual area V5/hMT+ contributes to perception of tactile motion direction: a TMS study. Scientific reports, 7(1), 40937.

      Muessgens, D., Thirugnanasambandam, N., Shitara, H., Popa, T., & Hallett, M. (2016). Dissociable roles of preSMA in motor sequence chunking and hand switching—a TMS study. Journal of Neurophysiology, 116(6), 2637-2646.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.

      Pitcher, D. (2014). Facial expression recognition takes longer in the posterior superior temporal sulcus than in the occipital face area. Journal of Neuroscience, 34(27), 9173-9177.

      Bardi, L., Kanai, R., Mapelli, D., & Walsh, V. (2012). TMS of the FEF interferes with spatial conflict. Journal of cognitive neuroscience, 24(6), 1305-1313.

      D’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2012). The role of the motor system in discriminating normal and degraded speech sounds. Cortex, 48(7), 882-887.

      Pitcher, D., Duchaine, B., Walsh, V., & Kanwisher, N. (2010). TMS evidence for feedforward and feedback mechanisms of face and body perception. Journal of Vision, 10(7), 671-671.

      Gagnon, G., Blanchet, S., Grondin, S., & Schneider, C. (2010). Paired-pulse transcranial magnetic stimulation over the dorsolateral prefrontal cortex interferes with episodic encoding and retrieval for both verbal and non-verbal materials. Brain Research, 1344, 148-158.

      Kalla, R., Muggleton, N. G., Juan, C. H., Cowey, A., & Walsh, V. (2008). The timing of the involvement of the frontal eye fields and posterior parietal cortex in visual search. Neuroreport, 19(10), 1067-1071.

      Pitcher, D., Garrido, L., Walsh, V., & Duchaine, B. C. (2008). Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. Journal of Neuroscience, 28(36), 8929-8933.

      Til Ole Bergmann, Gesa Hartwigsen; Inferring Causality from Noninvasive Brain Stimulation in Cognitive Neuroscience. J Cogn Neurosci 2021; 33 (2): 195–225. https://doi.org/10.1162/jocn_a_01591

      Romero, M.C., Davare, M., Armendariz, M. et al. Neural effects of transcranial magnetic stimulation at the single-cell level. Nat Commun 10, 2642 (2019). https://doi.org/10.1038/s41467-019-10638-7

      Paulus W, Rothwell JC. Membrane resistance and shunting inhibition: where biophysics meets state-dependent human neurophysiology. J Physiol. 2016 May 15;594(10):2719-28. doi: 10.1113/JP271452. PMID: 26940751; PMCID: PMC4865581.

      Staat, C., Gattinger, N., & Gleich, B. (2022). PLUSPULS: A transcranial magnetic stimulator with extended pulse protocols. HardwareX, 13. https://doi.org/10.1016/j.ohx.2022.e00380

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. https://doi.org/10.1523/jneurosci.1355-21.2021.

      Reviewer #1 (Public review):

      Summary:

      The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).

      Strengths:

      A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.

      We sincerely thank the reviewer for recognizing our efforts in conducting three experiments to explore the neural activity linked to the amount of information processed during multisensory gesture-speech integration. In Experiment 1, we observed that the extent of inhibition in the pMTG and LIFG was closely linked to the overlapping gesture-speech responses, as quantified by mutual information. Building on the established roles of the pMTG and LIFG in our previous study (Zhao et al., 2021, JN), we then expanded our investigation to determine whether the dynamic neural engagement between the pMTG and LIFG during gesture-speech processing was also associated with the quality of the information. This hypothesis was further validated through high-temporal resolution EEG, where we examined ERP components related to varying information contents. Notably, we observed a close time alignment between the ERP components and the time windows of the TMS effects, which were associated with the same informational matrices in gesture-speech processing.

      Weaknesses:

      (1) One major issue is that there is a tight anatomical coupling between pMTG and LIFG. Stimulating one area could therefore also result in stimulation of the other area (see Silvanto and Pascual-Leone, 2008). I therefore think it is very difficult to tease apart the contribution of these areas to the speech-gesture integration process, especially considering that the authors stimulate these regions in time windows that are very close to each other in both time and space (and the disruption might last longer over time).

      Response 1: We greatly appreciate the reviewer’s careful consideration. We trust that the explanation provided above has clarified this issue (see Response to Editors for detail).

      (2) Related to this point, it is unclear to me why the HD-TDCS/TMS is delivered in set time windows for each region. How did the authors determine this, and how do the results for TMS compare to their previous work from 2018 and 2023 (which describes a similar dataset+design)? How can they ensure they are only targeting their intended region since they are so anatomically close to each other?

      Response 2: The current study builds on a series of investigations that systematically examined the temporal and spatial dynamics of gesture-speech integration. In our earlier work (Zhao et al., 2018, J. Neurosci), we demonstrated that interrupting neural activity in the IFG or pMTG using TMS selectively disrupted the semantic congruency effect (reaction time costs due to semantic incongruence), without affecting the gender congruency effect (reaction time costs due to gender incongruence). These findings identified the IFG and pMTG as critical hubs for gesture-speech integration. This informed the brain regions selected for subsequent studies.

      In Zhao et al. (2021, J. Neurosci), we employed a double-pulse TMS protocol, delivering stimulation within one of eight 40-ms time windows, to further examine the temporal involvement of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, confirming the dynamic and temporally staged roles of these regions during gesture-speech integration.

      In Zhao et al. (2023, Frontiers in Psychology), we investigated the semantic predictive role of gestures relative to speech by comparing two experimental conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. We observed time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG only in the second condition, leading to the conclusion that gestures exert a semantic priming effect on co-occurring speech. These findings underscored the semantic advantage of gesture in facilitating speech integration, further refining our understanding of the temporal and functional interplay between these modalities.

      The design of the current study—including the choice of brain regions and time windows—was directly informed by these prior findings. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (mutual information, MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and ERPs to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.

      We acknowledge that the rationale for the design of the current study was not fully articulated in the original manuscript. In the revised version, we provided a more comprehensive and coherent explanation of the logic behind the three experiments, as well as the alignment with our previous findings in Lines 75-102:

      ‘To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI.

      Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).

      Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics.’

      Although the IFG and pMTG are anatomically close, the consistent differentiation of their respective roles, as evidenced by our experiment across various time windows (TWs) and supported by previous research (see Response to editors for details), reinforces the validity of the stimulation effect observed in our study.

      References

      Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. https://doi.org/10.1523/jneurosci.1355-21.2021.

      Zhao, W. (2023). TMS reveals a two-stage priming circuit of gesture-speech integration. Front Psychol 14, 1156087. 10.3389/fpsyg.2023.1156087.

      Bikson, M., Inoue, M., Akiyama, H., Deans, J.K., Fox, J.E., Miyakawa, H., and Jefferys, J.G.R. (2004). Effects of uniform extracellular DC electric fields on excitability in rat hippocampal slices. J Physiol-London 557, 175-190. 10.1113/jphysiol.2003.055772.

      Federmeier, K.D., Mai, H., and Kutas, M. (2005). Both sides get the point: hemispheric sensitivities to sentential constraint. Memory & Cognition 33, 871-886. 10.3758/bf03193082.

      Kelly, S.D., Kravitz, C., and Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89, 253-260. 10.1016/s0093-934x(03)00335-3.

      Wu, Y.C., and Coulson, S. (2005). Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654-667. 10.1111/j.1469-8986.2005.00356.x.

      Fritz, I., Kita, S., Littlemore, J., and Krott, A. (2021). Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliates. J Mem Lang 117, 104191. 10.1016/j.jml.2020.104191.

      Gunter, T.C., and Weinbrenner, J.E.D. (2017). When to take a gesture seriously: On how we use and prioritize communicative cues. J Cognitive Neurosci 29, 1355-1367. 10.1162/jocn_a_01125.

      Ozyurek, A., Willems, R.M., Kita, S., and Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. J Cognitive Neurosci 19, 605-616. 10.1162/jocn.2007.19.4.605.

      (3) As the EEG signal is often not normally distributed, I was wondering whether the authors checked the assumptions for their Pearson correlations. The authors could perhaps better choose to model the different variables to see whether MI/entropy could predict the neural responses. How did they correct the many correlational analyses that they have performed?

      Response 3: We greatly appreciate the reviewer’s thoughtful comments.

      (1) Regarding the questioning of normal distribution of EEG signals and the use of Pearson correlation, in Figure 5 of the manuscript, we have already included normal distribution curves to illustrate the relationships between average ERP amplitudes across each ROI or elicited cluster and the three information models.

      Additionally, we performed the Shapiro-Wilk test, a widely accepted method for assessing bivariate normality, on both the MI/entropy and averaged ERP data. The p-values for all three combinations were greater than 0.05, indicating that the sample data from all bivariate combinations were normally distributed (Author response table 2).

      Author response table 2.

      Shapiro-Wilk results of bivariable normality test

      To further consolidate the relationship between entropy/MI and various ERP components, we also conducted a Spearman rank correlation analysis (Author response table 3-5). While the correlation between speech entropy and ERP amplitude in the P1 component yielded a p-value of 0.061, all other results were consistent with those obtained from the Pearson correlation analysis across the three experiments. Therefore, our conclusion that progressive neural responses reflected the degree of information remains robust. Although the Spearman rank and Pearson correlation analyses yielded similar results, we opted to report the Pearson correlation coefficients throughout the manuscript to maintain consistency.

      Author response table 3.

      Comparison of Pearson and Spearman results in Experiment 1

      Author response table 4.

      Comparison of Pearson and Spearman results in Experiment 2

      Author response table 5.

      Comparison of Pearson and Spearman results in Experiment 3

      (2) Regarding the reviewer’s comment ‘choose to model the different variables to see whether MI/entropy could predict the neural responses’, we employed Representational Similarity Analysis (RSA) (Popal et.al, 2019) with MI and entropy as continuous variables. This analysis aimed to build a model to predict neural responses based on these feature metrics.

      To capture dynamic temporal features indicative of different stages of multisensory integration, we segmented the EEG data into overlapping time windows (40 ms in duration with a 10 ms step size). The 40 ms window was chosen based on the TMS protocol used in Experiment 2, which also employed a 40 ms time window. The 10 ms step size (equivalent to 5 time points) was used to detect subtle shifts in neural responses that might not be captured by larger time windows, allowing for a more granular analysis of the temporal dynamics of neural activity.

      Following segmentation, the EEG data were reshaped into a four-dimensional matrix (42 channels × 20 time points × 97 time windows × 20 features). To construct a neural similarity matrix, we averaged the EEG data across time points within each channel and each time window. The resulting matrix was then processed using the pdist function to compute pairwise distances between adjacent data points. This allowed us to calculate correlations between the neural matrix and three feature similarity matrices, which were constructed in a similar manner. These three matrices corresponded to (1) gesture entropy, (2) speech entropy, and (3) mutual information (MI). This approach enabled us to quantify how well the neural responses corresponded to the semantic dimensions of gesture and speech stimuli at each time window.

      To determine the significance of the correlations between neural activity and feature matrices, we conducted 1000 permutation tests. In this procedure, we randomized the data or feature matrices and recalculated the correlations repeatedly, generating a null distribution against which the observed correlation values were compared. Statistical significance was determined if the observed correlation exceeded the null distribution threshold (p < 0.05). This permutation approach helps mitigate the risk of spurious correlations, ensuring that the relationships between the neural data and feature matrices are both robust and meaningful.

      Finally, significant correlations were subjected to clustering analysis, which grouped similar neural response patterns across time windows and channels. This clustering allowed us to identify temporal and spatial patterns in the neural data that consistently aligned with the semantic features of gesture and speech stimuli, thus revealing the dynamic integration of these multisensory modalities across time. Results are as follows:

      (1) Two significant clusters were identified for gesture entropy (Author response image 1 left). The first cluster was observed between 60-110 ms (channels F1 and F3), with correlation coefficients (r) ranging from 0.207 to 0.236 (p < 0.001). The second cluster was found between 210-280 ms (channel O1), with r-values ranging from 0.244 to 0.313 (p < 0.001).

      (2) For speech entropy (Author response image 1 middle), significant clusters were detected in both early and late time windows. In the early time windows, the largest significant cluster was found between 10-170 ms (channels F2, F4, F6, FC2, FC4, FC6, C4, C6, CP4, and CP6), with r-values ranging from 0.151 to 0.340 (p = 0.013), corresponding to the P1 component (0-100 ms). In the late time windows, the largest significant cluster was observed between 560-920 ms (across the whole brain, all channels), with r-values ranging from 0.152 to 0.619 (p = 0.013).

      (3) For mutual information (MI) (Author response image 1 right), a significant cluster was found between 270-380 ms (channels FC1, FC2, FC3, FC5, C1, C2, C3, C5, CP1, CP2, CP3, CP5, FCz, Cz, and CPz), with r-values ranging from 0.198 to 0.372 (p = 0.001).

      Author response image 1.

      Results of RSA analysis.

      These additional findings suggest that even using a different modeling approach, neural responses, as indexed by feature metrics of entropy and mutual information, are temporally aligned with distinct ERP components and ERP clusters, as reported in the current manuscript. This alignment serves to further consolidate the results, reinforcing the conclusion we draw. Considering the length of the manuscript, we did not include these results in the current manuscript.

      (3) In terms of the correction of multiple comparisons, in Experiment 1, two separate participant groups were recruited for HD-tDCS applied over either the IFG or pMTG. FDR correction was performed separately for each group, resulting in six comparisons for each brain region (three information matrices × two tDCS effects: anodal-sham or cathodal-sham). In Experiment 2, six comparisons (three information matrices × two sites: IFG or pMTG) were submitted for FDR correction. In Experiment 3, FDR correction was applied to the seven regions of interest (ROIs) within each component, resulting in five comparisons.

      Reference:

      Wilk, M.B. (2015). The Shapiro Wilk And Related Tests For Normality.

      Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to representational similarity analysis for social neuroscience. Social cognitive and affective neuroscience, 14(11), 1243-1253.

      (4) The authors use ROIs for their different analyses, but it is unclear why and on the basis of what these regions are defined. Why not consider all channels without making them part of an ROI, by using a method like the one described in my previous comment?

      Response 4: For the EEG data, we conducted both a traditional ROI analysis and a cluster-based permutation approach. The ROIs were defined based on a well-established work (Habets et al., 2011), allowing for hypothesis-driven testing of specific regions. In addition, we employed a cluster-based permutation methods, which is data-driven and helps enhance robustness while addressing multiple comparisons. This method serves as a complement to the hypothesis-driven ROI analysis, offering an exploratory, unbiased perspective. Notably, the results from both approaches were consistent, reinforcing the reliability of our findings.

      To make the methods more accessible to a broader audience, we clarified the relationship between these approaches in the revised manuscript in Lines 267-270: ‘To consolidate the data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work40, and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons’

      Additionally, we conducted an RSA analysis without defining specific ROIs, considering all channels in the analysis. This approach yielded consistent results, further validating the robustness of our findings across different analysis methods. See Response 3 for detail.

      Reference:

      Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462

      (5) The authors describe that they have divided their EEG data into a "lower half" and a "higher half" (lines 234-236), based on entropy scores. It is unclear why this is necessary, and I would suggest just using the entropy scores as a continuous measure.

      Response 5: To identify ERP components or spatiotemporal clusters that demonstrated significant semantic differences, we split each model into higher and lower halves based on entropy scores. This division allowed us to capture distinct levels of information processing and explore how different levels of entropy or mutual information (MI) related to neural activity. Specifically, the goal was to highlight the gradual activation process of these components and clusters as they correlate with changes in information content. Remarkably, consistent results were observed between the ERP components and clusters, providing robust evidence that semantic information conveyed through gestures and speech significantly influenced the amplitude of these components or clusters. Moreover, the semantic information was shown to be highly sensitive, varying in tandem with these amplitude changes.

      Reviewer #2 (Public review):

      Comment:

      Summary:

      The study is an innovative and fundamental study that clarified important aspects of brain processes for integration of information from speech and iconic gesture (i.e., gesture that depicts action, movement, and shape), based on tDCS, TMS, and EEG experiments. They evaluated their speech and gesture stimuli in information-theoretic ways and calculated how informative speech is (i.e., entropy), how informative gesture is, and how much shared information speech and gesture encode. The tDCS and TMS studies found that the left IFG and pMTG, the two areas that were activated in fMRI studies on speech-gesture integration in the previous literature, are causally implicated in speech-gesture integration. The size of tDC and TMS effects are correlated with the entropy of the stimuli or mutual information, which indicates that the effects stem from the modulation of information decoding/integration processes. The EEG study showed that various ERP (event-related potential, e.g., N1-P2, N400, LPC) effects that have been observed in speech-gesture integration experiments in the previous literature, are modulated by the entropy of speech/gesture and mutual information. This makes it clear that these effects are related to information decoding processes. The authors propose a model of how the speech-gesture integration process unfolds in time, and how IFG and pMTG interact with each other in that process.

      Strengths:

      The key strength of this study is that the authors used information theoretic measures of their stimuli (i.e., entropy and mutual information between speech and gesture) in all of their analyses. This made it clear that the neuro-modulation (tDCS, TMS) affected information decoding/integration and ERP effects reflect information decoding/integration. This study used tDCS and TMS methods to demonstrate that left IFG and pMTG are causally involved in speech-gesture integration. The size of tDCS and TMS effects are correlated with information-theoretic measures of the stimuli, which indicate that the effects indeed stem from disruption/facilitation of the information decoding/integration process (rather than generic excitation/inhibition). The authors' results also showed a correlation between information-theoretic measures of stimuli with various ERP effects. This indicates that these ERP effects reflect the information decoding/integration process.

      We sincerely thank the reviewer for recognizing our efforts and the innovation of employing information-theoretic measures to elucidate the brain processes underlying the multisensory integration of gesture and speech.

      Weaknesses:

      The "mutual information" cannot fully capture the interplay of the meaning of speech and gesture. The mutual information is calculated based on what information can be decoded from speech alone and what information can be decoded from gesture alone. However, when speech and gesture are combined, a novel meaning can emerge, which cannot be decoded from a single modality alone. When example, a person produces a gesture of writing something with a pen, while saying "He paid". The speech-gesture combination can be interpreted as "paying by signing a cheque". It is highly unlikely that this meaning is decoded when people hear speech only or see gestures only. The current study cannot address how such speech-gesture integration occurs in the brain, and what ERP effects may reflect such a process. Future studies can classify different types of speech-gesture integration and investigate neural processes that underlie each type. Another important topic for future studies is to investigate how the neural processes of speech-gesture integration change when the relative timing between the speech stimulus and the gesture stimulus changes.

      We greatly appreciate Reviewer2 ’s thoughtful concern regarding whether "mutual information" adequately captures the interplay between the meanings of speech and gesture. We would like to clarify that the materials used in the present study involved gestures that were performed without actual objects, paired with verbs that precisely describe the corresponding actions. For example, a hammering gesture was paired with the verb “hammer”, and a cutting gesture was paired with the verb “cut”. In this design, all gestures conveyed redundant information relative to the co-occurring speech, creating significant overlap between the information derived from speech alone and that from gesture alone.

      We understand the reviewer’s concern about cases where gestures and speech might provide complementary, rather than redundant, information. To address this, we have developed an alternative metric for quantifying information gains contributed by supplementary multisensory cues, which will be explored in a subsequent study. However, for the present study, we believe that the observed overlap in information serves as a key indicator of multisensory convergence, a central focus of our investigation.

      Regarding the reviewer’s concern about how neural processes of speech-gesture integration may change with varying relative timing between speech and gesture stimuli, we would like to highlight findings from our previous study (Zhao, 2023, Frontiers in Psychology). In that study, we explored the semantic predictive role of gestures relative to speech under two timing conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. Interestingly, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures play a semantic priming role for co-occurring speech. Building on this, we designed the present study with gestures deliberately preceding speech at its semantic identification point to reflect this semantic priming relationship. Additionally, ongoing research in our lab is exploring gesture and speech interactions in natural conversational settings to investigate whether the neural processes identified here remain consistent across varying contexts.

      To address potential concerns and ensure clarity regarding the limitations of the MI measurement, we have included a discussion of tthis in the revised manuscript in Lines 543-547: ‘Furthermore, MI quantifies overlap in gesture-speech integration, primarily when gestures convey redundant meaning. Consequently, the conclusions drawn in this study are constrained to contexts in which gestures serve to reinforce the meaning of the speech. Future research should aim to explore the neural responses in cases where gestures convey supplementary, rather than redundant, semantic information.’ This is followed by a clarification of the timing relationship between gesture and speech: ‘Note that the sequential cortical involvement and ERP components discussed above are derived from a deliberate alignment of speech onset with gesture DP, creating an artificial priming effect with gesture semantically preceding speech. Caution is advised when generalizing these findings to the spontaneous gesture-speech relationships, although gestures naturally precede speech[34].’ (Lines 539-543).

      Reviewer #3 (Public review):

      In this useful study, Zhao et al. try to extend the evidence for their previously described two-step model of speech-gesture integration in the posterior Middle Temporal Gyrus (pMTG) and Inferior Frontal Gyrus (IFG). They repeat some of their previous experimental paradigms, but this time quantifying Information-Theoretical (IT) metrics of the stimuli in a stroop-like paradigm purported to engage speech-gesture integration. They then correlate these metrics with the disruption of what they claim to be an integration effect observable in reaction times during the tasks following brain stimulation, as well as documenting the ERP components in response to the variability in these metrics.

      The integration of multiple methods, like tDCS, TMS, and ERPs to provide converging evidence renders the results solid. However, their interpretation of the results should be taken with care, as some critical confounds, like difficulty, were not accounted for, and the conceptual link between the IT metrics and what the authors claim they index is tenuous and in need of more evidence. In some cases, the difficulty making this link seems to arise from conceptual equivocation (e.g., their claims regarding 'graded' evidence), whilst in some others it might arise from the usage of unclear wording in the writing of the manuscript (e.g. the sentence 'quantitatively functional mental states defined by a specific parser unified by statistical regularities'). Having said that, the authors' aim is valuable, and addressing these issues would render the work a very useful approach to improve our understanding of integration during semantic processing, being of interest to scientists working in cognitive neuroscience and neuroimaging.

      The main hurdle to achieving the aims set by the authors is the presence of the confound of difficulty in their IT metrics. Their measure of entropy, for example, being derived from the distribution of responses of the participants to the stimuli, will tend to be high for words or gestures with multiple competing candidate representations (this is what would presumptively give rise to the diversity of responses in high-entropy items). There is ample evidence implicating IFG and pMTG as key regions of the semantic control network, which is critical during difficult semantic processing when, for example, semantic processing must resolve competition between multiple candidate representations, or when there are increased selection pressures (Jackson et al., 2021). Thus, the authors' interpretation of Mutual Information (MI) as an index of integration is inextricably contaminated with difficulty arising from multiple candidate representations. This casts doubt on the claims of the role of pMTG and IFG as regions carrying out gesture-speech integration as the observed pattern of results could also be interpreted in terms of brain stimulation interrupting the semantic control network's ability to select the best candidate for a given context or respond to more demanding semantic processing.

      Response 1: We sincerely thank the reviewer for pointing out the confound of difficulty. The primary aim of this study is to investigate whether the degree of activity in the established integration hubs, IFG and pMTG, is influenced by the information provided by gesture-speech modalities and/or their interactions. While we provided evidence for the differential involvement of the IFG and pMTG by delineating their dynamic engagement across distinct time windows of gesture-speech integration and associating these patterns with unisensory information and their interaction, we acknowledge that the mechanisms underlying these dynamics remain open to interpretation. Specifically, whether the observed effects stem from difficulties in semantic control processes, as suggested by the reviewer, or from resolving information uncertainty, as quantified by entropy, falls outside the scope of the current study. Importantly, we view these two interpretations as complementary rather than mutually exclusive, as both may be contributing factors. Nonetheless, we agree that addressing this question is a compelling avenue for future research.

      In the revised manuscript, we have included an additional analysis to assess whether the confounding effects of lexical or semantic control difficulty—specifically, the number of available responses—affect the neural outcomes. To address this, we performed partial correlation analyses, controlling for the number of responses.

      We would like to clarify an important distinction between the measure of entropy derived from the distribution of responses and the concept of response diversity. Entropy, in our analysis, is computed based on the probability distribution of each response, as captured by the information entropy formula. In contrast, response diversity refers to the simple count of different responses provided. Mutual Information (MI), by its nature, is also an entropy measure, quantifying the overlap in responses. For reference, although we observed a high correlation between the three information matrices and the number of responses (gesture entropy & gesture response number: r = 0.976, p < 0.001; speech entropy & speech response number: r = 0.961, p < 0.001; MI & total response number: r = 0.818, p < 0.001), it is crucial to emphasize that these metrics capture different aspects of the semantic information represented. In the revised manuscript, we have provided a table detailing both entropy and response numbers for each stimulus, to allow for greater transparency and clarity.

      Furthermore, we have added a comprehensive description of the partial correlation analysis conducted across all three experiments in the methodology section: for Experiment 1, please refer to Lines 213–222: ‘To account for potential confounds related to multiple candidate representations, we conducted partial correlation analyses between the tDCS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses provided for each gesture and speech, as well as the total number of combined responses. Given that HD-tDCS induces overall disruption at the targeted brain regions, we hypothesized that the neural activity within the left IFG and pMTG would be progressively affected by varying levels of multisensory convergence, as indexed by MI. Moreover, we hypothesized that the modulation of neural activity by MI would differ between the left IFG and pMTG, as reflected in the differential modulation of response numbers in the partial correlations, highlighting their distinct roles in semantic processing[37].’

      Experiment 2: ‘To control for potential confounds, partial correlations were also performed between the TMS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses for each gesture and speech, as well as the total number of combined responses. By doing this, we can determine how the time-sensitive contribution of the left IFG and pMTG to gesture–speech integration was affected by gesture and speech information distribution.’ (Lines 242–246).

      Experiment 3: ‘Additionally, partial correlations were conducted, accounting for the number of responses for each respective metric’ (Lines 292–293).

      As anticipated by the reviewer, we observed a consistent modulation of response numbers across both regions as well as across the four ERP components and associated clusters. The detailed results are presented below:

      Experiment 1: ‘However, partial correlation analysis, controlling for the total response number, revealed that the initially significant correlation between the Cathodal-tDCS effect and MI was no longer significant (r = -0.303, p = 0.222, 95% CI = [-0.770, 0.164]). This suggests that the observed relationship between Cathodal-tDCS and MI may be confounded by semantic control difficulty, as reflected by the total number of responses. Specifically, the reduced activity in the IFG under Cathodal-tDCS may be driven by variations in the difficulty of semantic control rather than a direct modulation of MI.’ (Lines 310-316) and ‘’Importantly, the reduced activity in the pMTG under Cathodal-tDCS was not influenced by the total response number, as indicated by the non-significant correlation (r = -0.253, p = 0.295, 95% CI = [-0.735, 0.229]). This finding was further corroborated by the unchanged significance in the partial correlation between Cathodal-tDCS and MI, when controlling for the total response number (r = -0.472, p = 0.048, 95% CI = [-0.903, -0.041]). (Lines 324-328).

      Experiment 2:’ Notably, inhibition of pMTG activity in TW2 was not influenced by the number of speech responses (r = -0.539, p = 0.087, 95% CI = [-1.145, 0.067]). However, the number of speech responses did affect the modulation of speech entropy on the pMTG inhibition effect in TW2. This was evidenced by the non-significant partial correlation between pMTG inhibition and speech entropy when controlling for speech response number (r = -0.218, p = 0.545, 95% CI = [-0.563, 0.127]).

      In contrast, the interrupted IFG activity in TW6 appeared to be consistently influenced by the confound of semantic control difficulty. This was reflected in the significant correlation with both gesture response number (r = -0.480, p = 0.032, 95% CI = [-904, -0.056]), speech response number (r = -0.729, p = 0.011, 95% CI = [-1.221, -0.237]), and total response number (r = -0.591, p = 0.008, 95% CI = [-0.993, -0.189]). Additionally, partial correlation analyses revealed non-significant relationship between interrupted IFG activity in TW6 and gesture entropy (r = -0.369, p = 0.120, 95% CI = [-0.810, -0.072]), speech entropy (r = -0.455, p = 0.187, 95% CI = [-1.072, 0.162]), and MI (r = -0.410, p = 0.091, 95% CI = [-0.856, -0.036]) when controlling for response numbers.’ (Lines 349-363)

      Experiment 3: ‘To clarify potential confounds of semantic control difficulty, partial correlation analyses were conducted to examine the relationship between the elicited ERP components and the relevant information matrices, controlling for response numbers. Results consistently indicated modulation by response numbers in the relationship of ERP components with the information matrix, as evidenced by the non-significant partial correlations between the P1 amplitude (P1 component over ML: r = -0.574, p = 0.082, 95% CI = [-1.141, -0.007]) and the P1 cluster (r = -0.503, p = 0.138, 95% CI = [-1.102, 0.096]) with speech entropy; the N1-P2 amplitude (N1-P2 component over LA: r = -0.080, p = 0.746, 95% CI = [-0.554, 0.394]) and N1-P2 cluster (r \= -0.179, p = 0.464, 95% CI = [-0.647, 0.289]) with gesture entropy; the N400 amplitude (N400 component over LA: r = 0.264, p = 0.247, 95% CI = [-0.195,0.723]) and N400 cluster (r = 0.394, p = 0.095, 95% CI = [-0.043, 0.831]) with gesture entropy; the N400 amplitude (N400 component over LA: r = -0.134, p = 0.595, 95% CI = [-0.620, 0.352]) and N400 cluster (r = -0.034, p = 0.894, 95% CI = [-0.524,0.456]) with MI; and the LPC amplitude (LPC component over LA: r \= -0.428, p = 0.217, 95% CI = [-1.054, 0.198]) and LPC cluster (r \= -0.202, p = 0.575, 95% CI = [-0.881, 0.477]) with speech entropy.’ (Lines 424-438)

      Based on the above results, we conclude that there is a dynamic interplay between the difficulty of semantic representation and the control pressures that shape the resulting neural responses. Furthermore, while the role of the IFG in control processes remains consistent, the present study reveals a more segmented role for the pMTG. Specifically, although the pMTG is well-established in the processing of distributed speech information, the integration of multisensory convergence, as indexed by MI, did not elicit the same control-related modulation in pMTG activity. A comprehensive discussion of the control process in shaping neural responses, as well as the specific roles of the IFG and pMTG in this process, is provided in the Discussion section in Lines (493-511): ‘Given that control processes are intrinsically integrated with semantic processing50, a distributed semantic representation enables dynamic modulation of access to and manipulation of meaningful information, thereby facilitating flexible control over the diverse possibilities inherent in a concept. Accordingly, an increased number of candidate responses amplifies the control demands necessary to resolve competing semantic representations. This effect was observed in the present study, where the association of the information matrix with the tDCS effect in IFG, the inhibition of pMTG activity in TW2, disruption of IFG activity in TW6, and modulation of four distinct ERP components collectively demonstrated that response quantity modulated neural activity. These results underscore the intricate interplay between the difficulty of semantic representation and the control pressures that shape the resulting neural responses. 

      The IFG and pMTG, central components of the semantic control network, have been extensively implicated in previous research 50-52. While the role of the IFG in managing both unisensory information and multisensory convergence remains consistent, as evidenced by the confounding difficulty results across Experiments 1 and 2, the current study highlights a more context-dependent function for the pMTG. Specifically, although the pMTG is well-established in the processing of distributed speech information, the multisensory convergence, indexed by MI, did not evoke the same control-related modulation in pMTG activity. These findings suggest that, while the pMTG is critical to semantic processing, its engagement in control processes is likely modulated by the specific nature of the sensory inputs involved’

      Reference:

      Tesink, C.M.J.Y., Petersson, K.M., van Berkum, J.J.A., van den Brink, D., Buitelaar, J.K., and Hagoort, P. (2009). Unification of speaker and meaning in language comprehension: An fMRI study. J Cognitive Neurosci 21, 2085-2099. 10.1162/jocn.2008.21161

      Jackson, R.L. (2021). The neural correlates of semantic control revisited. Neuroimage 224, 117444. 10.1016/j.neuroimage.2020.117444.

      Jefferies, E. (2013). The neural basis of semantic cognition: converging evidence from neuropsychology, neuroimaging and TMS. Cortex 49, 611-625. 10.1016/j.cortex.2012.10.008.

      Noonan, K.A., Jefferies, E., Visser, M., and Lambon Ralph, M.A. (2013). Going beyond inferior prefrontal involvement in semantic control: evidence for the additional contribution of dorsal angular gyrus and posterior middle temporal cortex. J Cogn Neurosci 25, 1824-1850. 10.1162/jocn_a_00442.

      In terms of conceptual equivocation, the use of the term 'graded' by the authors seems to be different from the usage commonly employed in the semantic cognition literature (e.g., the 'graded hub hypothesis', Rice et al., 2015). The idea of a graded hub in the controlled semantic cognition framework (i.e., the anterior temporal lobe) refers to a progressive degree of abstraction or heteromodal information as you progress through the anatomy of the region (i.e., along the dorsal-to-ventral axis). The authors, on the other hand, seem to refer to 'graded manner' in the context of a correlation of entropy or MI and the change in the difference between Reaction Times (RTs) of semantically congruent vs incongruent gesture-speech. The issue is that the discourse through parts of the introduction and discussion seems to conflate both interpretations, and the ideas in the main text do not correspond to the references they cite. This is not overall very convincing. What is it exactly the authors are arguing about the correlation between RTs and MI indexes? As stated above, their measure of entropy captures the spread of responses, which could also be a measure of item difficulty (more diverse responses imply fewer correct responses, a classic index of difficulty). Capturing the diversity of responses means that items with high entropy scores are also likely to have multiple candidate representations, leading to increased selection pressures. Regions like pMTG and IFG have been widely implicated in difficult semantic processing and increased selection pressures (Jackson et al., 2021). How is this MI correlation evidence of integration that proceeds in a 'graded manner'? The conceptual links between these concepts must be made clearer for the interpretation to be convincing.

      Response 2: Regarding the concern of conceptual equivocation, we would like to emphasize that this study represents the first attempt to focus on the relationship between information quantity and neural engagement, a question addressed in three experiments. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and ERPs to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.

      Therefore, the incremental engagement of the integration hub of IFG and pMTG along with the informativeness of gesture and speech during multisensory integration is different from the "graded hub," which refers to anatomical distribution. We sincerely apologize for this oversight. In the revised manuscript, we have changed the relevant conceptual equivocation in Lines 44-60: ‘Consensus acknowledges the presence of 'convergence zones' within the temporal and inferior parietal areas [1], or the 'semantic hub' located in the anterior temporal lobe[2], pivotal for integrating, converging, or distilling multimodal inputs. Contemporary theories frame the semantic processing as a dynamic sequence of neural states[3], shaped by systems that are finely tuned to the statistical regularities inherent in sensory inputs[4]. These regularities enable the brain to evaluate, weight, and integrate multisensory information, optimizing the reliability of individual sensory signals[5]. However, sensory inputs available to the brain are often incomplete and uncertain, necessitating adaptive neural adjustments to resolve these ambiguities [6]. In this context, neuronal activity is thought to be linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations[7,8]. Although the role of 'convergence zones' and 'semantic hubs' in integrating multimodal inputs is well established, the precise functional patterns of neural activity in response to the distribution of unified multisensory information—along with the influence of unisensory signals—remain poorly understood.

      To this end, we developed an analytic approach to directly probe the cortical engagement during multisensory gesture-speech semantic integration.’  

      Furthermore, in the Discussion section, we have replaced the term 'graded' with 'incremental' (Line 456,). Additionally, we have included a discussion on the progressive nature of neural engagement, as evidenced by the correlation between RTs and MI indices in Lines 483-492: ‘The varying contributions of unisensory gesture-speech information and the convergence of multisensory inputs, as reflected in the correlation between distinct ERP components and TMS time windows (TMS TWs), are consistent with recent models suggesting that multisensory processing involves parallel detection of modality-specific information and hierarchical integration across multiple neural levels[4,48]. These processes are further characterized by coordination across multiple temporal scales[49]. Building on this, the present study offers additional evidence that the multi-level nature of gesture-speech processing is statistically structured, as measured by information matrix of unisensory entropy and multisensory convergence index of MI, the input of either source would activate a distributed representation, resulting in progressively functioning neural responses.’

      Reference:

      Damasio, H., Grabowski, T.J., Tranel, D., Hichwa, R.D., and Damasio, A.R. (1996). A neural basis for lexical retrieval. Nature 380, 499-505. DOI 10.1038/380499a0.

      Patterson, K., Nestor, P.J., and Rogers, T.T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience 8, 976-987. 10.1038/nrn2277.

      Brennan, J.R., Stabler, E.P., Van Wagenen, S.E., Luh, W.M., and Hale, J.T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94. 10.1016/j.bandl.2016.04.008.

      Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.

      Noppeney, U. (2021). Perceptual Inference, Learning, and Attention in a Multisensory World. Annual Review of Neuroscience, Vol 44, 2021 44, 449-473. 10.1146/annurev-neuro-100120-085519.

      Ma, W.J., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Annu Rev Neurosci 37, 205-220. 10.1146/annurev-neuro-071013-014017.

      Fischer, B.J., and Pena, J.L. (2011). Owl's behavior and neural representation predicted by Bayesian inference. Nat Neurosci 14, 1061-1066. 10.1038/nn.2872.

      Ganguli, D., and Simoncelli, E.P. (2014). Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput 26, 2103-2134. 10.1162/NECO_a_00638.

      Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.

      Senkowski, D., and Engel, A.K. (2024). Multi-timescale neural dynamics for multisensory integration. Nat Rev Neurosci 25, 625-642. 10.1038/s41583-024-00845-7.

      Reviewer #2 (Recommendations for the authors):

      I have a number of small suggestions to make the paper more easy to understand.

      We sincerely thank the reviewer for their careful reading and thoughtful consideration. All suggestions have been thoroughly addressed and incorporated into the revised manuscript.

      (1) Lines 86-87, please clarify whether "chronometric double-pulse TMS" should lead to either excitation or inhibition of neural activities

      Double-pulse TMS elicits inhibition of neural activities (see responses to editors), which has been clarified in the revised manuscript in Lines 90-93: ‘we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI)’

      (2) Line 106 "validated by replicating the semantic congruencey effect". Please specify what the task was in the validation study.

      The description of the validation task has been added in Lines 116-119: ‘To validate the stimuli, 30 participants were recruited to replicate the multisensory index of semantic congruency effect, hypothesizing that reaction times for semantically incongruent gesture-speech pairs would be significantly longer than those for congruent pairs.’

      (3) Line 112. "30 subjects". Are they Chinese speakers?

      Yes, all participants in the present study, including those in the pre-tests, are native Chinese speakers.

      (4) Line 122, "responses for each item" Please specify whether you mean here "the comprehensive answer" as you defined in 118-119.

      Yes, and this information has been added in Lines 136-137: ‘comprehensive responses for each item were converted into Shannon's entropy (H)’

      (5) Line 163 "one of three stimulus types (Anodal, Cathodal or Sham)". Please specify whether the order of the three conditions was counterbalanced across participants. Or, whether the order was fixed for all participants.

      The order of the three conditions was counterbalanced across participants, a clearer description has been added in the revised manuscript in Lines 184-189: ‘Participants were divided into two groups, with each group undergoing HD-tDCS stimulation at different target sites (IFG or pMTG). Each participant completed three experimental sessions, spaced one week apart, during which 480 gesture-speech pairs were presented across various conditions. In each session, participants received one of three types of HD-tDCS stimulation: Anodal, Cathodal, or Sham. The order of stimulation site and type was counterbalanced using a Latin square design to control for potential order effects.’

      (6) Line 191-192, "difference in reaction time between semantic incongruence and semantic congruent pairs)" Here, please specify which reaction time was subtracted from which one. This information is very crucial; without it, you cannot interpret your graphs.

      (17) Figure 3. Figure caption for (A). "The semantic congruence effect was calculated as the reaction time difference between...". You need to specify which condition was subtracted from what condition; otherwise, you cannot interpret this figure. "difference" is too ambiguous.

      Corrections have been made in the revised manuscript in Lines 208-211: ‘Neural responses were quantified based on the effects of HD-tDCS (active tDCS minus sham tDCS) on the semantic congruency effect, defined as the difference in reaction times between semantic incongruent and congruent conditions (Rt(incongruent) - Rt(congruent))’ and Line 796-798: ‘The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs (Rt(incongruent) - Rt(congruent))’.

      (7) Line 363 "progressive inhibition of IFG and pMTG by HD-tDCS as the degree of gesture-speech interaction, indexed by MI, advanced." This sentence is very hard to follow. I don't understand what part of the data in Figure 3 speaks to "inhibition of IFG". And what is "HD-tDCS"? I think it is easier to read if you talk about correlation (not "progressive" and "advanced").

      High-Definition transcranial direct current stimulation (HD-tDCS) was applied to modulate the activity of pMTG and IFG, with cathodal stimulation inducing inhibitory effects and anodal stimulation facilitating neural activity. In Figure 3, we examined the relationship between the tDCS effects on pMTG and IFG and the three information matrices (entropy and MI). Our results revealed significant correlations between MI and the cathodal-tDCS effects in both regions. We acknowledge that the original phrasing may have been unclear, and in the revised manuscript, we have provided a more explicit explanation to enhance clarity in Lines 443-445: ‘Our results, for the first time, revealed that the inhibition effect of cathodal-tDCS on the pMTG and IFG correlated with the degree of gesture-speech multisensory convergence, as indexed by MI’.

      (8) Lines 367-368 I don't understand why gesture is top down and speech is bottom up. Is that because gesture precedes speech (gesture is interpretable at the point of speech onset)?

      Yes, since we employed a semantic priming paradigm by aligning speech onset with the gesture comprehension point, we interpret the gesture-speech integration process as an interaction between the top-down prediction from gestures and the bottom-up processing of speech. In the revised manuscript, we have provided a clearer and more coherent description that aligns with the results. Lines 445-449: ‘Moreover, the gradual neural engagement was found to be time-sensitive and staged, as evidenced by the selectively interrupted time windows (Experiment 2) and the distinct correlated ERP components (Experiment 3), which were modulated by different information contributors, including unisensory entropy or multisensory MI’

      (9) Line 380 - 381. Can you spell out "TW" and "IP"?

      (16) Line 448, NIBS, Please spell out "NIBS".

      "TW" have been spelled out in Lines 459: ‘time windows (TW)’,"IP" in Line 460: ‘identification point (IP)’. The term "NIBS" was replaced with "HD-tDCS and TMS" to provide clearer specification of the techniques employed: ‘Consistent with this, the present study provides robust evidence, through the application of HD-tDCS and TMS, that the integration hubs for gesture and speech—the pMTG and IFG—operate in an incremental manner.’ (Lines 454-457). 

      (10) Line 419, The higher certainty of gesture => The higher the certainty of gesture is

      (13) Line 428, "a larger MI" => "a larger MI is"

      (12) Line 427-428, "the larger overlapped neural populations" => "the larger, the overlapped neural populations"

      Changes have been made in Line 522 ‘The higher the certainty of gesture is’ , Line 531: ‘a larger MI is’ and Line 530 ‘the larger, overlapped neural populations’

      (11) Line 423 "Greater TMS effect over the IFG" Can you describe the TMS effect?

      TMS effect has been described as ‘Greater TMS inhibitory effect’ (Line 526)

      (14) Line 423 "reweighting effect" What is this? Please describe (and say which experiment it is about).

      Clearer description has been provided in Lines 535-538: ‘As speech entropy increases, indicating greater uncertainty in the information provided by speech, more cognitive effort is directed towards selecting the targeted semantic representation. This leads to enhanced involvement of the IFG and a corresponding reduction in LPC amplitude’.

      (15) Line 437 "the graded functionality of every disturbed period is not guaranteed" (I don't understand this sentence).

      Clearer description has been provided in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56], whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).

      References:

      Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006

      Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013

      (18) Figure 4. "TW1", "TW2", etc. are not informative. Either replace them with the actual manuscript or add manuscript information (either in the graph itself or in the figure title).

      Information was added into the figure title ‘Figure 4. TMS impacts on semantic congruency effect across various time windows (TW).’ (Line 804), included a detailed description of each time window in Lines 805-807: ‘(A) Five time windows (TWs) showing selective disruption of gesture-speech integration were chosen: TW1 (-120 to -80 ms relative to speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms).’

      (19) Table 2C.

      The last column is titled "p(xi, yi)". I don't understand why the authors use this label for this column.

      In the formula, at the very end, there is "p(xi|yi). I wonder why it is p(xi|yi), as opposed to p(yi|xi).

      Mutual Information (MI) was calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of the individual entropies of gesture and speech (Entropy(gesture) + Entropy(speech)). Thus, the p(xi,yi) aimed to describe the entropy of the combined dataset. We acknowledge the potential ambiguity in the original description, and in the revised manuscript, we have changed the formula of p(xi,yi) into ‘p(xi+yi)’ (Line 848) in Table 2C, and the relevant equation of MI ‘’. Also we provided a clear MI calculation process in Lines 143-146: ‘MI was used to measure the overlap between gesture and speech information, calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of their individual entropies (Entropy(gesture) + Entropy(speech)) (see Appendix Table 2C)’.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should try and produce data showing that the confound of difficulty due to the number of lexical or semantic representations is not underlying high-entropy items if they wish to improve the credibility of their claim that the disruption of the congruency effect is due to speech-gesture integration. Additionally, they should provide more evidence either in the form of experiments or references to better justify why mutual information is an index for integration in the first place.

      Response 1: An additional analysis has been conducted to assess whether the number of lexical or semantic representations affect the neural outcomes, please see details in the Responses to Reviewer 3 (public review) response 1.

      Mutual information (MI), a concept rooted in information theory, quantifies the reduction in uncertainty about one signal when the other is known, thereby capturing the statistical dependence between them. MI is calculated as the difference between the individual entropies of each signal and their joint entropy, which reflects the total uncertainty when both signals are considered together. This metric aligns with the core principle of multisensory integration: different modalities reduce uncertainty about each other by providing complementary, predictive information. Higher MI values signify that the integration of sensory signals results in a more coherent and unified representation, while lower MI values indicate less integration or greater divergence between the modalities. As such, MI serves as a robust and natural index for assessing the degree of multisensory integration.

      To date, the use of MI as an index of integration has been limited, with one notable study by Tremblay et al. (2016), cited in the manuscript, using pointwise MI to quantify the extent to which two syllables mutually constrain each other. While MI has been extensively applied in natural language processing to measure the co-occurrence strength between words (e.g., Lin et al., 2012), its application as an index of multisensory convergence—particularly in the context of gesture-speech integration as employed in this study—is novel. In the revised manuscript, we have clarified the relationship between MI and multisensory convergence: ‘MI assesses share information between modalities[25],indicating multisensory convergence and acting as an index of gesture-speech integration’ (Lines 73-74).

      Also, in our study, we calculated MI as per its original definition, by subtracting the entropy of summed dataset of gesture-speech from the combined entropies of gesture and speech. The detailed calculation method is provided in Lines 136-152: ‘To quantify information content, comprehensive responses for each item were converted into Shannon's entropy (H) as a measure of information richness (Figure 1A bottom). With no significant gender differences observed in both gesture (t(20) = 0.21, p = 0.84) and speech (t(20) = 0.52, p = 0.61), responses were aggregated across genders, resulting in 60 answers per item (Appendix Table 2). Here, p(xi) and p(yi) represent the distribution of 60 answers for a given gesture (Appendix Table 2B) and speech (Appendix Table 2A), respectively. High entropy indicates diverse answers, reflecting broad representation, while low entropy suggests focused lexical recognition for a specific item (Figure 2B). MI was used to measure the overlap between gesture and speech information, calculated by subtracting the entropy of the combined gesture-speech dataset (Entropy(gesture + speech)) from the sum of their individual entropies (Entropy(gesture) + Entropy(speech)) (see Appendix Table 2C). For specific gesture-speech combinations, equivalence between the combined entropy and the sum of individual entropies (gesture or speech) indicates absence of overlap in response sets. Conversely, significant overlap, denoted by a considerable number of shared responses between gesture and speech datasets, leads to a noticeable discrepancy between combined entropy and the sum of gesture and speech entropies. Elevated MI values thus signify substantial overlap, indicative of a robust mutual interaction between gesture and speech.’

      Additional examples outlined in Appendix Table 2 in Lines 841-848:

      This novel application of MI as a multisensory convergence index offers new insights into how different sensory modalities interact and integrate to shape semantic processing.

      Reference:

      Tremblay, P., Deschamps, I., Baroni, M., and Hasson, U. (2016). Neural sensitivity to syllable frequency and mutual information in speech perception and production. Neuroimage 136, 106-121. 10.1016/j.neuroimage.2016.05.018

      Lin, W., Wu, Y., & Yu, L. (2012). Online Computation of Mutual Information and Word Context Entropy. International Journal of Future Computer and Communication, 167-169.

      (2) Finally, if the authors wish to address the graded hub hypothesis as posited by the controlled semantic cognition framework (e.g., Rice et al., 2015), they would have to stimulate a series of ROIs progressing gradually through the anatomy of their candidate regions showing the effects grow along this spline, more than simply correlate MI with RT differences.

      Response 2: We appreciate the reviewer’s thoughtful consideration. The incremental engagement of the integration hub of IFG and pMTG along with the informativeness of gesture and speech during multisensory integration is different from the concept of "graded hub," which refers to anatomical distribution. See Responses to reviewer 3 (public review) response 2 for details.

      (3) The authors report significant effects with p values as close to the threshold as p=0.49 for the pMTG correlation in Experiment 1, for example. How confident are the authors these results are reliable and not merely their 'statistical luck'? Especially in view of sample sizes that hover around 22-24 participants, which have been called into question in the field of non-invasive brain stimulation (e.g., Mitra et al, 2021)?

      Response 3: In Experiment 1, a total of 52 participants were assigned to two groups, each undergoing HD-tDCS stimulation over either the inferior frontal gyrus (IFG) or posterior middle temporal gyrus (pMTG), yielding 26 participants per group for correlation analysis. Power analysis, conducted using G*Power, indicated that a sample size of 26 participants per group would provide sufficient power (0.8) to detect a large effect size (0.5) at an alpha level of 0.05, justifying the chosen sample size. To control for potential statistical artifacts, we compared the results to those from the unaffected control condition.

      In the Experiment 1, participants were tasked with a gender categorization task, where they responded as accurately and quickly as possible to the gender of the voice they saw, while gender congruency (e.g., a male gesture paired with a male voice or a female gesture with a male voice) was manipulated. This manipulation served as direct control, enabling the investigation of automatic and implicit semantic interactions between gesture and speech. This relevant information was provided in the manuscript in Lines 167-172:‘An irrelevant factor of gender congruency (e.g., a man making a gesture combined with a female voice) was created[22,23,35]. This involved aligning the gender of the voice with the corresponding gender of the gesture in either a congruent (e.g., male voice paired with a male gesture) or incongruent (e.g., male voice paired with a female gesture) manner. This approach served as a direct control mechanism, facilitating the investigation of the automatic and implicit semantic interplay between gesture and speech[35]’. Correlation analyses were conducted to examine the TMS disruption effects on gender congruency, comparing reaction times for gender-incongruent versus congruent trials. No significant correlations were found between TMS disruption effects on either the IFG (Cathodal-tDCS effect with MI: r = 0.102, p = 0.677; Anodal-tDCS effect with MI: r = 0.178, p = 0.466) or pMTG (Cathodal-tDCS effect with MI: r \= -0.201, p = 0.410; Anodal-tDCS effect with MI: r = -0.232, p = 0.338).

      Moreover, correlations between the TMS disruption effect on semantic congruency and both gesture entropy, speech entropy, and mutual information (MI) were examined. P-values of 0.290, 0.725, and 0.049 were observed, respectively.  

      The absence of a TMS effect on gender congruency, coupled with the lack of significance when correlated with the other information matrices, highlights the robustness of the significant finding at p = 0.049.

      (4) The distributions of entropy for gestures and speech are very unequal. Whilst entropy for gestures has high variability, (.12-4.3), that of speech is very low (ceiling effect?) with low variance. Can the authors comment on whether they think this might have affected their analyses or results in any way? For example, do they think this could be a problem when calculating MI, which integrates both measures? L130-131.'

      Response 4: We sincerely thank the reviewer for raising this insightful question. The core premise of the current study is that brain activity is modulated by the degree of information provided. Accordingly, the 20 entropy values for gesture and speech represent a subset of the overall entropy distribution, with the degree of entropy correlating with a distributed pattern of neural activity, regardless of the scale of variation. This hypothesis aligns with previous studies suggesting that neuronal activity is linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations (Fischer & Pena, 2011; Ganguli & Simoncelli, 2014).

      Importantly, we conducted another EEG experiment with 30 subjects. Given the inherent differences between gesture and speech, it is important to note that speech, being more structurally distinct, tends to exhibit lower variability than gesture. To prevent an imbalance in the distribution of gesture and speech, we manipulated the information content of each modality. Specifically, we created three conditions for both gesture and speech (i.e., 0.75, 1, and 1.25 times the identification threshold), thereby ensuring comparable variance between the two modalities: gesture (mean entropy = 2.91 ± 1.01) and speech (mean entropy = 1.82 ± 0.71) (Author response table 6).

      Full-factorial RSA analysis revealed an early P1 effect (0-100 ms) for gesture and a late LPC effect (734-780 ms) for speech (Author response image 2b). Crucially, the identified clusters showed significant correlations with both gesture (Author response image 2c1) and speech entropy (Author response image 2c3), respectively. These findings replicate the results of the present study, demonstrating that, irrespective of the variance in gesture and speech entropy, both modalities elicited ERP amplitude responses in a progressive manner that aligned with their respective information distributions.

      Regarding the influence on MI values, since MI was calculated based on the overlapping responses between gesture and speech, a reduction in uncertainty during speech comprehension would naturally result in a smaller contribution to the MI value. However, as hypothesized above, the MI values were also assumed to represent a subset of the overall distribution, where the contributions of both gesture and speech are expected to follow a normal distribution. This hypothesis was further supported by our replication experiment. When the contributions of gesture and speech were balanced, a correlation between MI values and N400 amplitude was observed (Author response image 2c2), consistent with the results reported in the present manuscript. These findings not only support the idea that the correlation between MI and ERP components is unaffected by the subset of MI values but also confirm the replicability of our results.

      Author response table 6.

      Quantitative entropy for each gesture stimulus (BD: before discrimination point; DP: discrimination point; AD: after discrimination point) and speech stimulus (BI: before identification point; IP: identification point; AI: after identification point).

      Author response image 2.

      Results of group-level analysis and full-factorial RSA. a: The full-factorial representational similarity analysis (RSA) framework is illustrated schematically. Within the general linear model (GLM), the light green matrix denotes the representational dissimilarity matrix (RDM) for gesture semantic states, while light blue matrix represents speech semantic states, and the light red matrix illustrates the semantic congruency effect. The symbol ‘e’ indicates the random error term. All matrices, including the neural dissimilarity matrix, are structured as 18 * 18 matrices, corresponding to 18 conditions (comprising 3 gesture semantic states, 3 speech semantic states, and 2 congruency conditions). b: Coding strength for gesture states, speech states and congruency effect. Shaded clusters represent regions where each factor exhibited significant effects. Clusters with lower opacity correspond to areas where the grand-mean ERP amplitudes across conditions showed the highest correlation with unimodal entropy or MI. c1-c6: Topographical correlation maps illustrate the four significant RSA clusters (top), accompanied by the highest correlations between ERP amplitudes within the significant RSA clusters and the information matrices (bottom). Black dots represent electrodes exhibiting significant correlations, while black stars highlight the electrode with the highest correlation coefficient.

      (5) L383: Why are the authors calling TW2 pre-lexical and TW6 post-lexical? I believe they must provide evidence or references justifying calling these periods pre- and post-lexical. This seems critical given the argument they're trying to make in this paragraph.

      Response 5: The time windows (TWs) selected for the current study were based on our previous work (Zhao et al., 2021, J. Neurosci). In that study, we employed a double-pulse TMS protocol, delivering stimulation across eight 40-ms time windows: three windows preceding the speech identification point (TWs 1-3) and five windows following it (TWs 4-8). The pre-lexical time windows (TWs 1-3) occur before speech identification, while the post-lexical time windows (TWs 4-8) occur after this point. in the revised manuscript, we have made that clear in Lines 462-466:

      “In TW2 of gesture-speech integration, which precedes the speech identification point23 and represents a pre-lexical stage, the suppression effect observed in the pMTG was correlated with speech entropy. Conversely, during TW6, which follows the speech identification point23 and represents a post-lexical stage, the IFG interruption effect was influenced by both gesture entropy, speech entropy, and their MI”

      Reference:

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.

      (6) Below, I recommend the authors improve their description of the criteria employed to select ROIs. This is important for several reasons. For example, the lack of a control ROI presumably not implicated in integration makes the interpretation of the specificity of the results difficult. Additionally, other regions have been proposed more consistently by recent evidence as multimodal integrators, like for example, the angular gyrus (Humphreys, 2021), or the anterior temporal lobe. The inclusion of IFG as a key region for integration and the oversight of angular gyrus seems to me unjustified in the light of recent evidence.

      Response 6: We appreciate the reviewer’s thoughtful consideration. The selection of IFG and pMTG as ROIs was based on a meta-analysis of multiple fMRI studies on gesture-speech integration, in which these two locations were consistently identified as activated. See Table 2 for details of the studies and coordinates of brain locations reported.

      Author response table 7.

      Meta-analysis of previous studies on gesture-speech integration.

      Based on the meta-analysis of previous studies, we selected the IFG and pMTG as ROIs for gesture-speech integration. The rationale for selecting these brain regions is outlined in the introduction in Lines 65-68: ‘Empirical studies have investigated the semantic integration between gesture and speech by manipulating their semantic relationship[15-18] and revealed a mutual interaction between them[19-21] as reflected by the N400 latency and amplitude[14] as well as common neural underpinnings in the left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG)[15,22,23]’.

      And further described in Lines 79-80: ‘_Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG ’._ And Lines 87-90: ‘Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to assess whether the activity of these regions was associated with relevant informational matrices’.

      In the Methods section, we clarified the selection of coordinates in Lines 193-199: ‘Building on a meta-analysis of prior fMRI studies examining gesture-speech integration[22], we targeted Montreal Neurological Institute (MNI) coordinates for the left IFG at (-62, 16, 22) and the pMTG at (-50, -56, 10). In the stimulation protocol for HD-tDCS, the IFG was targeted using electrode F7 as the optimal cortical projection site[36], with four return electrodes placed at AF7, FC5, F9, and FT9. For the pMTG, TP7 was selected as the cortical projection site36, with return electrodes positioned at C5, P5, T9, and P9.’

      The selection of IFG or pMTG as integration hubs for gesture and speech has also been validated in our previous studies. Specifically, Zhao et al. (2018, J. Neurosci) applied TMS to both areas. Results demonstrated that disrupting neural activity in the IFG or pMTG via TMS selectively impaired the semantic congruency effect (reaction time costs due to semantic incongruence), while leaving the gender congruency effect unaffected. These findings identified the IFG and pMTG as crucial hubs for gesture-speech integration, guiding the selection of brain regions for our subsequent studies.

      In addition, Zhao et al. (2021, J. Neurosci) employed a double-pulse TMS protocol across eight 40-ms time windows to explore the temporal dynamics of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, further supporting the dynamic and temporally staged involvement of these regions in gesture-speech integration.

      While we have solid rationale for selecting the IFG and pMTG as key regions, we acknowledge the reviewer's point that the involvement of additional functionally and anatomically brain areas, cannot be excluded. We have included in the discussion as limitations in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56], whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).

      References:

      Willems, R.M., Ozyurek, A., and Hagoort, P. (2009). Differential roles for left inferior frontal and superior temporal cortex in multimodal integration of action and language. Neuroimage 47, 1992-2004. 10.1016/j.neuroimage.2009.05.066.

      Drijvers, L., Jensen, O., and Spaak, E. (2021). Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Human Brain Mapping 42, 1138-1152. 10.1002/hbm.25282.

      Drijvers, L., and Ozyurek, A. (2018). Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditions. Brain and Language 177, 7-17. 10.1016/j.bandl.2018.01.003.

      Drijvers, L., van der Plas, M., Ozyurek, A., and Jensen, O. (2019). Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noise. Neuroimage 194, 55-67. 10.1016/j.neuroimage.2019.03.032.

      Holle, H., and Gunter, T.C. (2007). The role of iconic gestures in speech disambiguation: ERP evidence. J Cognitive Neurosci 19, 1175-1192. 10.1162/jocn.2007.19.7.1175.

      Kita, S., and Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 48, 16-32. 10.1016/S0749-596x(02)00505-3.

      Bernardis, P., and Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia 44, 178-190. 10.1016/j.neuropsychologia.2005.05.007.

      Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.

      Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.

      Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016.

      Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006

      Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013

      (7) Some writing is obscure or unclear, in part due to superfluous words like 'intricate neural processes' on L74. Or the sentence in L47 - 48 about 'quantitatively functional mental states defined by a specific parser unified by statistical regularities' which, even read in context, fails to provide clarity about what a quantitatively functional mental state is, or how it is defined by specific parsers (or what these are), and what is the link to statistical regularities. In some cases, this lack of clarity leads to difficulties assessing the appropriateness of the methods, or the exact nature of the claims. For example, do they mean degree of comprehension instead of comprehensive value? I provide some more examples below:

      Response 7: We appreciate the reviewer’s thoughtful consideration. The revised manuscript now includes a clear description and a detailed explanation of the association with the statistical logic, addressing the concerns raised in Lines 47-55: ‘Contemporary theories frame the semantic processing as a dynamic sequence of neural states[3], shaped by systems that are finely tuned to the statistical regularities inherent in sensory inputs[4]. These regularities enable the brain to evaluate, weight, and integrate multisensory information, optimizing the reliability of individual sensory signals [5]. However, sensory inputs available to the brain are often incomplete and uncertain, necessitating adaptive neural adjustments to resolve these ambiguities[6]. In this context, neuronal activity is thought to be linked to the probability density of sensory information, with higher levels of uncertainty resulting in the engagement of a broader population of neurons, thereby reflecting the brain’s adaptive capacity to handle diverse possible interpretations[7,8].’

      References:

      Brennan, J.R., Stabler, E.P., Van Wagenen, S.E., Luh, W.M., and Hale, J.T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94. 10.1016/j.bandl.2016.04.008.

      Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.

      Noppeney, U. (2021). Perceptual Inference, Learning, and Attention in a Multisensory World. Annual Review of Neuroscience, Vol 44, 2021 44, 449-473. 10.1146/annurev-neuro-100120-085519.

      Ma, W.J., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Annu Rev Neurosci 37, 205-220. 10.1146/annurev-neuro-071013-014017.

      Fischer, B.J., and Pena, J.L. (2011). Owl's behavior and neural representation predicted by Bayesian inference. Nat Neurosci 14, 1061-1066. 10.1038/nn.2872.

      Ganguli, D., and Simoncelli, E.P. (2014). Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput 26, 2103-2134. 10.1162/NECO_a_00638.

      Comment 7.1: a) I am not too sure what they mean by 'response consistently provided by participants for four to six consecutive instances' [L117-118]. They should be clearer with the description of these 'pre-test' study methods.

      Response 7.1: Thank you for this insightful question. An example of a participant's response to the gesture 'an' is provided below (Table 3). Initially, within 240 ms, the participant provided the answer "an," which could potentially be a guess. To ensure that the participant truly comprehends the gesture, we repeatedly present it until the participant’s response stabilizes, meaning the same answer is given consistently over several trials. While one might consider fixing the number of repetitions (e.g., six trials), this could lead to participants predicting the rule and providing the same answer out of habit. To mitigate this potential bias, we allow the number of repetitions to vary flexibly between four and six trials. 

      We understand that the initial phrase might be ambiguous, in the revised manuscript, we have changed the phrase into: ‘For each gesture or speech, the action verb consistently provided by participants across four to six consecutive repetitions—with the number of repetitions varied to mitigate learning effects—was considered the comprehensive response for the gesture or speech.’ (Lines 130-133)

      Author response table 8.

      Example of participant's response to the gesture 'an'

      Comment 7.2: b) I do not understand the paragraph in L143 - 146. This is important to rephrase for clarification. What are 'stepped' neural changes? What is the purpose of 'aggregating' neural responses with identical entropy / MI values?

      Response 7.2: It is important to note that the 20 stimuli exhibit 20 increments of gesture entropy values, 11 increments of speech entropy values, and 19 increments of mutual information values (Appendix Table 3). This discrepancy arises from the calculation of entropy and mutual information, where the distributions were derived from the comprehensive set of responses contributed by all 30 participants. As a result, these values were impacted not only by the distinct nameabilities of the stimuli but also by the entirety of responses provided. Consequently, in the context of speech entropy, 9 items demonstrate the nameability of 1, signifying unanimous comprehension among all 30 participants, resulting in an entropy of 0. Moreover, stimuli 'ning' and 'jiao' share an identical distribution, leading to an entropy of 0.63. Regarding MI, a value of 0.66 is computed for the combinations of stimuli 'sao' (gesture entropy: 4.01, speech entropy: 1.12, Author response image 32) and 'tui' (gesture entropy: 1.62, speech entropy: 0, Author response image 4). This indicates that these two sets of stimuli manifest an equivalent degree of integration.

      Author response image 3.

      Example of gesture answers (gesture sao), speech answers (speech sao), and mutual information (MI) for the ‘sao’ item

      Author response image 4.

      Example of gesture answers (gesture tui), speech answers (speech tui), and mutual information (MI) for the ‘tui’ item

      To precisely assess whether lower entropy/MI corresponds to a smaller or larger neural response, neural responses (ERP amplitude or TMS inhibition effect) with identical entropy or MI values were averaged before undergoing correlational analysis. We understand that the phrasing might be ambiguous. Clear description has been changed in the revised manuscript in Lines 157-160: ‘To determine whether entropy or MI values corresponds to distinct neural changes, the current study first aggregated neural responses (including inhibition effects of tDCS and TMS or ERP amplitudes) that shared identical entropy or MI values, prior to conducting correlational analyses.’

      Comment 7.3: c) The paragraph in L160-171 is confusing. Is it an attempt to give an overview of all three experiments? If so, consider moving to the end or summarising what each experiment is at the beginning of the paragraph giving it a name (i.e., TMS). Without that, it is unclear what each experiment is counterbalancing or what 'stimulation site' refers to, for example, leading to a significant lack of clarity.

      Response 7.3: We are sorry for the ambiguity, in the revised manuscript, we have moved the relevant phrasing to the beginning of each experiment.

      ‘Experiment 1: HD-tDCS protocol and data analysis

      Participants were divided into two groups, with each group undergoing HD-tDCS stimulation at different target sites (IFG or pMTG). Each participant completed three experimental sessions, spaced one week apart, during which 480 gesture-speech pairs were presented across various conditions. In each session, participants received one of three types of HD-tDCS stimulation: Anodal, Cathodal, or Sham. The order of stimulation site and type was counterbalanced using a Latin square design to control for potential order effects’ (Lines 183-189)

      ‘Experiment 2: TMS protocol and data analysis

      Experiment 2 involved 800 gesture-speech pairs, presented across 15 blocks over three days, with one week between sessions. Stimulation was administered at three different sites (IFG, pMTG, or Vertex). Within the time windows (TWs) spanning the gesture-speech integration period, five TWs that exhibited selective disruption of integration were selected: TW1 (-120 to -80 ms relative to the speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms)23 (Figure 1C). The order of stimulation site and TW was counterbalanced using a Latin square design.’ (Lines 223-230)

      ‘Experiment 3: Electroencephalogram (EEG) recording and data analysis

      Experiment 3, comprising a total of 1760 gesture-speech pairs, was completed in a single-day session.’ (Lines 249-250)

      Comment 7.4: d) L402-406: This sentence is not clear. What do the authors mean by 'the state of [the neural landscape] constructs gradually as measured by entropy and MI'? How does this construct a neural landscape? The authors must rephrase this paragraph using clearer language since in its current state it is very difficult to assess whether it is supported by the evidence they present.

      Response 7.4: We are sorry for the ambiguity, in the revised manuscript we have provided clear description in Lines 483-492: ‘The varying contributions of unisensory gesture-speech information and the convergence of multisensory inputs, as reflected in the correlation between distinct ERP components and TMS time windows (TMS TWs), are consistent with recent models suggesting that multisensory processing involves parallel detection of modality-specific information and hierarchical integration across multiple neural levels[4,48]. These processes are further characterized by coordination across multiple temporal scales[49]. Building on this, the present study offers additional evidence that the multi-level nature of gesture-speech processing is statistically structured, as measured by information matrix of unisensory entropy and multisensory convergence index of MI, the input of either source would activate a distributed representation, resulting in progressively functioning neural responses’

      References:

      Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.

      Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.

      Senkowski, D., and Engel, A.K. (2024). Multi-timescale neural dynamics for multisensory integration. Nat Rev Neurosci 25, 625-642. 10.1038/s41583-024-00845-7.

      (8) Some writing suffers from conceptual equivocation. For example, the link between 'multimodal representation' and gesture as a type of multimodal extralinguistic information is not straightforward. What 'multimodal representations' usually refer to in semantic cognition is not the co-occurrence of gesture and speech, but the different sources or modalities that inform the structure of a semantic representation or concept (not the fact we use another modality vision to perceive gestures that enrich the linguistic auditory communication of said concepts). See also my comment in the public review regarding the conceptual conflation of the graded hub hypothesis.

      Response 8: We aimed to clarify that the integration of gesture and speech, along with the unified representation it entails, is not merely a process whereby perceived gestures enhance speech comprehension. Rather, there exists a bidirectional influence between these two modalities, affecting both their external forms (Bernaidis et al., 2006) and their semantic content (Kita et al., 2003; Kelly et al., 2010). Given that multisensory processing is recognized as an interplay of both top-down and bottom-up mechanisms, we hypothesize that this bidirectional semantic influence between gesture and speech operates similarly. Consequently, we recorded neural responses—specifically the inhibitory effects observed through TMS/tDCS or ERP components—beginning at the onset of speech, which marks the moment when both modalities are accessible.

      We prioritize gesture for two primary reasons. Firstly, from a naturalistic perspective, speech and gesture are temporally aligned; gestures typically precede their corresponding speech segments by less than one second (Morrelsamuls et al., 1992). This temporal alignment has prompted extensive research aimed at identifying the time windows during which integration occurs (Obermeier et al., 2011, 2015). Results indicate that local integration of gesture and speech occurs within a time frame extending from -200 ms to +120 ms relative to gesture-speech alignment, where -200 ms indicates that gestures occur 200 ms before speech onset, and +120 ms signifies gestures occurring after the identification point of speech.

      Secondly, in our previous study (Zhao, 2023), we investigated this phenomenon by manipulating gesture-speech alignment across two conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. Notably, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures serve a semantic priming function for co-occurring speech.

      We recognize that our previous use of the term "co-occurring speech" may have led to ambiguity. Therefore, in the revised manuscript, we have replaced those sentences with a detailed description of the properties of each modality in Lines 60-62: ‘Even though gestures convey information in a global-synthetic way, while speech conveys information in a linear segmented way, there exists a bidirectional semantic influence between the two modalities[9,10]’

      Conceptual conflation of the graded hub hypothesis has been clarified in the Response to Reviewer 3 (public review) response 2.

      References:

      Bernardis, P., & Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia, 44(2), 178-190

      Kelly, S. D., Ozyurek, A., & Maris, E. (2010b). Two sides of the same coin: speech and gesture mutually interact to enhance comprehension. Psychological Science, 21(2), 260-267. doi:10.1177/0956797609357327

      Kita, S., & Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16-32. doi:10.1016/s0749-596x(02)00505-3

      Obermeier, C., & Gunter, T. C. (2015). Multisensory Integration: The Case of a Time Window of Gesture-Speech Integration. Journal of Cognitive Neuroscience, 27(2), 292-307. doi:10.1162/jocn_a_00688

      Obermeier, C., Holle, H., & Gunter, T. C. (2011). What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help. Journal of Cognitive Neuroscience, 23(7), 1648-1663. doi:10.1162/jocn.2010.21498

      Morrelsamuels, P., & Krauss, R. M. (1992). WORD FAMILIARITY PREDICTS TEMPORAL ASYNCHRONY OF HAND GESTURES AND SPEECH. Journal of Experimental Psychology-Learning Memory and Cognition, 18(3), 615-622. doi:10.1037/0278-7393.18.3.615

      Hostetter, A., and Mainela-Arnold, E. (2015). Gestures occur with spatial and Motoric knowledge: It's more than just coincidence. Perspectives on Language Learning and Education 22, 42-49. doi:10.1044/lle22.2.42.

      McNeill, D. (2005). Gesture and though (University of Chicago Press). 10.7208/chicago/9780226514642.001.0001.

      Zhao, W. (2023). TMS reveals a two-stage priming circuit of gesture-speech integration. Front Psychol 14, 1156087. 10.3389/fpsyg.2023.1156087.

      (9) The last paragraph of the introduction lacks a conductive thread. The authors describe three experiments without guiding the reader through a connecting thread underlying the experiments. Feels more like three disconnected studies than a targeted multi-experiment approach to solve a problem. What is each experiment contributing to? What is the 'grand question' or thread unifying these?

      Response 9: The present study introduced three experiments to explore the neural activity linked to the amount of information processed during multisensory gesture-speech integration. In Experiment 1, we observed that the extent of inhibition in the pMTG and LIFG was closely linked to the overlapping gesture-speech responses, as quantified by mutual information. Building on the established roles of the pMTG and LIFG in our previous study (Zhao et al., 2021, JN), we then expanded our investigation to determine whether the dynamic neural engagement between the pMTG and LIFG during gesture-speech processing was also associated with the quality of the information. This hypothesis was further validated through high-temporal resolution EEG, where we examined ERP components related to varying information qualities. Notably, we observed a close time alignment between the ERP components and the time windows of the TMS effects, which were associated with the same informational matrices in gesture-speech processing.

      Linkage of the three experiments has been clarified in the introduction in Lines 75-102: ‘

      To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI.

      Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).

      Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics’

      References:

      Bikson, M., Inoue, M., Akiyama, H., Deans, J.K., Fox, J.E., Miyakawa, H., and Jefferys, J.G.R. (2004). Effects of uniform extracellular DC electric fields on excitability in rat hippocampal slices. J Physiol-London 557, 175-190. 10.1113/jphysiol.2003.055772.

      Federmeier, K.D., Mai, H., and Kutas, M. (2005). Both sides get the point: hemispheric sensitivities to sentential constraint. Memory & Cognition 33, 871-886. 10.3758/bf03193082.

      Kelly, S.D., Kravitz, C., and Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89, 253-260. 10.1016/s0093-934x(03)00335-3.

      Wu, Y.C., and Coulson, S. (2005). Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654-667. 10.1111/j.1469-8986.2005.00356.x.

      Fritz, I., Kita, S., Littlemore, J., and Krott, A. (2021). Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliates. J Mem Lang 117, 104191. 10.1016/j.jml.2020.104191.

      Gunter, T.C., and Weinbrenner, J.E.D. (2017). When to take a gesture seriously: On how we use and prioritize communicative cues. J Cognitive Neurosci 29, 1355-1367. 10.1162/jocn_a_01125.

      Ozyurek, A., Willems, R.M., Kita, S., and Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. J Cognitive Neurosci 19, 605-616. 10.1162/jocn.2007.19.4.605.

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.

      (10) The authors should provide a clearer figure to appreciate their paradigm, illustrating clearly the stimulus presentation (gesture and speech).

      Response 10: To reduce ambiguity, unnecessary arrows were deleted from Figure 1.

      Comment 11.1: (11) Required methodological clarifications to better assess the strength of the evidence presented:

      a) Were the exclusion criteria only handedness and vision? Did the authors exclude based on neurological and psychiatric disorders? Psychoactive drugs? If not, do they think the lack of these exclusion criteria might have influenced their results?

      Response 11.1: Upon registration, each participant is required to complete a questionnaire alongside the consent form and handedness questionnaire. This procedure is designed to exclude individuals with potential neurological or psychiatric disorders, as well as other factors that may affect their mental state or reaction times. Consequently, all participants reported in the manuscript do not have any of the aforementioned neurological or psychiatric disorders. The questionnaire is attached below:

      Author response image 4.

      Comment 11.2: b) Are the subjects from the pre-tests (L112-113) and the replication study (L107) a separate sample or did they take part in Experiments 1-3?

      Response 11.2: The participants in each pre-test and experiment were independent, resulting in a total of 188 subjects. Since the stimuli utilized in this study were previously validated and reported (Zhao et al., 2021), the 90 subjects who participated in the three pre-tests are not included in the final count for the current study, leaving a total of 98 participants reported in the manuscript in Lines 103-104: ‘Ninety-eight young Chinese participants signed written informed consent forms and took part in the present study’.

      Comment 11.3: c) L176. The authors should explain how they selected ROIs. This is very important for the reasons outlined above.

      Response 11.3: Please see Response to Comment 6 for details.

      Comment 11.4: d) The rationale for Experiment 1 and its analysis approach should be explicitly described. Why perform Pearson correlations? What is the conceptual explanation of the semantic congruency effect and why should it be expected to correlate with the three information-theoretic metrics? What effects could the authors expect to find and what would they mean? There is a brief description in L187-195 but it is unclear.

      Response 11.4: We thank the reviewer for their rigorous consideration. The semantic congruency effect is widely used as an index of multisensory integration. Therefore, the effects of HD-tDCS on the IFG and pMTG, as measured by changes in the semantic congruency effect, serve as an indicator of altered neural responses to multisensory integration. In correlating these changes with behavioral indices of information degree, we aimed to assess whether the integration hubs (IFG and pMTG) function progressively during multisensory gesture-speech integration. The rationale for using Pearson correlations is based on the hypothesis that the 20 sets of stimuli used in this study represent a sample from a normally distributed population. Thus, even with changes in the sample (e.g., using another 20 values), the gradual relationship between neural responses and the degree of information would remain unchanged. This hypothesis is supported by the findings from another experiment (see details in Response to Comment 4).

      In the revised manuscript, we have provided a clear description of the rationale for Experiment 1 in Lines 206-219: ‘To examine the relationship between the degree of information and neural responses, we conducted Pearson correlation analyses using a sample of 20 sets. Neural responses were quantified based on the effects of HD-tDCS (active tDCS minus sham tDCS) on the semantic congruency effect, defined as the difference in reaction times between semantic incongruent and congruent conditions (Rt(incongruent) - Rt(congruent)). This effect served as an index of multisensory integration[35] within the left IFG and pMTG. The variation in information was assessed using three information-theoretic metrics. To account for potential confounds related to multiple candidate representations, we conducted partial correlation analyses between the tDCS effects and gesture entropy, speech entropy, and MI, controlling for the number of responses provided for each gesture and speech, as well as the total number of combined responses. Given that HD-tDCS induces overall disruption at the targeted brain regions, we hypothesized that the neural activity within the left IFG and pMTG would be progressively affected by varying levels of multisensory convergence, as indexed by MI.’

      Additionally, in the introduction, we have rephrased the relevant rationale in Lines 75-86: _‘_To investigate the neural mechanisms underlying gesture-speech integration, we conducted three experiments to assess how neural activity correlates with distributed multisensory integration, quantified using information-theoretic measures of MI. Additionally, we examined the contributions of unisensory signals in this process, quantified through unisensory entropy. Experiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarization with cathodal stimulation[26], thereby increasing or decreasing cortical excitability in the targeted brain area, respectively. This experiment aimed to determine whether the overall facilitation (Anodal-tDCS minus Sham-tDCS) and/or inhibitory (Cathodal-tDCS minus Sham-tDCS) of these integration hubs is modulated by the degree of gesture-speech integration, as measure by MI

      Reference:

      Kelly, S.D., Creigh, P., and Bartolotti, J. (2010). Integrating speech and iconic gestures in a Stroop-like task: Evidence for automatic processing. Journal of Cognitive Neuroscience 22, 683-694. 10.1162/jocn.2009.21254.

      Comment 11.5: e) The authors do not mention in the methods if FDR correction was applied to the Pearson correlations in Experiment 1. There is a mention in the Results Figure, but it is unclear if it was applied consistently. Can the authors confirm, and explicitly state the way they carried out FDR correction for this family of tests in Experiment 1? This is especially important in the light of some of their results having a p-value of p=.049.

      Response 11.5: FDR correction was applied to Experiment 1, and all reported p-values were corrected using this method. In the revised manuscript, we have included a reference to FDR correction in Lines 221-222: ‘False discovery rate (FDR) correction was applied for multiple comparisons.’

      In Experiment 1, since two separate participant groups (each N = 26) were recruited for the HD-tDCS over either the IFG or pMTG, FDR correction was performed separately for each group. Therefore, for each brain region, six comparisons (three information matrices × two tDCS effects: anodal-sham or cathodal-sham) were submitted for FDR correction.

      In Experiment 2, six comparisons (three information matrices × two sites: IFG or pMTG) were submitted for FDR correction. In Experiment 3, FDR correction was applied to the seven regions of interest (ROIs) within each component, resulting in five comparisons

      The confidence of a p-value of 0.049 was clarified in Response to Comment 3.

      Comment 11.6: f) L200. What does the abbreviation 'TW' stands for in this paragraph? When was it introduced in the main text? The description is in the Figure, but it should be moved to the main text.]

      Comment 11.7: g) How were the TWs chosen? Is it the criterion in L201-203? If so, it should be moved to the start of the paragraph. What does the word 'selected' refer to in that description? Selected for what? The explanation seems to be in the Figure, but it should be in the main text. It is still not a complete explanation. What were the criteria for assigning TWs to the IFG or pMTG?

      Response 11.6& 11.7: Since the two comments are related, we will provide a synthesized response. 'TW' refers to time window, the selection of which was based on our previous study (Zhao et al., 2021, J. Neurosci). In Zhao et al. (2021), we employed the same experimental protocol—using inhibitory double-pulse transcranial magnetic stimulation (TMS) over the IFG and pMTG in one of eight 40-ms time windows relative to the speech identification point (IP; the minimal length of lexical speech), with three time windows before the speech IP and five after. Based on this previous work, we believe that these time windows encompass the potential gesture-speech integration process. Results demonstrated a time-window-selective disruption of the semantic congruency effect (i.e., reaction time costs driven by semantic conflict), with no significant modulation of the gender congruency effect (i.e., reaction time costs due to gender conflict), when stimulating the left pMTG in TW1, TW2, and TW7, and when stimulating the left IFG in TW3 and TW6. Based on these findings, the present study selected the five time windows that showed a selective disruption effect during gesture-speech integration.

      Note that in the present study, we applied stimulation to both the IFG and pMTG across all five time windows, and further correlated the TMS disruption effects with the three information matrices.

      We recognize that the rationale for the choice of time windows was not sufficiently explained in the original manuscript. In the revised manuscript, we have added the relevant description in Lines 223-228: ‘Stimulation was administered at three different sites (IFG, pMTG, or Vertex). Within the time windows (TWs) spanning the gesture-speech integration period, five TWs that exhibited selective disruption of integration were selected: TW1 (-120 to -80 ms relative to the speech identification point), TW2 (-80 to -40 ms), TW3 (-40 to 0 ms), TW6 (80 to 120 ms), and TW7 (120 to 160 ms)[23] (Figure 1C). The order of stimulation site and TW was counterbalanced using a Latin square design.’

      Comment 11.8: h) Again, the rationale for the Pearson correlations of semantic congruency with information-theoretic metrics should be explicitly outlined. What is this conceptually?

      Response 11.8: Given that the rationale behind Experiment 1 and Experiment 2 is similar—both investigating the correlation between interrupted neural effects and the degree of information—we believe that the introduction of the Pearson correlation between semantic congruency and information-theoretic metrics, as presented in Experiment 1 (see Response to Comment 11.4 for details), is sufficient for both experiments.

      Comment 11.9: i)What does 'gesture stoke' mean in the Figure referring to Experiment 3? Figure 1D is not clear. What are the arrows referring to?

      Response 11.9: According to McNeill (1992), gesture phases differ based on whether the gesture depicts imagery. Iconic and metaphoric gestures are imagistic and typically consist of three phases: a preparation phase, a stroke phase, and a retraction phrase. Figure 4 provides an example of these three phases using the gesture ‘break’. In the preparation phase, the hand and arm move away from their resting position to a location in gesture space where the stroke begins. As illustrated in the first row of Figure 4, during the preparation phase of the ‘break’ gesture, the hands, initially in a fist and positioned downward, rise to a center-front position. In the stroke phase, the meaning of the gesture is conveyed. This phase occurs in the central gesture space and is synchronized with the linguistic segments it co-expresses. For example, in the stroke phase of the ‘break’ gesture (second row of Figure 4), the two fists move 90 degrees outward before returning to a face-down position. The retraction phase involves the return of the hand from the stroke position to the rest position. In the case of the ‘break’ gesture, this involves moving the fists from the center front back into the resting position (see third row of Figure 4).

      Therefore, in studies examining gesture-speech integration, gestures are typically analyzed starting from the stroke phase (Habets et al., 2011; Kelly et al., 2010), a convention also adopted in our previous studies (Zhao et al., 2018, 2021, 2023). We acknowledge that this should be explained explicitly, and in the revised manuscript, we have added the following clarification in Lines 162-166: ‘Given that gestures induce a semantic priming effect on concurrent speech[33], this study utilized a semantic priming paradigm in which speech onset was aligned with the DP of each gesture[23,33], the point at which the gesture transitions into a lexical form[34]. The gesture itself began at the stroke phase, a critical moment when the gesture conveys its primary semantic content[34].’

      Additionally, Figure 1 has been revised in the manuscript to eliminate ambiguous arrows. (see Response 10 for detail).

      Author response image 5.

      An illustration of the gesture phases of the 'break' gesture.

      References:

      Habets, B., Kita, S., Shao, Z. S., Ozyurek, A., & Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. Journal of Cognitive Neuroscience, 23(8), 1845-1854. doi:10.1162/jocn.2010.21462

      Kelly, S. D., Creigh, P., & Bartolotti, J. (2010). Integrating Speech and Iconic Gestures in a Stroop-like Task: Evidence for Automatic Processing. Journal of Cognitive Neuroscience, 22(4), 683-694. doi:DOI 10.1162/jocn.2009.21254

      Comment 11.10: j) L236-237: "Consequently, four ERP components were predetermined" is very confusing. Were these components predetermined? Or were they determined as a consequence of the comparison between the higher and lower halves for the IT metrics described above in the same paragraph? The description of the methods is not clear.

      Response 11.10: The components selected were based on a comparison between the higher and lower halves of the information metrics. By stating that these components were predetermined, we aimed to emphasize that the components used in our study are consistent with those identified in previous research on semantic processing. We acknowledge that the phrasing may have been unclear, and in the revised manuscript, we have provided a more explicit description in Lines 267-276: ‘To consolidate the data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work[40], and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons.

      For the traditional ROI analysis, grand-average ERPs at electrode Cz were compared between the higher (≥50%) and lower (<50%) halves for gesture entropy (Figure 5A1), speech entropy (Figure 5B1), and MI (Figure 5C1). Consequently, four ERP components were determined: the P1 effect observed within the time window of 0-100 ms[27,28], the N1-P2 effect observed between 150-250ms[27,28], the N400 within the interval of 250-450ms[14,28,29], and the LPC spanning from 550-1000ms[30,31].’

      Reference: Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462.

      (12) In the Results section for Experiment 2 (L292-295), it is not clear what the authors mean when they mention that a more negative TMS effect represents a stronger interruption of the integration effect. If I understand correctly, the correlation reported for pMTG was for speech entropy, which does not represent integration (that would be MI).

      Response 12: Since the TMS effect was defined as active TMS minus Vertex TMS, the inhibitory TMS effect is inherently negative. A greater inhibitory TMS effect corresponds to a larger negative value, such that a more negative TMS effect indicates a stronger disruption of the integration process. We acknowledge that the previous phrasing was somewhat ambiguous. In the revised manuscript, we have rephrased the sentence as follows: ‘a larger negative TMS effect signifies a greater disruption of the integration process’ (Lines 342-343)

      Multisensory integration transcends simple data amalgamation, encompassing complex interactions at various hierarchical neural levels and the parallel detection and discrimination of raw data from each modality (Benetti et al., 2023; Meijer et al., 2019). Therefore, we regard the process of gesture-speech integration as involving both unisensory processing and multisensory convergence. The correlation of gesture and speech entropy reflects contributions from unisensory processing, while the mutual information (MI) index indicates the contribution of multisensory convergence during gesture-speech integration. The distinction between these various source contributions will be the focus of Experiment 2 and Experiment 3, as described in the revised manuscript Lines 87-102: ‘Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to further assess whether the activity of these regions was associated with relevant informational matrices. Specifically, we applied inhibitory chronometric double-pulse transcranial magnetic stimulation (TMS) to specific temporal windows associated with integration processes in these regions[23], assessing whether the inhibitory effects of TMS were correlated with unisensory entropy or the multisensory convergence index (MI).

      Experiment 3 complemented these investigations by focusing on the temporal dynamics of neural responses during semantic processing, leveraging high-temporal event-related potentials (ERPs). This experiment investigated how distinct information contributors modulated specific ERP components associated with semantic processing. These components included the early sensory effects as P1 and N1–P2[27,28], the N400 semantic conflict effect[14,28,29], and the late positive component (LPC) reconstruction effect[30,31]. By integrating these ERP findings with results from Experiments 1 and 2, Experiment 3 aimed to provide a more comprehensive understanding of how gesture-speech integration is modulated by neural dynamics’.  

      References:

      Benetti, S., Ferrari, A., and Pavani, F. (2023). Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 17, 1108354. 10.3389/fnhum.2023.1108354.

      Meijer, G.T., Mertens, P.E.C., Pennartz, C.M.A., Olcese, U., and Lansink, C.S. (2019). The circuit architecture of cortical multisensory processing: Distinct functions jointly operating within a common anatomical network. Prog Neurobiol 174, 1-15. 10.1016/j.pneurobio.2019.01.004.

      (13) I find the description of the results for Experiment 3 very hard to follow. Perhaps if the authors have decided to organise the main text by describing the components from earliest to latest, the Figure organisation should follow suit (i.e., organise the Figure from the earliest to the latest component, instead of gesture entropy/speech entropy / mutual information). This might make the description of the results easier to follow.

      Response 13: As suggested, we have reorganized the results of experiment 3 based on components from earliest to latest, together with an updated Figure 5.

      The results are detailed in Lines 367-423: ‘Topographical maps illustrating amplitude differences between the lower and higher halves of speech entropy demonstrate a central-posterior P1 amplitude (0-100 ms, Figure 5B). Aligning with prior findings[27], the paired t-tests demonstrated a significantly larger P1 amplitude within the ML ROI (t(22) = 2.510, p = 0.020, 95% confidence interval (CI) = [1.66, 3.36]) when contrasting stimuli with higher 50% speech entropy against those with lower 50% speech entropy (Figure 5D1 left). Subsequent correlation analyses unveiled a significant increase in the P1 amplitude with the rise in speech entropy within the ML ROI (r = 0.609, p = 0.047, 95% CI = [0.039, 1.179], Figure 5D1 right). Furthermore, a cluster of neighboring time-electrode samples exhibited a significant contrast between the lower 50% and higher 50% of speech entropy, revealing a P1 effect spanning 16 to 78 ms at specific electrodes (FC2, FCz, C1, C2, Cz, and CPz, Figure 5D2 middle) (t(22) = 2.754, p = 0.004, 95% confidence interval (CI) = [1.65, 3.86], Figure 5D2 left), with a significant correlation with speech entropy (r = 0.636, p = 0.035, 95% CI = [0.081, 1.191], Figure 5D2 right).

      Additionally, topographical maps comparing the lower 50% and higher 50% gesture entropy revealed a frontal N1-P2 amplitude (150-250 ms, Figure 5A). In accordance with previous findings on bilateral frontal N1-P2 amplitude[27], paired t-tests displayed a significantly larger amplitude for stimuli with lower 50% gesture entropy than with higher 50% entropy in both ROIs of LA (t(22) = 2.820, p = 0.011, 95% CI = [2.21, 3.43]) and RA (t(22) = 2.223, p = 0.038, 95% CI = [1.56, 2.89]) (Figure 5E1 left).  Moreover, a negative correlation was found between N1-P2 amplitude and gesture entropy in both ROIs of LA (r = -0.465, p = 0.039, 95% CI = [-0.87, -0.06]) and RA (r = -0.465, p = 0.039, 95% CI = [-0.88, -0.05]) (Figure 5E1 right). Additionally, through a cluster-permutation test, the N1-P2 effect was identified between 184 to 202 ms at electrodes FC4, FC6, C2, C4, C6, and CP4 (Figure 5E2 middle) (t(22) = 2.638, p = 0.015, 95% CI = [1.79, 3.48], (Figure 5E2 left)), exhibiting a significant correlation with gesture entropy (r = -0.485, p = 0.030, 95% CI = [-0.91, -0.06], Figure 5E2 right).

      Furthermore, in line with prior research[42], a left-frontal N400 amplitude (250-450 ms) was discerned from topographical maps of gesture entropy (Figure 5A). Specifically, stimuli with lower 50% values of gesture entropy elicited a larger N400 amplitude in the LA ROI compared to those with higher 50% values  (t(22) = 2.455, p = 0.023, 95% CI = [1.95, 2.96], Figure 5F1 left). Concurrently, a negative correlation was noted between the N400 amplitude and gesture entropy (r = -0.480, p = 0.032, 95% CI = [-0.94, -0.03], Figure 5F1 right) within the LA ROI. The identified clusters showing the N400 effect for gesture entropy (282 – 318 ms at electrodes FC1, FCz, C1, and Cz, Figure 5F2 middle) (t(22) = 2.828, p = 0.010, 95% CI = [2.02, 3.64], Figure 5F2 left) also exhibited significant correlation between the N400 amplitude and gesture entropy (r = -0.445, p = 0.049, 95% CI = [-0.88, -0.01], Figure 5F2 right).

      Similarly, a left-frontal N400 amplitude (250-450 ms) [42] was discerned from topographical maps for MI (Figure 5C). A larger N400 amplitude in the LA ROI was observed for stimuli with lower 50% values of MI compared to those with higher 50% values (t(22) = 3.00, p = 0.007, 95% CI = [2.54, 3.46], Figure 5G1 left). This was accompanied by a significant negative correlation between N400 amplitude and MI (r = -0.504, p = 0.028, 95% CI = [-0.97, -0.04], Figure 5G1 right) within the LA ROI. The N400 effect for MI, observed in the 294–306 ms window at electrodes F1, F3, Fz, FC1, FC3, FCz, and C1 (Figure 5G2 middle) (t(22) = 2.461, p = 0.023, 95% CI = [1.62, 3.30], Figure 5G2 left), also showed a significant negative correlation with MI (r = -0.569, p = 0.011, 95% CI = [-0.98, -0.16], Figure 5G2 right).

      Finally, consistent with previous findings[30], an anterior LPC effect (550-1000 ms) was observed in topographical maps comparing stimuli with lower and higher 50% speech entropy (Figure 5B). The reduced LPC amplitude was evident in the paired t-tests conducted in ROIs of LA (t(22) = 2.614, p = 0.016, 95% CI = [1.88, 3.35]); LC (t(22) = 2.592, p = 0.017, 95% CI = [1.83, 3.35]); RA (t(22) = 2.520, p = 0.020, 95% CI = [1.84, 3.24]); and ML (t(22) = 2.267, p = 0.034, 95% CI = [1.44, 3.10]) (Figure 5H1 left). Simultaneously, a marked negative correlation with speech entropy was evidenced in ROIs of LA (r = -0.836, p =   0.001, 95% CI = [-1.26, -0.42]); LC (r = -0.762, p = 0.006, 95% CI = [-1.23, -0.30]); RA (r = -0.774, p = 0.005, 95% CI = [-1.23, -0.32]) and ML (r = -0.730, p = 0.011, 95% CI = [-1.22, -0.24]) (Figure 5H1 right). Additionally, a cluster with the LPC effect (644 - 688 ms at electrodes Cz, CPz, P1, and Pz, Figure 5H2 middle) (t(22) = 2.754, p = 0.012, 95% CI = [1.50, 4.01], Figure 5H2 left) displayed a significant correlation with speech entropy (r = -0.699, p = 0.017, 95% CI = [-1.24, -0.16], Figure 5H2 right).’

      (14) In the Discussion (L394 - 395) the authors mention for the first time their task being a semantic priming paradigm. This idea of the task as a semantic priming paradigm allowing top-down prediction of gesture over speech should be presented earlier in the paper, perhaps during the final paragraph of the introduction (as part of the rationale) or during the explanation of the task. The authors mention top-down influences earlier and this is impossible to understand before this information about the paradigm is presented. It would also make the reading of the paper significantly clearer. Critically, an appropriate description of the paradigm is missing in the Methods (what are the subjects asked to do? It states that it replicates an effect in Ref 28, but this manuscript does not contain a clear description of the task). To further complicate things, the 'Experimental Procedure' section of the methods states this is a semantic priming paradigm of gestures onto speech (L148) and proceeds to provide two seemingly irrelevant references (for example, the Pitcher reference is to a study that employed faces and houses as stimuli). How is this a semantic priming paradigm? The study where I found the first mention of this paradigm seems to clearly classify it as a Stroop-like task (Kelly et al, 2010).

      We appreciate the reviewer’s thorough consideration. The experimental paradigm employed in the current study differs from the Stroop-like task utilized by Kelly et al. (2010). In their study, the video presentation started with the stroke phase of the gesture, while speech occurred 200 ms after the gesture onset.

      As detailed in our previous study (Zhao et al., 2023, Frontiers in Psychology), we confirmed the semantic predictive role of gestures in relation to speech by contrasting two experimental conditions: (1) gestures preceding speech by a fixed 200 ms interval, and (2) gestures preceding speech at the semantic identification point of the gesture. Our findings revealed time-window-selective disruptions in the semantic congruency effect in the IFG and pMTG, but only in the second condition, suggesting that gestures exert a semantic priming effect on concurrent speech.

      This work highlighted the semantic priming role of gestures in the integration of speech found in Zhao et al. (2021, Journal of Neuroscience). In the study, a comparable approach was adopted by segmenting speech into eight 40-ms time windows based on the speech discrimination point, while manipulating the speech onset to align with the gesture identification point. The results revealed time-window-selective disruptions in the semantic congruency effect, providing support for the dynamic and temporally staged roles of the IFG and pMTG in gesture-speech integration.

      Given that the present study follows the same experimental procedure as our prior work (Zhao et al., 2021, Journal of Neuroscience; Zhao et al., 2023, Frontiers in Psychology), we refer to this design as a "semantic priming" of gesture upon speech. We agree with the reviewer that a detailed description should be clarified earlier in the manuscript. To address this, we have added a more explicit description of the semantic priming paradigm in the methods section of the revised manuscript in Lines 162-166: ‘Given that gestures induce a semantic priming effect on concurrent speech[33], this study utilized a semantic priming paradigm in which speech onset was aligned with the DP of each gesture[23,33], the point at which the gesture transitions into a lexical form[34]. The gesture itself began at the stroke phase, a critical moment when the gesture conveys its primary semantic content [34].’

      The task participants completed was outlined immediately following the explanation of the experimental paradigm: ‘Gesture–speech pairs were presented randomly using Presentation software (www.neurobs.com). Participants were asked to look at the screen but respond with both hands as quickly and accurately as possible merely to the gender of the voice they heard’ (Lines:177-180).

      Wrongly cited references have been corrected.

      (15) L413-417: How do the authors explain that they observe this earlier ERP component and TMS effect over speech and a later one over gesture in pMTG when in their task they first presented gesture and then speech? Why mention STG/S when they didn't assess this?

      (19) L436-440: This paragraph yields the timing of the findings represented in Figure 6 even more confusing. If gesture precedes speech in the paradigm, why are the first TMS and ERP results observed in speech?

      Response 15 &19: Since these two aspects are closely related, we offer a comprehensive explanation. Although gestures were presented before speech, the integration process occurs once both modalities are available. Consequently, ERP and TMS measurements were taken after speech onset to capture the integration of the two modalities. Neural responses were used as the dependent variable to reflect the degree of integration—specifically, gesture-speech semantic congruency in the TMS study and high-low semantic variance in the ERP study. Therefore, the observed early effect can be interpreted as an interaction between the top-down influence of gesture and the bottom-up processing of speech.

      To isolate the pure effect of gesture, neural activity would need to be recorded from gesture onset. However, if one aims to associate the strength of neural activity with the degree of gesture information, recording from the visual processing areas would be more appropriate.

      To avoid unnecessary ambiguity, the phrase "involved STG/S" has been removed from the manuscript.

      (16) L427-428: I find it hard to believe that MI, a behavioural metric, indexes the size of overlapped neural populations activated by gesture and speech. The authors should be careful with this claim or provide evidence in favour.

      Response 16: Mutual information (MI) is a behavioral metric that indexes the distribution of overlapping responses between gesture and speech (for further details, please see the Response to Comment 1). In the present study, MI was correlated with neural responses evoked by gesture and speech, with the goal of demonstrating that neural activity progressively reflects the degree of information conveyed, as indexed by MI.

      (17) Why would you have easier integration (reduced N400) with larger gesture entropy in IFG (Figure 6(3))? Wouldn't you expect more difficult processing if entropy is larger?

      (18) L431-432: The claim that IFG stores semantic information is controversial. The authors provide two references from the early 2000s that do not offer support for this claim (the IFG's purported involvement according to these is in semantic unification, not storage).

      Response 17 &18: As outlined in the Responses to Comment 1 of the public review, we have provided a re-explanation of the IFG as a semantic control region. Additionally, we have clarified the role of the IFG in relation to the various stages of gesture-speech integration in Lines 533-538: ‘Last, the activated speech representation would disambiguate and reanalyze the semantic information and further unify into a coherent comprehension in the pMTG[12,37]. As speech entropy increases, indicating greater uncertainty in the information provided by speech, more cognitive effort is directed towards selecting the targeted semantic representation. This leads to enhanced involvement of the IFG and a corresponding reduction in LPC amplitude’

      (20) Overall, the grammar makes some parts of the discussion hard to follow (e.g. the limitation in L446-447: 'While HD tDCS and TMS may impact functionally and anatomically connected brain regions, the graded functionality of every disturbed period is not guaranteed')

      Response 20: Clear description has been provided in the revised manuscript in Lines 552-557: ‘Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions[55,56],  whether the absence of influence in certain TWs can be attributed to compensation by other connected brain areas, such as angular gyrus[57] or anterior temporal lobe[58], warrants further investigation. Therefore, caution is needed when interpreting the causal relationship between inhibition effects of brain stimulation and information-theoretic metrics (entropy and MI).’

      References:

      Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.

      Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016

      Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452–463. https://doi.org/10.1016/j.tins.2021.01.006

      Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213–4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013

      (21) Inconsistencies between terminology employed in Figures and main text (e.g., pre-test study in text, gating study in Figure?)

      Response 21: Consistence has been made by changing the ‘gating study’ into ‘pre-tests’ in Figure 1 (Lines 758).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their thoughtful evaluation of our manuscript. We considered all the comments and prepared the revised version. The following are our responses to the reviewers’ comments. All references, including those in the original manuscript are included at the end of this point-by-point response.

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, yeasts such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These statements have been incorporated into our revised manuscript (lines 391-397). Nevertheless, the term “core” in the manuscript and title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we replaced the “core” with “key,” a term that is more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species detected from the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions would be needed, such as artificially introducing wild flies onto fresh bananas in the laboratory. Nevertheless, the microbes potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources of microbes also merit consideration. For example, microbes may have been introduced to unfermented bananas by penetration through peel injuries (lines 1300-1301). In addition, they could be introduced by insects other than flies, given that rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. The explanation of these possibilities have been incorporated into DISCUSSION (lines 414427) of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. For the traps where adult flies were caught, we identified the species of the drosophilids as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We added these descriptions in MATERIALS AND METHODS (lines 511-512 and 560-562), and DISCUSSION (lines 378-379).

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of the genes showed low expression levels. The RNA-seq data of the yeast-fed larvae is shown in Author response Table 1. While a subset of genes exhibited significantly elevated expression in the nonsupportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than the difference in the nutritional conditions. Similar expression profiles were observed in the bacteria-fed larvae as well (data not shown). Therefore, it is difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences in the gene expression profiles of the larvae fed on different microbial species within bacteria or fungi, or between those fed on bacteria and those fed on fungi. For example, the gene expression profiles of larvae fed on the various supportive microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on various nonsupportive microbes.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria. Thus, it is challenging to discuss the potential differential impacts of yeast and bacteria on larval growth, if any.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisks in the label highlight “LAB + AAB” or “LAB” samples clustered separately from the other samples in those conditions; “” indicates a sample in a “LAB + AAB” condition (Lactiplantibacillus plantarum + Acetobacter orientalis), and “*” indicates a sample in a “LAB” condition (Leuconostoc mesenteroides). Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; Sa. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; BY4741, Saccharomyces cerevisiae BY4741 strain.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with the microbiota in food samples. We found that adult flies and early-stage foods, as well as larvae and late-stage foods, harbored similar microbial species (Figure 1F). Additionally, previous studies examining the gut microbiota in wild adult flies have detected microbes belonging to the same species or taxa as those isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we have cited the study by Dodge et al., 2023 in our revised manuscript and discussed the possibility that predominant microbes in adult flies may show a propensity for colonization (lines 410-413).

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). Based on our RNA-seq data, this is unlikely. The aforementioned study demonstrated that seven protease genes are upregulated through Imd pathway stimulation by a bacterium that promotes the larval growth. In our RNA-seq analysis, these seven genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response table 2A; Le. mes + A. ori in Author response table 2B). Rather, they exhibited a tendency to be upregulated by the presence of non-supportive microbes (St. bac or Pi. klu in Author response table 2A; La. pla in Author Response Table 2B).

      Author response table 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We have mentioned this point in DISCUSSION of the revised manuscript (lines 408-410).

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While we recognize the importance of comprehensive mechanistic analysis, elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be a subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second-instar stage. This observation implies that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We mentioned a previously reported interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 433-436). LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates, and the co-inoculation of AAB with LAB mutant strains defective in lactate production to assess both larval growth and continuous larval association with AAB. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations. The explanation of the universality of our findings has been included in the revised DISCUSSION (lines 391-397). We have also added descriptions on the implication of compositional shifts occurring in adult microbiota (lines 404413), possible inoculation routes of different microbes (lines 414-427), and hypotheses on the mechanism of larval growth promotion by yeasts (lines 469-476), all of which could be the focus of our future study.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect earlystage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This duration was determined from pilot experiments. A shorter collection time resulted in a lower likelihood of obtaining traps visited by adult flies, whereas a longer collection time caused overcrowding of larvae as well as deaths of adults from drowning in the liquid seeping out of the fruits. These procedural details have been included in the MATERIALS AND METHODS section of the revised manuscript (lines 523-526).

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We have included these possibilities in the DISCUSSION section of the revised manuscript (lines 417-421).

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestion regarding shifts in the adult microbiome. We have included in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages or after adult eclosion (lines 404-413).

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used?

      In this metabolomic analysis, LC-MS/MS with triple quadrupole MS monitors the formation of fragment ions from precursor ions specific to each target compound. The use of PFPP columns, which provide excellent separation of amino acids and nucleobases, allows chromatographic peaks of many structural isomers to be separated into independent peaks. In addition, all measured compounds are compared with data from a standard library to confirm retention time agreement. Structural isomers were separated either by retention time on the column or by compound-specific MRM signals (in fact, leucine and isoleucine have both unique MRM channels and column separations). Detailed MRM conditions are identical to the previously published study (Oka et al., 2017). These have been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 810-824).

      Were standard curves produced?

      Since relative quantification of metabolite amounts was performed in this study, no standard curve was generated to determine absolute concentrations. However, a standard compound of known concentration (single point) was measured to confirm retention time and relative area values.

      Were internal, deuterated controls used?

      Internal standards for deuterium-labeled compounds were not used in this study. This is because it is not realistic to obtain deuterium-labeled compounds for all compounds since a large number of compounds are measured. However, an internal standard (L-methionine sulfone) is added to the extraction solvent to calculate the recovery rate. This has been included in the revised ‘LC-MS/MS measurement’ section in MATERIALS AND METHODS (lines 824-825).

      Reviewer #1 (Recommendations For The Authors):

      Additional comments 1. The authors should do a better job of presenting their data. It took me quite a while to understand the protocol of Figure 1. Panel 1A, B, C could be improved. For instance, 1A suggests that flies are transferred to the lab while this is in fact the banana trap. Indicate 'Banana trap colonized by flies' rather 'wild-type flies in the trap'. 1C: should indicate that the food suspension comes from the banana trap. 1B,D,D: do not use pale color as legend. Avoid the use of indices in Figure 2 (Y1 rather than Y1). Grey colors are difficult to distinguish in Figure 2. Etc. It is a pain for reviewers that figure legends are on the verso of each figure and not just below.

      We thank the reviewer for the detailed suggestions to improve the clarity and comprehensibility of our figures. We have improved the figures according to the suggestions. As for the figure legends, we have placed them below each respective figure whenever possible.

      1. Clarify in the text if 'sample' means food substratum or flies/larvae (ex. line 116 and elsewhere).

      We have revised the word “sample” throughout our manuscript and eliminated the confusion.

      1. Line 170 - clarify what you mean by fermented food.

      We have replaced the “fermented larval foods” with “fermented bananas” in our revised manuscript (line 165).

      1. Line 199 - what is the meaning of 'stocks'.

      We have replaced the “stocks” with “strains” (line 195).

      1. Line 320 - explain more clearly what the yeast-conditioned banana-agar plate and cell suspension supernatant are, and what the goals of using these media are. This will help in understanding the subsequent text.

      We have added a supplemental figure illustrating the sample preparation for the metabolomic analysis (Figure S6), with the following legend describing the procedure (lines 1335-1346): “Sample preparation process for the metabolomic analysis. We suspected that the supportive live yeast cells may release critical nutrients for larval growth, whereas the non-supportive yeasts may not. To test this possibility, we made three distinct sample preparations of individual yeast strains (yeast cells, yeast-conditioned banana-agar plates, and cell suspension supernatants). Yeast cells were for the analysis of intracellular metabolites, whereas yeast-conditioned banana-agar plates and cell suspension supernatants were for that of extracellular metabolites. The samples were prepared as the following procedures. Yeasts were grown on banana-agar plates for 2 days at 25°C, and then scraped from the plates to obtain “yeast cells.” Next, the remaining yeasts on the resultant plates were thoroughly removed, and a portion from each plate was cut out (“yeast-conditioned banana agar”). In addition, we suspended yeast cells from the agar plates into sterile PBS, followed by centrifugation and filtration to eliminate the yeast cells, to prepare “cell suspension supernatants.”

      1. Figure 5 is difficult to understand. Provide more explanation. Consider moving the 'all metabolites panel' to Supp. Better explain what this holidic medium is.

      The holidic medium is a medium that has been commonly used in the Drosophila research community, which contains ~40 known nutrients, and supports the larval development to pupariation (Piper et al., 2014; Piper et al., 2017). We have introduced this explanation to the RESULTS section of the manuscript (lines 322-327). However, the scope of our research reaches beyond the analysis of the holidic medium components, because feeding the holidic medium alone causes a significant delay in larval growth, suggesting a lack of nutritional components (Piper et al., 2014). Thus, we believe the "All Metabolites" panels should be placed alongside the corresponding “The holidic medium components” panels.

      1. I could not access Figure 6 when downloading the PDF. The page is white and an error message appears - it is problematic to review a paper lacking a figure.

      We regret any inconvenience caused, perhaps due to a system error. Please refer to the Author response image 2, which is identical to Figure 6 of our original manuscript.

      Author response image 2.

      Supportive yeasts facilitate larval growth by providing nutrients, including branched-chain amino acids, by releasing them from their cells (Figure 6 from the original manuscript). (A and B) Growth of larvae feeding on yeasts on banana agar supplemented with leucine and isoleucine. (A) The mean percentage of the live/dead individuals in each developmental stage. n=4. (B) The percentage of larvae that developed into second instar or later stages. The “Not found” population in Figure 6A was omitted from the calculation. Each data point represents data from a single tube. Unique letters indicate significant differences between groups (Tukey-Kramer test, p < 0.05). (C) The biosynthetic pathways for leucine and isoleucine with S. cerevisiae gene names are shown. The colored dots indicate enzymes that are conserved in the six isolated species, while the white dots indicate those that are not conserved. Abbreviations of genera are given in the key in the upper right corner. LEU2 is deleted in BY4741. (D-G) Representative image of Phloxine B-stained yeasts. The right-side images are expanded images of the boxed areas. The scale bar represents 50 µm. (H) Summary of this study. H. uvarum is predominant in the early-stage food and provides Leu, Ile, and other nutrients that are required for larval growth. In the late-stage food, AAB directly provides nutrients, while LAB and yeasts indirectly contribute to larval growth by enabling the stable larva-AAB association. The host larva responds to the nutritional environment by dramatically altering gene expression profiles, which leads to growth and pupariation. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; Pi. klu, Pichia kluyveri; St. bac, Starmerella bacillaris; GF, germ-free.

      1. Line 323 - Consider rewriting this sentence (too long, explain what the holidic medium is and why this is interesting). "In the yeast-conditioned banana-agar plates, which were anticipated to contain yeast-derived nutrients, many well-known nutrients included in a chemically defined synthetic (holidic) medium for Drosophila melanogaster (Piper et al., 2014, 2017) were not increased compared to the sterile banana-agar plates; instead, they exhibited drastic decreases irrespective of the yeast species."

      We thank the reviewer's suggestion to improve the readability of our manuscript. We have rewritten the sentence in the revised manuscript (lines 320-328) as follows: “The yeastconditioned banana-agar plates were expected to contain yeast-derived nutrients. On the contrary, the result revealed a depletion of various metabolites originally present in the sterile banana agar (Figure 5A). This result prompted us to focus on the metabolites in the chemically defined (holidic) medium for Drosophila melanogaster Piper et al., 2014; Piper et al., 2017. This medium contains ~40 known nutrients, and supports the larval development to pupariation, albeit at the half rate compared to that on a yeast-containing standard laboratory food Piper et al., 2014; Piper et al., 2017. Therefore, the holidic medium could be considered to contain the minimal essential nutrients required for larval growth. Our analysis indicated a substantial reduction of these known nutrients in the yeast-conditioned plates compared to their original quantities (Figure 5B).”

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      1. It should be clearly shown (or stated) that isolated microbes, such as H. uvarum and Pa. agglomerans, are indigenous microbes in wild Drosophila melanogaster in their outdoor sampling.

      We thank the reviewer for the suggestions. Addressing the presence of isolated microbes within wild D. melanogaster adults is important, but cannot be feasible with our data for the following reasons. Our microbiota analysis of adults was conducted using pooled individuals of multiple Drosophila species, rather than using D. melanogaster exclusively. Moreover, the microbial isolation and the analysis of adult microbiota were carried out in two independent samplings (Figures 1A and 1E in the original manuscript, respectively). As a result, the microbial species detected in the adults were slightly different from those isolated from the food samples collected in the previous sampling. Nevertheless, it is worth noting that H. uvarum dominated in 2 out of the 3 adult samples, constituting >80% of the fungal composition. Pantoea agglomerans was not detected in the adults, although Enterobacterales accounted for >59% in 2 out of the 3 samples. Therefore, these isolated microbial species, or at least their phylogenetically related species, are presumed to be indigenous to wild D. melanogaster.

      If the reviewer’s suggestion was to state the dominance of H. uvarum and Pantoea agglomerans in early-stage foods, we have added a supplemental figure showing the species-level microbial compositions corresponding to Figure 1B of the original manuscript (Figure S1), and further revised the manuscript (lines 180-186).

      1. The reviewer supposes that the indigenous microbes of flies may differ from what they usually eat. In this study, the authors use banana-based food, but is it justified in terms of the natural environment of the places where those microbes were isolated? In other words, did sampled wild flies eat bananas outside the laboratory at Kyoto University?

      Drosophila spp. inhabit human residential areas and feed on various fermented fruits and vegetables. In the areas surrounding Kyoto University, they can be found in garbage in residential dwellings as well as supermarkets. In this regard, fruits are natural food sources of wild Drosophila in the area.

      Among various fruits, bananas were selected based on the following two reasons. Firstly, bananas were commonly used in previous Drosophila studies as a trap bait or a component of Drosophila food (Anagnostou et al., 2010; Stamps et al., 2012; Consuegra et al., 2020). Secondly, and rather practically, bananas can be obtained in Japan all year at a relatively low cost. Previous studies have used various fruits such as grapes (Quan and Eisen, 2018), figs (Pais et al., 2018), and raspberries (Cho and Rohlfs, 2023). However, these fruits are only available during limited seasons and are more expensive per volume than bananas. Thus, they were not practical for our study, which required large amounts of fruit-based culture media. We have included a brief explanation regarding this point in MATERIALS AND METHODS (lines 514-518).

      1. In Fig. 6B, the Leu and Ile experiment, is the added amount of those amino acids appropriate in the context that they mention "...... supportive yeasts had concentrations of both leucine and isoleucine that were at least four-fold higher than those of non-supportive yeasts"?

      We acknowledge that the supplementation should be carried out ideally in a quantity equivalent to the difference between the released amounts of supportive and non-supportive species. However, achieving this has been highly challenging. Previous studies determined the amount of amino acid supplementation by quantifying their concentration in the bacteriaconditioned media (Consuegra et al., 2020; Henriques et al., 2020). However, we found that quantifying the exact concentrations of the amino acids is not feasible with our yeasts. As shown in Figure 5B in the original manuscript, the amino acid contents were markedly reduced in the yeast-conditioned banana agar compared to the agar without yeasts, presumably because of the uptake by the yeasts. Thus, the amino acids released from yeast cells on the banana-agar plate are not expected to accumulate in the medium. As this reviewer pointed out, in the cell suspension supernatants of the supportive yeasts, concentrations of both leucine and isoleucine were at least four-fold higher compared to those of non-supportive yeasts (Figures 5G-H in the original submission), However, this measurement does not give the absolute amount of either amino acid available for larvae. Given these constraints, we opted for the amino acid concentrations in the holidic medium, which support larval growth under axenic conditions (Piper et al., 2014). We also showed that the supplementation of the amino acids at that concentration to the bananaagar plate was not detrimental to larval growth (Figures 6A-B in the original manuscript). These rationales have been included in the revised ‘Developmental progression with BCAA supplementation’ section in MATERIALS AND METHODS of our manuscript (lines 840-847).

      1. In addition to the above, it can be included other amino acids or nutrients as control experiments.

      As mentioned in our manuscript (lines 365-368), we did supplement other amino acids, lysine and asparagine, which failed to rescue the larval growth.

      1. In the experiment of Fig. 2E, how about examining larval development using heat-killed LAB or yeast with live AAB? The reviewer speculates that one possibility is that AAB needs nutrients from LAB.

      We did not feed larvae with heat-killed LAB and live AAB for the following reasons. LAB grows very poorly on banana agar compared to yeasts, and preparation of LAB required many banana-agar plates even when we fed live bacteria to larvae. Adding dead LAB to banana-agar tubes would require far more plates, but this preparation is impractical. Furthermore, heat-killing may not allow the investigation of the contribution of heat-unstable or volatile compounds.

      As for the reviewer's suggestion regarding the addition of heat-killed yeast with AAB, heat-killed yeast itself promotes larval growth, as shown in Figures 4G and 4H in the original manuscript, so the contribution of yeast cannot be examined using this method.

      Recommendations for improving the writing and presentation.

      1. It would be good to mention that during sample collection, other insects (other than Drosophila species) were not found in the food if this is true.

      Insects other than Drosophila spp. were found in several traps in the sampling shown in Figures 1C-F. These insects, rove beetles (Staphylinidae) and sap beetles (Nitidulidae), seemed to share a niche with Drosophila in nature. Therefore, we believe that the contamination of these insects did not interfere with our goal of obtaining larval food samples. We added these descriptions and explanations to MATERIALS AND METHODS (lines 527531).

      1. There are many different kinds of bananas. It should be mentioned the detailed information.

      We had included the information on the banana in MATERIALS AND METHODS section (line 622).

      1. Concerning the place of sample collection, detailed longitude, and latitude information can be provided (this is easily obtained from Google Maps). When the collection was performed should also be mentioned. This may suggest the environment of the "wild flies" they collected.

      We added a table listing the dates of our collections, along with the longitude and latitude of each sampling place (Table S1A).

      1. The reviewer could not find how the authors conducted heat killing of yeast.

      We added the following procedure to the ‘Quantification of larval development’ section in MATERIALS AND METHODS (lines 680-688). “When feeding heat-killed yeasts to larvae, yeasts were added to the banana-agar tubes and subsequently heated as following procedures. The yeasts were revived from frozen stocks on banana-agar plates, incubated at 25°C, and then streaked on fresh agar plates. After 2-day incubation, yeast cells were scraped from the plates and suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions were added to banana-agar tubes prepared as described, and after centrifugation at 3,000 x g for 5 min, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts, which compensates for the reduced amount due to their inability to proliferate. The tubes were subsequently heated at 80°C for 30 min before adding germ-free larvae.”

      1. The reviewer prefers that all necessary information on how to see figures be provided in figure legends. For example, an explanation of some abbreviations is missing.

      We carefully re-examined the figure legends and added necessary information.

      1. Many of the figures are not kind to readers, i.e., one needs to refer to the legends and main text very frequently. Adding subheadings (titles) to each figure may help.

      We added subheadings to our figures to improve the comprehensibility.

      Reviewer #3 (Recommendations For The Authors):

      I have some minor questions/suggestions about the manuscript that, if addressed, may increase the clarity and quality of the work.

      1. Please, when referring to microbial species in the abbreviated form, use only the first letter of the genus. For example, P. agglomerans should be used, not Pa. agglomerans.

      We are concerned about the potential confusion caused by using only the first letter of genera, since several genera mentioned in our work share the first letters, such as P (Pichia and Pantoea), S (Starmerella, Saccharomyces, and Saccharomycopsis), or L (Lactiplantibacillus and Leuconostoc). Therefore, we used only the unabbreviated form of the above seven genera in our revised manuscript. We have also made every effort to avoid abbreviations in our figures and tables, but found it necessary to retain two-letter abbreviations when spaces are particularly limiting.

      1. In lines 294-298, how exactly was the experiment where yeasts were killed by anti-fungal agents performed? If these agents killed the yeast, how was the microbial growth on plates required to have biomass for fly inoculation obtained? Please, clarify this section.

      The yeasts were grown on normal banana-agar plates before the addition onto the anti-fungal agents-containing banana agar. We added the following procedure to MATERIALS AND METHODS (lines 689-695). “When feeding yeasts on banana agar supplemented with antifungal agents, the yeasts were individually grown on normal banana agar twice before being suspended in PBS at the concentration of 400 mg of yeast cells in 500 µL of PBS. 125 µL of the suspensions was introduced onto the anti-fungal agents (10 mL/L 10% p-hydroxybenzoic acid in 70% ethanol and 6 mL/L propionic acid, following the concentration described in Kanaoka et al., 2023)-containing banana agar in 1.5 mL tubes. After centrifugation, the supernatants were removed. The amount of cells in each tube is ~50x compared to that when feeding live yeasts.”

      1. In lines 557-558, please clarify how rDNA copy numbers can be calculated in this way.

      Considering the results of the ITS and 16S sequencing analysis, it was highly likely that rDNAs from bananas and Drosophila were amplified along with microbial rDNA in this qPCR. To estimate the microbial rDNA copy number, we assumed that the proportion of microbial rDNA within the total amplification products remains consistent between the qPCR and the corresponding sequencing analysis, because the template DNA samples and amplified regions were shared between the analyses. Based on this, the copy number of microbial rDNA was estimated by multiplying the qPCR results with the microbial rDNA ratio observed in the ITS or 16S sequencing analysis of each sample. This methodology has been detailed in the MATERIALS AND METHODS section (lines 609-615).

      1. In lines 609-611, how did you check for cells left from the previous day? Microscopy? Or do you mean that if there was liquid still in the sample you would not add more bacterial cultures? Please, clarify.

      We observed with the naked eye from outside the tubes to determine if additional AAB should be introduced. Since we placed AAB on the banana agar in a lump, we examined whether the lumps were gone or not. We have added these procedures in MATERIALS AND METHODS (lines 671-673).

      1. In Figure 2A, it is hard to differentiate between the gray tones. Please, improve this.

      We have distinguished the plots for different conditions by changing the shape of the markers on the graphs.

      1. In the legend of Figure 4, line 1101, I believe the panel letters are incorrect.

      We have corrected the manuscript (lines 1241-1242) from “heat-killed yeasts on banana agar (H and I) or live yeasts on a nutritionally rich medium (J and K)” to “heat-killed yeasts on banana agar (G and H) or live yeasts on a nutritionally rich medium (I and J).”

      1. In Figure S1, authors showed that bananas that were not inoculated still had detectable rDNA signal. Is this really because bacteria can penetrate the peel? Or could this be the “reagent microbiome”? Alternatively, could these microbes have been introduced during sample prep, such as cutting the bananas?

      The detection of rDNA in bananas that were not inoculated with microbes was unlikely to be due to microbial contamination during experimental manipulation. The reviewer pointed out the possibility that the “reagent microbiome”, presumably the microbes in PBS, are detected from the uninoculated bananas. This seems to be unlikely, considering the PBS was sterilized by autoclaving before use. To ensure that no viable microbe was left in the autoclaved PBS, we applied a portion of the PBS onto a banana-agar plate and confirmed no colony was formed after incubation for a few days. DNA derived from dead microbes might be present in the PBS, but the PBS-added bananas were incubated for 4 days, so it is also unlikely that a detectable amount of DNA remained until sample collection. Furthermore, we believe that no contamination occurred during sample preparation. Banana peels were treated with 70% ethanol before removing them extremely carefully to avoid touching the fruit inside. All tools were sterilized before use. Taking all of these into account, we speculate that the microbes were already present in the bananas before peeling. We added the details of the sample preparation processes in MATERIALS AND METHODS (lines 518-521 and 540).

      Other major revisions

      1. We deposited our yeast genome annotation data in the DDBJ Annotated/Assembled Sequences database, and the accession numbers have been added to the ‘Data availability’ section in MATERIALS AND METHODS (lines 868-873).

      2. The bacterial composition data in Figure 1B was corrected, because in the original version, the data for Place 3 and Place 4 was plotted in reverse. The original and revised plots are shown side by side in Author response image 3. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p5, lines 117-120).

      Author response image 3.

      Comparison of the original and revised version of bacterial composition graph in Figure 1B. Comparison of the original (left) and revised (right) version of the graph at the bottom of Figure 1B, which shows the result of bacterial composition analysis. The color key, which is unmodified, is placed below the revised version.

      1. The plot data and labels in the RNA-seq result heatmaps (Figures 3A and 4C) have been corrected. In these figures, row Z-scores of log2(TPM + 1) were to be plotted, as indicated by the key in each figure. However, in the original version, row Z-scores of TPM was erroneously plotted. Thus, Figures 3A and 4C of the original version have been replaced with the correct plots, and the original and revised plots are shown side by side in Author response images 4A and 4B. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 222-226 and p9, lines 277-281).

      Author response image 4.

      Comparison of the original and revised version of Figures 3A and 4C. (A and B) Comparison of the original (left) and revised (right) version of Figures 3A (A) or 4C (B).

      1. The keys in the original Figures 3D and 4F indicate that log2(fold change) was used to plot all data. However, when plotting the data from the previous study (Zinke et al., 2002), their “fold change value” was used. We have corrected the keys, plots, and legend of Figure 3D to reflect the different nature of the data from our RNA-seq analysis and those from microarray analysis by Zinke et al. The original and revised plots are shown side by side in Author response image 5. We hope that the reviewers agree that this replacement of the plots does not affect our conclusion (p7, lines 228230 and p9, 277-284).

      Author response image 5.

      Comparison of the original and revised version of Figures 3D and 4F. (A and B) Comparison of the original (left) and revised (right) version of Figures 3D (A) or 4F (B).

      1. The labels in Figure S5C and S5D (Figure S4C and S4D in the original version) have been corrected (they are "Pichia kluyveri > Supportive" and "Starmerella bacillaris > Supportive" rather than "Non-support. > H. uva" and "Non-support. > K. hum"). Additionally, we have reintroduced the circle indicating the number of “dme04070: Phosphatidylinositol signaling system” DEGs in Figure S5D, which was missing in Figure S4D of the original version. The original and revised figures are shown in Author response image 6.

      Author response image 6.

      Comparison of the original and revised version of Figures S5C and S5D. (A and B) Comparison of the original (left) and revised (right) versions of Figures S5C (A) or S5D (B). The original figures corresponding to the aforementioned figures were Figures S4C and S4D, respectively.

      1. The "Fermentation stage" column in Table 1, which indicated whether each microbe was considered an early-stage microbe or a late-stage microbe, has been removed to avoid confusion. This is because some of the microbes (Hanseniaspora uvarum, Pichia kluyveri, and Pantoea agglomerans) were employed in both of the feeding experiments using the microbes detected from the early-stage foods (Figures 2A, 2B, S2A, and S2B) and those from the late-stage foods (Figures 2C, 2D, S2C, and S2D).

      2. The leftmost column in Table S7 has been edited to indicate species names rather than “Sample IDs,” because the IDs were not used in anywhere else in the paper.

      Reference

      Chandler, J. A., Lang, J., Bhatnagar, S., Eisen, J. A. and Kopp, A. (2011). Bacterial communities of diverse Drosophila species: Ecological context of a host-microbe model system. PLoS Genetics 7, e1002272.

      Chandler, J. A., Eisen, J. A. and Kopp, A. (2012). Yeast communities of diverse Drosophila species: Comparison of two symbiont groups in the same hosts. Applied and Environmental Microbiology 78, 7327–7336.

      Cho, H. and Rohlfs, M. (2023). Transmission of beneficial yeasts accompanies offspring production in Drosophila—An initial evolutionary stage of insect maternal care through manipulation of microbial load? Ecology and Evolution 13, e10184.

      Consuegra, J., Grenier, T., Akherraz, H., Rahioui, I., Gervais, H., da Silva, P. and Leulier, F. (2020). Metabolic Cooperation among Commensal Bacteria Supports Drosophila Juvenile Growth under Nutritional Stress. iScience 23, 101232.

      Dodge, R., Jones, E. W., Zhu, H., Obadia, B., Martinez, D. J., Wang, C., Aranda-Díaz, A., Aumiller, K., Liu, Z., Voltolini, M., et al. (2023). A symbiotic physical niche in Drosophila melanogaster regulates stable association of a multi-species gut microbiota. Nat Commun 14, 1557.

      Erkosar, B., Storelli, G., Mitchell, M., Bozonnet, L., Bozonnet, N. and Leulier, F. (2015). Pathogen Virulence Impedes Mutualist-Mediated Enhancement of Host Juvenile Growth via Inhibition of Protein Digestion. Cell Host & Microbe 18, 445–455.

      Hanson, M. A. and Lemaitre, B. (2020). New insights on Drosophila antimicrobial peptide function in host defense and beyond. Current Opinion in Immunology 62, 22–30.

      Henriques, S. F., Dhakan, D. B., Serra, L., Francisco, A. P., Carvalho-Santos, Z., Baltazar, C., Elias, A. P., Anjos, M., Zhang, T., Maddocks, O. D. K., et al. (2020). Metabolic cross-feeding in imbalanced diets allows gut microbes to improve reproduction and alter host behaviour. Nat Commun 11, 4236.

      Oka, M., Hashimoto, K., Yamaguchi, Y., Saitoh, S., Sugiura, Y., Motoi, Y., Honda, K., Kikko, Y., Ohata, S., Suematsu, M., et al. (2017). Arl8b is required for lysosomal degradation of maternal proteins in the visceral yolk sac endoderm of mouse embryos. Journal of Cell Science jcs.200519.

      Pais, I. S., Valente, R. S., Sporniak, M. and Teixeira, L. (2018). Drosophila melanogaster establishes a species-specific mutualistic interaction with stable gut-colonizing bacteria. PLOS Biology 16, e2005710.

      Piper, M. D. W., Blanc, E., Leitão-Gonçalves, R., Yang, M., He, X., Linford, N. J., Hoddinott, M. P., Hopfen, C., Soultoukis, G. A., Niemeyer, C., et al. (2014). A holidic medium for Drosophila melanogaster. Nature Methods 11, 100–105.

      Piper, M. D. W., Soultoukis, G. A., Blanc, E., Mesaros, A., Herbert, S. L., Juricic, P., He, X., Atanassov, I., Salmonowicz, H., Yang, M., et al. (2017). Matching Dietary Amino Acid Balance to the In Silico-Translated Exome Optimizes Growth and Reproduction without Cost to Lifespan. Cell Metab 25, 610–621.

      Quan, A. S. and Eisen, M. B. (2018). The ecology of the drosophila-yeast mutualism in wineries. PLOS ONE 13, e0196440.

      Solomon, G. M., Dodangoda, H., McCarthy-Walker, T. T., Ntim-Gyakari, R. R. and Newell, P. D. (2019). The microbiota of Drosophila suzukii influences the larval development of Drosophila melanogaster. PeerJ 7, e8097.

      Zinke, I., Schütz, C. S., Katzenberger, J. D., Bauer, M. and Pankratz, M. J. (2002). Nutrient control of gene expression in Drosophila: microarray analysis of starvation and sugar-dependent response. The EMBO Journal 21, 6162–6173.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their insightful and constructive comments of our work that have helped to strengthen the manuscript. In response to the additional suggestions provided by the reviewers, we have made revisions by adding or replacing five main figures, three supplementary figures, refining the text, and clarifying certain conclusions. Detailed responses to the reviewers’ points can be found below.

      Additional experiments, textual changes, or modulation of claims are needed to address weaknesses in the SOD1 portion of the study. Specifically:

      A) These studies require an assessment of the on-target efficacy of the inhibitors at the relevant concentration ranges. Ideally, they should have minimal effects against SOD1 knockout cell lines (an acute challenge at a time point before the growth defects become apparent) and show better efficacy in SOD1-overexpressing lines. Key experiments (changes in superoxide, OCR profiling, DNA alkaline comet assay) would be more convincing if they were carried out with SOD1 knockout lines to compare against the inhibitor effects (3-4 days after introducing sgSOD1 when growth defects are not apparent). In addition, SOD activity should be measured directly following inhibitor treatment.

      We agree with the reviewers that the on- vs. off-target effects of the pharmacologic SOD1 inhibitors is a critical point to address. We have validated that SOD activity is reduced following treatment with ATN-224 in Figure 2 – Figure supplement 1A.

      Nevertheless, we acknowledge that the potential for off-target effects of these inhibitors cannot be completely ruled out. To address this concern, we have incorporated a discussion regarding the potential off-target effects of both LCS-1 and ATN-224.

      B) Assays should be included to support that SOD1 activity is altered. ATN-224 and LCS-1 are used to inhibit SOD1 function in the majority of the experiments, which should be supported by SOD activity assays to confirm SOD inhibition. Further, the concentration of ATN-224 used in this paper (12.5 uM) is beyond the concentration of what has been reported to inhibit SOD1 function in human blood cells. In Figure 4D, the authors demonstrate comparable SOD1 total protein levels in WT and PPM1Dmutant cells. However, the authors should further address whether PPM1D-mutation alters SOD1 activity via SOD activity assays.

      We thank the reviewers for these suggestions. We have performed SOD activity assays which confirmed that SOD activity is inhibited upon treatment with ATN-224 at two concentrations (6.25 and 12.5 uM). Although we also did this for LCS-1-treated cells as well, in our hands, we did not see reduced SOD activity. However, LCS-1 has been shown to inhibit SOD activity in other publications including PMID: 21930909 and PMID: 32424294. From these assays, we have also found that PPM1D-mutant cells had increased SOD activity at baseline, despite having similar levels of SOD1 protein. These data have been added to Figure 2–Figure supplement 1A.

      C) Some conclusions are not fully supported by the data provided. The authors claimed that "upon inhibition of SOD1, there was an increase in ROS that was specific to the mutant cells" in Figure 2E. Comparison of ROS levels among untreated, ATN-224, and LCS-1 of PPM1D-mutant cells should have been made and the statistics analysis among these groups should have been provided. Moreover, in Figure 2-Figure Supplement 1E, LCS-1 treatment does not increase ROS levels in PPM1D mutant LCLs. Performing these experiments with control and SOD1 deletion cells would have strengthened the results. Along with this point, the authors should comment on why SOD2 is not identified as a top hit in the CRISPR screen, as SOD2 deletion accumulates superoxide in cells.

      After performing additional statistical analyses for Figure 2E, we found that the minor increase in ROS levels in the mutant cells after SOD1 inhibition was not statistically significant. We have revised the text accordingly.

      As for why SOD2 was not identified as a top hit, we postulate that this may be due to inherent dependency of the WT cell lines on SOD2.

      D) Fig. 1 - SOD1 appears to be clustered with several other genes in the volcano plot (including FANC proteins). Did any other ROS-detoxifying enzymes show similar fitness scores? The effects of the SOD1 sgRNA are striking, however, it would be useful to see qPCR or immunoblot data confirming robust depletion.

      Thank you for your suggestion. We have validated the loss of SOD1 protein expression after SOD1 sgRNA deletion by immunoblot and have added this data to Figure 1– figure supplement 1D. While other ROS-detoxifying enzymes were not significantly enriched in the top 37 hits, interestingly, the Fanconi Anemia pathway also has roles in counteracting oxidative stress. FA-deficient cells have mitochondrial dysfunction and redox imbalance, and several of the FA family proteins are implicated in mitophagy. Therefore, there may be an interesting interplay between SOD1 and the FA pathway that is worth highlighting in the discussion of our manuscript even though there was no experimental investigation performed.

      E) Fig. 2 - What are the relative SOD1 levels in the mutant PPM1D vs. WT. cell lines? The effects of the chemical inhibitors are stronger in MOLM-13 than in the other two lines. These data could also point to whether LCS-1 and ATN-224 cytotoxicity are on-target or off-target at these concentrations, which is a key issue not currently addressed in these studies. This is a particular concern as the OCI-AML2 line shows a stronger growth defect with CRISPR SOD1 KO (in Fig 1) but the smallest effects with these chemical inhibitors. The authors should also include SOD1 levels for Figure 1D and Figure 4Figure supplement 1C.

      SOD1 protein expression is similar between WT and PPM1D-mutant cell lines and the loss of SOD1 after SOD1 sgRNA deletion was validated by immunoblot. These data have been added to Figure 1- figure supplement 1D and Figure 4D.

      F) Does SOD1 co-expression in PPM1D-mutant patient AML correspond to poorer disease outcomes? This can be evaluated in publicly available patient datasets and would support the idea of SOD1 synthetic lethality.

      Unfortunately, there are no publicly available patient datasets with sufficient cases of de novo PPMDmutant AML to assess this question.

      G) While endogenous mitochondrial superoxide levels are elevated in PPM1D mutant lines, it is entirely unclear why SOD1 inhibition should affect mitochondrial superoxide as it detoxifies cytosolic superoxide. Also unclear why the DCFDA signal (which measures total hydroperoxides) is increased under SOD1 inhibition - SOD1 dismutates superoxide radicals into hydrogen peroxide, therefore unless SOD2 is compensating for SOD1 loss, one might expect hydroperoxides to be lower (unless some entirely different oxidase is increasing their levels). None of these outcomes appear to be considered. Finally, it is not explained how lipid peroxidation, which requires the production of hydroxyl or similarly high-potency radicals, is being caused by increased superoxide or peroxides. One possibility is there is an increase in labile iron, in which case this phenotype would be rescued by the iron chelator desferal, and by the lipophilic antioxidant, ferrostatin.

      We measured intracellular labile iron levels by flow cytometry by staining the cells with FerroOrange at baseline and after SOD1 inhibition with our pharmacologic inhibitors (ATN-224 at 12.5 uM and LCS-1 at 1.25 uM). Across the three leukemia cell lines, we saw variable results in iron levels with no appreciable patterns (see below). Therefore, we cannot make conclusions about the contribution of labile iron to our observed phenotypes.

      Author response image 1.

      H) Do the sgSOD1 cells also show similar increases in MitoSox green, DCFDA, and BODIPY signal? These experiments would clarify whether the effects of the inhibitors are directly related directly to SOD1 loss or if they represent off-target effects from the inhibitors and/or compensatory changes in SOD2.

      We do not observe changes in SOD2 in the several contexts in which we have examined this. We cannot exclude off-target effects of the inhibitors so have clarified this in the text.

      I) The authors may want to assess whether Rac1 or NADPH oxidase activity is altered in the SOD1 KO in WT vs. PPM1D cells. Their results may be the consequence of compromised ROS-driven survival signaling or DNA repair rather than direct ROS-induced damage, which is not caused directly by superoxide (or hydrogen peroxide).

      We appreciate the reviewer’s recommendations. However, due to time constraints, we regret not being able to assess Rac1 or NADPH oxidase activity. Nevertheless, we recognize the possibility of altered ROS-driven signaling rather than ROS-induced damage as a driver of our phenotype and have incorporated this possibility into our discussion.

      J) Fig. 3 - the effects on mitochondrial respiratory parameters, while statistically significant, do not seem biologically striking. Also, these data are shown for OCI-AML2 cells which show the smallest cytotoxic effects with the SOD1 inhibitors among the 3 lines tested. They do however show the most robust growth defect with sgSOD1. This discrepancy could suggest that mitochondrial dysfunction does not underlie the observed growth defect and/or the inhibitor cytotoxicity is not on-target. Ideally, mitochondrial profiling should also be carried out on this cell line with inducible SOD1 depletion. Have the authors assessed whether the mitochondrial Bcl family proteins are affected by the inhibitors?

      We assessed a few members of the mitochondrial Bcl-family proteins including MCL-1, BCL-2, and BCL-XL during the revision process. PPM1D-mutant cells have mildly increased expression of these anti-apoptotic proteins at baseline and the expression is not altered by pharmacologic SOD1 inhibition (see Author response image 2 below). Due to time constraints, we were unable to perform seahorse assays and mitochondrial profiling in the SOD1-deletion cells.

      Author response image 2.

      K) Fig. 4 - Currently the data in this figure do not support the authors' claim that PPM1D-mutant cells have impaired antioxidant defense mechanisms, leading to an elevation in ROS levels and reliance on SOD1 for protection. It should be noted that oxidative stress specifically refers to adverse cellular effects of increasing ROS, not baseline levels of various redox parameters. Ideally, levels of GSSG/GSH would be a better measure of potential redox stress tolerance than the total antioxidant capacity assay. Finally, oxidative stress can be assessed by challenging the wt and mutant PPM1D cell lines with oxidant stressors such as paraquat which elevates superoxide, or drugs like erastin which elevate mitochondrial ROS. The immunoblot shows negligible changes in the antioxidant proteins assayed. Again, this blot should include SOD2 which is the most relevant antioxidant in the context of mitochondrial superoxide.

      We measured intracellular glutathione levels by flow cytometry and found that PPM1D-mutant cells had a greater proportion of cells with low levels of GSH. This data has been added as Figure 4D. We have also repeated the western blot to look at the antioxidant proteins catalase, SOD1, and thioredoxin after SOD1-deletion and pharmacologic SOD1 inhibition. We evaluated SOD2 protein levels in these experiments, as suggested. Smooth muscle actin (SMA) is included in the antibody cocktail as a loading control. However, it is unclear to us as to why PPM1D-mutant cells consistently have significantly higher levels of SMA. Therefore, we included a separate loading control, Vinculin. Repeat of these western blots showed a clearer difference between WT and PPM1D-mutant cells in the levels of these antioxidant proteins in which PPM1D-mutant cells have decreased levels of catalase and thioredoxin. These blots also show that SOD2 levels may be mildly increased in the PPM1D-mutant cells at baseline but is not significantly upregulated upon SOD1 inhibition. We have replaced the original immunoblot from Figure 4D with the revised blots that more clearly demonstrate the reduced levels of catalase and thioredoxin, now figure 4E.

      L) Fig. 5 - These data support that DNA breaks are elevated in PPM1D mutant vs. wt cells. However, the data with the chemical SOD1 inhibitor again do not convince us that the enhanced levels are due to on-target effects on SOD1. Use of the alkaline comet assay is appropriate for these studies and the 8-oxoguanine data do indicate contributions from oxidative DNA base damage. But these are unlikely to result directly from altered superoxide levels, as this species cannot directly oxidize DNA bases or cause DNA strand breaks.

      Thank you to the reviewers for raising this point. We have performed comet assays in SOD1-deletion cells to look at levels of DNA damage. Consistent with the reviewers’ point, we do not see a significant increase in DNA breaks after SOD1 deletion. We have removed the data using the SOD1 inhibitor and instead show the COMET analysis in the PPM1D-mut and SOD1-KO cells (see Figure 5F). We now make the point that increased DNA damage with SOD1 loss cannot explain the vulnerability of the double-mutant cells.

      M) Instead of using NAC, which elevates glutathione synthesis but also has several known side effects, the authors may want to determine whether Tempol, a SOD mimetic can rescue the effects of SOD1 knockout or inhibition. This would directly prove that SOD1 functional loss underlies the observed growth defect and cytotoxicity from genetic SOD1 knockdown or chemical inhibition.

      This is an excellent suggestion; we have added comments to this effect into the discussion.

      N) It is recommended the discussion focus more strongly on how the signaling function of superoxide vs. its reactions with other molecular entities to induce genotoxic outcomes could be contributing to the observed phenotypes. The discussion of FANC proteins, which were targets with similar fitness scores but not experimentally investigated at all, is an unwarranted digression.

      Thank you for this recommendation. We have expanded the discussion to focus more on the signaling functions of superoxide. However, considering the role of the Fanconi Anemia pathway in mitigating DNA damage and oxidative stress, we believe the discussion on the FANC proteins is important due to the possible intersection with SOD1. Therefore, we have refined this portion discussion to focus more on the interplay between SOD1 and FA.

      O) The complete lack of consideration of SOD2 in these studies is a missed opportunity as it reduces mitochondrial superoxide levels but elevates hydrogen peroxide levels. It would be very interesting to see whether SOD1 inhibition leads to compensatory increases in SOD2. SOD2 can be easily measured by immunoblot. Furthermore, measuring total superoxide via hydroethidium in a flow cytometric assay vs. mitochondrial ROS in PPM1D mut vs. wt cells and under SOD1 knockout would enable a determination of which species dominates (cytosolic or mitochondrial). These experiments are required to fill some logical gaps in the interpretation of their redox data.

      During the revision process, we have included SOD2 in our studies and have found that loss of SOD1 via genetic deletion and pharmacologic inhibition does not lead to compensatory increases in SOD2 (Figure 4D). Additionally, we have measured cytoplasmic superoxide levels using dihydroethidium to differentiate between cytoplasmic vs. mitochondrial superoxide. We found that at baseline levels, the mutant cells also harbored more cytoplasmic superoxide. We have added this figure as Figure 2C and moved the original mitochondrial superoxide data to Figure 2-figure supplement 1C.

      P) Given the DNA breaks observed in PPM1D mutant cells, it is highly recommended that the authors assess whether iron levels are elevated in mut vs. wt cells and whether desferal can rescue observed SOD1 inhibition defects. Also, it has been reported that PPM1D promotes homologous recombination by forming a stable complex with BRCA1-BARD1, thereby enhancing their recruitment to doublestrand break sites. The authors should comment on why there is no difference in repair via HR in WT and PPM1D mutant cells in Figure 5C.

      Please see comment G regarding our findings about iron levels.

      The reviewers pose an interesting question as to why there is no difference in HR repair between WT and mutant cells, given the reported role of PPM1D in promoting HR. We have addressed this question in the main text. We believe that several factors can limit the extent of HR enhancement in PPM1D-mutant cells. For example, HR is typically confined to the S/G2 phase and thus may be constrained by cell cycling, among other regulatory mechanisms.

      Other comments:

      A) The authors described in the Method section that "The CRISPR Screen PPM1D mutant Cas9expressing OCI-AML2 cell lines were transduced with lentivirus library supernatant." The authors need to provide information on whether the MOI of the CRISPR screen has been well controlled to ensure that the majority of the cell population has a single copy of sgRNA transduction.

      We performed a lentiviral titer curve prior to the screen to determine the volume of viral supernatant to add for a multiplicity of infection (MOI) of 0.3. This important detail has been added to our Methods.

      B) The study convincingly shows differences between parental leukemic cells and the PPM1D mutants but one important control is missing in experiments related to Fig. 2 and 3. All PPM1D mutant clones used in this study were subjected to the blasticidin selection of the transduced cells to generate cells stably expressing Cas9 and subsequently, the clones with successful PPM1D targeting were expanded. The authors should demonstrate that increased ROS production is not just a consequence of the lentiviral transduction and antibiotic selection and that it corresponds to increased PPM1D activity in PPM1D mutant cells. To do that, authors could compare PPM1D clones to parental cells that underwent the same selection procedure (OCI-AML2-Cas9 cells and OCI-AML3-Cas9 cells).

      It is true that the parental OCI-AML2 and OCI-AML3 cell lines underwent four days of blasticidin selection to create the stably expressing Cas9 cell lines. However, after the four-day period, the blasticidin was removed from the cell culture media. From there, we induced the PPM1D-mutations into the Cas9-expressing “WT” cell lines using the RNP-based CRISPR/Cas9 delivery method and single cells were then sorted into 96-well plates. Clones were expanded and validated using Sanger sequencing, TIDE analysis, and western blot. In all of our assays, we compare the WT Cas9 cells to the PPM1D-mutant Cas9 cells. Additionally, the cells have been expanded and passaged several times after blasticidin-selection. Therefore, we believe it is unlikely that there are residual ROSinducing effects from the antibiotic treatment.

      C) The authors mention that they identified 3530 genes differentially expressed in parental and PPM1D mutant cells (line 267) but it is unclear what was the threshold for statistical significance. They mention FDR<0.05 in the Methods but show GSEA analysis with FDR<0.25 in Figure 4A. Source data for Fig. 4 is missing and the list of differentially expressed genes is not shown.

      The source data files for Figures 1 and 4 will be uploaded with the revised manuscript. Upon reviewing the source data, we noticed an error in the number of differentially expressed genes. We have corrected this in line 274 and you will see that this correlates with Figure 4-source data 1. For the thresholds, we used an FDR<0.05 for the differential gene expression analysis, and an FDR <0.25 in the GSEA, which is an appropriate threshold for GSEA. We have clarified these thresholds in the methods section.

      D) Include a definition of MFI in Figure legend Fig.2 and also in the Methods section. The unit should be indicated at both the x and y axes.

      We have defined MFI in the figure legends and methods sections and have updated the figures accordingly.

      E) Legend to Figure 2 - Figure Supplement 1 E should define the grey and pink columns (likely WT and mutants LCLs).

      Thank you. We have defined the grey and pink columns as WT and PPM1D-mutant cell lines, respectively for Figure 2 – Figure supplement 2D and E.

      F) Reporter assays in Fig. 5 convincingly show that NHEJ capacity is reduced in PPM1D mut cells. In the text, the authors state that this might reflect the impact of PPM1D on LSD1 (line 365). Although this might be the case, other options are equally possible. It would be appropriate to include a reference to the ability of PPM1D to counteract gH2AX and ATM which generate the most upstream signals in DDR.

      Thank you to the reviewers for raising this excellent point. We have revised the text to incorporate the impact of PPM1D on yH2AX and ATM on NHEJ.

      G) The authors correctly state that truncation of PPM1D leads to protein stabilization (line 85) and that it is present in U2OS cells (line 355). These observations have first been reported by Kleiblova et al 2013 and therefore one reviewer believes that this reference should be included. This study also identified truncating PPM1D mutation in colon adenocarcinoma. HCT116 cells and the role of PPM1D mutation in promoting the growth of colon cancer has subsequently been tested in an animal model (Burocziova et al., 2019).

      Thank you. We have added this reference to our text in line 360.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.

      Strengths:

      It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.

      Weaknesses:

      Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of reepithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.

      Thank you for your thoughtful review and acknowledgment of the thoroughness of our analysis.

      First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study.

      We acknowledge your concerns regarding the limitations of our study, particularly regarding the small number of mice per group and the examination of only one time point post-wounding. We agree that a more comprehensive analysis across multiple time points would provide a deeper understanding of the temporal changes induced by infection. While our primary focus in this study was to elucidate the foundational responses to bacteria-infected wounds, we attempted to augment our analysis by incorporating publicly available datasets of similar nature. However, these datasets lacked power in terms of cell number and populations. Nonetheless, we have bolstered our analysis by applying a crossentropy test on the integrated dataset and reporting its significance (Figure S1F), ensuring the robustness of our single-cell RNA sequencing datasets.

      Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study.

      We also recognize the significance of comparing infected wounds to unwounded skin to establish a baseline for transcriptional changes. While we attempted to incorporate publicly available unwounded skin samples into our analysis, we encountered limitations in the number of cells, particularly within the immune population. This constraint is addressed in the Limitations section of the manuscript.

      Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds.

      Regarding the concern about differences between murine and human wound healing mechanisms, we took measures during tissue isolation to mitigate this issue, extracting incisions of the wounds rather than contracted tissues. Despite the primary mode of wound closure in mice being contraction, we believe our analysis still offers valuable insights into cellular responses to infection relevant to human wound healing.

      We appreciate your constructive criticism of our study. Despite these constraints, we believe our work provides valuable insights into the transcriptional changes induced by infection in healing wounds.

      Reviewer #2 (Public Review):

      Summary:

      The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.

      Strengths:

      The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.

      Weaknesses:

      The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We are thankful for your acknowledgment of the thoroughness of our analysis and the cautious nature of our conclusions.

      The analysis is purely descriptive, and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing.

      Regarding your concern about the purely descriptive nature of our analysis and the lack of functional validation of identified factors, we agree on the importance of understanding the functional roles of transcriptional changes in wound healing. To address this limitation, we plan to conduct functional experiments, such as perturbation assays or in vivo validation studies, to validate the roles of specific factors identified in our analysis.

      The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We acknowledge the importance of comparing wounded tissue to unwounded skin to establish a baseline for understanding transcriptional changes. This point is noted and acknowledged in the limitations section of our manuscript.

      We appreciate your feedback and assure you that we will consider your suggestions in future iterations of our research.

      Recommendations For The Authors:

      We are grateful for the positive overall assessment of our revised work by the reviewers. Critical comments on specific aspects of our work are listed verbatim below followed by our responses.

      Reviewer 1 (Recommendations for the Authors):

      (1) The figures are a bit cluttered and hard to parse out. The different parts of the figure seem to be scattered all over the place with no consistent order.

      Thank you for your feedback regarding the figures in our manuscript. We acknowledge your concern that some panels may appear cluttered and challenging to navigate. In response, we made concerted efforts to declutter certain panels, taking into account page size constraints and ensuring a minimum font size for readability.

      (2) I didn't really understand what the last sentence on page 6 meant. Is this meant to say that these could be biomarkers of infection?

      We thank the reviewer for noting this lack of clarity. We revised the statement.

      Updated manuscript (lines 111-113)

      “Overall, the persistent E. faecalis infection contributed to higher Tgfb1 expression, whilst Pdgfa levels remained low, correlating with delayed wound healing.”

      (3) >(3) A reference on page 19 didn't format correctly.

      We thank the reviewer for catching the typos. We corrected the reference formatting.

      Updated manuscript (lines 503-505)

      “We confirm the immune-suppressive role of E. faecalis in wound healing, consistent with previous findings in different experimental settings (Chong et al., 2017; Kao et al., 2023; Tien et al., 2017).”

      (4) The title doesn't really address the scope of the finding which goes beyond immunomodulatory.

      The reviewer is correct! We therefore revised the title to cover all aspects of the study as:

      “Decoding the complexity of delayed wound healing following Enterococcus faecalis infection”

      Reviewer 2 (Recommendations for the Authors):

      (1) On page 6, the expression of Tgfb1 is described as "aggravated" by wounding alone. I am not sure whether this means Tgfb1 levels are increased or decreased. It appears from the data that it is increased, which was confusing to me since I interpreted "aggravated" as meaning decreased. So perhaps a different more straightforward word could be used to describe the data.

      We modified this ambiguous statement to:

      Updated manuscript (lines 105-106)

      “By contrast, wounding alone resulted in higher transforming growth factor beta 1 (Tgfb1) expression.”

      (2) On page 7, the authors state that "cells from infected wounds...demonstrated distinct clustering patterns compared to cells from uninfected wounds (Figure S1F)" but when I look at the data in this figure, I cannot really see a difference. Perhaps the differences could be more clearly highlighted?

      Thank you for pointing out this issue. We appreciate the reviewer's comment. We utilized the crossentropy test for statistical comparison, employing UMAP embedding space data. While the data underwent batch correction based on infection status, the UMAP plots for each condition may appear visually similar. However, it's important to note that the number of cells per clusters between the infected and uninfected conditions varies significantly. This aspect influences the selection of points (cells) and their nearest neighbours for statistical testing within each cluster in the embedding space. To address this concern, we have included a table indicating the number of cells per cell type alongside the plot (Figure S1F), providing additional context for the interpretation of our results.

      Author response table 1.

      Author response image 1.

      (3) On page 8, Zeb2hi cells are described as "immunosuppressive" and yet the genes are highlighted to express in include Cxcl2 and IL1b which I would classify as inflammatory, not immunosuppressive. Can the authors be a bit more clear on why they describe the phenotype of these cells as "immunosuppressive"?

      We agree with the reviewer that this is a bit counterintuitive. Conventionally, CXCL2 is thought to be chemoattractant for neutrophil recruitment. However, the infection-specific keratinocyte cluster expressing Cxcl2, Il1b, Wfdc17 along with Zeb2 and Thbs1 indicate their myeloid-derived suppressor cell-like features, which play immunosuppressive roles during infection and in cancer (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021).

      Updated manuscript (lines 159-163)

      “As the barrier to pathogens, keratinocytes secrete a broad range of cytokines that can induce inflammatory responses (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021). However, Zeb2hi keratinocytes co-expressing Cxcl2, Il1b, and Wfdc17, indicate myeloidderived suppressor cell-like phenotype which implies an immunosuppressive environment (Hofer et al., 2021; Veglia et al., 2021).”

      (4) On pages 8-9, Keratinocytes are described to express MHC class II. I find this quite unexpected since class II is usually thought to be expressed primarily by APCs such as DCs and B cells. Is there a precedent for keratinocytes to express class II? The authors should acknowledge that this is unexpected and in need of further validation, or support the claim with references in which class II expression has been previously observed on keratinocytes (and is thus not unexpected)

      Although MHC class II expression is predominantly on immune cells, an antigen-presenting role for keratinocytes has been reported in many studies (Banerjee et al., 2004; Black et al., 2007; Carr et al., 1986; Gawkrodger et al., 1987; Jiang et al., 2020; Li et al., 2022; Oh et al., 2019; Tamoutounour et al., 2019). Therefore, antigen-presenting role of keratinocytes is known and expected, and we think that this should be further investigated in in the context of wound infection.

      Updated manuscript (lines 177-179)

      “These genes are associated with the major histocompatibility complex (MHC) class II, suggesting a self-antigen presenting keratinocyte population, which have a role in costimulation of T cell responses (Meister et al., 2015; Tamoutounour et al., 2019).”

      REFERENCES

      Alshetaiwi, H., Pervolarakis, N., McIntyre, L. L., Ma, D., Nguyen, Q., Rath, J. A., Nee, K., Hernandez, G., Evans, K., Torosian, L., Silva, A., Walsh, C., & Kessenbrock, K. (2020). Defining the emergence of myeloid-derived suppressor cells in breast cancer using single-cell transcriptomics. Science Immunology, 5(44), eaay6017. https://doi.org/10.1126/sciimmunol.aay6017

      Banerjee, G., Damodaran, A., Devi, N., Dharmalingam, K., & Raman, G. (2004). Role of keratinocytes in antigen presentation and polarization of human T lymphocytes. Scandinavian Journal of Immunology, 59(4), 385–394. https://doi.org/10.1111/j.0300-9475.2004.01394.x

      Black, A. P. B., Ardern-Jones, M. R., Kasprowicz, V., Bowness, P., Jones, L., Bailey, A. S., & Ogg, G. S. (2007). Human keratinocyte induction of rapid effector function in antigen-specific memory CD4+ and CD8+ T cells. European Journal of Immunology, 37(6), 1485–1493. https://doi.org/10.1002/eji.200636915

      Carr, M. M., McVittie, E., Guy, K., Gawkrodger, D. J., & Hunter, J. A. (1986). MHC class II antigen expression in normal human epidermis. Immunology, 59(2), 223–227.

      Gawkrodger, D. J., Carr, M. M., McVittie, E., Guy, K., & Hunter, J. A. (1987). Keratinocyte expression of MHC class II antigens in allergic sensitization and challenge reactions and in irritant contact dermatitis. The Journal of Investigative Dermatology, 88(1), 11–16. https://doi.org/10.1111/1523-1747.ep12464641

      Jiang, Y., Tsoi, L. C., Billi, A. C., Ward, N. L., Harms, P. W., Zeng, C., Maverakis, E., Kahlenberg, J. M., & Gudjonsson, J. E. (2020). Cytokinocytes: The diverse contribution of keratinocytes to immune responses in skin. JCI Insight, 5(20), e142067, 142067. https://doi.org/10.1172/jci.insight.142067

      Li, D., Cheng, S., Pei, Y., Sommar, P., Kärner, J., Herter, E. K., Toma, M. A., Zhang, L., Pham, K., Cheung, Y. T., Liu, Z., Chen, X., Eidsmo, L., Deng, Q., & Xu Landén, N. (2022). Single-Cell Analysis Reveals Major Histocompatibility Complex II‒Expressing Keratinocytes in Pressure Ulcers with Worse Healing Outcomes. The Journal of Investigative Dermatology, 142(3 Pt A), 705–716. https://doi.org/10.1016/j.jid.2021.07.176

      Oh, S., Chung, H., Chang, S., Lee, S.-H., Seok, S. H., & Lee, H. (2019). Effect of Mechanical Stretch on the DNCB-induced Proinflammatory Cytokine Secretion in Human Keratinocytes. Scientific Reports, 9(1), 5156. https://doi.org/10.1038/s41598-019-41480-y

      Siriwach, R., Ngo, A. Q., Higuchi, M., Arima, K., Sakamoto, S., Watanabe, A., Narumiya, S., & Thumkeo, D. (2022). Single-cell RNA sequencing identifies a migratory keratinocyte subpopulation expressing THBS1 in epidermal wound healing. iScience, 25(4), 104130. https://doi.org/10.1016/j.isci.2022.104130

      Tamoutounour, S., Han, S.-J., Deckers, J., Constantinides, M. G., Hurabielle, C., Harrison, O. J., Bouladoux, N., Linehan, J. L., Link, V. M., Vujkovic-Cvijin, I., Perez-Chaparro, P. J., Rosshart, S. P., Rehermann, B., Lazarevic, V., & Belkaid, Y. (2019). Keratinocyte-intrinsic MHCII expression controls microbiota-induced Th1 cell responses. Proceedings of the National Academy of Sciences of the United States of America, 116(47), 23643–23652. https://doi.org/10.1073/pnas.1912432116

      Veglia, F., Sanseviero, E., & Gabrilovich, D. I. (2021). Myeloid-derived suppressor cells in the era of increasing myeloid cell diversity. Nature Reviews. Immunology, 21(8), 485–498. https://doi.org/10.1038/s41577-020-00490-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors investigate the tolerance of aminoglycosides in E. coli mutants deleted in the Krebs cycle and respiratory chain enzymes. The motivation for this study is unclear. Transport of aminoglycosides is pmf-dependent, as the authors correctly note, and knocking out energy-producing components leads to tolerance of aminoglycosides, this has been well established. In S. aureus, clinically relevant "small colony" strains selected for in the course of therapy with aminoglycosides acquire null mutations in the biosynthesis of heme or ubiquinone, and have been studied in detail. In E. coli, such knockouts have not been reported in clinical isolates, probably due to severe fitness costs.

      Response: We sincerely appreciate the time and consideration the reviewer dedicated to evaluating our manuscript. It's important to highlight that while the transport of aminoglycosides is PMF-dependent, recent studies underscore the potential role of metabolic mutations in antibiotic tolerance, a facet that warrants further investigation. For instance, the study by Henimann’s and Michiels' groups explored genomic changes in E. coli strains (including uropathogenic UTI89 strains) subjected to daily antibiotic exposure (Van den Bergh et al., 2022). Notably, mutations predominantly occurred in genes of the nuo operon, a key component of E. coli energy metabolism, suggesting a link between metabolic adaptations and antibiotic tolerance. Furthermore, the research by Collin's group revealed previously unrecognized genes related to central metabolism (e.g., icd, gltD, sucA) that contribute to antibiotic resistance in E. coli cells exposed to multiple antibiotics, including aminoglycosides (Lopatkin et al., 2021). These findings are corroborated by the presence of similar mutations in clinical E. coli pathogens, as evidenced by the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection (Lopatkin et al., 2021). The clinical relevance of metabolic mutations in antibiotic tolerance is increasingly recognized, yet their underlying mechanisms remain enigmatic. Therefore, elucidating the role of metabolic pathways in conferring antibiotic tolerance is highly critical. We have updated the introduction to clearly convey our motivation in this study (see page 4).

      At the same time, single-cell analysis has shown that individual cells with a decrease in the expression of Krebs cycle enzymes are tolerant of antibiotics and have lower ATP (Manuse et al., PLoS Biol 19: e3001194). The authors of the study under review report that knocking out ICD, isocitrate dehydrogenase that catalyzes the rate-limiting step in the Krebs cycle, has little effect on aminoglycoside tolerance and actually leads to an increase in the level of ATP over time. This observation does not seem to make much sense and contradicts previous reports, specifically that E. coli ICD is tolerant of antibiotics and, not surprisingly, produces Less ATP (Kabir and Shimizu, Appl Micro-biol Biotechnol. 2004; 65(1):84-96; Manuse et al., PLoS Biol 19: e3001194). Mutations in other Krebs cycle enzymes, unlike ICD, do lead to a dramatic increase in tolerance of aminoglycosides according to the paper under review. This is all very confusing.

      Response: Although our data cannot be directly compared to that of Kabir and Shimizu (Mohiuddin Kabir and Shimizu, 2004), due to the utilization of entirely different experimental procedures and measurement techniques, we can draw some parallels to the study conducted by Lewis’ group (Manuse et al., 2021), despite certain differences in experimental protocols. Furthermore, the reviewer has made strong assertions regarding our manuscript based on the findings of Lewis’ group. Thus, we believe it's pertinent to expand our response regarding that study.

      In the study of Lewis’ group, bacterial cells were inoculated at a ratio of 1:100 into LB medium from an overnight culture (approximately 16 hours). Subsequently, the cultures were incubated at 37°C for approximately 2 hours, and ATP levels were measured using the BacTiter Glo kit (Promega, Madison, WI, USA). ATP levels were then normalized to cell density, determined through optical density measurements, and represented on a linear diagram. As demonstrated in Supplementary Figure S1c of their paper, there was a 10-15% reduction in normalized ATP levels in the icd mutant compared to the wild type. In our experiments, cells were grown for 24 hours in overnight cultures, diluted 100-fold in fresh media, and ATP levels were measured at 3, 4, 5, and 6 hours using the same kit. ATP levels were normalized to cell counts quantified by flow cytometry. Upon analyzing our data of the icd mutant for around 3 hours (the time point closest to that of the study of Lewis’ group), we observed a reduction of approximately 15-20% (without statistical significance) in the icd mutant compared to the wild-type (see raw data, linear plot, and logarithmic plot below; Author response image 1), which aligns with the findings of Lewis’ group.

      We further investigated the gentamicin tolerance of both wild-type and icd mutant strains of E. coli BW25113 (Author response image 2). Our findings indicate that the increased sensitivity of the icd mutant of the MG1655 strain to gentamicin is similar to the observation in the other E. coli strain.

      Author response image 1.

      ATP levels in the icd mutant. ATP levels of both the mutant and wild-type strains were measured at t=3 hours of cell growth and normalized to cell counts. The figure presents the raw data (a), linear plot (b), and logarithmic plot (c) of the same dataset. This data corresponds to the first panel of Figure 3B in the manuscript.

      Author response image 2.

      Gentamicin tolerance of wild-type and icd mutant strains of E. coli BW25113. Both wild type and mutant strains were treated with gentamicin (50 µg/ml) for 5 hours at the mid-exponential phase. Cells were plated before and after treatment for CFU/ml counts. The dashed line represents the limit of detection. CFU: Colony forming units.

      We think that there are two primary reasons why our study cannot contradict the findings of the Lewis group:

      Firstly, our study cannot be directly compared to theirs, as they did not comprehensively explore the impact of gene deletions on cell metabolism beyond the measurement of ATP levels at a single time point (Manuse et al., 2021). Our study encompasses various metabolic parameters such as cellular ATP, redox status, proton motive force (PMF), intracellular pH, and drug uptake throughout the exponential and/or early stationary phase. Additionally, we conducted proteomic analysis for five different strains including mutants and wild type. Moreover, we performed pathway enrichment analysis grounded in the statistical background of the entire genome, encompassing various functional pathway classification frameworks such as Gene Ontology annotations, KEGG pathways, and Uniprot keywords. The results of these pathway enrichment analyses are now available in the Supplementary File (see Supplementary Tables 11-17 in the current manuscript). Thus, we believe it is unjust to deem our study contradictory compared to the Lewis group's study, which does not have a comprehensive analysis of the metabolism of the mutant strains they investigated.

      Secondly, our study cannot be compared to that specific study (Manuse et al., 2021) due to the utilization of a distinct antibiotic (ciprofloxacin). Cell tolerance is heavily reliant on the mechanism of action of the antibiotic used. Therefore, the reviewer should have focused on studies closely related to aminoglycoside tolerance. Our study is not confusing or contradictory, as Lewis’ group also demonstrated that the tolerance of the icd mutant to gentamicin was significantly reduced while the tolerance of other TCA cycle mutant strains was increased in a different study (Shan et al., 2015). However, they did not delve into the metabolism of these mutant strains, as we did. We now mention this point in our manuscript (see pages 14-15).

      Apart from the confusing data, it is not clear what useful information may be obtained from the choice of the experimental system. The authors examine exponentially growing cells of E. coli for tolerance of aminoglycosides. The population at this stage of growth is highly susceptible to aminoglycosides, and only some rare persister cells can survive. However, the authors do not study persisters. A stationary population of E. coli is tolerant of aminoglycosides, and this is clinically relevant, but this is not the subject of the study.

      Response: Respectfully, we must express our disagreement with the reviewer's comments. Our experimental system is meticulously organized and logically structured. Mutant strains such as gltA, sucA, and nuoI deletions exhibit increased tolerance to all aminoglycosides tested, with their fractions clearly increasing around the mid-exponential phase between 3-4 hours (refer to Figure 2B in our manuscript). This surge in tolerance is evident at the population level as well (as depicted in Figure 1A in our manuscript, where certain mutant strains demonstrate complete survival to streptomycin, with survival fractions nearing 1). Given the pronounced increase observed around the mid-exponential phase, we primarily characterize the metabolism of these cells during this growth phase.

      It's essential to note that any investigation into antibiotic tolerance and/or resistance holds immense significance, regardless of the growth phase under scrutiny, as antibiotic tolerance/resistance poses a substantial healthcare challenge. Additionally, metabolic mutant strains do not necessarily entail severe fitness costs, as evidenced by Figure S2A published by the Lewis group (Manuse et al., 2021), a finding consistent with our study (see Figure 2B in our manuscript). This phenomenon could confer a survival advantage to bacterial cells, as they may acquire metabolic mutations to bolster their tolerance without incurring significant fitness costs. Furthermore, numerous studies suggest that bacterial cells may opt for the evolutionary pathway leading to increased tolerance before acquiring resistance mechanisms (Levin-Reisman et al., 2017; Santi et al., 2021). The presence of metabolic mutations in clinical E. coli pathogens has also been confirmed through the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection by Collin’s group (Lopatkin et al., 2021). Consequently, comprehending the tolerance mechanisms of metabolic mutations holds paramount importance.

      References

      Levin-Reisman I, Ronin I, Gefen O, Braniss I, Shoresh N, Balaban NQ. 2017. Antibiotic tolerance facilitates the evolution of resistance. Science (1979) 355:826–830. doi:10.1126/science.aaj2191

      Lopatkin AJ, Bening SC, Manson AL, Stokes JM, Kohanski MA, Badran AH, Earl AM, Cheney NJ, Yang JH, Collins JJ. 2021. Clinically relevant mutations in core metabolic genes confer antibiotic resistance. Science (1979) 371. doi:10.1126/science.aba0862

      Manuse S, Shan Y, Canas-Duarte SJ, Bakshi S, Sun WS, Mori H, Paulsson J, Lewis K. 2021. Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biol 19. doi:10.1371/journal.pbio.3001194

      Mohiuddin Kabir M, Shimizu K. 2004. Metabolic regulation analysis of icd-gene knockout Escherichia coli based on 2D electrophoresis with MALDI-TOF mass spectrometry and enzyme activity measurements. Appl Microbiol Biotechnol 65:84–96. doi:10.1007/s00253-004-1627-1

      Santi I, Manfredi P, Maffei E, Egli A, Jenal U. 2021. Evolution of Antibiotic Tolerance Shapes Resistance Development in Chronic Pseudomonas aeruginosa Infections. doi:10.1128/mBio.03482-20

      Shan Y, Lazinski D, Rowe S, Camilli A, Lewis K. 2015. Genetic basis of persister tolerance to aminoglycosides in Escherichia coli. mBio 6. doi:10.1128/mBio.00078-15

      Van den Bergh B, Schramke H, Michiels JE, Kimkes TEP, Radzikowski JL, Schimpf J, Vedelaar SR, Burschel S, Dewachter L, Lončar N, Schmidt A, Meijer T, Fauvart M, Friedrich T, Michiels J, Heinemann M. 2022. Mutations in respiratory complex I promote antibiotic persistence through alterations in intracellular acidity and protein synthesis. Nat Commun 13:546. doi:10.1038/s41467-022-28141-x

      Reviewer #2 (Public Review):

      Summary:

      This interesting study challenges a dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.

      Strengths:

      This is a complete study. These results are surprising and are based on various read-outs, such as ATP levels, pH measurement, membrane potential, and the uptake of fluorophore-labeled gentamicin. Utilizing a proteomic approach, the authors show instead that in tolerant mutants, there is a decrease in the levels of proteins associated with ribosomes (targets of AG), causing tolerance.

      Response: We sincerely appreciate the reviewer for taking the time to read our manuscript and offer valuable suggestions.

      Weaknesses:

      The use of a single high concentration of aminoglycoside: my main comment on this study concerns the use of an AG concentration well above the MIC (50 µg/ml or 25 µg/ml for uptake experiments), which is 10 times higher than previously used concentrations (Kohanski, Taber) in study showing a link with PMF. This significant difference may explain the discrepancies in results. Indeed, a high concentration of AG can mask the effects of a metabolic disruption and lead to less specific uptake. However, this concentration highlights a second molecular level of tolerance. Adding experiments using lower concentrations (we propose 5 µg/ml to compare with the literature) would provide a more comprehensive understanding of AG tolerance mechanisms during a decrease in metabolism.

      Another suggestion would be to test iron limitation (using an iron chelator as DIP), which has been shown to induce AG tolerance. Can the authors demonstrate if this iron limitation leads to a decrease in ribosomal proteins? This experiment would validate their hypothesis in the case of a positive result. Otherwise, it would help distinguish two types of molecular mechanisms for AG tolerance during a metabolic disruption: (i) PMF and uptake at low concentrations, (ii) ribosomal proteins at high concentrations.

      Response: While we acknowledge the intriguing possibility of exploring whether iron limitation results in a reduction of ribosomal proteins, we believe that this topic falls slightly outside the scope of our current study. This area warrants independent investigation since our current research did not specifically focus on iron-limited environments (LB medium is iron-rich, as referenced (Abdul-tehrani et al., 1999; Rodríguez-Rojas et al., 2015)). However, we fully concur with the notion that experimental outcomes may be contingent upon the concentration of aminoglycosides (AG). Hence, we repeated the critical experiments using a lower concentration of gentamicin (5 µg/mL), as suggested by the reviewer. Before delving into a discussion of these results, we wish to emphasize two key points. Firstly, the majority of our metabolic measurements, including ATP levels, redox activities, intracellular pH, and metabolomics, were conducted in mutant and wild-type cells in the absence of drugs. Our objective was to elucidate the impact of genetic perturbations of the TCA cycle on cell metabolism. Secondly, it's important to emphasize that our study does not invalidate the hypothesis that AG uptake is proton motive force (PMF)-dependent. We observed similar drug uptake across the strains tested, which is reasonable considering that their energy metabolism and PMF are not significantly altered compared to the wild type (at least we did not observe a consistent trend in their metabolic levels). Consequently, our study does not necessarily contradict with previous claims (Taber Harry W et al., 1987). We have now clarified this point in the manuscript (see pages 1 and 13).

      When we employed a lower gentamicin concentration, we still noted a significant elevation in tolerance among the gltA, sucA, and nuoI mutant strains compared to the wild type. Also, it remained evident that the observed tolerance in the mutant strains cannot be ascribed to differences in drug uptake or impaired PMF, as the levels of drug uptake and the disruption of PMF by gentamicin (at lower concentrations) in the mutant strains were comparable to those of the wild type. Moreover, since our metabolic measurements and proteomics analyses failed to reveal any notable alterations in energy metabolism in these strains, the consistency in drug uptake levels across both mutant and wild-type strains, even at lower concentrations, further bolsters the validity of our findings obtained at higher gentamicin concentrations. The new results have been incorporated into the Supplementary file (see Supplementary Figures S1, S5, S7, and S9) and discussed throughout the manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Line 120: Luria-Bertani (LB), used Lysogeny Broth.

      Line 180: "RSG dye can be reduced by bacterial reductases of PMF" to be reformulated.

      Response: The suggested corrections have been incorporated into the manuscript.

      References

      Abdul-tehrani H, Hudson AJ, Chang Y, Timms AR, Hawkins C, Williams JM, Harrison PM, Guest JR, Andrews SC. 1999. Ferritin Mutants of Escherichia coli Are Iron Deficient and Growth Impaired, and fur Mutants are Iron Deficient, Journal of Bacteriology.

      Rodríguez-Rojas A, Makarova O, Müller U, Rolff J. 2015. Cationic Peptides Facilitate Iron-induced Mutagenesis in Bacteria. PLoS Genet 11. doi:10.1371/journal.pgen.1005546

      Taber Harry W, Mueller JP, Miller PF, Arrow AS. 1987. Bacterial Uptake of Aminoglycoside Antibiotics. Microbiol Rev 51:439–457. doi:10.1128/mr.51.4.439-457.1987

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The current work explored the link between the pulvinar intrinsic organisation and its functional and structural connectivity patterns of the cortex using different dimensional reduction techniques. Overall they find relationships between pulvinar-cortical organization and cortico-cortical organization, and little evidence for clustered organization. Moreover, they investigate PET maps to understand how neurotransmitter/receptor distributions vary within the pulvinar and along its structural and functional connectivity axes.

      Strengths:

      There is a replication dataset and different modalities are compared against each other to understand the structural and functional organisation of the pulvinar complex.

      Weaknesses:

      (1) What is the motivation of the study and how does this work extend previous assessments of the organization of the complete thalamus within the gradient framework?

      Thank you for raising this central question. As already mentioned in the main text, pulvinar is one of the largest and prototypical associative nuclei, yet its organizational principles in the human brain remain relatively unexplored. The substantial body of anatomical research conducted in primate species suggests the coexistence of multiple coexisting and overlapping corticotopic representations on the pulvinar complex.

      Existing connectivity-based parcellation studies of pulvinar organization often overlook these organizational principles, as the resulting parcellation may reflect a linear combination of single overlapping connectopies rather than accurately capturing their distinct and unique spatial arrangement.

      Investigations of thalamic connectivity have already revealed overarching organizational principles within the thalamus, which are partially reflected in its cytoarchitecture subdivision. These principles are associated with core and matrix thalamic neuronal subpopulation, and their distinct contributions to large-scale connectivity networks.

      Since gradient selection relies on the explained variance of the diffusion embeddings, and pulvinar-cortical connectivity likely accounts for only a limited portion of the variance in thalamocortical connectivity, we chose to focus specifically on the pulvinar nucleus. This approach was intended to ensure that the local connectivity principles of the pulvinar are not overshadowed by the broader connectotopical organization of the entire thalamus.

      This rationale aligns with findings in topographically organized regions of the cerebral cortex, such as M1, S1 or visual areas. In these regions, distinct principles of topographical organization are not readily apparent when analyzing whole-brain connectivity embedding but emerge when dimensionality reduction is applied to region-specific connectivity data.

      (2) Why is the current atlas chosen for the delineation of the pulvinar and individualized maps not considered? Given the size of the pulvinar, more validation of the correctness of the atlas may be helpful.

      To improve signal-to-noise ratio and in alignment with previous studies, we performed diffusion embedding on the group-level, averaged connectivity matrices rather than estimating gradients at the individual subject level.

      The decision to use a standard-space atlas for pulvinar delineation, rather than individualized parcellation, was driven by technical considerations: 1) functional MRI data were already transformed to MNI space; and 2) individualized parcellation of thalamic nuclei can result in varying pulvinar volumes across subjects, complicating the averaging of connectivity data. By using a standard-space atlas, we ensured that connectivity was consistently extracted from the same set of voxels across all subjects.

      We selected the AAL3 atlas (Rolls et al., 2020)over other existing thalamic atlases for practical reasons: the atlas incorporates an ex-vivo thalamic parcellation (Iglesias et al., 2018) with a specific delineation of pulvinar nuclei, which was necessary for subsequent analyses. In the revised version of the manuscript, to validate our findings, we replicated the pulvinar gradient using a different pulvinar delineation from a recent, thalamus-specific atlas (Su et al., 2019). Notably, the spatial distribution of pulvinar connectivity and coexpression gradients remained consistent, regardless of the choice of the thalamic atlas, underscoring the robustness of our results.

      (3) Overall the study feels a little incremental and a repetition of what others have done already in the thalamus. It would be good to know how focusing only on the pulvinar changes interpretation, for example by comparing thalamic and pulvinar gradients?

      The authors acknowledge the existing body of literature that has examined thalamic connectivity under the lens of the connectivity gradient framework. While these studies may provide valuable insights into the functional topography of the pulvinar complex -given its prominent role within the thalamus - we contend that a focused analysis of pulvinar connectivity offers a unique opportunity to uncover the specific organization principles of this nuclear complex. By isolating the pulvinar, we aimed to avoid the potential overshadowing of its local connectivity patterns by the broader connectotopical organization of the entire thalamus. However, as we believe that our findings are best interpreted within the broader context of general thalamic connectivity organization, we have included an additional paragraph in the Discussion section, which explores the similarities and differences between thalamic and pulvinar gradients, offering a more integrative perspective on our results.

      “In recent years, different works have explored the spatial arrangement of thalamic connectivity within a connectivity gradient framework. Diffusion embedding of thalamocortical functional connectivity has revealed a principal, medio-lateral gradient that was found correlated to thalamic structural subdivisions, and a secondary, antero-posterior gradient associated with thalamic functional subfields, and showing progression from unimodal sensorimotor cortical networks to multimodal attention and associative networks. Interestingly, the principal thalamic gradient shows a medio-lateral arrangement on the pulvinar axis while the secondary gradients correspond more to a ventral-dorsal pulvinar axis (Yang et al. 2020). In particular, further independent investigations have suggested that the progressing pattern of thalamic connectivity from unimodal to transmodal cortices is strongly associated to the local density of core and matrix cell types, thus establishing a link between molecular properties and functional connectivity dynamics (Müller et al. 2020; Huang et al. 2024). Our findings complement and expand the existing literature by revealing a similar arrangement of cortical connectivity patterns on the pulvinar complex, and elucidating its relationship to in-vivo estimates of molecular markers of neurotransmission. We found that the gradient associated to unimodal-transmodal cortical connectivity accounted for the highest percentage of variance of variance in cortico-pulvinar connectivity, in line with its well-acknowledged role of associative nucleus. It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.”

      (4) Could it be that the gradient patterns stem from lacking anatomical and functional resolutions (or low SNR) therefore generating no sharp boundaries?

      The gradient organization described in our results is aligns with anatomical evidence on non-human primates (Shipp, 2003), and with existing neuroimaging studies in humans, which report limited correspondence between connectivity-based hard clustering solutions and histological delineation of pulvinar nuclei. However, we recognize the critical importance of assessing the impact of SNR on connectivity measures derived from functional and structural MRI. In the revised manuscript, we have included an additional analysis to investigate the potential impact of local noise on gradient reconstruction. This analysis involved sampling voxel-wise SNR estimates in the pulvinar from both BOLD and diffusion-weighted MRI data, averaging these estimates to generate group-level, modality-specific SNR maps. We then assessed spatial correlations between these maps and the gradient embeddings using the same methodological framework employed throughout the study. Our findings indicate that functional connectivity gradients are weakly, but significantly correlated to SNR, with the strongest correlation observed for the third gradient (left hemisphere G<sub>FC</sub>1 r= -0.30, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.05, G<sub>FC</sub>3 r= 0.55, SA-corrected p < 0.001; right hemisphere G<sub>FC</sub>1 r= -0.41, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.008, G<sub>FC</sub>3 r= 0.52, SA-corrected p = 0.017). In contrast, structural connectivity gradients showed no significant correlation with SNR (left hemisphere G<sub>SC</sub>1 r= 0.06, SA-corrected p = 0.82, G<sub>SC</sub>2 r= -0.33, SA-corrected p = 0.01; right hemisphere G<sub>SC</sub>1 r= 0.40, SA-corrected p = 0.28, G<sub>SC</sub>2 r=-0.19, SA-corrected p = 0.31).

      Reviewer #1 (Recommendations for the authors):

      (1) Please add more literature on thalamus gradients and interpret this with care.

      Thank you for the suggestion. We have added the following paragraph in the Discussion section:

      “In recent years, different works have explored the spatial arrangement of thalamic connectivity within a connectivity gradient framework. Diffusion embedding of thalamocortical functional connectivity has revealed a principal, medio-lateral gradient that was found correlated to thalamic structural subdivisions, and a secondary, antero-posterior gradient associated with thalamic functional subfields, and showing progression from unimodal sensorimotor cortical networks to multimodal attention and associative networks. Interestingly, the principal thalamic gradient shows a medio-lateral arrangement on the pulvinar axis while the secondary gradients correspond more to a ventral-dorsal pulvinar axis (Yang et al. 2020). In particular, further independent investigations have suggested that the progressing pattern of thalamic connectivity from unimodal to transmodal cortices is strongly associated to the local density of core and matrix cell types, thus establishing a link between molecular properties and functional connectivity dynamics (Müller et al. 2020; Huang et al. 2024). Our findings complement and expand the existing literature by revealing a similar arrangement of cortical connectivity patterns on the pulvinar complex, and elucidating its relationship to in-vivo estimates of molecular markers of neurotransmission. We found that the gradient associated to unimodal-transmodal cortical connectivity accounted for the highest percentage of variance of variance in cortico-pulvinar connectivity, in line with its well-acknowledged role of associative nucleus. It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.

      As regards structural connectivity, existing accounts describe a medio-lateral organization of thalamocortical connections, corresponding to an antero-posterior gradient on the cortical mantle. This gradient organization appears to be anchored to genetic markers of different cell types (Oldham and Ball 2023). In line with their findings, we describe a principal axis of structural connectivity in the pulvinar complex that is arranged on the mediolateral axis, and we enforce the notion of a deep relationship between structural connections and molecular expression of neurotransmission markers. On the other hand, the patterns of connectivity with the cerebral cortex do not correspond to a clear antero-posterior axis on the cerebral cortex, probably showing the predominance of local connectivity over the global thalamic structural topography. Further investigations are warranted to ascertain whether the structural gradients of the pulvinar complex may be in continuity with this general cortico-thalamic connectivity gradient.”

      (2) Please state the motivation of the work more clearly and what makes it different from related literature.

      Thank you for pointing us to this lack of clarity. We have added the following paragraph in the Introduction section:

      “In particular, investigations of thalamic connectivity within the gradient framework have uncovered general organizational principles within the thalamus, which are partially reflected in thalamic cytoarchitecture subdivisions. These principles have been linked to core and matrix thalamic neuronal subpopulation, and to their differential contribution to large-scale connectivity networks (Müller et al., 2020; Yang et al., 2020). However, given the remarkable functional complexity and diversity of the pulvinar complex, these global spatial organization patterns likely capture only part of its functional topography. With this in mind, isolating pulvinar connectivity from the remaining thalamocortical connectome would ensure that local organizational principles are not obscured by the global connectotopic structure of the entire thalamus.”

      (3) Why did the authors opt for a whole brain labelling atlas, would a thalamus-specific atlas not be more suitable?

      Despite being a large-scale whole brain atlas, the labeling atlas of choice (AAL3) incorporates a thalamus-specific parcellation from previous work (Iglesias et al., 2018), derived from ex-vivo data and including subdivision of the pulvinar complex into anterior, inferior, lateral and medial nuclei. In the revised version of the manuscript, to validate our findings, we replicated the pulvinar gradient using a different pulvinar delineation from a recent, thalamus-specific atlas (Su et al., 2019). We show these results in Supplementary Figure 1. Notably, the spatial distribution of pulvinar connectivity and coexpression gradients remained consistent, regardless of the choice of the thalamic atlas, underscoring the robustness of our results.

      (4) How did the authors account for the potential low sensitivity of subcortical signals in the PET data?

      We acknowledge the inherent limitations in spatial sensitivity that are a common drawback of PET imaging. However, the PET data employed in the present study were derived from a high-quality dataset collected across multiple studies, predominantly acquired using high resolution scanners (Hansen et al., 2022; see supplementary material at https://static-content.springer.com/esm/art%3A10.1038%2Fs41593-022-01186-3/MediaObjects/41593_2022_1186_MOESM3_ESM.xlsx for technical details). Furthermore, the reliability of neurotransmission markers measurements at the subcortical level has been validated against genetic transcription markers (Hansen, Markello, et al., 2022; Hansen, Shafiei, et al., 2022), ensuring robust and biologically meaningful results.

      (5) What about SNR of the metrics within the pulvinar?

      The referee raises a crucial and complex point, prompting us to conduct additional analyses. We recognize the critical importance of assessing the impact of SNR on connectivity measures derived from functional and structural MRI. In the revised manuscript, we have included an additional analysis to investigate the potential impact of local noise on gradient reconstruction. Therefore, we have incorporated the following text into the manuscript:

      Results (5. Reliability and Reproducibility):

      “To assess the influence of local noise on functional and structural connectivity gradients, we calculated the spatial correlation between gradient values and averaged voxel-wise estimates of signal-to-noise ratio (SNR) from functional and structural MRI data, respectively. We found that functional connectivity gradients are weakly, but significantly correlated with the SNR, with the strongest correlation observed for the third gradient (left hemisphere G<sub>FC</sub>1 r= -0.30, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.05, G<sub>FC</sub>3 r= 0.55, SA-corrected p < 0.001; right hemisphere G<sub>FC</sub>1 r= -0.41, SA-corrected p < 0.001, G<sub>FC</sub>2 r= 0.22, SA-corrected p = 0.008, G<sub>FC</sub>3 r= 0.52, SA-corrected p = 0.017). In contrast, structural connectivity gradients were not significantly associated with SNR (left hemisphere G<sub>SC</sub>1 r= 0.06, SA-corrected p = 0.82, G<sub>SC</sub>2 r= -0.33, SA-corrected p = 0.01; right hemisphere G<sub>SC</sub>1 r= 0.40, SA-corrected p = 0.28, G<sub>SC</sub>2 r=-0.19, SA-corrected p = 0.31) (Supplementary Figure 5).”

      Methods (4. Reliability and reproducibility assessment):

      “To evaluate the possible influence of SNR on connectivity-derived diffusion embeddings, we have performed a voxel-wise,

      modality-specific, SNR assessment to investigate correlation between spatial distribution of noise and diffusion embeddings. For each subject, we separately calculated voxel-wise SNR maps for the left and right pulvinar, using both functional (BOLD) volumes and DWI data. For BOLD volumes, we employed the widely accepted definition of temporal signal to noise (tSNR) (Murphy et al., 2006):

      where T<sub>mean</sub> and T<sub>std</sub> are, respectively, the mean and the standard deviation of each voxel’s signal across the time series.

      For the DWI data, we applied a similar approach (Cai et al., 2021) that allows estimation of SNR from multiple b=0 diffusion weighted volumes:

      where S is the voxel’s signal intensity, and the mean (S<sub>mean</sub>) and standard deviation (S<sub>std</sub>) were computed across all the b0-weighted volumes (18 for HCP dataset; 7 for LEMON dataset). Individual pulvinar SNR maps were then averaged to generate group-level estimates of SNR spatial distribution. The resulting, modality-specific average SNR maps were correlated with the diffusion gradients derived from the corresponding modality, following the same approach described in the previous section (Pearson’s correlation; p-values corrected using spatial null models for spatial autocorrelation, and Benjamini-Hochberg correction for FWE).”

      (6) The numbers of the screeplot / numbers in figures are quite small and not so easy to read.

      Thank you for highlighting this point. We have fixed this issue in the revised version of the Figures.

      (7) How do you know the pulvinar mask is not also picking up on the cortical spinal tract?

      To ensure that pulvinar masks did not pick up streamlines from the corticospinal tracts, we performed a thorough visual inspection of the tractograms that were employed for structural connectivity estimation. For each subject-specific tractogram, we randomly subsampled 10000 streamlines after transformation into MNI standard space and summed up these results to generate a group-level tractogram in standard space. The resulting track-density images (Author response image 1) demonstrate only minimal involvement of descending/ascending tracts from/to the brainstem and spinal cord, confirming the specificity of the pulvinar masks.

      Author response image 1.

      Group-level structural connectivity of the pulvinar complex. Track-density images have been normalized and overlaid on the MNI152 standard template.

      (8) There is no mention of the within pulvinar gradients that then are correlated with PET patterns or across gradients are tested to spatial autocorrelation? I believe it is only mentioned for the cortex.

      Thanks for providing us with the opportunity to clarify this important aspect, which is mentioned in the Methods section (3. Gradient analysis and statistics):

      “To account for the spatial autocorrelation (SA) properties of gradient maps, for all the correlations described, statistical significance was assessed using the permutational approach described in Burt et al. (2020). Briefly, this method takes as input geometric distance matrices for SA estimation and involves the generation of a given number of SA-preserving permuted surrogate maps, which are then employed as nulls to estimate a permutational null distribution of the test statistic (Burt et al. 2020). Pairwise Euclidean distances between left or right pulvinar voxel coordinates were employed for pulvinar null models, while for cortical parcellated connectivity data Euclidean distances were estimated between centroids of each cortical ROI. In both cases, 1000 surrogates were generated to estimate the null distribution. Statistical tests were controlled for false discovery rate (FDR) using Benjamini and Hochberg’s correction.”

      However, to enhance readability, we have highlighted this concept in the Results section (3. The unimodal-to-transmodal gradient (G<sub>FC</sub>1) aligns with receptor expression on the dorso-ventral pulvinar axis):

      “To take into account the effects of spatial autocorrelation, we corrected the resulting p-values using a method based on SA-preserving spatial null models (Burt et al. 2020)”.

      (9) I don't fully understand why the mappings are so patchy of the structural connectivity gradient? Maybe some normalisation went wrong? Other papers on thalamic gradients show smoother patterns.

      We thank the Reviewer for the observation. After thoroughly reviewing the related codes, we found no normalization errors. However, we identified a visualization issue, which has been addressed in the revised version. Specifically, the structural gradient representations showed in the figures were based on the averaged values of left and right pulvinar gradients both of which include structural connectivity to either the ipsilateral or contralateral cerebral cortex. Since ipsilateral connectivity is more prominently represented than contralateral connectivity, this led to asymmetric gradient patterns between ipsilateral and contralateral cortical gradients, resulting in a patchy representation when gradients were averaged between left and right pulvinar. To resolve this, we adjusted the visualization by flipping the right pulvinar gradient representations along the x axis, aligning all the ipsilateral cortical connectivity on the left side and all the contralateral connectivity on the right. This adjustment produced smoother, more readable, and interpretable visualizations. Additionally, it allowed the asymmetry between ipsilateral and contralateral connections to be more clearly appreciated.

      (10) The final statement of the abstract is misleading as we at this point don't know how making spatial pattern maps in the pulvinar may help understand the role of the pulvinar in health and disease.

      We appreciate the Reviewer’s suggestion and have updated the expression accordingly:

      “Our findings represent a significant step forward in advancing the understanding of pulvinar anatomy and function, offering an exploratory framework to investigate the role of this structure in both health and disease.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore and better understand the complex topographical organization of the human pulvinar, a brain region crucial for various high-order functions such as perception and attention. They sought to move beyond traditional histological subdivisions by investigating continuous 'gradients' of cortical connections along the dorsoventral and mediolateral axes. Using advanced imaging techniques and a comprehensive PET atlas of neurotransmitter receptors, the study aimed to identify and characterize these gradients in terms of structural connections, functional coactivation, and molecular binding patterns. Ultimately, the authors targeted to provide a more nuanced understanding of pulvinar anatomy and its implications for brain function in both healthy and diseased states.

      Strengths:

      A key strength of this study lies in the authors' effort to comprehensively combine multimodal data, encompassing both functional and structural connectomics, alongside the analysis of major neurotransmitter distributions. This approach enabled a more nuanced understanding of the overarching organizational principles of the pulvinar nucleus within the broader context of whole-brain connectivity. By employing cortex-wide correlation analyses of multimodal embedding patterns derived from 'gradients,' which provide spatial maps reflecting the underlying connectomic and molecular similarities across voxels, the study offers a thorough characterization of the functional neuroanatomy of the pulvinar.

      Weaknesses:

      Despite its strengths, the current manuscript falls short in presenting the authors' unique perspectives on integrating the diverse biological principles derived from the various neuroimaging modalities. The findings are predominantly reported as correlations between different gradient maps, without providing the in-depth interpretations that would allow for a more comprehensive understanding of the pulvinar's role as a central hub in the brain's network. Another limitation of the study is the lack of clarity regarding the application of pulvinar and its subnuclei segmentation maps to individual brains prior to BOLD signal extraction and gradient reconstruction. This omission raises concerns about the precision and reproducibility of the findings, leaving their robustness less transparently evaluable.

      We thank the Reviewer for the valuable comments. While commonalities and discrepancies between structural and functional connectivity have been extensively explored in the literature, the relationship between functional connectivity and modulatory neurotransmission remains poorly understood. Specifically, while the role of thalamic modulatory neurotransmission has been thoroughly investigated in experimental animal models from an electrophysiological perspective, it remains relatively underexplored in the human brain. In our study, we identified significant associations between the spatial distribution of serotonergic, noradrenergic, dopaminergic and mu-opioid systems and functional pulvinar-cortical connectivity to specific functional networks. Evidence from pharmacological challenge studies using resting-state fMRI suggests that these neurotransmission systems may modulate network-specific thalamocortical connectivity directly or influence neural gain in cortico-cortical connectivity, a process partially dependent on thalamocortical connections to associative thalamic nuclei. However, the limitations of spatial and receptor specificity inherent to this approach, coupled with the predominantly correlational nature of our study design, prevented us from drawing more definitive conclusions on the biological relationship between neurotransmitter expression and functional connectivity. As regards the lack of clarity concerning signal extraction, we have now clarified that all the relevant steps of time series extraction were performed in standard space, without any further registration to individual subjects.

      Reviewer #2 (Recommendations for the authors):

      In line with the weaknesses that I raised above, my recommendation to authors are two-fold:

      (1) Please provide readers with a more holistic viewpoint to better digest all the correlation analyses. For instance, in p18, the summary says:

      "G<sub>FC</sub>1, GRC1, and G<sub>SC</sub>2 substantially delineate multiscale differences between the ventral and dorsal aspects of the pulvinar. Moving along the ventral-dorsal axis of the pulvinar complex, more ventral regions showed higher functional connectivity to unimodal sensory processing networks, higher levels of 5HTT and NAT expression, and preferentially higher structural connectivity to modality-independent or low-level sensory processing cortices."

      We already knew somehow the existence of the dorsoventral axis in the pulvinar, as the authors already specified in the introduction. Beyond this simple report on phenomenological observation, one may provide a more integrated discussion to pinpoint what commonality or discrepancy the GFC, GRC, and GSC map show and potential common principles explaining their biological relationship (e.g., the 5HTT and NAT's high expression and functional connectivity). Such digested perspectives will grant the study unique insights into the functional system of the pulvinar.

      We have expanded on this topic in the Discussion section (Neurochemical correlates of pulvinar-cortical topographical organization) as follows:

      “Indeed, while commonalities and discrepancies between structural and functional connectivity have been extensively investigated, the relationship between functional connectivity and modulatory neurotransmission remains poorly understood. Our findings reveal stronger associations between pulvinar-cortical connectivity to specific functional networks and the spatial distribution of markers of serotonergic, noradrenergic, dopaminergic and opioid systems. Pharmacological challenge studies using resting-state functional MRI suggest that each of these neurotransmission systems may either directly modulate thalamocortical connectivity or influence neuronal gain in cortico-cortical functional connectivity, which is known to depend, in part, on cortical connections to associative thalamic nuclei, including the pulvinar.”

      (2) Specify the details if there was a QC procedure to check the signal extraction from the pulvinar subnuclei by applying the segmentation atlas at each individual.

      Preprocessed BOLD volumes were available in standard-space, and time series were extracted for each voxel within a standard-space mask of the pulvinar complex. All volumes underwent visual inspection to ensure the accuracy of the registration process. Regarding the pulvinar subnuclei, these structures were not segmented at the individual level.

      Reviewer #3 (Public review):

      Summary of the Study:

      The authors investigate the organization of the human pulvinar by analyzing DWI, fMRI, and PET data. The authors explore the hypothesis of the "replication principle" in the pulvinar.

      Strengths and Weaknesses of the Methods and Results:

      The study effectively integrates diverse imaging modalities to provide a view of the pulvinar's organization. The use of analysis techniques, such as diffusion embedding-driven gradients combined with detailed interpretations of the pulvinar, is a strength.

      Even though the study uses the best publicly available resolution possible with current MR-technology, the pulvinar is densely packed with many cell bodies, requiring even higher spatial resolution. In addition, the model order selection of gradients may vary with the acquired data quality. Therefore, the pulvinar's intricate organization needs further exploration with even higher spatial resolution to capture gradients closer to the biological organization of the pulvinar.

      Appraisal of the Study's Aims and Conclusions:

      The authors delineate the gradient organization of the pulvinar. The study provides a basis for understanding the pulvinar's role in mediating brain network communication.

      Impact and Utility of the Work:

      This work contributes to the field by offering insights into pulvinar organization.

      We thank the Reviewer for their positive assessment and constructive feedback. The Authors agree with the Reviewer that the spatial resolution of currently available in-vivo imaging methods is limited, and that gradient representation would indeed benefit from higher resolution data. However, we also note that the resolution of structural and functional volumes used in our study is consistent with existing literature on pulvinar connectivity. Additionally, the PET data employed in our work include multi-centric studies collected worldwide from healthy populations, and are primarily acquired using high-resolution scanners that allow spatial resolution up to 2 mm<sup>2</sup>. Notwithstanding, further investigations employing finer resolution imaging techniques, such as ultra-high field fMRI, may provide more detailed insights into pulvinar topographical organization at a finer scale.

      Reviewer #3 (Recommendations for the authors):

      (1) The HCP data contains genetically related datasets. Please mention whether the data-selection criteria for the selected 210 healthy subjects followed the genetically unrelated criteria.

      The HCP sample employed in this study consists of an initial cohort of 100 unrelated subjects, as provided in the HCP database, along with an additional random sample of 110 subjects. Subjects were selected without following a genetic criterion, as the family structure of the HCP dataset was part of a restricted access subset that we did not have access to at the time of processing. Subsequently, we obtained access to this information and determined that 178 out of 210 subjects (85%) are genetically unrelated. Of the remaining, genetically related subjects, 22 (~10% of the total sample) were included with another subject from the same family group (11 pairs); 6 (3%) were included with two other family members (2 triplets) and 4 (2%) were all parts of the same family group. This information has been included in the Methods section for clarity.

      (2) The study uses HCP data with an fMRI resolution of 2mm isotropic and diffusion MRI with 1.25mm. Additionally, the LEMON dataset includes 1.7mm isotropic DWI data and fMRI with 2.3mm isotropic resolution. Furthermore, the available PET data from the Hansen et al. 2022b study has a rather coarser spatial resolution. Therefore, it may be important to mention in the discussion that the pulvinar is densely packed with cell bodies and that their gradient organization might be better reflected with even higher spatial resolution or improved measurement techniques used in the study.

      We have revised the conclusive section of the Discussion into a paragraph title “Future perspectives and limitations”, and added the following text:

      “One notable limitation of this study lies in the relatively small size of the pulvinar complex compared to other larger cortical or subcortical structures. The high cellular density of the pulvinar poses a challenge for the relatively coarse resolution of currently available imaging techniques. Although the generally high quality of both the main and validation datasets, including rs-fMRI data (Uǧurbil et al. 2013; Babayan et al. 2019), align with current standards for imaging investigations of pulvinar connectivity, higher-resolution imaging approaches may offer more granular insights. Advanced techniques, such as ultra-high-field fMRI, hold promise for uncovering the fine-scale topographical organization of the pulvinar complex.”

      (3) The functional multiplicity of the Pulvinar nuclei among other thalamus nuclei is also illustrated in https://doi.org/10.1038/s42003-022-04126-w

      We thank the Reviewer for suggesting this important reference. We have added the following text in the Discussion section:

      “It is noteworthy that, in analyses of thalamocortical gradients, the pulvinar complex is situated towards the “sensorimotor” extreme of the unimodal-to-transmodal thalamic gradient (Yang et al., 2020). This likely reflects its prominent connectivity to visual and sensory areas compared to other thalamic nuclei. Nevertheless, the extensive and intricate association of pulvinar with multiple cortical networks emerges is strongly evident in various functional connectivity investigations (Basile et al., 2021; Kumar et al., 2017, 2022). By isolating pulvinar-cortical from broader thalamocortical connectivity, our analysis was able to provide additional insights into the spatial organization of its connectivity with different cortical networks, highlighting the pulvinar's remarkable functional diversity and complexity.”

      (4) In addition to DWI/DSI and PET, the study also uses fMRI, which allows for functional interaction in time. It may be worth reflecting in the discussion that the observed gradient organization of the pulvinar could have detailed aspects in the temporal domain, which might not be fully captured in the time-averaged embeddings.

      We thank the Reviewer for their insightful observation. The authors recognize that the exploration of brain temporal dynamics is a compelling area of research due to its extensive correlation with multiple hierarchical aspects of brain information processing. Examining the functional organization of the pulvinar complex lies beyond the scope of the present work and will be subject of further investigation. On the other hand, it is possible that certain aspects of the spatial organization of pulvinar connectivity may be influenced by temporal dynamics of cortico-thalamic information processing. Intrinsic timescales have been consistently showed to progressively increase from unimodal to multimodal associative cortical regions. Furthermore, cortico-thalamic connectivity in matrix-rich regions has been correlated with cortical time scales.

      To address this point, we have added the following lines to the Discussion section:

      “In this context, it could be hypothesized that the observed gradient organization of the pulvinar may also exhibit specific patterns in the temporal domain. Indeed, multiple investigations have linked the temporal dynamics of cortical regions to different aspects of information processing (Rossi-Pool et al., 2021; Soltani et al., 2021). Notably, intrinsic neural timescales of functional activity have been associated with the functional specialization and gradient organization of the cerebral cortex (Golesorkhi et al., 2021), with shorter timescales in unimodal sensory regions and longer ones in transmodal networks (Ito et al., 2020; Murray et al., 2014). Moreover, thalamocortical connectivity has been showed to correlate with these patterns of intrinsic time scale (Müller et al., 2020). In addition, modulatory neurotransmitters such as serotonin and dopamine have been demonstrated to play a significant role in modulating functional cortical dynamics across different timescales (Hansen, Shafiei, et al., 2022; Luppi et al., 2023). Exploring how the spatial organization of the pulvinar relates to temporal dynamics and timescale modulation could provide valuable insights and represents a promising avenue for future investigations.”

      (5) The K-means clustering (Supplementary Figure 1) used has limitations, particularly with respect to the structure of the data. Another aspect is the reproducibility of the model-order selection. Did the reliability and reproducibility assessment produce a similar number of clusters with the LEMON data as with the HCP data?

      We acknowledge the limitations of k-means clustering, particularly regarding the stability and reproducibility of the model order. To address the concerns, we iteratively ran the clustering algorithm 50 times on bootstrap resamples to enhance the stability of the silhouette score estimates. In addition, we have now replicated the analysis on the secondary dataset, as suggested by the Reviewer (Author response image 2). The Silhouette plots show similar number of clusters between the two different datasets for functional connectivity gradients, with minor differences observed in the results for structural connectivity gradients and multimodal gradient clustering. Notably, we did not find high a high degree of similarity between the results of gradient clustering and histologically defined nuclei, further underscoring the distinct organizational patterns identified through our analysis.

      This reinforces the relevance of using gradient-based approaches to reveal insights into the functional and structural organization of the pulvinar complex that may not align strictly with discrete, histologically defined subdivisions.

      Author response image 2.

      K-means clustering of pulvinar gradients on the secondary dataset (LEMON) and their correspondence with histological pulvinar nuclei. Panels on the left show the silhouette plots for left and right pulvinar clustering solutions; error bars are standard error calculated across 50 resamples. Panels on the right show matrix plots of Dice similarity coefficients for pulvinar clusters against histological nuclei (AAL3 atlas). INF: inferior; ANT: anterior; LAT: lateral; MED: medial.

      (6) The pulvinar correlates of the unimodal-transmodal cortical gradient (Figure 4) show an association with almost the entire brain (Figure 4C, violin plot). It would be interesting to back this association with known anatomical connectivity studies in animals that show connections to these network areas. To my limited knowledge, I am not aware of pulvinar tracer studies showing such extensive connectivity across the entire cortex.

      As our structural connectivity estimates are based on tractography, they are subject to the known limitation of potentially overestimating anatomical connectivity. A technical clarification is warranted: since structural connectivity is grouped by networks, it is strongly influenced by connections to specific cortical regions within each network. This explains the uneven and asymmetric distribution of structural gradient-weighted connectivity observed in our results and does not imply widespread connectivity across the entire cortex.

      Nonetheless, structural connectivity of the pulvinar to cortical regions in primates encompasses a remarkably broad array of cortical areas, including predominantly occipital (Adams et al., 2000; Benevento, 1976; Casanova et al., 1989), temporal (Berman & Wurtz, 2010; Gattass et al., 2018; Homman-Ludiye et al., 2020) and parietal cortices (Asanuma et al., 1985; Baleydier & Morel, 1992). Additionally, to a more limited extent, connections to the cingulate gyrus, and portions of the lateral prefrontal cortex have also been documented (Baleydier & Mauguiere, 1985; Baleydier & Mauguire, 1987). These connectivity patterns are in line with prior accounts of structural connectivity of the human pulvinar (Arcaro et al., 2015; Basile et al., 2021; Leh et al., 2008; Tamietto et al., 2012), and with the patterns identified in our work (Author response image 1). Such findings provide further validation of the structural connectivity profiles explored in the present study.

      References

      Adams, M. M., Hof, P. R., Gattass, R., Webster, M. J., & Ungerleider, L. G. (2000). Visual cortical projections and chemoarchitecture of macaque monkey pulvinar. The Journal of Comparative Neurology, 419(3), 377–393. https://doi.org/10.1002/(SICI)1096-9861(20000410)419:3<377::AID-CNE9>3.0.CO;2-E

      Arcaro, M. J., Pinsk, M. A., & Kastner, S. (2015). The anatomical and functional organization of the human visual pulvinar. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.1575-14.2015

      Asanuma, C., Andersen, R. A., & Cowan, W. M. (1985). The thalamic relations of the caudal inferior parietal lobule and the lateral prefrontal cortex in monkeys: Divergent cortical projections from cell clusters in the medial pulvinar nucleus. Journal of Comparative Neurology, 241(3), 357–381. https://doi.org/10.1002/cne.902410309

      Baleydier, C., & Mauguiere, F. (1985). Anatomical evidence for medial pulvinar connections with the posterior cingulate cortex, the retrosplenial area, and the posterior parahippocampal gyrus in monkeys. Journal of Comparative Neurology. https://doi.org/10.1002/cne.902320207

      Baleydier, C., & Mauguiere, F. (1987). Network organization of the connectivity between parietal area 7, posterior cingulate cortex and medial pulvinar nucleus: A double fluorescent tracer study in monkey. Experimental Brain Research, 66(2). https://doi.org/10.1007/BF00243312

      Baleydier, C., & Morel, A. (1992). Segregated thalamocortical pathways to inferior parietal and inferotemporal cortex in macaque monkey. Visual Neuroscience, 8(5), 391–405. https://doi.org/10.1017/S0952523800004922

      Basile, G. A., Bertino, S., Bramanti, A., Anastasi, G. P., Milardi, D., & Cacciola, A. (2021). In Vivo Super-Resolution Track-Density Imaging for Thalamic Nuclei Identification. Cerebral Cortex. https://doi.org/10.1093/cercor/bhab184

      Benevento. (1976). The Cortical Projections of the Inferior Pulvinar and Adjacent Lateral Pulvinar in the Rhesus Monkey ( Macaca. October, 108, 1–24.

      Berman, R. A., & Wurtz, R. H. (2010). Functional Identification of a Pulvinar Path from Superior Colliculus to Cortical Area MT. The Journal of Neuroscience, 30(18), 6342–6354. https://doi.org/10.1523/JNEUROSCI.6176-09.2010

      Cai, L. Y., Yang, Q., Hansen, C. B., Nath, V., Ramadass, K., Johnson, G. W., Conrad, B. N., Boyd, B. D., Begnoche, J. P., Beason-Held, L. L., Shafer, A. T., Resnick, S. M., Taylor, W. D., Price, G. R., Morgan, V. L., Rogers, B. P., Schilling, K. G., & Landman, B. A. (2021). PreQual: An automated pipeline for integrated preprocessing and quality assurance of diffusion weighted MRI images. Magnetic Resonance in Medicine, 86(1), 456. https://doi.org/10.1002/mrm.28678

      Casanova, C., Freeman, R. D., & Nordmann, J. P. (1989). Monocular and binocular response properties of cells in the striate-recipient zone of the cat’s lateral posterior-pulvinar complex. Journal of Neurophysiology. https://doi.org/10.1152/jn.1989.62.2.544

      Gattass, R., Soares, J. G. M., & Lima, B. (2018). Comparative Pulvinar Organization Across Different Primate Species (pp. 37–37). https://doi.org/10.1007/978-3-319-70046-5_8

      Golesorkhi, M., Gomez-Pilar, J., Tumati, S., Fraser, M., & Northoff, G. (2021). Temporal hierarchy of intrinsic neural timescales converges with spatial core-periphery organization. Communications Biology, 4(1), 277. https://doi.org/10.1038/s42003-021-01785-z

      Hansen, J. Y., Markello, R. D., Tuominen, L., Nørgaard, M., Kuzmin, E., Palomero-Gallagher, N., Dagher, A., & Misic, B. (2022). Correspondence between gene expression and neurotransmitter receptor and transporter density in the human brain. NeuroImage, 264, 119671. https://doi.org/10.1016/j.neuroimage.2022.119671

      Hansen, J. Y., Shafiei, G., Markello, R. D., Smart, K., Cox, S. M. L., Nørgaard, M., Beliveau, V., Wu, Y., Gallezot, J.-D., Aumont, É., Servaes, S., Scala, S. G., DuBois, J. M., Wainstein, G., Bezgin, G., Funck, T., Schmitz, T. W., Spreng, R. N., Galovic, M., … Misic, B. (2022). Mapping neurotransmitter systems to the structural and functional organization of the human neocortex. Nature Neuroscience, 25(11), 1569–1581. https://doi.org/10.1038/s41593-022-01186-3

      Homman-Ludiye, J., Mundinano, I. C., Kwan, W. C., & Bourne, J. A. (2020). Extensive Connectivity Between the Medial Pulvinar and the Cortex Revealed in the Marmoset Monkey. Cerebral Cortex, 30(3), 1797–1812. https://doi.org/10.1093/cercor/bhz203

      Iglesias, J. E., Insausti, R., Lerma-Usabiaga, G., Bocchetta, M., Van Leemput, K., Greve, D. N., van der Kouwe, A., Fischl, B., Caballero-Gaudes, C., & Paz-Alonso, P. M. (2018). A probabilistic atlas of the human thalamic nuclei combining ex vivo MRI and histology. NeuroImage, 183, 314–326. https://doi.org/10.1016/j.neuroimage.2018.08.012

      Ito, T., Hearne, L. J., & Cole, M. W. (2020). A cortical hierarchy of localized and distributed processes revealed via dissociation of task activations, connectivity changes, and intrinsic timescales. NeuroImage, 221, 117141. https://doi.org/10.1016/j.neuroimage.2020.117141

      Kumar, V. J., Beckmann, C. F., Scheffler, K., & Grodd, W. (2022). Relay and higher-order thalamic nuclei show an intertwined functional association with cortical-networks. Communications Biology, 5(1), 1–17. https://doi.org/10.1038/s42003-022-04126-w

      Kumar, V. J., van Oort, E., Scheffler, K., Beckmann, C. F., & Grodd, W. (2017). Functional anatomy of the human thalamus at rest. NeuroImage, 147, 678–691. https://doi.org/10.1016/j.neuroimage.2016.12.071

      Leh, S. E., Chakravarty, M. M., & Ptito, A. (2008). The Connectivity of the Human Pulvinar: A Diffusion Tensor Imaging Tractography Study. International Journal of Biomedical Imaging, 2008, 1–5. https://doi.org/10.1155/2008/789539

      Luppi, A. I., Hansen, J. Y., Adapa, R., Carhart-Harris, R. L., Roseman, L., Timmermann, C., Golkowski, D., Ranft, A., Ilg, R., Jordan, D., Bonhomme, V., Vanhaudenhuyse, A., Demertzi, A., Jaquet, O., Bahri, M. A., Alnagger, N. L. N., Cardone, P., Peattie, A. R. D., Manktelow, A. E., … Stamatakis, E. A. (2023). In vivo mapping of pharmacologically induced functional reorganization onto the human brain’s neurotransmitter landscape. Science Advances, 9(24), eadf8332. https://doi.org/10.1126/sciadv.adf8332

      Müller, E. J., Munn, B., Hearne, L. J., Smith, J. B., Fulcher, B., Arnatkevičiūtė, A., Lurie, D. J., Cocchi, L., & Shine, J. M. (2020). Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222, 117224. https://doi.org/10.1016/j.neuroimage.2020.117224

      Murphy, K., Bodurka, J., & Bandettini, P. A. (2006). How long to scan? The relationship between fMRI temporal signal to noise and necessary scan duration. NeuroImage, 34(2), 565. https://doi.org/10.1016/j.neuroimage.2006.09.032

      Murray, J. D., Bernacchia, A., Freedman, D. J., Romo, R., Wallis, J. D., Cai, X., Padoa-Schioppa, C., Pasternak, T., Seo, H., Lee, D., & Wang, X.-J. (2014). A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience, 17(12), 1661–1663. https://doi.org/10.1038/nn.3862

      Oldham, S., & Ball, G. (2023). A phylogenetically-conserved axis of thalamocortical connectivity in the human brain. Nature Communications, 14(1), 6032. https://doi.org/10.1038/s41467-023-41722-8

      Rolls, E. T., Huang, C.-C., Lin, C.-P., Feng, J., & Joliot, M. (2020). Automated anatomical labelling atlas 3. NeuroImage, 206, 116189. https://doi.org/10.1016/j.neuroimage.2019.116189

      Rossi-Pool, R., Zainos, A., Alvarez, M., Parra, S., Zizumbo, J., & Romo, R. (2021). Invariant timescale hierarchy across the cortical somatosensory network. Proceedings of the National Academy of Sciences, 118(3), e2021843118. https://doi.org/10.1073/pnas.2021843118

      Shipp, S. (2003). The functional logic of cortico-pulvinar connections. Philosophical Transactions of the Royal Society B: Biological Sciences, 358(1438), 1605–1624. https://doi.org/10.1098/rstb.2002.1213

      Soltani, A., Murray, J. D., Seo, H., & Lee, D. (2021). Timescales of cognition in the brain. Current Opinion in Behavioral Sciences, 41, 30–37. https://doi.org/10.1016/j.cobeha.2021.03.003

      Su, J. H., Thomas, F. T., Kasoff, W. S., Tourdias, T., Choi, E. Y., Rutt, B. K., & Saranathan, M. (2019). Thalamus Optimized Multi Atlas Segmentation (THOMAS): Fast, fully automated segmentation of thalamic nuclei from structural MRI. NeuroImage, 194, 272–282. https://doi.org/10.1016/j.neuroimage.2019.03.021

      Tamietto, M., Pullens, P., de Gelder, B., Weiskrantz, L., & Goebel, R. (2012). Subcortical Connections to Human Amygdala and Changes following Destruction of the Visual Cortex. Current Biology, 22(15), 1449–1455. https://doi.org/10.1016/j.cub.2012.06.006

      Yang, S., Meng, Y., Li, J., Li, B., Fan, Y.-S., Chen, H., & Liao, W. (2020). The thalamic functional gradient and its relationship to structural basis and cognitive relevance. NeuroImage, 218, 116960. https://doi.org/10.1016/j.neuroimage.2020.116960

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you and the two Reviewers for the thoughtful evaluation of the manuscript and the support for publication. We have addressed all points raised by the two Reviewers.

      - We have extensively streamlined the manuscript. Repetitive passages regarding the respective kinase cascades have been removed.

      - We improved the presentation of the main Figures (mainly labeling and font size):

      - Figure 1: C, D, E, F o Figure 2: C, E, F, G, I, o Figure 3: D o Figure 4: F

      - Figure 5: A, B, C, D, E

      - We integrated new SI-data related to kinase functions, expression and the ‘cell-type comparisons’ of the KinCon reporter system (Figure Supplement 4, 5).

      Below you will find a detailed point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Regarding the issue of the use of the word "dynamics," as described in the public review, here are a few examples of ambiguous use in different sentences: o Line 27: dynamics of full-length protein kinases. Is this referring to the dynamics of conformational interconversion between inactive and active states?

      - Line 138: dynamic functioning of kinases. It is not clear what this means. o Line 276: ... alters KinCon dynamics. Not clear if they are measuring time-dependent process or a single point. 

      - Figure legend 4F: dynamics of CDK4/6 reporters. Again, not clear how the assay is measuring dynamics.

      In my opinion, the authors use proper terminology that describes their assay in which the term dynamics is not used: Title: "... impact of protein and small molecule interactions on kinase conformations" and Line 89 "... reporter can be used to track conformational changes of kinases...".

      We have replaced the “dynamics” sections. 

      - Line 27: The understanding of the structural dynamics of…

      - Line 91: This reporter can be used to track dynamic changes of kinases conformations…

      - Line 139: Conventional methods often fall short in capturing the dynamics of kinases within their native cellular environments…

      - Line 146: Such insights into the molecular structure dynamics of kinases in intact cells…

      - Line 199: In order to enhance our understanding of kinase structure dynamics…

      - Line 276: These findings underline that indeed the trimeric complex formation alters….

      - Figure Legend 4F: Quantification of alterations of CDK4/6 KinCon reporter bioluminescence signals…

      The authors state that KinCon has predictive capabilities (abstract and line 142). What do  the authors mean by this?

      Previously we have benchmarked the suitability of the KinCon reporter for target engagement assays of wt and mutated kinase activities. With this we determined specificities of melanoma drugs for mutated BRAF variants (Mayrhofer 2020, PNAS). 

      The authors indicate that KinCon is a highly sensitive assay. Can the authors elaborate on what high sensitivity means?  

      With sensitivity we mean that we can detect conformation dynamics of the reporter at low expression levels of the hybrid protein expressed in the cell line of choice.

      - Line 209: Immunoblotting of cell lysates following luminescence measurements showed expression levels of the reporters in the range and below the endogenous expressed kinases (Figure 1E).  …

      - Line 219:   Using this readout, we showed that at expression levels of the BRAF KinCon reporter below the immunoblotting detection limit, one hour of drug exposure exclusively converted BRAF-V600E to the more closed conformation (Figure 1F, G, Figure Supplement 1B). 

      - Line 221: These data underline that at expression levels far below the endogenous kinase, protein activity conformations can be tracked in intact cells. …

      For example, can they discuss how other fluorescence-based approaches that are less sensitive would not be able to accomplish the same type of results or derive similar conclusions? Can they provide a resolution metric both in space and time? Given that the authors state that this is a technical report, this information is of relevance.

      We highlight the key pros & cons of the KinCon reporter technology in following sections:

      -Line 529: The KinCon technology, introduced here, seeks to address the previously mentioned challenges. It has the potential to become a valuable asset for tracking kinase functions in living cells which are hard to measure solely via phosphotransferase activities. Overall, it offers an innovative solution for understanding kinase activity conformations, which could pave the way for more novel intervention strategies for kinase entities with limited pharmaceutical targeting potential. So far, this relates to the tracking of kinase-scaffold and pseudo-kinase functions.

      - Line 535: Key advantages of the KinCon reporter technology is the robustness of the system to track kinase conformations at varying expression levels. However, in contrast to fluorescence-based reporter read-outs subcellular analysis and cell sorting are still challenging due to comparable low levels of light emission

      The authors nicely describe how KinCon works in Figure 1B and part of 1C. I do think that the bottom of panel 1C needs to be revised, as well as the text describing the potential scenarios of potency, efficacy, and synergism.

      One issue with this part of Figure 1C is that it is not clear what the x-axis in the 3 plots refers to. Is this time? Is this concentration of a small molecule, inhibitor, or binding partner? This was confusing also in the context of the term dynamics used throughout the text. The terms potency, efficacy, and synergism should be subtitles, or the panels and the x-axis should be better defined, especially for a non-specialized reader.

      Related to this part of Figure 1C is the text. The authors mention potency, effectiveness, and synergy (Line 195). Can the authors use more fundamental terminology related to these three scenarios, for example, changes in activation constant, and percent of protein activates? Also, why synergy is only related to effectiveness? Can synergy also be associated with potency?

      Thank you for bringing this up, we have revised Figure 1C to better reflect the mentioned effects of potency. To avoid confusion, we removed the illustration for drug synergism. Accordingly, we have integrated the axis descriptions for the presented dose-response curves.   

      Thus, we have further streamlined the text in the introduction – examples are shown below:

      - Line 195: Light recordings and subsequent calculations of time-dependent dosage variations of bioluminescence signatures of parallel implemented KinCon configurations aid in establishing dose-response curves. These curves are used for discerning pharmacological characteristics such as drug potency, effectiveness of drug candidates, and potential drug synergies (Figure 1C)

      - Figure 1C:  Shown is the workflow for the KinCon reporter construct engineering and analyses using KinCon technology. The kinase gene of interest is inserted into the multiple cloning site of a mammalian expression vector which is flanked by respective PCA fragments (-F[1], -F[2]) and separated with interjacent flexible linkers. Expression of the genetically encoded reporter in indicated multi-well formats allows to vary expression levels and define a coherent drug treatment plan. Moreover, it is possible to alter the kinase sequence (mutations) or to co-express or knock-down the respective endogenous kinase, interlinked kinases or proteinogenic regulators of the respective pathway. After systematic administration of pathway modulating drugs or drug candidates, analyses of KinCon structure dynamics may reveal alterations in potency, efficacy, and potential synergistic effects of the tested bioactive small molecules (schematic dose response curves are depicted)

      Lastly, the use of these three cartoons gives the impression that the experimental results to come will follow a similar representation. Instead, the results are presented in bar plots for many different conditions. I think this will lead to confusion for a broad audience.

      The bottom panel of Figure 1C is not the depiction of real experiments but rather an illustration of fitted dose-response curves. We would like to present previous demonstrations of doseresponse curves using BRAF KinCon data and ERK phosphorylation (Röck 2019, Sci. Advances) 

      We further agree with the reviewer and have therefore added a new part in the methods section addressing the evaluation of data extensively. 

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown. In these cases, absolute bioluminescence values without any normalization are shown. Otherwise, data was indicated as RLU (relative light unit) fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated.

      For a non-expert reader, can the authors clarify the use of tracking basal conformations vs. transient over-expression of the various KinCon constructs? Moreover, the authors use the term transient over-expression for 10, 16, 24, and 48 h (Line 203). This, to a non-expert reader, does not seem transient.

      We have revised the manuscript to clarify it:

      - Line 207: We showed that transient over-expression of these KinCon reporters for a time frame of 10h, 16h, 24h or 48h in HEK293T cells delivers consistently increasing signals for all KinCon reporters (Figure 1E, Figure Supplement 1A). 

      - Figure 1E) Representative KinCon experiments of time-dependent expressions of indicated KinCon reporter constructs in HEK293T cells are shown (mean ±SEM). Indicated KinCon reporters were transiently over-expressed in 24-well format in HEK293T cells for 10h, 16h, 24h and 48h each.

      Regarding Figure 1E and similar graphical representations: Why is the signal (RLU) nonlinear with time? If the fluorescence of the KinCon construct is linearly related to its expression or concentration inside the cell, one would expect a linear increase. Have the authors plotted RLU/Expression band intensity to account for changes in protein concentration? For instance, some of the results within Figure 3 are normalized to concentration on reporter expression level.

      Out intention was to show that varying expression levels can be used for the illustrated target engagement assays.Indeed, the represented elevations of RLU might be  due to factors such as: 

      - Doubling times of cells

      - Cell density

      - Media composition (which changes over time)

      - Reporter protein stabilities

      - Abundance of interactors of kinases

      For the results with LKB1, the authors claim that intermediate fold change in fluorescence (Figure 2E) is due to a partially closed intermediate state (Line 262). Can the authors discard the possibility by which there is a change in populations of active and inactive that on average give intermediate values?

      Based on our experience with KinCon reporter conformation states of kinases we tested so far, we assume that the presented data reflects an intermediate state. We agree that it needs further validation. We have changed the text accordingly:

      - Line 264: Upon interaction with LKB1 this conformation shifts to a partially closed intermediate state.

      The authors claim in Line 274 that mutations located at the interface of the LKB1/STRADalpha complex affect interactions and hypothesize that allosteric communication between LKB1 and STRADalpha is essential for function. Given that these mutations are at the interaction interface, why would the authors postulate an allosteric mechanism that evokes an effect distant from the interaction/active site? Could it be that function requires surface contacts alone that are disrupted by the mutations?

      We agree with the reviewer and changed our argumentation for this point:

      - Line 276: These findings underline that indeed the trimeric complex formation alters the opening and closing of the tested full-length kinase structures using the applied KinCon reporter read out

      I was unable to find text to explain the following: Figure 2I shows the mutation R74A as n.s., but in the text, only W308C is mentioned to not change fluorescence. Could the authors clarify why R74A is not discussed in the text?  Maybe this reviewer missed the text in which it was discussed.

      We adapted the manuscript and include the R74A mutation as followed:

      - Line 296: Among these mutations, only the W308C and R74A mutation prevented significant closing of the LKB1 conformation when co-expressed with STRAD𝛼 and MO25 (Figure 2I).

      In Figure 2I where the individual measurements of the LKB1-R74A KinCon are highlighted in red to better emphasize the deviations. In the case of the R74A mutation the effect seen might be due to the high deviation between the experiments (Highlighted in red). These deviations are much higher when compared to either the wt or the W308 mutant, and can also be seen in the LKB1-R74A-KinCon only condition (white). Even though no significant closing of the LKB1 conformation could be observed in the case of R74A, we believe, since the trend of the conformation closing upon complex formation is still visible that the effect is still there. Further replicates would be necessary to validate this theory. 

      Similarly, the authors state in line 326 that the study included an analysis of RIPK2. However, I was unable to find results, graphs, or additional text discussing RIPK2.

      The RIPK2 conformation was analyzed in Figure 3C (page 12).

      Some figures of RLU use absolute values, percentages, and fold change. Is there are reason why the authors use different Y-axis values? These should be explained and justified in Methods. Similarly, bars for wt in Figures 3D, G, or 4D, E, F show no errors. How are the authors normalizing the data and repeats so that there is no error, and are they treating the rest of the data (i.e., mutants and/or treated with small molecules) in the same way?

      We have changed the Y-axis values. Now, throughout the manuscript we show that there is a RLU fold-change. Except are selected experiments when solely absolute RLU values are shown (such as Figure 1E, F). We have also decided to integrate a paragraph into the methods section (Line 655). Figure 3D was changed as well.

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown.  In these cases absolute bioluminescence values without any normalisation are shown.  Otherwise, data was indicated as RLU fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated).

      The data is generally normalized on wt or untreated conditions, when the cells were treated with small molecules for target engagement assays. 

      Lastly, the section starting in Line 472 reads more like a discussion of results from different types of inhibitors used in this study that results on its own. The authors should consider a new subtitle such as results or make this section a discussion.

      We agree with the reviewer and this part of the results was split into a new section of the result:

      - Line 455: “Effect of different kinase inhibitor types on the KinCon reporter system”.

      Reviewer #2 (Recommendations For The Authors):

      I have a few suggestions, since the paper is a distillation of a vast amount of work and tells a useful story.

      (1) The work is very solid, uses examples from the literature, and also extends into new experimental space. An obvious weakness is mentioned by the authors for the CKD data, in that measurements with Cyclin D (the activating subunit) are not characterized, although Cyclin D might be assumed to be present. 

      We performed experiments with the CDK4/6 KinCon reporters and co-expressed CyclinD with a ratio of 1:3 (HEK293T cells, expression for 48h). However, in the context of inhibitor treatments we could not track conformation changes in these initial experiments. The cells were treated with the indicated CDK4/6i [1µM] for 3h. This seems to not impact the conformation of CDK4/6 wt or mutated KinCon reporters. There is a tendency that CyclinD co-expression promotes CDK4/6 conformation opening (data not shown).

      Author response image 1.

      Bioluminescence signal of CDK4/6 KinCon reporters with co-expressed CyclinD3 (HEK293T, expression for 48h) upon exposure to indicated CDK4/6i [1µM] or DMSO for 3h (mean ±SEM, n=3 ind. experiments). No significant changes using the current setting.

      (2) The work with the trimeric LKB1 complex involves pseudokinase, STRADalpha, whose conformation is also examined as a function of LKB1 status; since STRAD is an activator of LKB1. A future goal should be the evaluation of the complex in the presence of STRAD inhibitory/activating small molecules.

      Thank you for this great idea, we are currently compiling a FWF grant application to get support for such a R&D project.

      Minor points

      • Have any of the data been repeated in a different cell background? This came to mind because HeLa cells lack LKB1, which might be a useful place to test the LKB1 data in a different context.

      This experiment was performed and we show it in Figure Supplement 5. Further, we followed the advice of the reviewer and performed suggested experiments. We integrated the colon cancer cell line SW480 into the experimental setup. Overall, three cell settings showed the same pattern of KinCon reporter analyses for LKB1-STRADα-MO25 complex formation utilizing the LKB1- and STRADα-KinCon reporters.  

      • The study picks up the PKA Cushings Syndrome field, which makes sense, and data are presented for L206R. PMID 35830806 explains how different patient mutations drive different signaling outcomes through distinct complex formations, and it would be interesting to discuss how mutations in KinCon complexes, especially those with mutations, could affect sub-cellular localization. Could the authors explain if this was done for any of the proteins, whose low experimental expression is a clear advantage, but is presumably hard to maintain across experiments?

      The feedback of the reviewer motivated us to perform subcellular fractionation experiments. They were performed with PKAc wt and L206R KinCon reporters as well as BRAF wt and V600E reporters. We were not able to see major differences between the wt and mutated reporter constructs in respect to their nucleus: cytoplasm localizations (Figure Supplement 4). For your information, in a R+D project with the mitochondrial kinase PINK1 we see localization of the reporter as expected almost exclusively at the mitochondria fraction. 

      - Line 495: In this context of activating kinase mutations we showed that using PKAc (wt and L206R) and BRAF (wt and V600E) reporters as example we could not track alterations of cytoplasmic and nuclear localization (Figure Supplement 4). Furthermore, subcellular localization of PKAc KinCon reporters did not change when L206R mutant was introduced (Figure Supplement 4). As a control BRAF wt and V600E KinCon reporters were used and also no changes in localization was observed.

      • I suggest changing PMs (Figure 2 and others) simply to mutation, I read this as plasma membrane constantly.

      We agree and we have changed it to “patient mutation” in Figure 2C, Figure 3E, Figure 4B.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that the nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SND1.

      Strengths:

      The authors developed a tissue-specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

      Weaknesses:

      Despite efforts at generating a liver-specific knockout, the phenotypic characterization is not focused on the key readouts. Notably missing are rigorous lipid flux studies and targeted gene expression/protein measurement that would underpin why the loss of Snhg3 protects from lipid accumulation. Along those lines, claims linking the Snhg3 to MAFLD would be better supported with careful interrogation of markers of fibrosis and advanced liver disease. In other areas, significance is limited since the presented data is either not clear or rigorous enough. Finally, there is an important conceptual limitation to the work since PPARG is not established to play a major role in the liver.

      We thank the reviewer for the detailed comment. In this study, hepatocyte-specific Snhg3 deficiency decreased body and liver weight and alleviated hepatic steatosis in DIO mice, whereas overexpression induced the opposite effect (Figure 2 and 3). Furthermore, we investigated the hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). We validated the expression of some DEGs involved in fatty acid metabolism by RT-qPCR. The results showed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2 were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C), respectively. Please check them in the first paragraph in p8.

      As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and HSCs) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13.

      Hepatotoxicity accelerates the development of progressive inflammation, oxidative stress and fibrosis (Roehlen et al., 2020). Chronic liver injury including MASLD can progress to liver fibrosis with the formation of a fibrous scar. Injured hepatocytes can secrete fibrogenic factors or exosomes containing miRNAs that activate HSCs, the major source of the fibrous scar in liver fibrosis (Kisseleva and Brenner, 2021). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). In this study, no hepatic fibrosis phenotype was seen in Snhg3-HKO and Snhg3-HKI mice (figures supplement 1D and 2D). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as collagen type I alpha 1/2 (Col1a1 and Col1a2), but had no effects on the pro-inflammatory factors, including transforming growth factor β1 (Tgfβ1), tumor necrosis factor α (Tnfα), interleukin 6 and 1β (Il6 and Il1β) (figures supplement 3A and B). Inflammation is an absolute requirement for fibrosis because factors from injured hepatocytes alone are not sufficient to directly activate HSCs and lead to fibrosis (Kisseleva and Brenner, 2021). Additionally, previous studies indicated that exposure to HFD for more 24 weeks causes less severe fibrosis (Alshawsh et al., 2022). In future, the effect of Snhg3 on hepatic fibrosis in mice need to be elucidated by prolonged high-fat feeding or by adopting methionine- and choline deficient diet (MCD) feeding. Please check them in the second paragraph in the section of Discussion in p13.

      References

      ALSHAWSH, M. A., ALSALAHI, A., ALSHEHADE, S. A., SAGHIR, S. A. M., AHMEDA, A. F., AL ZARZOUR, R. H. & MAHMOUD, A. M. 2022. A Comparison of the Gene Expression Profiles of Non-Alcoholic Fatty Liver Disease between Animal Models of a High-Fat Diet and Methionine-Choline-Deficient Diet. Molecules, 27. DIO:10.3390/molecules27030858, PMID:35164140

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      KISSELEVA, T. & BRENNER, D. 2021. Molecular and cellular mechanisms of liver fibrosis and its regression. Nat Rev Gastroenterol Hepatol, 18, 151-166. DIO:10.1038/s41575-020-00372-7, PMID:33128017

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      ROEHLEN, N., CROUCHET, E. & BAUMERT, T. F. 2020. Liver Fibrosis: Mechanistic Concepts and Therapeutic Perspectives. Cells, 9. DIO:10.3390/cells9040875, PMID:32260126

      Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by a high-fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Snhg3 knockout was reduced, while Snhg3 over-expression potentiated fatty liver in mice on an HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study. There are, however, several potential issues to consider before jumping to a conclusion.

      (1) First of all, it's important to ensure the robustness and rigor of each study. The manuscript was not carefully put together. The image qualities for several figures were poor, making it difficult for the readers to evaluate the results with confidence. The biological replicates and numbers of experimental repeats for cell-based assays were not described. When possible, the entire immunoblot imaging used for quantification should be presented (rather than showing n=1 representative). There were multiple mislabels in figure panels or figure legends (e.g., Figure 2I, Figure 2K, and Figure 3K). The b-actin immunoblot image was reused in Figure 4J, Figure 5G, and Figure 7B with different exposure times. These might be from the same cohort of mice. If the immunoblots were run at different times, the loading control should be included on the same blot as well.

      We thank the reviewer for the detailed comment. We have provided the clear figures in revised manuscript, please check them.

      The biological replicates and numbers of experimental repeats for cell-based assays had been updated and please check them in the manuscript.

      The entire immunoblot imaging used for quantification had been provided in the primary data. Please check them.

      The original Figure 2I, Figure 2K, Figure 3K have been revised and replaced with new Figure 2F, Figure 2H, Figure 3H, and their corresponding figure legends has also been corrected in revised manuscript.

      The protein levels of CD36, PPARγ and β-ACTIN were examined at the same time and we had revised the manuscript, please check them in revised Figure 7B and 7C.

      (2) The authors can do a better job in explaining the logic for how they came up with the potential function of each component of the signaling cascade. Snhg3 is down-regulated by HFD. However, the evidence presented indicates its involvement in promoting steatosis. In Figure 1C, one would expect PPARg expression to be up-regulated (when Sngh3 was down-regulated). If so, the physiological observation conflicts with the proposed mechanism. In addition, SND1 is known to regulate RNA/miRNA processing. How do the authors rule out this potential mechanism? How about the hosting snoRNA, Snord17? Does it involve the progression of NASLD?

      We thank the reviewer for the detailed comment. Our results showed that the expression of Snhg3 was decreased in DIO mice which led us to speculate that the downregulation of Snhg3 in DIO mice might be a stress protective reaction to high nutritional state, but the specific details need to be clarified. This is probably similar to fibroblast growth factor 21 (FGF21) and growth differentiation factor 15 (GDF15), whose endogenous expression and circulating levels are elevated in obese humans and mice despite their beneficial effects on obesity and related metabolic complications (Keipert and Ost, 2021). Although FGF21 can be induced by oxidative stress and be activated in obese mice and in NASH patients, elevated FGF21 paradoxically protects against oxidative stress and reduces hepatic steatosis (Tillman and Rolph, 2020).  We had added the content the section of Discussion, please check it in the second paragraph in p12.

      SND1 has multiple roles through associating with different types of RNA molecules, including mRNA, miRNA, circRNA, dsRNA and lncRNA. SND1 could bind negative-sense SARS-CoV-2 RNA and promoted viral RNA synthesis, and to promote viral RNA synthesis (Schmidt et al., 2023). SND1 is also involved in hypoxia by negatively regulating hypoxia‐related miRNAs (Saarikettu et al., 2023). Furthermore, a recent study revealed that lncRNA SNAI3-AS1 can competitively bind to SND1 and perturb the m6A-dependent recognition of Nrf2 mRNA 3'UTR by SND1, thereby reducing the mRNA stability of Nrf2 (Zheng et al., 2023). Huang et al. also reported that circMETTL9 can directly bind to and increase the expression of SND1 in astrocytes, leading to enhanced neuroinflammation (Huang et al., 2023). However, whether there is an independent-histone methylation role of SND1/lncRNA-Snhg3 involved in lipid metabolism in the liver needs to be further investigated. We also discussed the limitation in the manuscript and please refer the section of Discussion in the third paragraph in p17.

      Snhg3 serves as host gene for producing intronic U17 snoRNAs, the H/ACA snoRNA. A previous study found that cholesterol trafficking phenotype was not due to reduced Snhg3 expression, but rather to haploinsufficiency of U17 snoRNA. Upregulation of hypoxia-upregulated mitochondrial movement regulator (HUMMR) in U17 snoRNA-deficient cells promoted the formation of ER-mitochondrial contacts, resulting in decreasing cholesterol esterification and facilitating cholesterol trafficking to mitochondria (Jinn et al., 2015). Additionally, disruption of U17 snoRNA caused resistance to lipid-induced cell death and general oxidative stress in cultured cells. Furthermore, knockdown of U17 snoRNA in vivo protected against hepatic steatosis and lipid-induced oxidative stress and inflammation (Sletten et al., 2021). We determined the expression of hepatic U17 snoRNA and its effect on SND1 and PPARγ. The results showed that the expression of U17 snoRNA decreased in the liver of DIO Snhg3-HKO mice and unchanged in the liver of DIO Snhg3-HKI mice, but overexpression of U17 snoRNA had no effect on the expression of SND1 and PPARγ (figure supplement 5A-C), indicating that Sngh3 induced hepatic steatosis was independent on U17 snoRNA. We also discussed it in revised manuscript, please refer the section of Discussion in p15.

      References

      HUANG, C., SUN, L., XIAO, C., YOU, W., SUN, L., WANG, S., ZHANG, Z. & LIU, S. 2023. Circular RNA METTL9 contributes to neuroinflammation following traumatic brain injury by complexing with astrocytic SND1. J Neuroinflammation, 20, 39. DIO:10.1186/s12974-023-02716-x, PMID:36803376

      JINN, S., BRANDIS, K. A., REN, A., CHACKO, A., DUDLEY-RUCKER, N., GALE, S. E., SIDHU, R., FUJIWARA, H., JIANG, H., OLSEN, B. N., SCHAFFER, J. E. & ORY, D. S. 2015. snoRNA U17 regulates cellular cholesterol trafficking. Cell Metab, 21, 855-67. DIO:10.1016/j.cmet.2015.04.010, PMID:25980348

      KEIPERT, S. & OST, M. 2021. Stress-induced FGF21 and GDF15 in obesity and obesity resistance. Trends Endocrinol Metab, 32, 904-915. DIO:10.1016/j.tem.2021.08.008, PMID:34526227

      SAARIKETTU, J., LEHMUSVAARA, S., PESU, M., JUNTTILA, I., PARTANEN, J., SIPILA, P., POUTANEN, M., YANG, J., HAIKARAINEN, T. & SILVENNOINEN, O. 2023. The RNA-binding protein Snd1/Tudor-SN regulates hypoxia-responsive gene expression. FASEB Bioadv, 5, 183-198. DIO:10.1096/fba.2022-00115, PMID:37151849

      SCHMIDT, N., GANSKIH, S., WEI, Y., GABEL, A., ZIELINSKI, S., KESHISHIAN, H., LAREAU, C. A., ZIMMERMANN, L., MAKROCZYOVA, J., PEARCE, C., KREY, K., HENNIG, T., STEGMAIER, S., MOYON, L., HORLACHER, M., WERNER, S., AYDIN, J., OLGUIN-NAVA, M., POTABATTULA, R., KIBE, A., DOLKEN, L., SMYTH, R. P., CALISKAN, N., MARSICO, A., KREMPL, C., BODEM, J., PICHLMAIR, A., CARR, S. A., CHLANDA, P., ERHARD, F. & MUNSCHAUER, M. 2023. SND1 binds SARS-CoV-2 negative-sense RNA and promotes viral RNA synthesis through NSP9. Cell, 186, 4834-4850 e23. DIO:10.1016/j.cell.2023.09.002, PMID:37794589

      SLETTEN, A. C., DAVIDSON, J. W., YAGABASAN, B., MOORES, S., SCHWAIGER-HABER, M., FUJIWARA, H., GALE, S., JIANG, X., SIDHU, R., GELMAN, S. J., ZHAO, S., PATTI, G. J., ORY, D. S. & SCHAFFER, J. E. 2021. Loss of SNORA73 reprograms cellular metabolism and protects against steatohepatitis. Nat Commun, 12, 5214. DIO:10.1038/s41467-021-25457-y, PMID:34471131

      TILLMAN, E. J. & ROLPH, T. 2020. FGF21: An Emerging Therapeutic Target for Non-Alcoholic Steatohepatitis and Related Metabolic Diseases. Front Endocrinol (Lausanne), 11, 601290. DIO:10.3389/fendo.2020.601290, PMID:33381084

      ZHENG, J., ZHANG, Q., ZHAO, Z., QIU, Y., ZHOU, Y., WU, Z., JIANG, C., WANG, X. & JIANG, X. 2023. Epigenetically silenced lncRNA SNAI3-AS1 promotes ferroptosis in glioma via perturbing the m(6)A-dependent recognition of Nrf2 mRNA mediated by SND1. J Exp Clin Cancer Res, 42, 127. DIO:10.1186/s13046-023-02684-3, PMID:37202791

      (3) The role of PPARg in fatty liver diseases might be a rodent-specific phenomenon. PPARg agonist treatment in humans may actually reduce ectopic fat deposition by increasing fat storage in adipose tissues. The relevance of the findings to human diseases should be discussed.

      We thank the reviewer for the detailed comment. As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and hepatic stellate cells (HSCs)) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13.

      References

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As a general strategy for the revision, I would advise the authors to focus on strengthening the analysis of the liver with the two most important figures being Figure 2 and Figure 3. The mechanism as it stands is problematic which reduces the impact of the animal studies despite substantial efforts from the authors. Consider removing or toning down some of the studies focused on mechanisms in the nucleus, including changing the title.

      We thank the reviewer for the detailed comment. In this study, hepatocyte-specific Snhg3 deficiency decreased body and liver weight, alleviated hepatic steatosis and promoted hepatic fatty acid metabolism in DIO mice, whereas overexpression induced the opposite effect. The hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). RT-qPCR analysis confirmed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2, were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as Col1a1 and Col1a2, but had no effects on the pro-inflammatory factors, including Tgfβ1, Tnfα, Il6 and Il1β (figure supplement 3A and B). The results indicated that Snhg3 involved in hepatic steatosis through regulating fatty acid metabolism. Furthermore, PPARγ was selected to study its role in Snhg3-induced hepatic steatosis by integrated analyzing the data from CUT&Tag-Seq, ATAC-Seq and RNA-Seq. Finally, inhibition of PPARγ with T0070907 alleviated Snhg3 induced Cd36 and Cidea/c increases and improved Snhg3-aggravated hepatic steatosis. In summary, we confirmed that SND1/H3K27me3/PPARγ is partially responsible for Sngh3-inuced hepatic steatosis. As the reviewer suggested, we replaced the title with “LncRNA-Snhg3 Aggravates Hepatic Steatosis via PPARγ Signaling”.

      (1) How is steatosis changing in the liver? Is this due to a change in fatty acid uptake, lipogenesis/synthesis, beta-oxidation, trig secretion, etc..? The analysis in Figures 2 and 3 is mostly focused on metabolic chamber studies which seem distracting, particularly in the absence of a mechanism and given a liver-specific perturbation. The authors should use a combination of targeted gene expression, protein blots, and lipid flux measurements to provide better insights here. The histology in Figure 2H suggests a very dramatic effect but does match with lipid measurements in 2I.

      We thank the reviewer for the detailed comment. The pathogenesis of MASLD has not been entirely elucidated. Multifarious factors such as genetic and epigenetic factors, nutritional factors, insulin resistance, lipotoxicity, microbiome, fibrogenesis and hormones secreted from the adipose tissue, are recognized to be involved in the development and progression of MASLD (Buzzetti et al., 2016, Lee et al., 2017, Rada et al., 2020, Sakurai et al., 2021, Friedman et al., 2018). In this study, we investigated the hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). We validated the expression of some DEGs involved in fatty acid metabolism by RT-qPCR. The results showed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2 were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C), respectively. Additionally, we re-analyzed the metabolic chamber data using CalR and the results showed that there were no obvious differences in heat production, total oxygen consumption, carbon dioxide production or RER between DIO Snhg3-HKO or DIO Snhg3-HKI and the corresponding control mice (figure supplement 1C and 2C). Unfortunately, we did not detect lipid flux due to limited experimental conditions. However, in summary, our results indicated that Snhg3 is involved in hepatic steatosis by regulating fatty acid metabolism. Please check them in the first paragraph in p8.

      Additionally, we determined the hepatic TC levels in other batch of DIO Snhg3-HKO and control mice and found there was no difference in hepatic TC (as below) between DIO Snhg3-HKO and control mice fed HFD 18 weeks. Perhaps the apparent difference in TC requires a prolonged high-fat diet feeding time.

      Author response image 1.

      Hepatic TC contents of in DIO Snhg3-Flox and Snhg3-HKO mice.

      References

      BUZZETTI, E., PINZANI, M. & TSOCHATZIS, E. A. 2016. The multiple-hit pathogenesis of non-alcoholic fatty liver disease (NAFLD). Metabolism, 65, 1038-48. DIO:10.1016/j.metabol.2015.12.012, PMID:26823198

      FRIEDMAN, S. L., NEUSCHWANDER-TETRI, B. A., RINELLA, M. & SANYAL, A. J. 2018. Mechanisms of NAFLD development and therapeutic strategies. Nat Med, 24, 908-922. DIO:10.1038/s41591-018-0104-9, PMID:29967350

      LEE, J., KIM, Y., FRISO, S. & CHOI, S. W. 2017. Epigenetics in non-alcoholic fatty liver disease. Mol Aspects Med, 54, 78-88. DIO:10.1016/j.mam.2016.11.008, PMID:27889327

      RADA, P., GONZALEZ-RODRIGUEZ, A., GARCIA-MONZON, C. & VALVERDE, A. M. 2020. Understanding lipotoxicity in NAFLD pathogenesis: is CD36 a key driver? Cell Death Dis, 11, 802. DIO:10.1038/s41419-020-03003-w, PMID:32978374

      SAKURAI, Y., KUBOTA, N., YAMAUCHI, T. & KADOWAKI, T. 2021. Role of Insulin Resistance in MAFLD. Int J Mol Sci, 22. DIO:10.3390/ijms22084156, PMID:33923817

      (2) Throughout the manuscript the authors make claims about liver disease models, but this is not well supported since markers of advanced liver disease are not examined. The authors should stain and show expression for fibrosis and inflammation.

      We thank the reviewer for the detailed comment. Metabolic dysfunction-associated fatty liver disease (MASLD) is characterized by excess liver fat in the absence of significant alcohol consumption. It can progress from simple steatosis to metabolic dysfunction-associated steatohepatitis (MASH) and fibrosis and eventually to chronic progressive diseases such as cirrhosis, end-stage liver failure, and hepatocellular carcinoma (Loomba et al., 2021). As the reviewer suggested, we detected the effect of Snhg3 on liver fibrosis and inflammation. The results showed no hepatic fibrosis phenotype was seen in Snhg3-HKO and Snhg3-HKI mice (figures supplement 1D and 2D). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as collagen type I alpha 1/2 (Col1a1 and Col1a2), but had no effects on the pro-inflammatory factors including Tgf-β, Tnf-α, Il-6 and Il-1β (figure supplement 3A and 3B). Inflammation is an absolute requirement for fibrosis because factors from injured hepatocytes alone are not sufficient to directly activate HSCs and lead to fibrosis (Kisseleva and Brenner, 2021). Additionally, previous studies indicated that exposure to HFD for more 24 weeks causes less severe fibrosis (Alshawsh et al., 2022). In future, the effect of Snhg3 on hepatic fibrosis in mice need to be elucidated by prolonged high-fat feeding or by adopting methionine- and choline deficient diet (MCD) feeding. Please check them in the second paragraph in the section of Discussion in p13.

      References

      ALSHAWSH, M. A., ALSALAHI, A., ALSHEHADE, S. A., SAGHIR, S. A. M., AHMEDA, A. F., AL ZARZOUR, R. H. & MAHMOUD, A. M. 2022. A Comparison of the Gene Expression Profiles of Non-Alcoholic Fatty Liver Disease between Animal Models of a High-Fat Diet and Methionine-Choline-Deficient Diet. Molecules, 27. DIO:10.3390/molecules27030858, PMID:35164140

      KISSELEVA, T. & BRENNER, D. 2021. Molecular and cellular mechanisms of liver fibrosis and its regression. Nat Rev Gastroenterol Hepatol, 18, 151-166. DIO:10.1038/s41575-020-00372-7, PMID:33128017

      LOOMBA, R., FRIEDMAN, S. L. & SHULMAN, G. I. 2021. Mechanisms and disease consequences of nonalcoholic fatty liver disease. Cell, 184, 2537-2564. DIO:10.1016/j.cell.2021.04.015, PMID:33989548

      (3) Publicly available datasets show that PPARG protein is not expressed in the liver (Science 2015 347(6220):1260419, PMID: 25613900). Are the authors sure this is not an effect on another PPAR isoform like alpha? ChIP and RNA-seq pathway readouts do not distinguish between different isoforms.

      We thank the reviewer for the detailed comment. As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and hepatic stellate cells (HSCs)) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13 in revised manuscript.

      PPARα, most highly expressed in the liver, transcriptionally regulates lipid catabolism by regulating the expression of genes mediating triglyceride hydrolysis, fatty acid transport, and β-oxidation. Activators of PPARα decrease plasma triglycerides by inhibiting its synthesis and accelerating its hydrolysis (Chen et al., 2023). Mice with deletion of the Pparα gene exhibited more hepatic steatosis under HFD induction. As the reviewer suggested, we investigated the effect of Snhg3 on Pparα expression.  The result showed that both deficiency of Snhg3 or overexpression of Snhg3 doesn’t affect the mRNA level of Pparα as showing below, indicating that Snhg3-induced lipid accumulation independent on PPARα. Additionally, the exon, upstream 2k, 5’-UTR and intron regions of Pparγ, not Pparα, were enriched with the H3K27me3 mark (fold_enrichment = 4.15697) in the liver of DIO Snhg3-HKO mice using the CUT&Tag assay (table supplement 8), which was further confirmed by ChIP (Figure 6F and G). Therefore, we choose PPARγ to study its role in Sngh3-induced hepatic steatosis by integrated analyzing the data from CUT&Tag-Seq, ATAC-Seq and RNA-Seq.

      Author response image 2.

      The mRNA levels of hepatic Pparα expression in DIO Snhg3-HKO mice and Snhg3-HKI mice compared to the controls.

      References

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      (4) Previous work suggests that SNHG3 regulates its neighboring gene MED18 which is an important regulator of global transcription. Could some of the observed effects be due to changes in MED18 or other neighboring genes?

      We thank the reviewer for the detailed comment. Previous work suggested that human SNHG3 promotes progression of gastric cancer by regulating neighboring MED18 gene methylation (Xuan and Wang, 2019). Here, we studied the effect of mouse Snhg3 on Med18 and the result showed that Snhg3 had no effect on the mRNA levels of Med18 (as below). Additionally, we also tested the effect of mouse Snhg3 on its neighboring gene, regulator of chromosome condensation 1 (Rcc1). Although deficiency of Snhg3 inhibited the mRNA level of Rcc1, overexpression of Snhg3 doesn’t affect the mRNA level of Rcc1 as showing below. RCC1, the only known guanine nucleotide exchange factor in the nucleus for Ran, a nuclear Ras-like G protein, directly participates in cellular processes such as nuclear envelope formation, nucleocytoplasmic transport, and spindle formation (Ren et al., 2020). RCC1 also regulates chromatin condensation in the late S and early M phases of the cell cycle. Many studies have found that RCC1 plays an important role in tumors. Furthermore, whether Rcc1 mediates the alleviated effect on MASLD of Snhg3 needs to be further investigated.

      Author response image 3.

      The mRNA levels of hepatic Rcc1 and Med18 expression in DIO Snhg3-HKO mice and Snhg3-HKI mice compared to the controls.

      References

      REN, X., JIANG, K. & ZHANG, F. 2020. The Multifaceted Roles of RCC1 in Tumorigenesis. Front Mol Biosci, 7, 225. DIO:10.3389/fmolb.2020.00225, PMID:33102517

      XUAN, Y. & WANG, Y. 2019. Long non-coding RNA SNHG3 promotes progression of gastric cancer by regulating neighboring MED18 gene methylation. Cell Death Dis, 10, 694. DIO:10.1038/s41419-019-1940-3, PMID:31534128

      (5) The claim that Snhg3 regulates SND1 protein stability seems subtle. There is data inconsistency between different panels regarding this regulation including Figure 5I, Figure 6A, and Figure 7E. In addition, is ubiquitination happening in the nucleus where Snhg3 is expressed?

      We thank the reviewer for the detailed comment. The effect of Snhg3-induced SND1 expression had been confirmed by western blotting, please check them in Figure 5I, Figure 6A, Figure 7E and corresponding primary data. Additionally, Snhg3-induced SND1 protein stability seemed subtle, indicating there may be other mechanism by which Snhg3 promotes SND1, such as riboregulation. We had added it in the section of Discussion, please check it in the second paragraph in p16.

      Additionally, we did not detect the sites where SND1 is modified by ubiquitination. Our results showed that Snhg3 was more localized in the nucleus (Figure 1D) and Snhg3 also promoted the nuclear localization of SND1 (Figure 5O). We had revised the diagram of Snhg3 action in Figure 8G. Please check them in revised manuscript.

      (6) The authors show that the loss of Snhg3 changes the global H3K27me3 level. Few enzymes modify H3K27me3 levels. Did the authors check for an interaction between EZH2, Jmjd3, UTX, and Snhg3/SND1?

      We thank the reviewer for the detailed comment. It is crucial to ascertain whether SND1 itself functions as a new demethylase or if it influences other demethylases, such as Jmjd3, enhancer of zeste homolog 2 (EZH2), and ubiquitously transcribed tetratricopeptide repeat on chromosome X (UTX). The precise mechanism by which SND1 regulates H3K27me3 is still unclear and hence requires further investigation. We had added the limitations in the section of Discussion and please check it in the third paragraph in p17.

      (7) Can the authors speculate if the findings related to Snhg3/SND1 extend to humans?

      We thank the reviewer for the detailed comment. Since the sequence of Snhg3 is not conserved between mice and humans, the findings in this manuscript may not be applicable to humans, but the detail need to be further exploited.

      (8) As a general rule the figures are too small or difficult to read with limited details in the figure legends which limits evaluation. For example, Figure 1B and almost all of 4 cannot read labels. Figure 2, cannot see the snapshots show of mice or livers. What figure is supporting the claim that snhg3KI are more 'hyper-accessible'? Can the authors clarify what Figure 4H is referring to?

      We thank the reviewer for the detailed comment. We have provided high quality figures in our revised manuscript.

      The ‘hyper-accessible’ state in the liver of Snhg3-HKI mice was inferred by the differentially accessible regions (DARs), that is, we discovered 4305 DARs were more accessible in Snhg3-HKI mice and only 2505 DARs were more accessible in control mice and please refer table supplement 3).

      The result of Figure 4H about heatmap for Cd36 was from hepatic RNA-seq of DIO Snhg3-HKI and control WT mice. For avoiding ambiguity, we have removed it.

      (9) Authors stated that upon Snhg3 knock out, more genes are upregulated(1028) than downregulated(365). This description does not match Figure 4A. It seems in Figure 4A there are equal numbers of up and downregulated genes.

      We thank the reviewer for the detailed question. We apologized for this mistake and have corrected it.

      (10) Provide a schematic of the knockout and KI strategy in the supplement.

      We thank the reviewer for the detailed comment. We had included the knockout and KI strategy in figure supplement 1A and B, and 2A.

      Reviewer #2 (Recommendations For The Authors):

      (1) Metabolic cage data need to be reanalyzed with CalR (particularly when the body weights are significantly different).

      We thank the reviewer for the detailed comment. We reanalyzed the metabolic cage data using CalR (Mina et al., 2018). The results showed that there were no obvious differences in heat production, total oxygen consumption, carbon dioxide production and the respiratory exchange ratio between DIO Snhg3-HKO and control mice. Similar to DIO Snhg3-HKO mice, there was also no differences in heat production, total oxygen consumption, carbon dioxide production, and RER between DIO Snhg3-HKI mice and WT mice. Please check them in figure supplement 1C and 2C, and Mouse Calorimetry in Materials and Methods.

      Reference

      MINA, A. I., LECLAIR, R. A., LECLAIR, K. B., COHEN, D. E., LANTIER, L. & BANKS, A. S. 2018. CalR: A Web-Based Analysis Tool for Indirect Calorimetry Experiments. Cell Metab, 28, 656-666 e1. DIO:10.1016/j.cmet.2018.06.019, PMID:30017358

      (2) ITT in Figure 2F should also be presented as % of the initial glucose level, which would reveal that there is no difference between WT and KO.

      We thank the reviewer for the detailed comment. We repeated ITT experiment and include the new data in revised manuscript, please check it in Figure 2C.

      (3) The fasting glucose results are inconsistent between ITT and GTT. Is there any difference in fasting glucose?

      We thank the reviewer for the questions. The difference between GTT and ITT was caused owing to different fasting time, that is, mice were fasted for 6 h in ITT and were fasted for 16 h in GTT. It seems that Snhg3 doesn’t affect short- and longer-time fasting glucose levels and please refer Figures 2C and 3C.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      [...] The experiments are well-designed and carefully conducted. The conclusions of this work are in general well supported by the data. There are a couple of points that need to be addressed or tested.

      1) It is unclear how LC phasic stimulation used in this study gates cortical plasticity without altering cellular responses (at least at the calcium imaging level). As the authors mentioned that Polack et al 2013 showed a significant effect of NE blockers in membrane potential and firing rate in V1 layer2/3 neurons during locomotion, it would be useful to test the effect of LC silencing (coupled to mismatch training) on both cellular response and cortical plasticity or applying NE antagonists in V1 in addition to LC optical stimulation. The latter experiment will also address which neuromodulator mediates plasticity, given that LC could co-release other modulators such as dopamine (Takeuchi et al. 2016 and Kempadoo et al. 2016). LC silencing experiment would establish a causal effect more convincingly than the activation experiment.

      Regarding the question of how phasic stimulation could alter plasticity without affecting the response sizes or activity in general, we believe there are possibilities supported by previous literature. It has been shown that catecholamines can gate plasticity by acting on eligibility traces at synapses (He et al., 2015; Hong et al., 2022). In addition, all catecholamine receptors are metabotropic and influence intracellular signaling cascades, e.g., via adenylyl cyclase and phospholipases. Catecholamines can gate LTP and LTD via these signaling pathways in vitro (Seol et al., 2007). Both of these influences on plasticity at the molecular level do not necessitate or predict an effect on calcium activity levels. We have now expanded on this in the discussion of the revised manuscript.

      While a loss of function experiment could add additional corroborating evidence that LC output is required for the plasticity seen, we did not perform loss-of-function experiments for three reasons:

      1. The effects of artificial activity changes around physiological set point are likely not linear for increases and decreases. The problem with a loss of function experiment here is that neuromodulators like noradrenaline affect general aspects of neuronal function. This is apparent in Polack et al., 2013: during the pharmacological blocking experiment, the membrane hyperpolarizes, membrane variance becomes very low, and the cells are effectively silenced (Figure 7 of (Polack et al., 2013)), demonstrating an immediate impact on neuronal function when noradrenaline receptor activation is presumably taken below physiological/waking levels. In light of this, if we reduce LC output/noradrenergic receptor activation and find that plasticity is prevented, this could be the result of a direct influence on the plasticity process, or, the result of a disruption of another aspect of neuronal function, like synaptic transmission or spiking. We would therefore challenge the reviewer’s statement that a loss-of-function experiment would establish a causal effect more convincingly than the gain- of-function experiment that we performed.

      2. The loss-of-function experiment is technically more difficult both in implementation and interpretation. Control mice show no sign of plasticity in locomotion modulation index (LMI) on the 10-minute timescale (Figure 4J), thus we would not expect to see any effect when blocking plasticity in this experiment. We would need to use dark-rearing and coupled-training of mice in the VR across development to elicit the relevant plasticity ((Attinger et al., 2017); manuscript Figure 5). We would then need to silence LC activity across days of VR experience to prevent the expected physiological levels of plasticity. Applying NE antagonists in V1 over the entire period of development seems very difficult. This would leave optogenetically silencing axons locally, which in addition to the problems of doing this acutely (Mahn et al., 2016; Raimondo et al., 2012), has not been demonstrated to work chronically over the duration of weeks. Thus, a negative result in this experiment will be difficult to interpret, and likely uninformative: We will not be able to distinguish whether the experimental approach did not work, or whether local LC silencing does nothing to plasticity.

      Note that pharmacologically blocking noradrenaline receptors during LC stimulation in the plasticity experiment is also particularly challenging: they would need to be blocked throughout the entire 15 minute duration of the experiment with no changes in concentration of antagonist between the ‘before’ and ‘after’ phases, since the block itself is likely to affect the response size, as seen in Polack et al., 2013, creating a confound for plasticity-related changes in response size. Thus, we make no claim about which particular neuromodulator released by the LC is causing the plasticity.

      1. There are several loss-of-function experiments reported in the literature using different developmental plasticity paradigms alongside pharmacological or genetic knockout approaches. These experiments show that chronic suppression of noradrenergic receptor activity prevents ocular dominance plasticity and auditory plasticity (Kasamatsu and Pettigrew, 1976; Shepard et al., 2015). Almost absent from the literature, however, are convincing gain-of-function plasticity experiments.

      Overall, we feel that loss-of-function experiments may be a possible direction for future work but, given the technical difficulty and – in our opinion – limited benefit that these experiments, would provide in light of the evidence already provided for the claims we make, we have chosen not to perform these experiments at this time. Note that we already discuss some of the problems with loss-of-function experiments in the discussion.

      2) The cortical responses to NE often exhibit an inverted U-curve, with higher or lower doses of NE showing more inhibitory effects. It is unclear how responses induced by optical LC stimulation compare or interact with the physiological activation of the LC during the mismatch. Since the authors only used one frequency stimulation pattern, some discussion or additional tests with a frequency range would be helpful.

      This is correct, we do not know how the artificial activation of LC axons relates to physiological activation, e.g. under mismatch. The stimulation strength is intrinsically consistent in our study in the sense that the stimulation level to test for changes in neuronal activity is similar to that used to probe for plasticity effects. We suspect that the artificial activation results in much stronger LC activity than seen during mismatch responses, given that no sign of the plasticity in LMI seen in high ChrimsonR occurs in low ChrimsonR or control mice (Figure 4J). Note, that our conclusions do not rely on the assumption that the stimulation is matched to physiological levels of activation during the visuomotor mismatches that we assayed. The hypothesis that we put forward is that increasing levels of activation of the LC (reflecting increasing rates or amplitude of prediction errors across the brain) will result in increased levels of plasticity. We know that LC axons can reach levels of activity far higher than that seen during visuomotor mismatches, for instance during air puff responses, which constitute a form of positive prediction error (unexpected tactile input) (Figures 2C and S1C). The visuomotor mismatches used in this study were only used to demonstrate that LC activity is consistent with prediction error signaling. We have now expanded on these points in the discussion as suggested.

      Reviewer #1 (Recommendations For The Authors):

      1) In Figure 3E, there is a rebound response of ChrimsonR at the offset of the mismatch. Is that common? If so, what does it mean? If not, maybe replace it with a more common example trace.

      This trace in fact represents the population average, so this offset response (or ‘rebound’) reflects a significant component of the population response to visual flow onset (i.e., mismatch offset), only under conditions of LC stimulation. See our response to reviewer 2 concerning this element of the response.

      2) It would be helpful to have some discussions on how a mismatch signal reaches and activates LC from cortical neurons.

      We have now added a short segment on this to the discussion.

      Reviewer #2 (Public Review):

      [...] The study provides very compelling data on a timely and fascinating topic in neuroscience. The authors carefully designed experiments and corresponding controls to exclude any confounding factors in the interpretation of neuronal activity in LC axons and cortical neurons. The quality of the data and the rigor of the analysis are important strengths of the study. I believe this study will have an important contribution to the field of system neuroscience by shedding new light on the role of a key neuromodulator. The results provide strong support for the claims of the study. However, I also believe that some results could have been strengthened by providing additional analyses and experimental controls. These points are discussed below.

      Calcium signals in LC axons tend to respond with pupil dilation, air puffs, and locomotion as the authors reported. A more quantitative analysis such as a GLM model could help understand the relative contribution (and temporal relationship) of these variables in explaining calcium signals. This could also help compare signals obtained in the sensory and motor cortical domains. Indeed, the comparison in Figure 2 seems a bit incomplete since only "posterior versus anterior" comparisons have been performed and not within-group comparisons. I believe it is hard to properly assess differences or similarities between calcium signal amplitude measured in different mice and cranial windows as they are subject to important variability (caused by different levels of viral expression for instance). The authors should at the very least provide a full statistical comparison between/within groups through a GLM model that would provide a more systematic quantification.

      To provide a more detailed comparison of responses, we have expanded on the analysis in Figure 2 to include comparative heatmaps from anterior and posterior imaging sites, as well as statistical comparisons of the response curves as a function of time. This shows how similar the responses are in the two regions.

      Beyond this, we are not sure how a regression analysis (GLM or otherwise) would help support the main point we aim to make here. The responses in anterior and posterior regions are similar, which supports a broadcast model of LC function in the cortex, rather than specialized routing of prediction error signals to cortical areas. Linear contributions of the signals are apparent from the stimulus triggered responses, and while non-linear interactions between the different variables are certainly an interesting question, they go beyond the point we aim to make and would also not be captured by a regression analysis. In addition, we have refined our language replacing descriptors of ‘the same’ or ‘indistinguishable’ between the two regions with ‘similar’, to highlight that while we find no evidence of a difference, our analysis does not cover all possible differences that might appear when looking at non-linear interactions.

      Previous studies using stimulations of the locus coeruleus or local iontophoresis of norepinephrine in sensory cortices have shown robust responses modulations (see McBurney-Lin et al., 2019, https://doi.org/10.1016/j.neubiorev.2019.06.009 for a review). The weak modulations observed in this study seem at odds with these reports. Given that the density of ChrimsonR-expressing axons varies across mice and that there are no direct measurements of their activation (besides pupil dilation), it is difficult to appreciate how they impact the local network. How does the density of ChrimsonR-expressing axons compare to the actual density of LC axons in V1? The authors could further discuss this point.

      In terms of estimating the percentage of cortical axons labelled based on our axon density measurements: we refer to cortical LC axonal immunostaining in the literature to make this comparison.

      In motor cortex, an average axon density of 0.07 µm/µm2 has been reported (Yin et al., 2021), and 0.09 µm/µm2 in prefrontal cortex (Sakakibara et al., 2021). Density of LC axons varies by cortical area, with higher density in motor cortex and medial areas than sensory areas (Agster et al., 2013): V1 axon density is roughly 70% of that in cingulate cortex (adjacent to motor and prefrontal cortices) (Nomura et al., 2014). So, we approximate a maximum average axon density in V1 of approximately 0.056 µm/µm2.

      Because these published measurements were made from images taken of tissue volumes with larger z-depth (~ 10 µm) than our reported measurements (~ 1 µm), they appear much larger than the ranges reported in our manuscript (0.002 to 0.007 µm/µm2). We repeated the measurements in our data using images of volumes with 10 µm z-depth, and find that the percentage axons labelled in our study in high ChrimsonR-expressing mice ranges between 0.012 to 0.039 µm/µm2. This corresponds to between 20% to 70% of the density we would expect based on previous work. Note that this is a potentially significant underestimate, and therefore should be used as a lower bound: analyses in the literature use images from immunostaining, where the signal to background ratio is very high. In contrast, we did not transcardially perfuse our mice leading to significant background (especially in the pia/L1, where axon density is high - (Agster et al., 2013; Nomura et al., 2014)), and the intensity of the tdTomato is not especially high. We therefore are likely missing some narrow, dim, and superficial fibers in our analysis.

      We also can quantify how our variance in axonal labelling affects our results: For the dataset in Figure 3, there doesn’t appear to be any correlation between the level of expression and the effect of stimulating the axons on the mismatch or visual flow responses for each animal (Author response image 1), while there is a significant correlation between the level of expression and the pupil dilation, consistent with the dataset shown in Figure 4. Thus, even in the most highly expressing mice, there is no clear effect on average response size at the level of the population. We have added these correlations to the revised manuscript as a new Figure S3.

      **Author response image 1. **

      Correlations between axon density and average effect of laser stimulation on stimulus responses and pupil dilation (data from manuscript Figure 3). Grey points show control mice, blue points show low ChrimsonR-expressing mice, and purple points show high ChrimsonR- expressing mice.

      To our knowledge, there has not yet been any similar experiment reported utilizing local LC axonal optogenetic stimulation while recording cortical responses, so when comparing our results to those in the literature, there are several important methodological differences to keep in mind. The vast majority of the work demonstrating an effect of LC output/noradrenaline on responses in the cortex has been done using unit recordings, and while results are mixed, these have most often demonstrated a suppressive effect on spontaneous and/or evoked activity in the cortex (McBurney-Lin et al., 2019). In contrast to these studies, we do not see a major effect of LC stimulation either on baseline or evoked calcium activity (Figure 3), and, if anything, we see a minor potentiation of transient visual flow onset responses (see also Author response image 2). There could be several reasons why our stimulation does not have the same effect as these older studies:

      1. Recording location: Unit recordings are often very biased toward highly active neurons (Margrie et al., 2002) and deeper layers of the cortex, while we are imaging from layer 2/3 – a layer notorious for sparse activity. In one of the few papers to record from superficial layers, it was been demonstrated that deeper layers in V1 are affected differently by LC stimulation methods compared to more superficial ones (Sato et al., 1989), with suppression more common in superficial layers. Thus, some differences between our results and those in the majority of the literature could simply be due to recording depth and the sampling bias of unit recordings.

      2. Stimulation method: Most previous studies have manipulated LC output/noradrenaline levels by either iontophoretically applying noradrenergic receptor agonists, or by electrically stimulating the LC. Arguably, even though our optogenetic stimulation is still artificial, it represents a more physiologically relevant activation compared to iontophoresis, since the LC releases a number of neuromodulators including dopamine, and these will be released in a more physiological manner in the spatial domain and in terms of neuromodulator concentration. Electrical stimulation of the LC as used by previous studies differs from our optogenetic method in that LC axons will be stimulated across much wider regions of the brain (affecting both the cortex and many of its inputs), and it is not clear whether the cause of cortical response changes is in cortex or subcortical. In addition, electrical LC stimulation is not cell type specific.

      3. Temporal features of stimulation: Few previous studies had the same level of temporal control over manipulating LC output that we had using optogenetics. Given that electrical stimulation generates electrical artifacts, coincident stimulation during the stimulus was not used in previous studies. Instead, the LC is often repeatedly or tonically stimulated, sometimes for many seconds, prior to the stimulus being presented. Iontophoresis also does not have the same temporal specificity and will lead to tonically raised receptor activity over a time course determined by washout times.

      4. State specificity: Most previous studies have been performed under anesthesia – which is known to impact noradrenaline levels and LC activity (Müller et al., 2011). Thus, the acute effects of LC stimulation are likely not comparable between anesthesia and in the awake animal.

      Due to these differences, it is hard to infer why our results differ compared to other papers. The study with the most similar methodology to ours is (Vazey et al., 2018), which used optogenetic stimulation directly into the mouse LC while recording spiking in deep layers of the somatosensory cortex with extracellular electrodes. Like us, they found that phasic optogenetic stimulation alone did not alter baseline spiking activity (Figure 2F of Vazey et al., 2018), and they found that in layers 5 and 6, short latency transient responses to foot touch were potentiated and recruited by simultaneous LC stimulation. While this finding appears more overt than the small modulations we see, it is qualitatively not so dissimilar from our finding that transient responses appear to be slightly potentiated when visual flow begins (Author response image 2). Differences in the degree of the effect may be due to differences in the layers recorded, the proportion of the LC recruited, or the fact anesthesia was used in Vazey et al., 2018.

      Note that we only used one set of stimulation parameters for optogenetic stimulation, and it is always possible that using different parameters would result in different effects. We have now added a discussion on the topic to the revised manuscript.

      In the analysis performed in Figure 3, it seems that red light stimulations used to drive ChrimsonR also have an indirect impact on V1 neurons through the retina. Indeed, figure 3D shows a similar response profile for ChrimsonR and control with calcium signals increasing at laser onset (ON response) and offset (OFF response). With that in mind, it is hard to interpret the results shown in Figure 3E-F without seeing the average calcium time course for Control mice. Are the responses following visual flow caused by LC activation or additional visual inputs? The authors should provide additional information to clarify this result.

      This is a good point. When we plot the average difference between the stimulus response alone and the optogenetic stimulation + stimulus response, we do indeed find that there is a transient increase in response at the visual flow onset (and the offset of mismatch, which is where visual flow resumes), and this is only seen in ChrimsonR-expressing mice (Author response image 2). We therefore believe that these enhanced transients at visual flow onset could be due to the effect of ChrimsonR stimulation, and indeed previous studies have shown that LC stimulation can reduce the onset latency and latency jitter of afferent-evoked activity (Devilbiss and Waterhouse, 2004; Lecas, 2004), an effect which could mediate the differences we see. We have added this analysis to the revised manuscript in Figure 3 and added discussion accordingly.

      **Author response image 2. **

      Difference in responses to visual stimuli caused by optogenetic stimulation, calculated by subtracting the average response when no laser was presented from the average response when the laser was presented concurrent with the visual stimulus. Pink traces show the response difference for ChrimsonR-expressing mice, and grey shows the same for control mice. Black blocks below indicate consecutive timepoints after stimulation showing a significant difference between ChrimsonR and control as determined by hierarchical bootstrapping (p<0.05).

      Some aspects of the described plasticity process remained unanswered. It is not clear over which time scale the locomotion modulation index changes and how many optogenetic stimulations are necessary or sufficient to saturate this index. Some of these questions could be addressed with the dataset of Figure 3 by measuring this index over different epochs of the imaging session (from early to late) to estimate the dynamics of the ongoing plasticity process (in comparison to control mice). Also, is there any behavioural consequence of plasticity/update of functional representation in V1? If plasticity gated by repeated LC activations reproduced visuomotor responses observed in mice that were exposed to visual stimulation only in the virtual environment, then I would expect to see a change in the locomotion behaviour (such as a change in speed distribution) as a result of the repeated LC stimulation. This would provide more compelling evidence for changes in internal models for visuomotor coupling in relation to its behavioural relevance. An experiment that could confirm the existence of the LC-gated learning process would be to change the gain of the visuomotor coupling and see if mice adapt faster with LC optogenetic activation compared to control mice with no ChrimsonR expression. Authors should discuss how they imagine the behavioural manifestation of this artificially-induced learning process in V1.

      Regarding the question of plasticity time course: Unfortunately, owing to the paradigm used in Figure 3, the time course of the plasticity will not be quantifiable from this experiment. This is because in the first 10 minutes, the mouse is in closed loop visuomotor VR experience, undergoing optogenetic stimulation (this is the time period in which we record mismatches). We then shift to the open loop session to quantify the effect of optogenetic stimulation on visual flow responses. Since the plasticity is presumably happening during the closed loop phase, and we have no read-out of the plasticity during this phase (we do not have uncoupled visual flow onsets to quantify LMI in closed loop), it is not possible to track the plasticity over time.

      Regarding the behavioral relevance of the plasticity: The type of plasticity we describe here is consistent with predictive, visuomotor plasticity in the form of a learned suppression of responses to self-generated visual feedback during movement. Intuitive purposes of this type of plasticity would be 1) to enable better detection of external moving objects by suppressing the predictable (and therefore redundant) self-generated visual motion and 2) to better detect changes in the geometry of the world (near objects have a larger visuomotor gain that far objects). In our paradigm, we have no intuitive read-out of the mouse’s perception of these things, and it is not clear to us that they would be reflected in locomotion speed, which does not differ between groups (manuscript Figure S5). Instead, we would need to turn to other paradigms for a clear behavioral read-out of predictive forms of sensorimotor learning: for instance, sensorimotor learning paradigms in the VR (such as those used in (Heindorf et al., 2018; Leinweber et al., 2017)), or novel paradigms that reinforce the mouse for detecting changes in the gain of the VR, or moving objects in the VR, using LC stimulation during the learning phase to assess if this improves acquisition. This is certainly a direction for future work. In the case of a positive effect, however, the link between the precise form of plasticity we quantify in this manuscript and the effect on the behavior would remain indirect, so we see this as beyond the scope of the manuscript. We have added a discussion on this topic to the revised manuscript.

      Finally, control mice used as a comparison to mice expressing ChrimsonR in Figure 3 were not injected with a control viral vector expressing a fluorescent protein alone. Although it is unlikely that the procedure of injection could cause the results observed, it would have been a better control for the interpretation of the results.

      We agree that this indeed would have been a better control. However, we believe that this is fortunately not a major problem for the interpretation of our results for two reasons:

      1. The control and ChrimsonR expressing mice do not show major differences in the effect of optogenetic LC stimulation at the level of the calcium responses for all results in Figure 3, with the exception of the locomotion modulation indices (Figure 3I). Therefore, in terms of response size, there is no major effect compared to control animals that could be caused by the injection procedure, apart from marginally increased transient responses to visual flow onset – and, as the reviewer notes, it is difficult to see how the injection procedure would cause this effect.

      2. The effect on locomotion modulation index (Figure 3I) was replicated with another set of mice in Figure 4C, for which we did have a form of injected control (‘Low ChrimsonR’), which did not show the same plasticity in locomotion modulation index (Figure 4E). We therefore know that at least the injection itself is not resulting in the plasticity effect seen.

      Reviewer #2 (Recommendations For The Authors):

      In experiments where axonal imaging was performed on LC axons, the authors should indicate the number of mice used in addition to the number of Field of View (FoV). Indeed, samples (FoVs) are not guaranteed to be independent as LC axons can span large cortical areas and the same axon can end up in different FoVs. Please provide statistics across mice/cranial windows to confirm the robustness of the results.

      All information requested regarding animal numbers in axonal imaging are provided in the statistical Table S1, as well as in the text and figures (e.g., Figure 2A). Samples will be independent in time (as different FoVs were imaged on different days), but it is indeed possible that axon segments from different FoVs within an animal come from the same axon.

      Averaging across animals greatly reduces statistical power. We have therefore implemented hierarchical bootstrapping instead: bootstrapping first occurs at the level of animal and then at the level of FoV. All p-values that were reported as significant in manuscript remained significant with this test, with no major reduction in significance level, with the exception of Figure S2B, where statistical significance was lost (p = 0.04 with Rank sum, p = 0.07 with hierarchical Bootstrapping). We therefore conclude that sampling from the same animals across days is not responsible for the significance of results reported.

      References

      Agster, K.L., Mejias-Aponte, C.A., Clark, B.D., Waterhouse, B.D., 2013. Evidence for a regional specificityi n the density and distribution of noradrenergic varicosities in rat cortex. Journal of Comparative Neurology 521, 2195–2207. https://doi.org/10.1002/cne.23270

      Attinger, A., Wang, B., Keller, G.B., 2017. Visuomotor Coupling Shapes the Functional Development of Mouse Visual Cortex. Cell 169, 1291-1302.e14. https://doi.org/10.1016/j.cell.2017.05.023

      Devilbiss, D.M., Waterhouse, B.D., 2004. The Effects of Tonic Locus Ceruleus Output on Sensory-Evoked Responses of Ventral Posterior Medial Thalamic and Barrel Field Cortical Neurons in the Awake Rat. J. Neurosci. 24, 10773–10785. https://doi.org/10.1523/JNEUROSCI.1573-04.2004

      He, K., Huertas, M., Hong, S.Z., Tie, X., Hell, J.W., Shouval, H., Kirkwood, A., 2015. Distinct Eligibility Traces for LTP and LTD in Cortical Synapses. Neuron 88, 528–538. https://doi.org/10.1016/j.neuron.2015.09.037

      Heindorf, M., Arber, S., Keller, G.B., 2018. Mouse Motor Cortex Coordinates the Behavioral Response to Unpredicted Sensory Feedback. Neuron 0. https://doi.org/10.1016/j.neuron.2018.07.046

      Hong, S.Z., Mesik, L., Grossman, C.D., Cohen, J.Y., Lee, B., Severin, D., Lee, H.-K., Hell, J.W., Kirkwood, A., 2022. Norepinephrine potentiates and serotonin depresses visual cortical responses by transforming eligibility traces. Nat Commun 13, 3202. https://doi.org/10.1038/s41467-022-30827-1

      Kasamatsu, T., Pettigrew, J.D., 1976. Depletion of brain catecholamines: failure of ocular dominance shift after monocular occlusion in kittens. Science 194, 206–209. https://doi.org/10.1126/science.959850

      Lecas, J.-C., 2004. Locus coeruleus activation shortens synaptic drive while decreasing spike latency and jitter in sensorimotor cortex. Implications for neuronal integration. European Journal of Neuroscience 19, 2519–2530. https://doi.org/10.1111/j.0953-816X.2004.03341.x

      Leinweber, M., Ward, D.R., Sobczak, J.M., Attinger, A., Keller, G.B., 2017. A Sensorimotor Circuit in Mouse Cortex for Visual Flow Predictions. Neuron 95, 1420-1432.e5. https://doi.org/10.1016/j.neuron.2017.08.036

      Mahn, M., Prigge, M., Ron, S., Levy, R., Yizhar, O., 2016. Biophysical constraints of optogenetic inhibition at presynaptic terminals. Nat Neurosci 19, 554–556. https://doi.org/10.1038/nn.4266

      Margrie, T.W., Brecht, M., Sakmann, B., 2002. In vivo, low-resistance, whole-cell recordings from neurons in the anaesthetized and awake mammalian brain. Pflugers Arch. 444, 491–498. https://doi.org/10.1007/s00424-002-0831-z

      McBurney-Lin, J., Lu, J., Zuo, Y., Yang, H., 2019. Locus coeruleus-norepinephrine modulation of sensory processing and perception: A focused review. Neurosci Biobehav Rev 105, 190–199. https://doi.org/10.1016/j.neubiorev.2019.06.009

      Müller, C.P., Pum, M.E., Amato, D., Schüttler, J., Huston, J.P., De Souza Silva, M.A., 2011. The in vivo neurochemistry of the brain during general anesthesia. Journal of Neurochemistry 119, 419–446. https://doi.org/10.1111/j.1471-4159.2011.07445.x

      Nomura, S., Bouhadana, M., Morel, C., Faure, P., Cauli, B., Lambolez, B., Hepp, R., 2014. Noradrenalin and dopamine receptors both control cAMP-PKA signaling throughout the cerebral cortex. Front Cell Neurosci 8. https://doi.org/10.3389/fncel.2014.00247

      Polack, P.-O., Friedman, J., Golshani, P., 2013. Cellular mechanisms of brain-state-dependent gain modulation in visual cortex. Nat Neurosci 16, 1331–1339. https://doi.org/10.1038/nn.3464

      Raimondo, J.V., Kay, L., Ellender, T.J., Akerman, C.J., 2012. Optogenetic silencing strategies differ in their effects on inhibitory synaptic transmission. Nat Neurosci 15, 1102–1104. https://doi.org/10.1038/nn.3143

      Sakakibara, Y., Hirota, Y., Ibaraki, K., Takei, K., Chikamatsu, S., Tsubokawa, Y., Saito, T., Saido, T.C., Sekiya, M., Iijima, K.M., n.d. Widespread Reduced Density of Noradrenergic Locus Coeruleus Axons in the App Knock-In Mouse Model of Amyloid-β Amyloidosis. J Alzheimers Dis 82, 1513–1530. https://doi.org/10.3233/JAD-210385

      Sato, H., Fox, K., Daw, N.W., 1989. Effect of electrical stimulation of locus coeruleus on the activity of neurons in the cat visual cortex. Journal of Neurophysiology. https://doi.org/10.1152/jn.1989.62.4.946

      Seol, G.H., Ziburkus, J., Huang, S., Song, L., Kim, I.T., Takamiya, K., Huganir, R.L., Lee, H.-K., Kirkwood, A., 2007. Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron 55, 919–929. https://doi.org/10.1016/j.neuron.2007.08.013

      Shepard, K.N., Liles, L.C., Weinshenker, D., Liu, R.C., 2015. Norepinephrine is necessary for experience-dependent plasticity in the developing mouse auditory cortex. J Neurosci 35, 2432–2437.https://doi.org/10.1523/JNEUROSCI.0532-14.2015

      Vazey, E.M., Moorman, D.E., Aston-Jones, G., 2018. Phasic locus coeruleus activity regulates cortical encoding of salience information. Proceedings of the National Academy of Sciences 115, E9439– E9448. https://doi.org/10.1073/pnas.1803716115

      Yin, X., Jones, N., Yang, J., Asraoui, N., Mathieu, M.-E., Cai, L., Chen, S.X., 2021. Delayed motor learning in a 16p11.2 deletion mouse model of autism is rescued by locus coeruleus activation. Nat Neurosci 24, 646–657. https://doi.org/10.1038/s41593-021-00815-7

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies the mitotic localization mechanism for Aurora B and INCENP (parts of the chromosomal passenger complex, CPC) in Trypanosoma brucei. The mechanism is different from that in the more commonly studied opisthokonts and there is solid support from RNAi and imaging experiments, targeted mutations, immunoprecipitations with crosslinking/mass spec, and AlphaFold interaction predictions. The results could be strengthened by biochemically testing proposed direct interactions and demonstrating that the targeting protein KIN-A is a motor. The findings will be of interest to parasitology researchers as well as cell biologists working on mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. Please note that the conserved glycine residue in the Switch II helix in KIN-A was mistakenly labelled as G209 in the original manuscript. We now corrected it to G210 in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The CPC plays multiple essential roles in mitosis such as kinetochore-microtubule attachment regulation, kinetochore assembly, spindle assembly checkpoint activation, anaphase spindle stabilization, cytokinesis, and nuclear envelope formation, as it dynamically changes its mitotic localization: it is enriched at inner centromeres from prophase to metaphase but it is relocalized at the spindle midzone in anaphase. The business end of the CPC is Aurora B and its allosteric activation module IN-box, which is located at the C-terminal part of INCENP. In most well-studied eukaryotic species, Aurora B activity is locally controlled by the localization module of the CPC, Survivin, Borealin, and the N-terminal portion of INCENP. Survivin and Borealin, which bind the N terminus of INCENP, recognize histone residues that are specifically phosphorylated in mitosis, while anaphase spindle midzone localization is supported by the direct microtubule-binding capacity of the SAH (single alpha helix) domain of INCENP and other microtubule-binding proteins that specifically interact with INCENP during anaphase, which are under the regulation of CDK activity. One of these examples includes the kinesin-like protein MKLP2 in vertebrates.

      Trypanosoma is an evolutionarily interesting species to study mitosis since its kinetochore and centromere proteins do not show any similarity to other major branches of eukaryotes, while orthologs of Aurora B and INCENP have been identified. Combining molecular genetics, imaging, biochemistry, cross-linking IP-MS (IP-CLMS), and structural modeling, this manuscript reveals that two orphan kinesin-like proteins KIN-A and KIN-B act as localization modules of the CPC in Trypanosoma brucei. The IP-CLMS, AlphaFold2 structural predictions, and domain deletion analysis support the idea that (1) KIN-A and KIN-B form a heterodimer via their coiled-coil domain, (2) Two alpha helices of INCENP interact with the coiled-coil of the KIN-A-KIN-B heterodimer, (3) the conserved KIN-A C-terminal CD1 interacts with the heterodimeric KKT9-KKT11 complex, which is a submodule of the KKT7-KKT8 kinetochore complex unique to Trypanosoma, (4) KIN-A and KIN-B coiled-coil domains and the KKT7-KKT8 complex are required for CPC localization at the centromere, (5) CD1 and CD2 domains of KIN-A support its centromere localization. The authors further show that the ATPase activity of KIN-A is critical for spindle midzone enrichment of the CPC. The imaging data of the KIN-A rigor mutant suggest that dynamic KIN-A-microtubule interaction is required for metaphase alignment of the kinetochores and proliferation. Overall, the study reveals novel pathways of CPC localization regulation via KIN-A and KIN-B by multiple complementary approaches.

      Strengths:

      The major conclusion is collectively supported by multiple approaches, combining site-specific genome engineering, epistasis analysis of cellular localization, AlphaFold2 structure prediction of protein complexes, IP-CLMS, and biochemical reconstitution (the complex of KKT8, KKT9, KKT11, and KKT12).

      We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      • The predictions of direct interactions (e.g. INCENP with KIN-A/KIN-B, or KIN-A with KKT9-KKT11) have not yet been confirmed experimentally, e.g. by domain mutagenesis and interaction studies.

      Thank you for this point. It is true that we do not have evidence for direct interactions between KIN-A with KKT9-KKT11. However, the interaction between INCENP with KIN-A/KIN-B is strongly supported by our cross-linking IP-MS of native complexes. Furthermore, we show that deletion of the INCENPCPC1 N-terminus predicted to interact with KIN-A:KIN-B abolishes kinetochore localization.

      • The criteria used to judge a failure of localization are not clearly explained (e.g., Figure 5F, G).

      As suggested by the reviewer in recommendation #14, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      • It remains to be shown that KIN-A has motor activity.

      We thank the reviewer for this important comment. Indeed, motor activity remains to demonstrated using an in vitro system, which is beyond the scope of this study. What we show here is that the motor domain of KIN-A effectively co-sediments with microtubules and that spindle localization of KIN-A is abolished upon deletion of the motor domain. Moreover, mutation of a conserved Glycine residue in the Switch II region (G210) to Alanine (‘rigor mutation’, (Rice et al., 1999)), renders KIN-A incapable of translocating to the central spindle, suggesting that its ATPase activity is required for this process. To clarify this point in the manuscript, we have replaced all instances, where we refer to ‘motor activity’ of KIN-A with ‘ATPase activity’ when referring to experiments performed using the KIN-A rigor mutant. In addition, we have included a Multiple Sequence Alignment (MSA) of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9 in Figure 6A and S6A, showing the conservation of key motifs required for ATP coordination and tubulin interaction. In the corresponding paragraph in the main text, we describe these data as follows:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      • The authors imply that KIN-A, but not KIN-B, interacts with microtubules based on microtubule pelleting assay (Fig. S6), but the substantial insoluble fractions of 6HIS-KINA and 6HIS-KIN-B make it difficult to conclusively interpret the data. It is possible that these two proteins are not stable unless they form a heterodimer.

      This is indeed a possibility. We are currently aiming at purifying full-length recombinant KIN-A and KIN-B (along with the other CPC components), which will allow us to perform in vitro interaction studies and to investigate biochemical properties of this complex (including the role of the motor domains of KIN-A and KIN-B) within the framework of an in-depth follow-up study. To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      • For broader context, some prior findings should be introduced, e.g. on the importance of the microtubule-binding capacity of the INCENP SAH domain and its regulation by mitotic phosphorylation (PMID 8408220, 26175154, 26166576, 28314740, 28314741, 21727193), since KIN-A and KIN-B may substitute for the function of the SAH domain.

      We have modified the introduction to include the following text and references mentioned by the reviewer: ‘The localization module comprises Borealin, Survivin and the N-terminus of INCENP, which are connected to one another via a three-helical bundle (Jeyaprakash et al., 2007, 2011; Klein et al., 2006). The two modules are linked by the central region of INCENP, composed of an intrinsically disordered domain and a single alpha helical (SAH) domain. INCENP harbours microtubule-binding domains within the N-terminus and the central SAH domain, which play key roles for CPC localization and function (Samejima et al., 2015; Kang et al., 2001; Noujaim et al., 2014; Cormier et al., 2013; Wheatley et al., 2001; Nakajima et al., 2011; Fink et al., 2017; Wheelock et al., 2017; van der Horst et al., 2015; Mackay et al., 1993).’

      Reviewer #2 (Public Review):

      How the chromosomal passenger complex (CPC) and its subunit Aurora B kinase regulate kinetochore-microtubule attachment, and how the CPC relocates from kinetochores to the spindle midzone as a cell transitions from metaphase to anaphase are questions of great interest. In this study, Ballmer and Akiyoshi take a deep dive into the CPC in T. brucei, a kinetoplastid parasite with a kinetochore composition that varies greatly from other organisms.

      Using a combination of approaches, most importantly in silico protein predictions using alphafold multimer and light microscopy in dividing T. brucei, the authors convincingly present and analyse the composition of the T. brucei CPC. This includes the identification of KIN-A and KIN-B, proteins of the kinesin family, as targeting subunits of the CPC. This is a clear advancement over earlier work, for example by Li and colleagues in 2008. The involvement of KIN-A and KIN-B is of particular interest, as it provides a clue for the (re)localization of the CPC during the cell cycle. The evolutionary perspective makes the paper potentially interesting for a wide audience of cell biologists, a point that the authors bring across properly in the title, the abstract, and their discussion.

      The evolutionary twist of the paper would be strengthened 'experimentally' by predictions of the structure of the CPC beyond T. brucei. Depending on how far the authors can extend their in-silico analysis, it would be of interest to discuss a) available/predicted CPC structures in well-studied organisms and b) structural predictions in other euglenozoa. What are the general structural properties of the CPC (e.g. flexible linkers, overall dimensions, structural differences when subunits are missing etc.)? How common is the involvement of kinesin-like proteins? In line with this, it would be good to display the figure currently shown as S1D (or similar) as a main panel.

      We thank the reviewer for her/his encouraging assessment of our manuscript and the appreciation on the extent of the evolutionary relevance of our work. As suggested, we have moved the phylogenetic tree previously shown in Fig. S1D to the main Fig. 1F. Our AF2 analysis of CPC proteins and (sub)complexes from other kinetoplastids failed to predict reliable interactions among CPC proteins except for that between Aurora B and the IN box. It therefore remains unclear whether CPC structures are conserved among kinetoplastids. Because components of CPC remain unknown in other euglenozoa (other than Aurora B and INCENP), we cannot perform structural predictions of CPC in diplonemids or euglenids.

      It remains unclear how common the involvement of kinesin-like proteins with the CPC is in other eukaryotes, partly because we could not identify an obvious homolog of KIN-A/KIN-B outside of kinetoplastids. Addressing this question would require experimental approaches in various eukaryotes (e.g. immunoprecipitation and mass spectrometry of Aurora B) as we carried out in this manuscript using Trypanosoma brucei.

      Reviewer #3 (Public Review):

      Summary:

      The protein kinase, Aurora B, is a critical regulator of mitosis and cytokinesis in eukaryotes, exhibiting a dynamic localisation. As part of the Chromosomal Passenger Complex (CPC), along with the Aurora B activator, INCENP, and the CPC localisation module comprised of Borealin and Survivin, Aurora B travels from the kinetochores at metaphase to the spindle midzone at anaphase, which ensures its substrates are phosphorylated in a time- and space-dependent manner. In the kinetoplastid parasite, T. brucei, the Aurora B orthologue (AUK1), along with an INCENP orthologue known as CPC1, and a kinetoplastid-specific protein CPC2, also displays a dynamic localisation, moving from the kinetochores at metaphase to the spindle midzone at anaphase, to the anterior end of the newly synthesised flagellum attachment zone (FAZ) at cytokinesis. However, the trypanosome CPC lacks orthologues of Borealin and Survivin, and T. brucei kinetochores also have a unique composition, being comprised of dozens of kinetoplastid-specific proteins (KKTs). Of particular importance for this study are KKT7 and the KKT8 complex (comprising KKT8, KKT9, KKT11, and KKT12). Here, Ballmer and Akiyoshi seek to understand how the CPC assembles and is targeted to its different locations during the cell cycle in T. brucei.

      Strengths & Weaknesses:

      Using immunoprecipitation and mass-spectrometry approaches, Ballmer and Akiyoshi show that AUK1, CPC1, and CPC2 associate with two orphan kinesins, KIN-A and KIN-B, and with the use of endogenously expressed fluorescent fusion proteins, demonstrate for the first time that KIN-A and KIN-B display a dynamic localisation pattern similar to other components of the CPC. Most of these data provide convincing evidence for KIN-A and KIN-B being bona fide CPC proteins, although the evidence that KIN-A and KIN-B translocate to the anterior end of the new FAZ at cytokinesis is weak - the KIN-A/B signals are very faint and difficult to see, and cell outlines/brightfield images are not presented to allow the reader to determine the cellular location of these faint signals (Fig S1B).

      We thank the reviewer for their thorough assessment of our manuscript and the insightful feedback to further improve our study. To address the point above, we have acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      They then demonstrate, by using RNAi to deplete individual components, that the CPC proteins have hierarchical interdependencies for their localisation to the kinetochores at metaphase. These experiments appear to have been well performed, although only images of cell nuclei were shown (Fig 2A), meaning that the reader cannot properly assess whether CPC components have localised elsewhere in the cell, or if their abundance changes in response to depletion of another CPC protein.

      We chose to show close-ups of the nucleus to highlight the different localization patterns of CPC proteins under the different RNAi conditions. In none of these conditions did we observe mis-localization of CPC subunits to the cytoplasm. To clarify this point, we added the following sentence in the legend for Figure 2A:

      ‘A) Representative fluorescence micrographs showing the localization of YFP-tagged Aurora BAUK1, INCENPCPC1, KIN-A and KIN-B in 2K1N cells upon RNAi-mediated knockdown of indicated CPC subunits. Note that nuclear close-ups are shown here. CPC proteins were not detected in the cytoplasm. RNAi was induced with 1 μg/mL doxycycline for 24 h (KIN-B RNAi) or 16 h (all others). Cell lines: BAP3092, BAP2552, BAP2557, BAP3093, BAP2906, BAP2900, BAP2904, BAP3094, BAP2899, BAP2893, BAP2897, BAP3095, BAP3096, BAP2560, BAP2564, BAP3097. Scale bars, 2 μm.’

      Ballmer and Akiyoshi then go on to determine the kinetochore localisation domains of KIN-A and KIN-B. Using ectopically expressed GFP-tagged truncations, they show that coiled-coil domains within KIN-A and KIN-B, as well as a disordered C-terminal tail present only in KIN-A, but not the N-terminal motor domains of KIN-A or KIN-B, are required for kinetochore localisation. These data are strengthened by immunoprecipitating CPC complexes and crosslinking them prior to mass spectrometry analysis (IP-CLMS), a state-of-the-art approach, to determine the contacts between the CPC components. Structural predictions of the CPC structure are also made using AlphaFold2, suggesting that coiled coils form between KIN-A and KIN-B, and that KIN-A/B interact with the N termini of CPC1 and CPC2. Experimental results show that CPC1 and CPC2 are unable to localise to kinetochores if they lack their N-terminal domains consistent with these predictions. Altogether these data provide convincing evidence of the protein domains required for CPC kinetochore localisation and CPC protein interactions. However, the authors also conclude that KIN-B plays a minor role in localising the CPC to kinetochores compared to KIN-A. This conclusion is not particularly compelling as it stems from the observation that ectopically expressed GFP-NLS-KIN-A (full length or coiled-coil domain + tail) is also present at kinetochores during anaphase unlike endogenously expressed YFP-KIN-A. Not only is this localisation probably an artifact of the ectopic expression, but the KIN-B coiled-coil domain localises to kinetochores from S to metaphase and Fig S2G appears to show a portion of the expressed KIN-B coiled-coil domain colocalising with KKT2 at anaphase. It is unclear why KIN-B has been discounted here.

      As the reviewer points out, a small fraction of GFP-NLS-KIN-B317-624 is indeed detectable at kinetochores in anaphase, although most of the protein shows diffuse nuclear staining. There are various explanations for this phenomenon: It is conceivable that the KIN-B motor domain may contribute to microtubule binding and translocation of the CPC from kinetochores onto the spindle in anaphase. In our experiments, ectopically expressed KIN-B317-624 likely outcompetes a fraction of endogenous KIN-B for binding to KIN-A, which could interfere with this translocation process, leaving a population of CPC ‘stranded’ at kinetochores in anaphase. Another possibility, hinted at by the reviewer, is that the C-terminus of KIN-B interacts with receptors at the kinetochore/centromere. Although we do not discount this possibility, we nevertheless decided to focus on KIN-A in this study, because the anaphase kinetochore retention phenotype for both full-length GFP-NLS-KIN-A and -KIN-A309-862 is much stronger than for KIN-B317-624. Two additional reasons were that (i) KIN-A is highly conserved within kinetoplastids, whereas KIN-B orthologs are missing in some kinetoplastids, and (ii) no convincing interactions between KIN-B and kinetochore proteins were predicted by AF2.

      To address the reviewer’s point, we decided to include KIN-B in the title of this manuscript, which now reads: ‘Dynamic localization of the chromosomal passenger complex is controlled by the orphan kinesins KIN-A and KIN-B in the kinetoplastid parasite Trypanosoma brucei’.

      Moreover, we modified the corresponding paragraph in the results section as follows:

      ‘Intriguingly, unlike endogenously YFP-tagged KIN-A, ectopically expressed GFP fusions of both full-length KIN-A and KIN-A310-862 clearly localized at kinetochores even in anaphase (Figs. 2, F and H). Weak anaphase kinetochore signal was also detectable for KIN-B317-624 (Fig. S2F). GFP fusions of the central coiled-coil domain or the C-terminal disordered tail of KIN-A did not localize to kinetochores (data not shown). These results show that kinetochore localization of the CPC is mediated by KIN-A and KIN-B and requires both the central coiled-coil domain as well as the C-terminal disordered tail of KIN-A.’

      Next, using a mixture of RNAi depletion and LacI-LacO recruitment experiments, the authors show that kinetochore proteins KKT7 and KKT9 are required for AUK1 to localise to kinetochores (other KKT8 complex components were not tested here) and that all components of the KKT8 complex are required for KIN-A kinetochore localisation. Further, both KKT7 and KKT8 were able to recruit AUK1 to an ectopic locus in the S phase, and KKT7 recruited KKT8 complex proteins, which the authors suggest indicates it is upstream of KKT8. However, while these experiments have been performed well, the reciprocal experiment to show that KKT8 complex proteins cannot recruit KKT7, which could have confirmed this hierarchy, does not appear to have been performed. Further, since the LacI fusion proteins used in these experiments were ectopically expressed, they were retained (artificially) at kinetochores into anaphase; KKT8 and KIN-A were both able to recruit AUK1 to LacO foci in anaphase, while KKT7 was not. The authors conclude that this suggests the KKT8 complex is the main kinetochore receptor of the CPC - while very plausible, this conclusion is based on a likely artifact of ectopic expression, and for that reason, should be interpreted with a degree of caution.

      We previously showed that RNAi-mediated depletion of KKT7 disrupts kinetochore localization of KKT8 complex members, whereas kinetochore localization of KKT7 is unaffected by disruption of the KKT8 complex (Ishii and Akiyoshi, 2020). Moreover, in contrast to the KKT8 complex, KKT7 remains at kinetochores in anaphase (Akiyoshi and Gull, 2014). These data show that KKT7 is upstream of the KKT8 complex. In this context, the LacI-LacO tethering approach can be very useful to probe whether two proteins (or domains of proteins) could interact in vivo either directly or indirectly. However, a recruitment hierarchy cannot be inferred from such experiments because the data just shows whether X can recruit Y to an ectopic locus (but not whether X is upstream of Y or vice versa). Regarding the retention of Aurora BAUK1 at kinetochores in anaphase upon ectopic expression of GFP-KKT8-LacI, we agree with the reviewer that these data need to be carefully interpreted. Nevertheless, the notion that the KKT7-KKT8 complex recruits the CPC to kinetochores is also strongly supported by IP-MS, RNAi experiments, and AF2 predictions. For clarification and to address the reviewer’s point, we re-formulated the corresponding paragraph in the main text:

      ‘We previously showed that KKT7 lies upstream of the KKT8 complex (Ishii and Akiyoshi, 2020). Indeed, GFP-KKT72-261-LacI recruited tdTomato-KKT8, -KKT9 and -KKT12 (Fig. S4E). Expression of both GFP-KKT72-261-LacI and GFP-KKT8-LacI resulted in robust recruitment of tdTomato-Aurora BAUK1 to LacO foci in S phase (Figs. 4, E and F). Intriguingly, we also noticed that, unlike endogenous KKT8 (which is not present in anaphase), ectopically expressed GFP-KKT8-LacI remained at kinetochores during anaphase (Fig. 4F). This resulted in a fraction of tdTomato-Aurora BAUK1 being trapped at kinetochores during anaphase instead of migrating to the central spindle (Fig. 4F). We observed a comparable situation upon ectopic expression of GFP-KIN-A, which is retained on anaphase kinetochores together with tdTomato-KKT8 (Fig. S4F). In contrast, Aurora BAUK1 was not recruited to LacO foci marked by GFP- KKT72-261-LacI in anaphase (Fig. 4E).’

      Further IP-CLMS experiments, in combination with recombinant protein pull-down assays and structural predictions, suggested that within the KKT8 complex, there are two subcomplexes of KKT8:KKT12 and KKT9:KKT11, and that KKT7 interacts with KKT9:KKT11 to recruit the remainder of the KKT8 complex. The authors also assess the interdependencies between KKT8 complex components for localisation and expression, showing that all four subunits are required for the assembly of a stable KKT8 complex and present AlphaFold2 structural modelling data to support the two subcomplex models. In general, these data are of high quality and convincing with a few exceptions. The recombinant pulldown assay (Fig. 4H) is not particularly convincing as the 3rd eluate gel appears to show a band at the size of KKT11 (despite the labelling indicating no KKT11 was present in the input) but no pulldown of KKT9, which was present in the input according to the figure legend (although this may be mislabeled since not consistent with the text). The text also states that 6HIS-KKT8 was insoluble in the absence of KKT12, but this is not possible to assess from the data presented.

      We thank the reviewer for pointing out an error in the text: ‘Removal of both KKT9 and KKT11 did not impact formation of the KKT8:KKT12 subcomplex’ should read ‘Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex’. Regarding the very faint band perceived to be KKT11 in the 3rd eluate: This band runs slightly lower than KKT11 and likely represents a bacterial contaminant (which we have seen also in other preps in the past). We have made a note of this in the corresponding legend (new Fig. 4I). Moreover, we provide the estimated molecular weights for each subunit, as suggested by the reviewer in recommendation #14 (see below):

      ‘(I) Indicated combinations of 6HIS-tagged KKT8 (~46 kDa), KKT9 (~39 kDa), KKT11 (~29 kDa) and KKT12 (~23 kDa) were co-expressed in E. coli, followed by metal affinity chromatography and SDS-PAGE. The asterisk indicates a common contaminant.’

      The corresponding paragraph in the results section now reads:

      To validate these findings, we co-expressed combinations of 6HIS-KKT8, KKT9, KKT11 and KKT12 in E. coli and performed metal affinity chromatography (Fig. 4I). 6HIS-KKT8 efficiently pulled down KKT9, KKT11 and KKT12, as shown previously (Ishii and Akiyoshi, 2020). In the absence of KKT9, 6HIS-KKT8 still pulled down KKT11 and KKT12. Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex. In contrast, 6HIS-KKT8 could not be recovered without KKT12, indicating that KKT12 is required for formation of the full KKT8 complex. These results support the idea that the KKT8 complex consists of KKT8:KKT12 and KKT9:KKT11 subcomplexes.’

      It is also surprising that data showing the effects of KKT8, KKT9, and KKT12 depletion on KKT11 localisation and abundance are not presented alongside the reciprocal experiments in Fig S4G-J.

      YFP-KKT11 is delocalized upon depletion of KKT8 and KKT9 (see below). Unfortunately, we were unsuccessful in our attempts at deriving the corresponding KKT12 RNAi cell line, rendering this set of data incomplete. Because these data are not of critical importance for this study, we decided not to invest more time in attempting further transfections.

      Author response image 1.

      The authors also convincingly show that AlphaFold2 predictions of interactions between KKT9:KKT11 and a conserved domain (CD1) in the C-terminal tail of KIN-A are likely correct, with CD1 and a second conserved domain, CD2, identified through sequence analysis, acting synergistically to promote KIN-A kinetochore localisation at metaphase, but not being required for KIN-A to move to the central spindle at anaphase. They then hypothesise that the kinesin motor domain of KIN-A (but not KIN-B which is predicted to be inactive based on non-conservation of residues key for activity) determines its central spindle localisation at anaphase through binding to microtubules. In support of this hypothesis, the authors show that KIN-A, but not KIN-B can bind microtubules in vitro and in vivo. However, ectopically expressed GFP-NLS fusions of full-length KIN-A or KIN-A motor domain did not localise to the central spindle at anaphase. The authors suggest this is due to the GPF fusion disrupting the ATPase activity of the motor domain, but they provide no evidence that this is the case. Instead, they replace endogenous KIN-A with a predicted ATPase-defective mutant (G209A), showing that while this still localises to kinetochores, the kinetochores were frequently misaligned at metaphase, and that it no longer concentrates at the central spindle (with concomitant mis-localisation of AUK1), causing cells to accumulate at anaphase. From these data, the authors conclude that KIN-A ATPase activity is required for chromosome congression to the metaphase plate and its central spindle localisation at anaphase. While potentially very interesting, these data are incomplete in the absence of any experimental data to show that KIN-A possesses ATPase activity or that this activity is abrogated by the G209A mutation, and the conclusions of this section are rather speculative.

      Thank you for this important comment, which relates to a similar point raised by Reviewer 1 (see above). Indeed, ATPase and motor activity of KIN-A remain to demonstrated biochemically using recombinant proteins, which is beyond the scope of this study. We generated MSAs of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9, which are now presented in Figure 6A and S6A. These clearly show that key motifs required for ATP or tubulin binding in other kinesins are highly conserved in KIN-A (but not KIN-B). This includes the conserved glycine residue in the Switch II helix (G234 in human Kinesin-1, G210 in T. brucei KIN-A), which forms a hydrogen bond with the γ-phosphate of ATP, and upon mutation has been shown to impair ATPase activity and trap the motor head in a strong microtubule (‘rigor’) state (Rice et al., 1999; Sablin et al., 1996). The prominent rigor phenotype of KIN-AG210A is consistent with KIN-A having ATPase activity. In addition to the data in Fig. 6A and S6A, we made following changes to the main text:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).

      Ectopically expressed GFP-KIN-A and -KIN-A2-309 partially localized to the mitotic spindle but failed to concentrate at the midzone during anaphase (Figs. 2, F and G), suggesting that N-terminal tagging of the KIN-A motor domain may interfere with its function. To address whether the ATPase activity of KIN-A is required for central spindle localization of the CPC, we replaced one allele of KIN-A with a C-terminally YFP-tagged G210A ATP hydrolysis-defective rigor mutant (Fig. 6A) (Rice et al., 1999) and used an RNAi construct directed against the 3’UTR of KIN-A to deplete the untagged allele. The rigor mutation did not affect recruitment of KIN-A to kinetochores (Figs. S6, C and D). However, KIN-AG210A-YFP marked kinetochores were misaligned in ~50% of cells arrested in metaphase, suggesting that ATPase activity of KIN-A promotes chromosome congression to the metaphase plate (Figs. S6, E-H).’

      Impact:

      Overall, this work uses a wide range of cutting-edge molecular and structural predictive tools to provide a significant amount of new and detailed molecular data that shed light on the composition of the unusual trypanosome CPC and how it is assembled and targeted to different cellular locations during cell division. Given the fundamental nature of this research, it will be of interest to many parasitology researchers as well as cell biologists more generally, especially those working on aspects of mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the reviewer for his/her feedback and thoughtful and thorough assessment of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Why did the authors omit KIN-B from the title?

      We decided to add KIN-B in the title. Please see our response to Reviewer #3 (public review).

      (2) Abstract, line 28, "Furthermore, the kinesin motor activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset." This must be revised - see public review.

      We changed this section of the abstract as follows:

      ‘Furthermore, the ATPase activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset. Thus, KIN-A constitutes a unique ‘two-in-one’ CPC localization module in complex with KIN-B, which directs the CPC to kinetochores (from S phase until metaphase) via its C-terminal tail, and to the central spindle (in anaphase) via its N-terminal kinesin motor domain.’

      (3) Line 87-90. The findings by Li et al., 2008 (KIN-A and KIN-B interacting with Aurora B and epistasis analysis) should be introduced more comprehensively in the Introduction section.

      We added the following sentence in the introduction:

      ‘In addition, two orphan kinesins, KIN-A and KIN-B, have been proposed to transiently associate with Aurora BAUK1 during mitosis (Li et al., 2008; Li, 2012).’

      (4) Figure 1B. The way the Trypanosoma cell cycle is defined should be briefly explained in the main text, rather than just referring to the figure.

      The ‘KN’ annotation of the trypanosome cell cycle is explained in the Figure 1 legend. We now also added a brief description in the main text:

      ‘We next assessed the localization dynamics of fluorescently tagged KIN-A and KIN-B over the course of the cell cycle (Figs. 1, B-E). T. brucei possesses two DNA-containing organelles, the nucleus (‘N’) and the kinetoplast (‘K’). The kinetoplast is an organelle found uniquely in kinetoplastids, which contains the mitochondrial DNA and replicates and segregates prior to nuclear division. The ‘KN’ configuration serves as a good cell cycle marker (Woodward and Gull, 1990; Siegel et al., 2008).’

      (5) Line 118. Throughout the paper, it is not clear why GFP-NLS fusion was used instead of GFP fusion. Please justify the fusion of NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (6) Line 121, "Unexpectedly". It is not clear why this was unexpected.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      (7) Line 127-129. Defining homologs and orthologs is tricky - there are many homologs and paralogs of kinesin-like proteins. The method to define the presence or absence of KIN-A/KIN-B homologs should be described in the Materials and Methods section.

      Due to the difficulty in defining true orthologs for kinesin-like proteins, we took a conservative approach: reciprocal best BLAST hits. We first searched KIN-A homologs using BLAST in the TriTryp database or using hmmsearch using manually prepared hmm profiles. When the top hit in a given organism found T. brucei KIN-A in a reciprocal BLAST search in T. brucei proteome, we considered the hit as a true ortholog. We modified the Materials and Methods section as below.

      ‘Searches for homologous proteins were done using BLAST in the TriTryp database (Aslett et al., 2010) or using hmmsearch using manually prepared hmm profiles (HMMER version 3.0; Eddy, 1998). The top hit was considered as a true ortholog only if the reciprocal BLAST search returned the query protein in T. brucei.’

      (8) Line 156. For non-experts of Trypanosoma cell biology, it is not clear how the nucleolar localization is defined.

      The nucleolus in T. brucei is discernible as a DAPI-dim region in the nucleus.

      (9) Fig.2G and Fig.S2F. These data imply that the coiled-coil and C-terminal tail domains of KIN-A/KIN-B are important for anaphase spindle midzone enrichment. However, it is odd that this was not mentioned. This reviewer recommends that the authors quantify the midzone localization data of these constructs and discuss the role of the coiled-coil domains.

      One possibility is that KIN-A and KIN-B need to form a complex (via their coiled-coil domains) to localize to the spindle midzone. Another likely possibility, which is discussed in the manuscript, is that N-terminal tagging of KIN-A impairs motor activity. This is supported by the fact that the central spindle localization is also disrupted in full-length GFP-KIN-A. We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      (10) Line 288-289, "pLDDT scores improved significantly for KIN-A CD1 in complex with KKT9:KKT11 (>80) compared to KIN-A CD1 alone (~20) (Figs. S3, A and B)." I can see that pLDDT score is about 20 at KIN-A CD1 from Figs S3A, but the basis of pLDDT > 80 upon inclusion go KKT9:KKT11 is missing.

      We added the pLDDT and PAE plots for the AF2 prediction of KIN-A700-800 in complex with KKT9:KKT11 in Fig. S5B.

      (11) Fig. 5A. Since there is no supporting biochemical data for KIN-A-KKT9-KKT11 interaction, it is important to assess the stability of AlphaFold-based structural predictions of the KIN-A-KKT9-KKT11 interaction. Are there significant differences among the top 5 prediction results, and do these interactions remain stable after the "simulated annealing" process used in the AlphaFold predictions? Are predicted CD1-interacting regions/amino residues in KKT9 and KKT11 evolutionarily conserved?

      See above. The interaction was predicted in all 5 predictions as shown in Fig. S5B. Conservation of the CD1-interacting regions in KKT9 and KKT11 are shown below:

      Author response image 2.

      KKT9 (residues ~53 – 80 predicted to interact with KIN-A in T. brucei)

      Author response image 3.

      KKT11 (residues 61-85 predicted to interact with KIN-A in T. brucei)

      (12) Line 300, Fig. S5D and E, "failed to localize at kinetochores". From this resolution of the microscopy images, it is not clear if these proteins fail to localize at kinetochores as the KKT and KIN-A310-716 signals overlap. Perhaps, "failed to enrich at kinetochores" is a more appropriate statement.

      We changed this sentence according to the reviewer’s suggestion.

      (13) Line 309 and Fig 5D and F, "predominantly localized to the mitotic spindle". From this image shown in Fig 5D, it is not clear if KIN-A∆CD1-YFP and Aurora B are predominantly localized to the spindle or if they are still localized to centromeres that are misaligned on the spindle. Without microtubule staining, it is also not clear how microtubules are distributed in these cells. Please clarify how the presence or absence of kinetochore/spindle localization was defined.

      As shown in Fig. S5E and S5F, deletion of CD1 clearly impairs kinetochore localization of KIN-A (kinetochores marked by tdTomato-KKT2). Moreover, misalignment of kinetochores, as observed upon expression of the KIN-AG210A rigor mutant, would result in an increase in 2K1N cells and proliferation defects, which is not the case for the KIN-A∆CD1 mutant (Fig. 5H, Fig. S5I). KIN-A∆CD1-YFP appears to localize diffusely along the entire length of the mitotic spindle, whereas we still observe kinetochore-like foci in the rigor mutant. Unfortunately, we do not have suitable antibodies that would allow us to distinguish spindle microtubules from the vast subpellicular microtubule array present in T. brucei and hence need to rely on tagging spindle-associated proteins such as MAP103.

      (14) Fig. 5F, G, S5F. Along the same lines, it would be helpful to show example images for each category - "kinetochores", "kinetochores + spindle", and "spindle".

      As suggested by the reviewer, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      (15) Line 332 and Fig. S6A. The experiment may be repeated in the presence of ATP or nonhydrolyzable ATP analogs.

      We thank the reviewer for the suggestion. We envisage such experiments for an in-depth follow-up study.

      (16) Line 342, "motor activity of KIN-A". Until KIN-A is shown to have motor activity, the result based on the rigor mutant does not show that the motor activity of KIN-A promotes chromosome congression. The result suggests that the ATPase activity of KIN-A is important.

      We changed that sentence as suggested by the reviewer.

      (17) Line 419 -. The authors base their discussion on the speculation that KIN-A is a plus-end directed motor. Please justify this speculation.

      Indeed, the notion that KIN-A is a plus-end directed motor remains a hypothesis, which is based on sequence alignments with other plus-end directed motors and the observation that the KIN-A motor domain is involved in translocation of the CPC to the central spindle in anaphase. We have modified the corresponding section in the discussion as follows:

      ‘It remains to be investigated whether KIN-A truly functions as a plus-end directed motor. The role of the KIN-B in this context is equally unclear. Since KIN-B does not possess a functional kinesin motor domain, we deem it unlikely that the KIN-A:KIN-B heterodimer moves hand-over-hand along microtubules as do conventional (kinesin-1 family) kinesins. Rather, the KIN-A motor domain may function as a single-headed unit and drive processive plus-end directed motion using a mechanism similar to the kinesin-3 family kinesin KIF1A (Okada and Hirokawa, 1999).’

      (18) Line 422-423, "plus-end directed motion using a mechanism similar to kinesin-3 family kinesins (such as KIF1A)." Please cite a reference supporting this statement.

      See above. We cited a paper by (Okada and Hirokawa, 1999).

      Reviewer #2 (Recommendations For The Authors):

      Please provide a quantification of data shown in Figure 2F-H and described in lines 151-166.

      We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      It appears as if the paper more or less follows a chronological order of the experiments that were performed before AF multimer enabled the insightful and compelling structural analysis. That is a matter of style, but in some cases, the writing could be updated, shortened, or re-arranged into a more logical order. Concrete examples:

      (i) Line 144: "we did not include CPC2 for further analysis in this study" Although CPC2 features at a prominent and interesting position in the predicted structures of the kinetoplastid CPC, shown in later main figures.

      We attempted RNAi-mediated depletion of CPC2 using two different shRNA constructs. However, we cannot exclude the possibility that the knockdown of CPC2 was less efficient compared with the other CPC subunits. For this reason, we decided to remove all the data on CPC2 from Fig. S2.

      (ii) The work with the KIN-A motor domain only and KIN-A ∆motor domain (Fig 2) begs the question about a more subtle mutation to interfere with the motor domain. Which is ultimately presented in Fig 6. I think that the final paragraph and Figure 6 follow naturally after Figure 2.

      We appreciate the suggestion. However, we would like to keep Figure 6 there.

      (iii) The high-confidence structural predictions in Fig 3 and Fig 4 are insightful. The XL-MS descriptions that precede them are not so helpful (Fig 3A and 4G and in the text). To emphasize their status as experimental support for the predicted structures, which is very important, it would be good to discuss the XL-MS after presenting the models.

      As suggested, we have re-arranged the text and/or figures such that the AF2 predictions are discussed first and the CLMS data are brought in afterwards.

      Figure 1A prominently features an arbitrary color code and a lot of protein IDs without a legend. That is not a very convincing start. Figure S1 is more informative, containing annotated protein names and results of the KIN-A and KIN-B IPs. Please improve Figure 1A, for example by presenting a modified version of Figure S1. In all these types of figures, please list both protein names and gene IDs.

      We agree with the reviewer that the IP-MS data in Fig. S1 is more informative and hence decided to swap the heatmaps in Fig. 1A and Fig. S1A. We further annotated the heatmap corresponding to the Aurora BAUK1 IP-MS (now presented in Fig. S1) as suggested by the reviewer.

      The visualization of the structural predictions is not consistent among figures:

      (i) The structure in Fig 4I is important and could be displayed larger. The pLDDT scores, and especially those of the non-displayed models, do not add much information and should not be a main panel. If the authors want to display the pLDDT scores, I recommend a panel (main or supplement) of the structure colored for local prediction confidences, as in Fig 5A.

      (ii) In Figure 5A itself, it is hard to follow the chains in general, and KIN-A in particular, since the structure is pLDDT-coloured. Please present an additional panel colored by chain (consistent with Fig 4I, as mentioned above).

      (iii) The summarizing diagram, currently displayed as Fig 4J, should be placed after Fig 5A and take the discovered KIN-A - KKT9-11 connection into account. Ideally, it also covers the suspected importance of the motor domain and serves as a summarising diagram.

      We thank the reviewer for the constructive comments. For each structure prediction, we now present two images side by side; one coloured by chain and one colored by pLDDT. We recently re-ran AF2 for the full CPC and also for the KKT7N-KKT8 complex, and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction. We also increased the size of the structures shown in Fig. 4. Furthermore, we decided to remove the summarizing diagram from Fig. 4 and instead made a new main Fig. 7, which shows a more detailed schematic, which also takes into account the proposed function of the KIN-A motor domain, as suggested by the reviewer, and other points addressed in the Discussion.

      The methods section for the structural predictions lacks essential information. Predictions can only be reproduced if the version of AF2 multimer v2.x is specified and key parameters are mentioned.

      As suggested, we have added the details in the Materials and Methods section as follows.

      ‘Structural predictions of KIN-A/KIN-B, KIN-A310-862/KIN-B317-624, CPC1/CPC2/KIN-A300-599/KIN-B 317-624, and KIN-A700-800/KKT9/KKT11 were performed using ColabFold version 1.3.0 (AlphaFold-Multimer version 2), while those of AUK1/CPC1/CPC2/KIN-A1-599/KIN-B, KKT71-261/KKT9/KKT11/KKT8/KKT12, KKT9/KKT11/KKT8/KKT12, and KKT71-261/KKT9/KKT11 were performed using ColabFold version 1.5.3 (AlphaFold-Multimer version 2.3.1) using default settings, accessed via https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/AlphaFold2.ipynb and https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.3/AlphaFold2.ipynb.’

      Line 121, please explain the "Unexpectedly" by including a reference to the work from Li and colleagues. A statement with some details would be useful, as the difference between both studies appears to be crucial for the novelty of this paper. Alternatively, refer to this being covered in the discussion.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      Line 285 refers to "conserved" regions in the C-terminal part of KIN-A, referring to Figure 5. Please expand the MSA in Figure 5B to get an idea about the conservation/variation outside CD1 and CD2.

      We now present the full MSA for KIN-A proteins in kinetoplastids in Fig. S5A.

      Please specify what is meant by Line 367-369 for someone who is not familiar with the work by Komaki et al. 2022. Either clarify in the text or clarify in the text with data to support it.

      We updated the corresponding section in the discussion as follows:

      ‘Komaki et al. recently identified two functionally redundant CPC proteins in Arabidopsis, Borealin Related Interactor 1 and 2 (BORI1 and 2), which engage in a triple helix bundle with INCENP and Borealin using a conserved helical domain but employ an FHA domain instead of a BIR domain to read H3T3ph (Komaki et al., 2022).’

      Data presented in Figure S6A, the microtubule co-sedimentation assay, is not convincing since a substantial amount of KIN-A/B is pelleted in the absence of microtubules. Did the authors spin the proteins in BRB80 before the assay to continue with soluble material and reduce sedimentation in the absence of microtubules? If the authors want to keep the wording in lines 331-332, the MT-binding properties of KIN-A and KIN-B need to be investigated in more detail, for example with a titration and a quantification thereof. Otherwise, they should change the text and replace "confirms" with "is consistent with". In any case, the legend needs to be expanded to include more information.

      To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      We have also updated the main text in the results section:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      Details:

      The readability of the pAE plots could be improved by arranging sequences according to their position in the structure. For example in Fig4I, KKT8 could precede KKT12. If it is easy to update this, the authors might want to do so.

      We re-ran the AF2 predictions for the KKT7N – KKT8 complex in Fig. 4/S4 and changed the order according to the reviewer’s suggestion (KKT9:KKT11:KKT8:KKT12).

      The same paper is referred to as Je Van Hooff et al. 2017 and as Van Hooff et al. 2017

      Thank you for pointing this out. We have corrected the citation.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please state at the end of the introduction/start of the results section that this work was performed in procyclic trypanosomes. Given that the cell cycles of procyclic and bloodstream forms differ, this is important.

      We added this information at the end of the introduction:

      ‘Here, by combining biochemical, structural and cell biological approaches in procyclic form T. brucei, we show that the trypanosome CPC is a pentameric complex comprising Aurora BAUK1, INCENPCPC1, CPC2 and the two orphan kinesins KIN-A and KIN-B.’

      (2) Please define NLS at first use (line 118), and for clarity, explain the rationale for using GFP with an NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (3) Lines 148-150 - it would strengthen this claim if KIN-A/B protein levels were assessed by Western blot.

      We now present a Western blot in Fig. S2C, showing that bulk KIN-B levels are clearly reduced upon KIN-A RNAi. The same is true also to some extent for KIN-A levels upon KIN-B RNAi, although this is less obvious, possibly due to the lower efficiency of KIN-B compared to KIN-A RNAi as judged by fluorescence microscopy (quantified in Fig. 2D and 2E).

      (4) Line 253 - the text mentions the removal of both KKT9 and KKT11, which is not consistent with the figure (Fig 4H) - do you mean the removal of either KKT9 or KKT11?

      Yes, we thank the reviewer for pointing out this mistake in the text, which has now been corrected.

      (5) Line 337 - please include a reference for the G209A ATPase-defective rigor mutant - has this been shown to result in KIN-A being inactive previously?

      Please see above our answer in public review.

      (6) It is not always obvious when fluorescent fusion proteins are being expressed endogenously or ectopically, or when they are being expressed in an RNAi background or not without tracing the cell lines in Table S1 - please ensure this is clearly stated throughout the manuscript.

      We now made sure that this is clearly stated in the main text as well as in the figure legends.

      (7) Line 410 - 'KIN-A C-terminal tail is stuffed full of conserved CDK1CRK3 sites' - what does 'stuffed full' really mean (this is rather imprecise) and what are the consensus sites - are these CDK1 consensus sites that are assumed to be conserved for CRK3? I'm not aware of consensus sites for CRK3 having been determined, but if they have, this should be referenced.

      We have modified the corresponding section in the discussion as follows:

      ‘In support of this, the KIN-A C-terminal tail harbours many putative CRK3 sites (10 sites matching the minimal S/T-P consensus motif for CDKs) and is also heavily phosphorylated by Aurora BAUK1 in vitro (Ballmer et al. 2024). Finally, we speculate that the interaction of KIN-A motor domain with microtubules, coupled to the force generating ATP hydrolysis and possibly plus-end directed motion, eventually outcompetes the weakened interactions of the CPC with the kinetochore and facilitates the extraction of the CPC from chromosomes onto spindle microtubules during anaphase. Indeed, deletion of the KIN-A motor domain or impairment of its motor function through N-terminal GFP tagging causes the CPC to be trapped at kinetochores in anaphase. Central spindle localization is additionally dependent on the ATPase activity of the KIN-A motor domain as illustrated by the KIN-A rigor mutant.’

      (8) Lines 412-416: this proposal is written rather definitively - given no motor activity has been demonstrated for KIN-A, please make clear that this is still just a theory.

      See above.

      (9) Fig 1: KKT2 is not highlighted in Fig 1A - given this has been used for colocalization in Fig 1C-E, was it recovered, and if not, why not? Fig 1B-E: the S phase/1K1N terminology is somewhat misleading. Not all S phase cells will have elongated kinetoplasts - usually an asterisk is used to signify replicated DNA, not kinetoplast shape. If it is to be used here for elongation, then for consistency, N should be used for G2/mitotic cells.

      Fig. 1A (now Fig. S1A) only shows the tip 30 hits. KKT2 was indeed recovered with Aurora BAUK1 (see Table S2) and is often used as a kinetochore marker in trypanosomes by our lab and others since the signal of fluorescently tagged KKT2 is relatively bright and KKT2 localizes to centromeres throughout the cell cycle.

      (10) A general comment for all image figures is that these do not have accompanying brightfield images and it is therefore difficult to know where the cell body is, or sometimes which nuclei and kinetoplasts belong to which cell where DNA from more than one cell is within the image. It would be beneficial if brightfield images could be added, or alternatively, the cell outlines were traced onto DAPI or merged images. Also, brightfield images would allow the stage of cytokinesis (pre-furrowing/furrowing/abscission) in anaphase cells to be determined.

      Since this study primarily addresses the recruitment mechanism of the CPC to kinetochores and to the central spindle from S phase to metaphase and in anaphase, respectively, and CPC proteins are not observed outside of the nucleus during these cell cycle stages, we did not present brightfield images in the figures. However, this point is particularly valid for discerning the localization of KIN-A and KIN-B to the new FAZ tip from late anaphase onwards. Hence, we acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      (11) Fig 2A: legend should state that the micrographs show the localisation of the proteins within the nucleus as whole cells are not shown. 2C: can INCENP not be split into 2 lines - the 'IN' looks like 1N at first glance, which is confusing.

      We have applied the suggested change in Fig. 2.

      (12) Fig 3 (and other AF2 figures): Could the lines for satisfied & not satisfied in the key be thicker so they more closely resemble the lines in the figure and are less likely to be confused with the disordered regions of the CPC components?

      We have now made those lines thicker.

      (13) Why were different E value thresholds used in Fig 3 and Fig 4?

      The CLMS data in Fig. 3 and Fig. 4 now both use the same E value threshold of E-3 (previously E-4 was used in Fig. 4). To determine a sensible significance threshold, we included some yeast protein sequences (‘false positives’) in the database used in pLink2 for identification of crosslinked peptides. Note that we recently also re-ran AF2 for the full CPC and for the KKT7N-KKT8 complex and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction.

      (14) Fig 4H legend - please give the expected sizes of these recombinant proteins & check the 3rd elution panel (see public review comments).

      See above response in public review.

      (15) Fig 4I - please explain what the colours of the PAE plot and the values in the key signify, as well as how the Scored Residue values are arrived at. Please also define the pIDDT in the legend.

      We have cited DeepMind’s 2021 methods paper, in which the outputs of AlphaFold are explained in detail. We also added a short description of the pLDDT and PAE scores and the corresponding colour coding in the legends of Fig. 3 and Fig. 4, respectively.

      From figure 3 legend:

      ‘(B) Cartoon representation showing two orientations of the trypanosome CPC, coloured by protein on the left (Aurora BAUK1: crimson, INCENPCPC1: green, CPC2: cyan, KIN-A: magenta, and KIN-B: yellow) or according to their pLDDT values on the right, assembled from AlphaFold2 predictions shown in Figure S3. The pLDDT score is a per-residue estimate of the confidence in the AlphaFold prediction on a scale from 0 – 100. pLDDT > 70 (blue, cyan) indicates a reasonable accuracy of the model, while pLDDT < 50 (red) indicates a low accuracy and often reflects disordered regions of the protein (Jumper et al., 2021). BS3 crosslinks in (B) were mapped onto the model using PyXlinkViewer (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å) (Schiffrin et al., 2020).’

      From Figure 4 legend:

      ‘(G) AlphaFold2 model of the KKT7 – KKT8 complex, coloured by protein (KKT71-261: green, KKT8: blue, KKT12: pink, KKT9: cyan and KKT11: orange) (left) and by pLDDT (center). BS3 crosslinks in (H) were mapped onto the model using PyXlinkViewer (Schiffrin et al., 2020) (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å). Right: Predicted Aligned Error (PAE) plot of model shown on the left (rank_2). The colour indicates AlphaFold’s expected position error (blue = low, red = high) at the residue on the x axis if the predicted and true structures were aligned on the residue on the y axis (Jumper et al., 2021).’

      (16) Fig 6 legend - Line 730 should say (F) not (C).

      Thank you for pointing out this typo.

      (17) Fig S1A - a key is missing for the colours. Fig S1B/C - cell outlines or a brightfield image are really needed here - see earlier comment. Fig S1D - there doesn't seem to be a method for how this tree was generated.

      See above response in public review regarding Fig. S1A and S1B/C. The tree in Fig. S1D is based on (Butenko et al., 2020).

      (18) Fig S2: A: how was protein knockdown validated (especially for CPC2 where there was little obvious phenotype)? Fig S2B: the y-axis should read proportion of cells, not percentage. Fig S2E - NLS should be labelled.

      Thank you for pointing out the mistake in the labelling.

      (19) Fig S3: PAE plots should be labelled with protein names, not A-E. Similarly, the pIDDT plots should be labelled as in Fig 4I.

      We have corrected the labelling in Fig. S3.

      (20) Fig S5A-D - cell cycle stage labels are missing from images.

      Thank you for pointing out the missing cell cycle stage labels.

      Addition by editor:

      In line 126 the statement that KIN-A and KIN-B "associate with Aurora-AUK1, INCENP-CPC1 and CPC2 throughout the cell cycle" seems too strong. There is no direct evidence for this. Please re-phrase as "likely associate" or "suggest... that ... may...".

      We have modified that sentence according to the editor’s suggestion.

      References:

      Akiyoshi, B., and K. Gull. 2014. Discovery of Unconventional Kinetochores in Kinetoplastids. Cell. 156. doi:10.1016/j.cell.2014.01.049.

      Butenko, A., F.R. Opperdoes, O. Flegontova, A. Horák, V. Hampl, P. Keeling, R.M.R. Gawryluk, D. Tikhonenkov, P. Flegontov, and J. Lukeš. 2020. Evolution of metabolic capabilities and molecular features of diplonemids, kinetoplastids, and euglenids. BMC Biology 2020 18:1. 18:1–28. doi:10.1186/S12915-020-0754-1.

      Cormier, A., D.G. Drubin, and G. Barnes. 2013. Phosphorylation regulates kinase and microtubule binding activities of the budding yeast chromosomal passenger complex in vitro. J Biol Chem. 288:23203–23211. doi:10.1074/JBC.M113.491480. Endow, S.A., F.J. Kull, and H. Liu. 2010. Kinesins at a glance. J Cell Sci. 123:3420. doi:10.1242/JCS.064113.

      Fink, S., K. Turnbull, A. Desai, and C.S. Campbell. 2017. An engineered minimal chromosomal passenger complex reveals a role for INCENP/Sli15 spindle association in chromosome biorientation. J Cell Biol. 216:911–923. doi:10.1083/JCB.201609123.

      van der Horst, A., M.J.M. Vromans, K. Bouwman, M.S. van der Waal, M.A. Hadders, and S.M.A. Lens. 2015. Inter-domain Cooperation in INCENP Promotes Aurora B Relocation from Centromeres to Microtubules. Cell Rep. 12:380–387. doi:10.1016/J.CELREP.2015.06.038.

      Ishii, M., and B. Akiyoshi. 2020. Characterization of unconventional kinetochore kinases KKT10/19 in Trypanosoma brucei. J Cell Sci. doi:10.1242/jcs.240978.

      Jeyaprakash, A.A., C. Basquin, U. Jayachandran, and E. Conti. 2011. Structural Basis for the Recognition of Phosphorylated Histone H3 by the Survivin Subunit of the Chromosomal Passenger Complex. Structure. 19:1625–1634. doi:10.1016/J.STR.2011.09.002.

      Jeyaprakash, A.A., U.R. Klein, D. Lindner, J. Ebert, E.A. Nigg, and E. Conti. 2007. Structure of a Survivin–Borealin–INCENP Core Complex Reveals How Chromosomal Passengers Travel Together. Cell. 131. doi:10.1016/j.cell.2007.07.045.

      Jumper, J., R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873. 596:583–589. doi:10.1038/s41586-021-03819-2.

      Kang, J.S., I.M. Cheeseman, G. Kallstrom, S. Velmurugan, G. Barnes, and C.S.M. Chan. 2001. Functional cooperation of Dam1, Ipl1, and the inner centromere protein (INCENP)-related protein Sli15 during chromosome segregation. J Cell Biol. 155:763–774. doi:10.1083/JCB.200105029.

      Klein, U.R., E.A. Nigg, and U. Gruneberg. 2006. Centromere targeting of the chromosomal passenger complex requires a ternary subcomplex of Borealin, Survivin, and the N-terminal domain of INCENP. Mol Biol Cell. 17:2547–2558. doi:10.1091/MBC.E05-12-1133.

      Komaki, S., E.C. Tromer, G. De Jaeger, N. De Winne, M. Heese, and A. Schnittger. 2022. Molecular convergence by differential domain acquisition is a hallmark of chromosomal passenger complex evolution. Proc Natl Acad Sci U S A. 119. doi:10.1073/PNAS.2200108119/-/DCSUPPLEMENTAL.

      Li, Z. 2012. Regulation of the Cell Division Cycle in Trypanosoma brucei. Eukaryot Cell. 11:1180. doi:10.1128/EC.00145-12.

      Li, Z., J.H. Lee, F. Chu, A.L. Burlingame, A. Günzl, and C.C. Wang. 2008. Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei. PLoS One. 3. doi:10.1371/journal.pone.0002354.

      Mackay, A.M., D.M. Eckley, C. Chue, and W.C. Earnshaw. 1993. Molecular analysis of the INCENPs (inner centromere proteins): separate domains are required for association with microtubules during interphase and with the central spindle during anaphase. J Cell Biol. 123:373–385. doi:10.1083/JCB.123.2.373.

      Marchetti, M.A., C. Tschudi, H. Kwon, S.L. Wolin, and E. Ullu. 2000. Import of proteins into the trypanosome nucleus and their distribution at karyokinesis. J Cell Sci. 113 ( Pt 5):899–906. doi:10.1242/JCS.113.5.899.

      Nakajima, Y., A. Cormier, R.G. Tyers, A. Pigula, Y. Peng, D.G. Drubin, and G. Barnes. 2011. Ipl1/Aurora-dependent phosphorylation of Sli15/INCENP regulates CPC-spindle interaction to ensure proper microtubule dynamics. J Cell Biol. 194:137–153. doi:10.1083/JCB.201009137.

      Noujaim, M., S. Bechstedt, M. Wieczorek, and G.J. Brouhard. 2014. Microtubules accelerate the kinase activity of Aurora-B by a reduction in dimensionality. PLoS One. 9. doi:10.1371/JOURNAL.PONE.0086786.

      Okada, Y., and N. Hirokawa. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science (1979). 283:1152–1157. doi:10.1126/SCIENCE.283.5405.1152.

      Rice, S., A.W. Lin, D. Safer, C.L. Hart, N. Naber, B.O. Carragher, S.M. Cain, E. Pechatnikova, E.M. Wilson-Kubalek, M. Whittaker, E. Pate, R. Cooke, E.W. Taylor, R.A. Milligan, and R.D. Vale. 1999. A structural change in the kinesin motor protein that drives motility. Nature 1999 402:6763. 402:778–784. doi:10.1038/45483.

      Sablin, E.P., F.J. Kull, R. Cooke, R.D. Vale, and R.J. Fletterick. 1996. Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 1996 380:6574. 380:555–559. doi:10.1038/380555a0.

      Samejima, K., M. Platani, M. Wolny, H. Ogawa, G. Vargiu, P.J. Knight, M. Peckham, and W.C. Earnshaw. 2015. The Inner Centromere Protein (INCENP) Coil Is a Single α-Helix (SAH) Domain That Binds Directly to Microtubules and Is Important for Chromosome Passenger Complex (CPC) Localization and Function in Mitosis. J Biol Chem. 290:21460–21472. doi:10.1074/JBC.M115.645317.

      Schiffrin, B., S.E. Radford, D.J. Brockwell, and A.N. Calabrese. 2020. PyXlinkViewer: A flexible tool for visualization of protein chemical crosslinking data within the PyMOL molecular graphics system. Protein Sci. 29:1851–1857. doi:10.1002/PRO.3902.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      This publication applies 3D super-resolution STORM imaging to understanding the role of developmental neural activity in the clustering of retinal inputs to the mouse dorsal lateral geniculate nucleus (dLGN). The authors argue that retinal ganglion cell (RGC) synaptic boutons start forming clusters early in postnatal development (P2). They then argue that these clusters contribute to eye-specific segregation of retinal inputs by activity-dependent stabilization of nearby boutons from the same eye. The data provided is N=3 animals for each condition of P2, P4, and P8 animals in wild-type mice and in mice where early patterns of structured retinal activity are blocked.

      Strengths:

      The 3D storm imaging of pre and postsynaptic elements provides convincing high-resolution localization of synapses.

      The experimental design of comparing ipsilateral and contralateral RGC axon boutons in a region of the dLGN that is known to become contralateral is elegant. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling.

      Weaknesses:

      Based on previous literature, it is known that synapse density, synapse clustering, and synaptic specificity increase during postnatal development. Previous work has also shown that both the changes in synaptic clustering and synaptic specificity are affected by retinal activity. The data and analysis provided by the authors add little unambiguous evidence that advances this understanding.

      We agree with the reviewer that previous literature shows that synapse density, synapse clustering, and synaptic specificity increase during postnatal development and that these processes are affected by retinal activity. The majority of studies on synaptic refinement have been performed after eye-opening, when eye-specific segregation is already complete. In contrast, most studies of eye-specific segregation focus on axonal refinement phenotypes. To our knowledge, only a small number of experiments have examined retinogeniculate synaptic properties at the nanoscale during eye-specific segregation (1-4). Our broad goal is to understand the mechanisms of synaptogenesis and competition at the earliest stages of eye-specific refinement, when spontaneous retinal activity is a major driver of activity-dependent remodeling. We hope that readers will appreciate that there is still much to discover in this fascinating model system of synaptic competition.

      General problem 1: Most of the statistical analysis is limited to ANOVA comparison of axons from the contralateral and ipsilateral retina in the contralateral dLGN. The hypothesis that ipsilateral and contralateral axons would be statistically identical in the contralateral dLGN is not a plausible hypothesis so rejecting the hypothesis with P < X does not advance the authors' arguments beyond what was already known.

      General problem 2: Most of the interpretation of data is qualitative. While error bars are provided, these error bars are not used to draw conclusions. Given the small sample size (N=3), there is a large degree of uncertainty regarding the magnitude of changes (synapse size, number, specificity). The authors base their conclusions on the averages of these values when the likely degree of uncertainty could allow for the opposite interpretation.

      We appreciate the reviewer’s concerns regarding the use of ANOVA for statistical testing in the original submission. We have generated new figures that show confidence intervals for each analysis in the manuscript and these are included in the response to reviewers document below. To address the underlying concern that our N=3 sample size limits the interpretation of our results, we have revised the manuscript to be cautious in our interpretations and to discuss additional possibilities that are consistent with the anatomical data.

      General problem 3: Two of the four results sections depend on using the frequency of single active zone vGlut2 clusters near multiple active zone vGlut2 as a proxy for synaptic stabilization of the single active zone vGlut2 clusters by the multiple active zone vGlut2 clusters. The authors argue that the increased frequency of same-eye single active zone clusters relative to opposite-eye single active zone clusters means that multiple active zone vGlut2 clusters are selectively stabilizing single active zone clusters. There are other plausible explanations for this observation that are not eliminated. An increased frequency of nearby single active zone clusters would also occur if RGC axons form more than one synapse in the dLGN. Eye-specific segregation is, by definition, a relative increase in the frequency of nearby boutons from the same eye. The authors were, therefore, guaranteed to observe a non-random relationship between boutons from the same eye. The authors do compare their measures to a random model, but I could not find a description of the model. I would expect that the model would need to account for RGC arbor size, arbor structure, bouton number, and segregation independent of multi-active-zone vGlut2 clusters. The most common randomization for the type of analysis described here, a shift in the positions of single-active zone boutons, would not be adequate.<br /> In discussing the claimed cluster-induced stabilization of nearby boutons, the authors state that the specificity increases with age due to activity-dependent refinement. Their quantification does not support an increase in specificity with age. In fact, the high degree of clustering "specificity" they observe at P2 argues for the trivial same axon explanation.

      We agree with the reviewer that individual RGC axons form multiple synapses and that, over time, eye-specific segregation must increase the frequency of like-eye synapses relative to opposite-eye synapses. Indeed, our previous study of eye-specific refinement showed that at P8, the density of eye-specific inputs had increased for the dominant-eye and decreased for the non-dominant-eye (1). However, at postnatal day 4, contralateral and ipsilateral input densities were the same in the future contralateral-eye territory. One of our goals in this study was to determine if the process of synaptic clustering begins at these earliest stages of synaptic competition and, if so, whether it is influenced by retinal wave activity. It is plausible that the RGC axons from the same eye could initially form synapses randomly and, at some later stage, synapses may be selectively added to produce mature glomeruli. Consistent with this possibility, previous analysis of JAM-B RGC axon refinement showed the progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5).

      Regarding the randomization that we employed, we performed a repositioning of synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. We agree that this type of randomization cannot account for the fine scale structure of axons and dendrites, which we did not have access to in this four-color volumetric super-resolution data set. To address this, we have performed additional clustering analyses surrounding both single-active zone and multi-active zone synapses. This new analysis showed that there is a modest clustering effect around single-active zone synapses compared to complete randomization described above. We now present this information using a normalized clustering index for direct comparison of clustering between multi-active zone and single-active zone synapses. We have measured effect sizes and confidence intervals, which we present in point-by-point responses below. We have restructured the manuscript figures and discussion to provide a balanced interpretation of our results and the limitations of our study.

      Analysis of specific claims:

      Result Section 1

      Most of the figures show mean, error bars, and asterisks, but not the three data points from which these statistics are derived. Large changes in variance from condition to condition suggest that displaying the data points would provide more useful information.

      We thank the reviewer for their suggestion. We have updated all figures to display the means of all biological replicates as individual data points.

      Claim 1: Contralateral density increases more than ipsilateral in the contralateral region over the course of development. This claim is supported by the qualitative comparison of means and error bars in Figure 2D. The argument could be made quantitative by providing a confidence interval for synapse density increase for dominant and non-dominant synapse density. A confidence interval could then be generated for the difference in this change between the two groups. Currently, the most striking effect is a big difference in variance between P4 and P8 for dominant eye complex synapses. Given that N=3, I assume there is one extreme outlier here.

      We appreciate the comment and believe the reviewer was referring to the data presented in the original Figure 1D, rather than Figure 2D.

      We agree with the reviewer that our comment on the change in synapse density across ages was not quantitatively supported by the figure as we did not perform a proper age-wise statistical comparison. We have removed this claim in the revised manuscript.

      We also appreciate the suggestions to clarify the presentation of our statistical analyses and to utilize confidence interval measurements wherever possible. We present Author response image 1 below, showing the density of multi-AZ synapses in the contralateral-eye territory over time (P2-P8), for both CTB(+) contralateral (black) and CTB(-) ipsilateral inputs (red) featuring 5/95% confidence intervals:

      Author response image 1.

      More broadly, the reviewer has raised the concern that the low number of biological replicates (N=3) presents challenges in the use of ANOVA for statistical testing. We agree with the concern and have revised the manuscript to be cautious in our statistical tests and resulting claims. We have chosen to use paired T-tests to compare measurements of eye-specific synapse properties because these measurements were always made within each individual biological replicate (paired measurements). Below, we discuss our logic for this change and the effects on the results we present in the revised manuscript.

      Considering the above image:

      (1) ANOVA: In our initial submission, we used an ANOVA test which showed P<0.05 for the CTB(+) P4 vs. P8 comparison above, leading to our statement about an age-dependent increase in multi-AZ density. However, the figure above shows that P8 data has higher variance. Thus, the homogeneity of variance assumption of ANOVA may lead to false positives in this comparison.

      (2) Confidence interval for N=3: We calculated confidence intervals for P4 and P8 data (5/95% CI shown above). Overlap between the two groups indicates the true mean values of the two groups could be identical. However, the P8 confidence intervals (as well as other confidence intervals across other comparisons in the manuscript) also include the value of 0. This indicates there actually might be no multi-active zone synapses in the mouse dLGN. The failure arises because the low number of biological replicates (N=3 data points) precludes a reliable confidence interval measurement. CI measurements require sufficient sample sizes to determine the true population variance.

      (3) Difficulty in achieving sufficient sample sizes for CI analysis in ultrastructural studies of the brain: volumetric STORM experiments are technically complex and make use of sample preparation and analysis methods that are similar to volumetric electron microscopy (physical ultrathin sectioning and computational 3D stack alignment). For these technical reasons, it is difficult to collect imaging data from >10 mice for each group of data (e.g. age and tissue location) in one single project. Because of the technical challenges, most ultrastructural studies published to date present results from single biological replicates. In our STORM dataset, we collected imaging data of N=3 biological replicates for each age and genotype. We agree that in the future the collection of additional replicates will be important for improving the reliability of statistical comparisons in super-resolution and electron-microscopy studies. Continued advances in the throughput of imaging/analysis should help to make this easier over time. 

      (4) The use of paired T-tests: In this study, we have eye-specific CTB(+) and CTB(-) synapse imaging data from the same STORM fields within single biological replicates. When there is only one measurement from each replicate (e.g. synapse density, ratio of total synapses), using paired tests to compare these groups increases statistical power and does not assume similar variance. However, this limits our analysis to comparisons within each age, and not between ages. Accordingly, we have revised our discussion of the results and interpretations throughout the manuscript. When there are thousands of measurements of synapses from each replicate (e.g. Figure 2A-B on synapse volumes), we use a mixed linear model to analyze the variance. In the revised figures we present the results using standard error of the mean and link measurements from within the same individual replicates to show the paired data structure. In cases where specific comparisons are made across ages, we present 5/95% confidence interval measurements.

      Claim 2: The fraction of multiple-active zone vGlut2 clusters increases with age. This claim is weakly supported by a qualitative reading of panel 1E. The error bars overlap so it is difficult to know what the range of possible increases could be. In the text, the authors report mean differences without confidence intervals (or any other statistics). The reported results should, therefore, be interpreted as a description of their three mice and not as evidence about mice in general.

      We appreciate the reviewer’s concern that statistical accuracy of our synapse density comparisons over age is limited by the small sample size as discussed above. We have removed all strong claims about age-dependent changes in the density of multi-active zone and single-active zone synapses. Instead, we focus our analyses on comparisons between CTB(+) and CTB(-) synapse measurements, which are paired within each biological replicate. To specifically address the reviewer’s concern about figure panel 1E, we present Author response image 2 with confidence intervals below.

      Author response image 2.

      Figure S1. Panel A makes the point that the study could not be done without STORM by comparing the STORM images to "Conventional" images. The images are over-saturated low-resolution images. A reasonable comparison would be to a high-quality quality confocal image acquired with a high NA objective (~1.4) and low laser power (PSF ~ 0.2 x 0.2 x 0.6 um) that was acquired over the same amount of time it takes to acquire a STORM volume.

      We agree with the reviewer that the presentation of low-resolution conventional images is not necessary. We have deleted the panel and modified the text accordingly.

      Result section 2.

      Claim 1: The ipsi/contra (in contra LGN) difference in VGluT2 cluster volume increases with development. While there are many p-values listed, the main point is not directly quantified. A reasonable way to quantify the relative increase in volume could be in the form: the non-dominant volumes were 75%-95%(?) of the dominant volume at P2 and 60%-80% (?) at P8. The difference in change was -5 to 15%(?).

      We thank the reviewer for their helpful suggestion to improve the clarity of the results presented in this analysis of eye-specific synapse volumes. In our original report, we found differences in eye-specific VGluT2 volume at each time point (P2/P4/P8) in control mice (1). The original measurements used the entire synapse population. Here, we aimed to determine whether eye-specific differences in VGluT2 volumes were present for both multi-AZ synapses and single-AZ synapses, and whether one population may have a greater contribution to the previous population measurement that we reported. We found that at P4 (a time when the overall eye-specific synapse density is equivalent for both eyes in the dLGN), WT multi-AZ synapses showed a greater difference (372%) in eye-specific VGluT2 volume compared with single-AZ synapses (135%). In β2KO mice multi-AZ synapses showed a greater difference (110%) in eye-specific VGluT2 volume compared with single-AZ synapses (41%). In our initial manuscript submission, we included statistical comparisons of eye-specific volume differences across ages, but we did not highlight these differences in our discussion of the results. For clarity, we have removed all statistical comparisons across ages in the revised manuscript. We have modified the text to focus on eye-specific VGluT2 volume differences at P4 described above. To specifically address the reviewer’s question, we provide the percentage differences between multi- and single-AZ eye-specific synapses for each age/genotype below:

      Author response table 1.

      Claim 2: Complex synapses (vGlut2 clusters with multiple active zones) represent clusters of simple synapses and not single large boutons with multiple active zones. The authors argue that because vGlut2 cluster volume scales roughly linearly with active zone number, the vGlut2 clusters are composed of multiple boutons each containing a single active zone. Their analysis does not rule out the (known to be true) possibility that RGC bouton sizes are much larger in boutons with multiple active zones. The correlation of volume and active zone number, by itself, does not resolve the issue. A good argument for multiple boutons might be that the variance is smallest in clusters with 4 active zones (looks like it in the plot) since they would be the average of four active zones to vesicle pool ratios. It is very likely that the multi-active zone vGlut2 clusters represent some clustering and some multi-synaptic boutons. The reference cited by the authors as evidence for the presence of single active zone boutons in young tissue does not rule out the existence of multiple active zone boutons.

      We agree with the reviewer’s comments on the challenges of classifying multi-active zone synapses in STORM images as single terminals versus aggregates of terminals. To help address this, we have performed electron microscopy imaging of genetically labeled RGC axons and identified the existence of single retinogeniculate terminals with multiple active zones. Our EM imaging was limited to 2D sections and does not rule out the clustering of small, single- active zone synapses within 3D volumes. Future volumetric EM reconstructions will be informative for this question. We have significantly updated the figures and text to discuss the new results and provide a careful interpretation of the nature of multi-AZ synapses in STORM imaging data. 

      Several arguments are made that depend on the interpretation of "not statistically significant" (n.s.) meaning that "two groups are the same" instead of "we don't know if they are different". This interpretation is incorrect and materially impacts the conclusions.

      Several arguments are made that interpret statistical significance for one group and a lack of statistical significance for another group meaning that the effect was bigger in the first group. This interpretation is incorrect and materially impacts the conclusions.

      We thank the reviewer for raising these concerns. We have extensively revised the manuscript text to report the data in a more precise way without overinterpreting the results. All references to “N.S.” and associated conclusions have been either removed or substantiated with 5/95% confidence interval testing.

      Result Section 3.

      Claim 1: Complex synapses stabilize simple synapses. There are alternative explanations (mentioned above) for the observed clustering that negate the conclusions. 1) Boutons from the same axon tend to be found near one another. 2) Any form of eye-specific segregation would produce non-random associations in the analysis as performed. The authors compare each observation to a random model, but I cannot determine from the text if the model adequately accounts for alternative explanations.

      We thank the reviewer for their suggestion to consider alternative explanations for our results. We agree that our study does not provide direct molecular mechanistic data demonstrating synaptic stabilization effects. We have significantly revised the manuscript to be more cautious in our interpretations and specifically address alternative biological mechanisms that are consistent with the non-random arrangement of retinogeniculate synapses in our data.

      We agree with the reviewer that individual RGC axons form multiple synapses, however, nascent synapses might not always form close together. If synapses are initially added randomly within RGC axons, eye-specific segregation may conclude with a still-random pattern of dominant-eye inputs. At some later stage, synapses may be selectively refined to produce mature glomeruli. Consistent with this, individual RGCs undergo progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5). One of our goals in this work was to determine if the process of synaptic clustering begins at the earliest stages of synapse formation and, if so, whether it is influenced by retinal wave activity.

      To measure synaptic clustering in our STORM data, we used a randomization of single-AZ synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. Multi-AZ centroid positions were held fixed. Comparing the randomized result to the original distribution, we found a higher fraction of single-AZ synapse associated with multi-AZ synapses, arguing for a non-random clustering effect. However, we agree with the reviewer’s concern that this type of randomization cannot account for the fine scale structure of axons, which we did not have access to in this four-color volumetric super-resolution data set. Thus, there could still be errors in a purely volumetric randomization (e.g. the assignment of synapses to regions in the volume that would not be synaptic locations in the original neuropil), which would effectively decrease the measured degree of clustering after the randomization. To address this, we have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ position with the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, any measured differences in the degree of clustering reflect the synapse type.

      We have updated Figure 3 in the revised manuscript to present the relative clustering index described above. We have updated the results, discussion, and methods sections accordingly.

      The authors claim that specificity increases over time. Figure 3b (middle) shows that the number of synapses near complex synapses might increase with time (needs confidence interval for effect size), but does not show that specificity (original relative to randomized) increases with time. The fact that nearby simple synapse density is always (P2) very different from random suggests a primarily non-activity-dependent explanation. The simplest explanation is that same-side boutons could be from the same axon whereas different-side axons could not be.

      We have significantly revised the analysis and presentation of results in Figure 3 to include a comparative measurement of synaptic clustering between multi-AZ and single-AZ synapses (discussed above). The data presented in the original Figure 3B have been moved to Supplemental Figure 4. Statistical comparisons in Figure S4 between the original and randomized synapse distributions are limited to within-age measurements. Cross-age comparisons were not performed or presented. To address the reviewer’s question concerning CI analysis in the original Figure 3B, we provide Author response image 3 below showing 5/95% confidence intervals for WT mice:

      Author response image 3.

      Claim 2: vGlut2 clusters more than 1.5 um away from multi-active zone vGlut2 clusters are not statistically significantly different in size than vGlut2 clusters within 1.5 um of multi-active zone vGlut2 clusters. Therefore "activity-dependent synapse stabilization mechanisms do not impact simple synapse vesicle pool size". The specific measure of 1.5 um from multi-active zone vGlut2 clusters does not represent all possible synapse stabilization mechanisms.

      We agree with the reviewer that this specific measure does not capture all possible synapse stabilization mechanisms. We have modified the text in the revised manuscript throughout to be more cautious in our data interpretation and have included additional discussion of alternative mechanisms consistent with our results.

      Result Section 4.

      Claim: The proximity of complex synapses with nearby simple synapses to other complex synapses with nearby simple synapses from the same eye is used to argue that activity is responsible for all this clustering.

      It is difficult to derive anything from the quantification besides 'not-random'. That is a problem because we already know that axons from the left and right eye segregate during the period being studied. All the measures in Section 4 are influenced by eye-specific segregation. Given this known bias, demonstrating a non-random relationship (P<X) doesn't mean anything. The test will reveal any non-random spatial relationship between same-eye and opposite-eye synapses.

      The results can be stated as: If you are a contralateral complex synapse, contralateral complex synapses that are also close to contralateral simple synapses will, on average, be slightly closer to you than contralateral complex synapses that are not close to contralateral ipsilateral synapses. That would be true if there is any eye-specific segregation (which there is).

      We appreciate the reviewer’s comments that our anatomical data are consistent with several possible mechanisms, suggesting the need for alternative interpretations of the results. In the original writing, we interpreted our results in the context of activity-dependent mechanisms of like-eye stabilization and opposite-eye competition. However, our results are also consistent with other mechanisms, including non-random molecular specification of eye-specific inputs onto subregions of postsynaptic target cells (e.g. distinct relay neuron dendrites). We have rewritten the manuscript to be more cautious in our interpretations and to provide a balanced discussion of alternative possibilities.

      Regarding the concern that the data in section four are influenced by eye-specific segregation, we previously found synapse density from both eyes is equivalent in the contralateral region at the P4 time point presented (1), which is consistent with binocular axonal overlap at this age. Within our imaging volumes, ipsilateral and contralateral inputs were broadly intermingled throughout the volume, and we did not find evidence for regional segregation with the imaging fields. By these metrics, retraction of ipsilateral inputs from the contralateral territory has not yet occurred.

      It is an overinterpretation of the data to claim that the lack of a clear correlation between vGlut2 cluster volume and distance to vGlut2 clusters with multiple active zones provides support for the claim that "presynaptic protein organization is not influenced by mechanisms governing synaptic clustering".

      We agree with the reviewer that our original language was imprecise in referring to presynaptic protein organization broadly. We have revised this text to present a more accurate description of the results.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye-specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes in this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of presynaptic organization based on Bassoon clustering, the complex and the simple synapse. By analyzing the relative densities and distances between these proteins over age, the authors conclude that the complex synapses promote the clustering of simple synapses nearby to form the future mature glomerular synaptic structure.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises. Using this approach, the authors find that simple synapses cluster close to complex synapses over age, that complex synapse density increases with age.

      Weaknesses:

      From these data, the authors conclude that the complex synapse serves to "promote clustering of like-eye synapses and prohibit synapse clustering from the opposite eye". However, the authors show no causal data to support these ideas. There are a number of issues that the authors should consider:

      (1) Clustering of retinal synapses is in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps simple synapses mature into complex synapses. Simple synapses may also represent ones that are in the process of being eliminated as previously described by Campbell and Shatz, JNeurosci 1992 (consider citing). Can the authors distinguish these scenarios from the ones that they conclude?

      We thank the reviewer for their thoughtful commentary and suggestions to improve our manuscript. We agree with the reviewer that our original interpretation of synaptic clustering by activity-dependent stabilization and punishment mechanisms is not directly supported by causal data. We have extensively revised the manuscript to take a more cautious view of the results and to discuss alternative mechanisms that are consistent with our data.

      During eye-specific circuit development, there is indeed increased synaptogenesis and, ultimately, RGC terminals are closely clustered within synaptic glomeruli. This process involves the selective addition and elimination of synapses. Bouton clustering has been shown to occur within individual RGC axons after eye-opening in the mouse (5). The convergence of other RGC types into clustered boutons has been shown at eye-opening by light and electron microscopy (3). There is also qualitative evidence that synaptic clusters may form earlier during eye-specific segregation in the cat (4). Our data provide additional evidence that synaptic clustering begins prior to eye-opening in the mouse (P2-P8). Although synapse numbers also increase during this period, the distribution of synapse addition is non-random. 

      Single-active zone synapses (we previously called these “simple”) may indeed mature into multi-active zone synapses (we previously called these “complex”). At the same time, single-active zone synapses may be eliminated. We believe that each of these events occurs as part of the synaptic refinement process. Our STORM images are static snapshots of eye-specific refinement, and we cannot infer the dynamic developmental trajectory of an individual synapse in our data. Future live imaging experiments in vivo/in situ will be needed to track the maturation and pruning of individual connections. We have expanded our discussion of these limitations and future directions in the manuscript.

      (2) The argument that "complex" synapses are the aggregate of "simple" synapses (Fig 2, S2) is not convincing.

      We agree with the reviewer’s concern about the ambiguous identity of complex synapses. To clarify the nature of multi-active zone synapses, we have performed RGC-specific dAPEX2 labeling to visualize retinogeniculate terminals by electron microscopy (EM). These experiments revealed the presence of synaptic terminals with multiple active zones. We have added images and text to the results section describing these findings. Our 2D EM images do not rule out the possibility that some multi-active zone synapses observed in STORM images are in fact clusters of individual RGC terminals. We have revised the text to provide a more accurate discussion of the nature of multi-active zone synapses.  

      (3) The authors use of the β2KO mice to assess changes in the organization of synaptic proteins in retinal terminals that have disrupted retinal waves. However, β2-nAChRs are also expressed in the dLGN and other areas of the brain and glutamatergic synapse development has been reported in the CNS independent of the disruption in retinal waves. This issue should be considered when interpreting the total reduced retinal synapse density in the dLGN of the mutant.

      We thank the reviewer for their suggestion to consider non-retinal effects of the germline deletion of the beta 2 subunit of the nicotinic acetylcholine receptor. Previously, Xu and colleagues reported the development of a conditional transgenic mouse model lacking β2-nAChR expression specifically in the retina (6). These retina-specific β2-nAChR mutant mice (Rx-β2cKO) have disrupted retinal wave properties and defects in eye-specific axonal segregation in binocular anterograde tracing experiments. This work suggests that the defects seen in germline β2-nAChR KO mice arise from defects in retinal wave activity rather than the loss of nicotinic receptors elsewhere in the brain. Additionally, the development of brainstem cholinergic inputs to the dLGN is delayed until the closure of the eye-specific segregation period (7), further suggesting a limited role for cholinergic transmission in the retinogeniculate refinement process.

      (4) Outside of a total synapse density difference between WT and β2KO mice, the changes in the spatial organization of synaptic proteins over development do not seem that different. In fact % simple synapses near complex synapses from the non-dominant eye in the mutant is not that different from WT at P8 (Fig 3C), an age when eye-specific segregation is very different between the genotypes. Can the authors explain this discrepancy?

      We thank the reviewer for their question concerning differences between synapse organization in WT versus β2KO mice. In the original presentation of Figure 3C at P4, the percentage of non-dominant eye single-AZ synapses near multi-AZ synapses increased at P4 in WT mice, but this did not occur in β2KO mice. This is consistent with our previous results showing that there is an increase in non-dominant eye synaptic density at this age, which does not occur in β2KO mice (1). At P8, this clustering effect is lost in WT as eye-specific segregation has taken place and non-dominant eye inputs have been eliminated. However, in β2KO mice, the overall synapse density is still low at this age. We interpret this result as a failure of synaptogenesis in the β2KO line, which leads to increased growth of individual RGC axons (8) and eye-specific overlap at P8 (9, 10). Evidence in support of this interpretation comes from live dynamic imaging studies of RGC axon branching in Xenopus and Zebrafish, showing that synapse formation stabilizes local axon branching and that disruptions of synapse formation or neurotransmission lead to enlarged axons (11-13).

      Our anatomical results do not provide a specific biological mechanism for the remaining clustering observed in the β2KO mice. We have revised our discussion of the fact that individual RGC axons may form multiple synaptic connections leading to clustering, which may be independent of changes in retinal wave properties in the β2KO mouse. We have also extensively revised the analysis and presentation of results in Figure 3 to directly compare synaptic clustering around both multi-AZ synapses and single-AZ synapses within the same imaging volumes.

      (5) The authors use nomenclature that has been previously used and associated with other aspects of retinogeniculate properties. For example, the phrases "simple" and "complex" synapses have been used to describe single boutons or aggregates of boutons from numerous retinal axons, whereas in this manuscript the phrases are used to describe vesicle clusters/release sites with no knowledge of whether they are from single or multiple boutons. Likewise, the use of the word "glomerulus" has been used in the context of the retinogeniculate synapse to refer to a specific pattern of bouton aggregates that involves inhibitory and neuromodulatory inputs. It is not clear how the release sites described by the authors fit in this picture. Finally the use of the word "punishment" is associated with a body of literature regarding the immune system and retinogeniculate refinement-which is not addressed in this study. This double use of the phrases can lead to confusion in the field and should be clarified by clear definitions of how they are used in the current study.

      We appreciate the reviewer’s concern that the terminology we used in the initial submission may cause confusion. We have revised the text throughout for clarity. “Simple” synapses are now referred to as “single-active zone synapses”. “Complex” synapses are now referred to as “multi-active zone synapses”. We have removed all text that previously referred to synaptic clusters in STORM images as glomeruli. We agree that we have not provided causal evidence for synaptic stabilization and punishment mechanisms, which would require additional molecular genetic studies. We have restructured the manuscript to remove these references and discuss our anatomical results impartially.  

      Reviewer #3 (Public Review):

      This manuscript is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label a number of active zones, and anti-Homer to identify postsynaptic densities. In their previous study, they compared the detailed synaptic structure across the development of synapses made with contra-projecting vs ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new analysis on the same data set in which they classify synapses into "complex" vs. "simple" and assess the number and spacing of these synapses. From these measurements, they make conclusions regarding the processes that lead to synapse competition/stabilization.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is also a plus.

      Weaknesses:

      The lack of details provided for the classification scheme as well as the interpretation of small effect sizes limit the interpretations that can be made based on these findings.

      We thank the reviewer for their reading of the manuscript and helpful comments to improve the work. We provide details on how single-active zone and multi-active zone synapses are classified in the methods section. We agree with the suggestion to be more careful in interpreting the results. We have extensively revised the manuscript to 1) include additional electron microscopy data demonstrating the presence of multi-active zone retinogeniculate synapses, 2) extend the synaptic clustering analysis to both single-active zone and multi-active zone synapses for comparison, and 3) improve the clarity and accuracy of the discussion throughout the manuscript.

      (1) The criteria to classify synapses as simple vs. complex is critical for all of the analysis in this study. Therefore this criteria for classification should be much more explicit and tested for robustness. As stated in the methods, it is based on the number of active zones which are designated by the number of Bassoon clusters associated with a Vglut2 cluster (line 697). A second part of the criteria is the size of the presynaptic terminal as assayed by "greater Vglut2 signal" (line 116). So how are these thresholds determined? For Bassoon clusters, is one voxel sufficient? Two? If it's one, how often do they see a Bassoon positive voxel with no Vglut2 cluster and therefore may represent "noise"? There is no distribution of Bassoon volumes that is provided that might be the basis for selecting this number of sites. Unfortunately, the images are not helpful. For example, does P8 WT in Figure 1B have 7 or 2? According to Figure 2C, it appears the numbers are closer to 2-4.

      The Vglut volume measurements also do not seem to provide a clear criterion. Figure 2 shows that the distributions of Vglut2 cluster volumes for complex and for simple synapses are significantly overlapping.

      The authors need to clarify the quantitative approach used for this classification strategy and test how sensitive the results of the study are to how robust this strategy is

      We thank the reviewer for their question concerning the STORM data analysis. Here we provide a brief overview of the complete analysis details, which are provided in the methods section.

      Our raw STORM data sets consisted of spectrally separate volumetric imaging channels of VGluT2, Bassoon, and Homer1 signals. For each of these channels, raw STORM data were processed by 1) application of the corresponding low-resolution conventional image of each physical section to the STORM data to filter artifacts in the STORM image which do not appear in the conventional image, 2) STORM images are then thresholded using a 2-factor Otsu threshold that removes low-intensity background noise while preserving all single-molecule localizations that correspond to genuine antibody labeling as well as non-specific antibody labeling in the tissue, 3) application of the MATLAB function “conncomp” to identify connected component voxel in 3D across the image stack. Clusters are only kept for further analysis steps if they are connected across at least 2 continuous physical sections (140 nm Z depth). 4) for every connected component (clusters corresponding to genuine antibody labeling and background labeling), we measure the volume and signal density (intensity/volume) for every cluster in the dataset, 5) a threshold is applied to retain clusters that have a higher volume and lower signal density. We exclude signals that have low-volume and high-density, which correspond to single antibody labels. This analysis retains larger clusters that correspond to synaptic objects and excludes non-specific antibody background. 

      The average size of WT synaptic Bassoon clusters ranges from 55 - 3532 voxels (0.00092~0.059 μm<sup>3</sup>), with a median size of 460 voxels (0.0077 μm<sup>3</sup>).

      The average size of WT synaptic VGluT2 clusters ranges from 50 -73752 voxels (0.00084~1.2 μm<sup>3</sup>), with a median size of 980 voxels (0.016 μm<sup>3</sup>).

      The average size of WT synaptic Homer1 clusters ranges from 63-7118 (0.0010~0.12 μm3), with a median size of 654 voxels (0.011 μm<sup>3</sup>).

      In practice, any Bassoon/VGluT2/Homer1 clusters with <10 voxels are immediately filtered at the Otsu thresholding step (2) above.

      The reviewer is correct that we often see Bassoon(+) clusters that are not associated with VGluT2, and these may reflect synapses of non-retinal origin or retinogeniculate synapses that lack VGluT2 expression. To identify retinogeniculate synapses containing VGluT2, we performed a synapse pairing analysis that measured the association between VGluT2 and Bassoon clusters after the synapse cluster filtering described above. We first measured the centroid-centroid distance from each VGluT2 cluster to the closest cluster in the Bassoon channel. We next quantified the signal intensity of the Bassoon channel within a 140 nm shell surrounding each VGluT2 cluster. A 2D histogram was plotted based on the measured centroid-centroid distances and opposing channel signal densities of each cluster. Paired clusters with closely positioned centroids and high intensities of apposed channel signal were identified using the OPTICS algorithm (14).

      In the original Figure 1B, the multi-active zone synapse in WT at P8 had two Bassoon clusters. To clarify this, we have revised the images in Figure 1 to include arrowheads that point to individual active zones. We have also revised Supplemental Figure 1 to show volumetric renderings of individual example synapses that help illustrate the 3D structure of these multi-active zone inputs. All details about synapse analysis and synapse pairing are provided in the methods section.

      (2) Effect sizes are quite small and all comparisons are made on medians of distributions. This leads to an n=3 biological replicates for all comparisons. Hence this small n may lead to significant results based on ANOVAS/t-tests, but the statistical power of these effects is quite weak. To accurately represent the variance in their data, the authors should show all three data points for each category (with a SD error bar when possible). They should also include the number of synapses in each category (e.g. the numerators in Figure 1D and the denominators for Figure 1E). For other figures, there are additional statistical questions described below.

      We thank the reviewer for their suggestion to improve the presentation of our results. We have added all three data points (individual biological replicates) to each figure plot when applicable. We have also included a supplemental table (Table S1) listing total eye-specific synapse numbers of each type (mAZ and sAZ) and AZ number for each biological replicate in both genotypes.

      (3) The authors need to add a caveat regarding their classification of synapses as "complex" vs. "simple" since this is a terminology that already exists in the field and it is not clear that these STORM images are measuring the same thing. For example, in EM studies, "complex" refers to multiple RGCs converging on the same single postsynaptic site. The authors here acknowledge that they cannot assign different AZs to different RGCs so this comparison is an assumption. In Figure 2 they argue this is a good assumption based on the finding that the Vglut column/active zone is constant and therefore each represents a single RGC. However, the authors should acknowledge that they are actually seeing quite different percentages than those in EM studies. For example, in Monavarfeshani et al, eLife 2018, there were no complex synapses found at P8. (Note this study also found many more complex vs. simple synapses in the adult - 70% vs. the 20% found in the current study - but this difference could be a developmental effect). In the future, the authors may want to take another data set in the adult dLGN to make a direct comparison based on numbers and see if their classification method for complex/simple maps onto the one that currently exists in the literature.

      We appreciate the reviewer’s comment that the use of the terms “complex” and “simple” may cause confusion. We have significantly revised the manuscript for clarity: 1) we now refer to “complex” synapses as “multi-active zone synapses” and “simple” synapses as “single-active zone synapses. 2) We have performed electron microscopy analysis of dAPEX2-labeled retinogeniculate projections to confirm the existence of large synaptic terminals with multiple active zones. 3) We have expanded our discussion of previous electron microscopy results describing a lack of axonal convergence at P8 (3). 4) We have added a discussion on how individual RGCs may form multiple synapses in close proximity within their axonal arbor, which would create a clustering effect.

      We agree that it will be informative to collect a STORM data set in the adult mouse dLGN and we look forward to working on this project to compare with EM results in the future.  

      (4) Figure 3 assays the relative distribution of simple vs. complex synapses. They found that a larger percentage of simple synapses were within 1.5 microns of complex synapses than you would expect by chance for both ipsi and contra projecting RGCs, and hence conclude that complex synapses are sites of synaptic clustering. In contrast, there was no clustering of ipsi-simple to contra-complex synapses and vice versa. The authors also argue that this clustering decreases between P4 and P8 for ipsi projecting RGCs.

      This analysis needs much more rigor before any conclusions can be drawn. First, the authors need to justify the 1.5-micron criteria for clustering and how robust their results are to variations in this distance. Second, these age effects need to be tested for statistical significance with an ANOVA (all the stats presented are pairwise comparisons to means expected by random distributions at each age). Finally, the authors should consider what n's to use here - is it still grouped by biological replicate? Why not use individual synapses across mice? If they do biological replicates, then they should again show error bars for each data point in their biological replicates. And they should include the number of synapses that went into these measurements in the caption.

      We appreciate the suggestion to improve the rigor of our analysis of synaptic clustering presented in Figure 3. We have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ synapses and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ positions within the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, the measured differences in the degree of clustering reflect a synapse type-specific effect.

      We have also updated Supplemental Figure 3 showing the results of varying the search radius from 1-4 μm for both contralateral- and ipsilateral-eye synapses. The results showed that a search radius of 1.5 μm resulted in the largest difference between the original synapse distribution and a randomized synapse distribution (shuffling of single-active zone synapse position while holding multi-active zone synapse position fixed).

      Finally, we have removed all statistical comparisons of single measurements (means or ratios) across ages from the manuscript. We focus our statistical analysis on paired data comparisons within individual biological replicates.

      For the analysis of synapse clustering, we grouped the data by biological replicates (N=3) to look for a global effect on synapse clustering. In the revised manuscript, we added data points for each replicate in the figure and included the number of synapses in Supplementary Table 1.

      (5) Line 211-212 - the authors conclude that the absence of clustered ipsi-simple synapses indicates a failure to stabilize (Figure 3). Yet, the link between this measurement and synapse stabilization is not clear. In particular, the conclusion that "isolated" synapses are the ones that will be eliminated seems to be countered by their finding in Figure 3D/E which shows that there is no difference in vesicle pool volume between near and far synapses. If isolated synapses are indeed the ones that fail to stabilize by P8, wouldn't you expect them to be weaker/have fewer vesicles? Also, it's hard to tell if there is an age-dependent effect since the data presented in Figures 3D/E are merged across ages.

      We thank the reviewer for their suggestion to clarify the results in Figure 3. Based on the measured eye-specific differences in vesicle pool size and organization, we also expected that synapses outside of clusters would show a reduced vesicle population. However, across all ages, we found no differences in the vesicle pool size of single-active zone synapses based on their proximity to multi-active zone synapses. Below, we show cumulative distributions of these results across all ages (P2/P4/P8) for WT mice CTB(+) data. Statistical tests (Kolmogorov-Smirnov tests) show no significant differences. P = 0.880, 0.767, 0.494 respectively. Separate 5/95% confidence interval calculations showed overlap between far and near populations at each age.

      Author response image 4.

      To clarify the presentation of the results, we have changed the text to state that the “vesicle pool size of sAZ synapses is independent of their distance to mAZ synapses”. We have removed references to stabilization and punishment from the results section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Because none of the phenomena being measured can be expected to behave randomly (given what is already known about the system) and the sample size is small, I believe quantification of the data requires confidence intervals for effect sizes. Resolving the multi-bouton vs multi-active zone bouton with EM would also help.

      We thank the reviewer for their thorough reading of the manuscript and many helpful suggestions. We provide analysis with confidence intervals in a point-by-point response below. In the manuscript we revised our results and focused our statistical analyses on comparisons within the same biological replicate (paired effects). In addition, we have performed electron microscopy of RGC inputs to the dLGN at postnatal day 8 to demonstrate the presence of retinogeniculate synapses with multiple active zones.

      Figure 1:

      Please show data points in scatter bar plots and not just error bars.

      We have updated all plots to show data points for independent biological replicates.

      Please describe the image processing in more detail and provide an image in which the degree of off-target labeling can be evaluated.

      We have updated the description of the image processing in the methods sections. We have made all the code used in this analysis freely available on GitHub (https://github.com/SpeerLab). We have uploaded the raw STORM images of the full data set to the open-access Brain Imaging Library (16). These images can be accessed here: https://api.brainimagelibrary.org/web/view?bildid=ace-dud-lid (WTP2A data for example). All 18 datasets are currently searchable on the BIL by keyword “dLGN” or PI last name “Speer” and a DOI for the grouped dataset is pending.

      How does panel 1D get very small error bars with N = 3? Please provide scatter plots.

      We have updated panel 1D to show the means for each independent biological replicate.

      Line 129: over what volume is density measured? What are the n's? What is the magnitude (with confidence intervals) of increase?

      The volume we collected from each replicate was ~80μm*80μm*7μm (total volume ~44,800 μm3). N=3 biological replicates for each age, genotype, and tissue location. Because of concerns with the use of ANOVA for low sample numbers, we have removed a majority of the age-wise comparisons from the manuscript and instead focus on within-replicate paired data comparisons. Author response image 5 showa 5/95% confidence intervals for WT data (left panel) and β2KO data (right panel) is shown below:

      Author response image 5.

      The 5/95% CI range for the increase in synapse density from P2 to P8 for CTB(+) synapses is ~ -0.001 ~ 0.037 synapses / μm<sup>3</sup>.

      Line 131: You say that non-dominant increases and then decreases. It appears that the error bars argue that you do not have enough information to reliably determine how much or little density changes.

      Line 140: No confidence intervals. It appears the error bars allow both for the claimed effect of increased fraction and the opposite effect of decreased density.

      Because of concerns with the use of ANOVA for low sample numbers, we have removed age-wise comparisons of single-measurements (means and ratios) from the manuscript and instead focus on within-replicate paired data comparisons.

      Line 144: Confidence intervals would be a reasonable way to argue that fraction is not changed in KO: normal fraction XX%-XX%. KO fraction XX%-XX%.

      Author response image 6 shows panels for WT (left) and β2KO mice (right) with 5/95% CIs.

      Author response image 6.

      In the revised manuscript, we have updated the text to report the measurements, but we do not draw conclusions about changes over development.

      I find it hard to estimate magnitudes on a log scale.

      We appreciate the reviewer’s concern with the presentation of results on a log scale. Because the measured synapse properties are distributed logarithmically, we have elected to present the data on a log scale so that the distribution(s) can be seen clearly. Lognormal distributions enable us to use a mixed linear model for statistical analysis.

      Line 156: Needs confidence interval for difference.

      Line 158: Needs confidence interval for difference of differences.

      Line 160: Needs confidence interval for difference of differences.

      Why only compare at P4 where there is the biggest difference? The activity hypothesis would predict an even bigger effect at P8.

      Below is a table listing the mean volume (log10μm3) and [5/95%] confidence intervals for comparisons of VGluT2 signal between CTB(+) and CTB(-) synapses from Figure 2A and 2B:

      Author response table 2.

      Based on the values given above, the mean difference of differences and [5/95%] confidence intervals are listed below:

      Author response table 3.

      We added these values to the manuscript. We have also reported the difference in median values on a linear scale (as below) so that the readers can have a straightforward understanding of the magnitude.

      Author response table 4.

      We elected to highlight the results at P4 based on our previous finding that the synapse density from each eye-of-origin is similar at this time point (1).

      At P8, there is a decrease in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to an increase in VGluT2 volume within non-dominant eye synapses that survive competition between P4-P8.

      At P8 in the mutant, there is an increase in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to delayed synaptic maturation in β2KO mice.

      Line 171: The correct statistical comparison was not performed for the claim. Lack of * at P2 does not mean they are the same. Why do you get the same result for KO?

      We have revised the statistical analysis, figure presentation, and text to remove discussion of changes in the number of active zones per synapse over development based on ANOVA. We now report eye-specific differences at each time point using paired T-test analysis, which is mathematically equivalent to comparing the 5/95% confidence interval in the difference.

      Line 175: Qualitative claim. Correlation coefficients and magnitudes of correlation coefficients are not reported.

      Linear fitting slop and R square values are attached:

      Author response table 5.

      The values are added to the manuscript to support the conclusions.

      Line 177: n.s. does not mean that you have demonstrated the values are the same. An argument for similarity could be made by calculating a confidence interval a for potential range of differences. Example: Complex were 60%-170% of Simple.

      Author response image 7 with 5/95% CI is shown below (WT and B2KO):

      Author response image 7.

      Comparing the difference between multi-AZ synapse and single-AZ synapse revealed that the difference in average VGluT2 cluster volume per AZ is:

      Author response table 6.

      The values are added to the manuscript for discussion.

      Line 178: There is no reason to think that the vesical pool for a single bouton does not scale with active zone number within the range of uncertainty presented here.

      We have collected EM images of multi-AZ zone synapses and modified our discussion and conclusions in the revised text.

      Line 196: "non-random clustering increased progressively" is misleading. The density of the boutons increases for both the Original and Randomized. Given the increase in variance at P8, it is unlikely that the data supports the claim that the non-randomness increased. Would be easy to quantify with confidence intervals for a measure of specificity (O/R).

      We have revised the manuscript to remove analysis and discussion of changes in clustering over development. We have modified this section of the manuscript and figures to present a normalized clustering index that describes the non-random clustering effect present at each time point.

      Line 209: Evidence is for correlation, not causation and there is a trivial potential explanation for correlation.

      We appreciate the reviewer’s concern with over interpretation of the results. We have changed the text to more accurately reflect the data.

      Line 238:239: Authors failed to show effect is activity-dependent. Near/Far distinction is not necessarily a criterion for the effect of activity. The claim is likely false in other systems.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to more accurately reflect the data. 

      Line 265-266: Assumes previous result is correct and measure of vGlut2 provides information about all presynaptic protein organization.

      We thank the reviewer for pointing out the incorrect reference to all presynaptic protein organization. We have corrected the text to reference only the VGluT2 and Bassoon signals that were measured.

      Line 276: There are many other interpretations that include trivial causes. It is unclear what the measure indicates about the biology and there is no interpretable magnitude of effect.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to remove references to mechanisms of synaptic stabilization.

      Line 289: Differences cannot be demonstrated by comparing P-values. Try comparing confidence intervals for effect size or generate a confidence interval for the difference between the two groups.

      5/95% confidence intervals are given below for Figure 4C/D:

      Author response table 7.

      We have added these values to the manuscript to support our conclusion.

      Line 305: "This suggests that complex synapses from the non-dominant-eye do not exert a punishment effect on synapses from the dominant-eye" Even if all the other assumptions in this claim were true, "n.s." just means you don't know something. It cannot be compared with an asterisk to claim a lack of effect.

      We thank the reviewer for raising this concern. We have modified the text to remove references to synaptic punishment mechanisms in the results section.

      Below are the 5/95% confidence intervals for the results in Figure 4F:

      Author response table 8.

      We have added these values to the manuscript to support our conclusion.

      Line 308: "mechanisms that act locally". 6 microns is introduced based on differences in curves above(?). I don't see any analysis that would argue that longer-distance effects were not present.

      The original reference referred to the differences in the cumulative distribution measurements between multi-active zone synapses versus single-active zone synapses in their distance to the nearest neighboring multi-active zone synapse. For clarity, we have deleted the reference to the 6 micron distance in the revised text.

      Reviewer #2 (Recommendations For The Authors):

      (1) This data set would be valuable to the community. However, unless the authors can show experiments that manipulate the presence of complex synapses to test their concluding claims, the manuscript should be rewritten with a reassessment of the conclusions that is more grounded in the data.

      We thank the reviewer for their careful reading of the manuscript and we agree the original interpretations were not causally supported by the experimental results. We have made substantial changes to the text throughout the introduction, results, and discussion sections so that the conclusions accurately reflect the data.

      (2) To convincingly address the claim that "complex synapse" are aggregates of simple synapses, the authors should perform experiments at the EM level showing what the bouton correlates are to these synapses.

      We thank the reviewer for their suggestion to perform EM to gain a better understanding of retinogeniculate terminal structure. We generated an RGC-specific transgenic line expressing the EM reporter dAPEX2 localized to mitochondria. We have collected EM images of retinogeniculate terminals that demonstrate the presence of multiple active zones within individual synapses. These results are now presented in Figure 1. The text has been updated to reflect the new results.

      (3) Experiments using the conditional β2KO mice would help address questions of the contribution of β2-nAChRs in dLGN to the synaptic phenotype.

      We appreciate the reviewer’s concern that the germline β2KO model may show effects that are not retina-specific. To address this, Xu and colleagues generated a retina-specific conditional β2KO transgenic and characterized wave properties and defective eye-specific segregation at the level of bulk axonal tracing (6). The results from the conditional mutant study suggest that the main effects on eye-specific axon refinement in the germline β2KO model are likely of retinal origin through impacts on retinal wave activity. Additionally, anatomical data shows that brainstem cholinergic axons innervate the dLGN toward the second half of eye-specific segregation and are not fully mature at P8 when eye-specific refinement is largely complete (7). We agree with the reviewer that future synaptic studies of previously published wave mutants, including the conditional reporter line, would be needed to conclusively assess a contribution of non-retinal nAChRs. These experiments will take significant time and resources and we respectfully suggest this is beyond the scope of the current manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to be more transparent that they are using the same data set from the previous publication (right now it does not appear until line 471) and clarify what was found in that study vs what is being tested here.

      We thank the reviewer for their thoughtful reading of the manuscript and helpful recommendations to improve the clarity of the work. We have edited the text to make it clear that this study is a reanalysis of an existing data set. We have revised the text to discuss the results from our previous study and more clearly define how the current analysis builds upon that initial work. 

      (2) The authors restricted their competition argument in Figure 4 to complex synapses, but why not include the simple ones? This seems like a straightforward analysis to do.

      We appreciate the reviewer’s suggestion to measure spatial relationships between “clustered” and “isolated” single-AZ synapses as we have done for multi-AZ synapses in Figure 4. However, we are not able to perform a direct and interpretable comparison with the results shown for multi-AZ synapses. First, we would need to classify “clustered” and “isolated” single-AZ synapses. This classification convolves two effects: 1) a distance threshold to define clustering and 2) subsequent distance measurements between clustered synapses.

      If we apply an equivalent 1.5 μm distance threshold (or any other threshold) to define clustered synapses, the distance from each “clustered” single-AZ synapse to the nearest other single-AZ synapse will always be smaller than the defined threshold (1.5 μm). Alternatively, if all of the single-AZ synapses within each local 1.5 μm shell are excluded from the subsequent intersynaptic distance measurements, this will set a hard lower boundary on the distance between synaptic clusters (1.5 μm minimum). The two effects discussed above were separated in our original analysis of multi-AZ synapses defined as “clustered” and “isolated” based on their relationship to single-AZ synapses, but these effects cannot be separated when analyzing single-AZ distributions alone.

      (3) The Discussion seems much too long and speculative from the current data that is represented - particularly without verification of complex synapses actually being inputs from different RGCs. Along the same lines, figure captions are misleading. For example, for Figure 4 - the title indicates that the complex synapses are driving the rearrangements. But of course, these are static images. The authors should use titles that are more reflective of their findings rather than this interpretation.

      We thank the reviewer for these helpful suggestions. We have changed each of the figure captions to more accurately reflect the results. We have deleted all of the speculative discussion and revised the remaining text to improve the accuracy of the presentation.

      (4) In the future, the authors may want to consider an analysis as to whether ipsi and contra projection contribute to the same synapses

      We agree with the reviewer that it is of interest to investigate the contribution of binocular inputs to retinogeniculate synaptic clusters during development. At maturity, some weak binocular input remains in the dominant-eye territory (15). To look for evidence of binocular synaptic interactions, we measured the percentage of the total small single-active zone synapses that were within 1.5 micrometers of larger multi-active zone synapses of the opposite eye. On average, ~10% or less of the single-active zone synapses were near multi-active zone synapses of the opposite eye. This analysis is presented in Supplemental Figure S3C/D.

      It is possible that some large mAZ synapses might reflect the convergence of two or more smaller inputs from the two eyes. Our current analyses do not rule this out. However, previous EM studies have found limited evidence for convergence of multiple RGCs (3) at P8 and our own EM images show that larger terminals with multiple active zones are formed by a single RGC bouton. Future volumetric EM reconstructions with eye-specific labels will be informative to address this question.

      References

      (1) Zhang C, Yadav S, Speer CM. The synaptic basis of activity-dependent eye-specific competition. Cell Rep. 2023;42(2):112085.

      (2) Bickford ME, Slusarczyk A, Dilger EK, Krahe TE, Kucuk C, Guido W. Synaptic development of the mouse dorsal lateral geniculate nucleus. J Comp Neurol. 2010;518(5):622-35.

      (3)Monavarfeshani A, Stanton G, Van Name J, Su K, Mills WA, 3rd, Swilling K, et al. LRRTM1 underlies synaptic convergence in visual thalamus. Elife. 2018;7.

      (4) Campbell G, Shatz CJ. Synapses formed by identified retinogeniculate axons during the segregation of eye input. J Neurosci. 1992;12(5):1847-58.

      (5) Hong YK, Park S, Litvina EY, Morales J, Sanes JR, Chen C. Refinement of the retinogeniculate synapse by bouton clustering. Neuron. 2014;84(2):332-9.

      (6) Xu HP, Burbridge TJ, Chen MG, Ge X, Zhang Y, Zhou ZJ, et al. Spatial pattern of spontaneous retinal waves instructs retinotopic map refinement more than activity frequency. Dev Neurobiol. 2015;75(6):621-40.

      (7) Sokhadze G, Seabrook TA, Guido W. The absence of retinal input disrupts the development of cholinergic brainstem projections in the mouse dorsal lateral geniculate nucleus. Neural Dev. 2018;13(1):27.

      (8) Dhande OS, Hua EW, Guh E, Yeh J, Bhatt S, Zhang Y, et al. Development of single retinofugal axon arbors in normal and beta2 knock-out mice. J Neurosci. 2011;31(9):3384-99.

      (9) Rossi FM, Pizzorusso T, Porciatti V, Marubio LM, Maffei L, Changeux JP. Requirement of the nicotinic acetylcholine receptor beta 2 subunit for the anatomical and functional development of the visual system. Proc Natl Acad Sci U S A. 2001;98(11):6453-8.

      (10) Muir-Robinson G, Hwang BJ, Feller MB. Retinogeniculate axons undergo eye-specific segregation in the absence of eye-specific layers. J Neurosci. 2002;22(13):5259-64.

      (11) Fredj NB, Hammond S, Otsuna H, Chien C-B, Burrone J, Meyer MP. Synaptic Activity and Activity-Dependent Competition Regulates Axon Arbor Maturation, Growth Arrest, and Territory in the Retinotectal Projection. J Neurosci. 2010;30(32):10939.

      (12) Hua JY, Smear MC, Baier H, Smith SJ. Regulation of axon growth in vivo by activity-based competition. Nature. 2005;434(7036):1022-6.

      (13) Rahman TN, Munz M, Kutsarova E, Bilash OM, Ruthazer ES. Stentian structural plasticity in the developing visual system. Proc Natl Acad Sci U S A. 2020;117(20):10636-8.

      (14) Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 1999;28(2):49–60.

      (15) Bauer J, Weiler S, Fernholz MHP, Laubender D, Scheuss V, Hübener M, et al. Limited functional convergence of eye-specific inputs in the retinogeniculate pathway of the mouse. Neuron. 2021;109(15):2457-68.e12.

      (16) Benninger K, Hood G, Simmel D, Tuite L, Wetzel A, Ropelewski A, et al. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research.  Practice and Experience in Advanced Research Computing; Portland, OR, USA: Association for Computing Machinery; 2020. p. 1–7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the use of antimony has been discontinued in India, the observation that there are Leishmania parasites that are resistant to antimony in circulation has been cited as evidence that these resistant parasites are now a distinct strain with properties that ensure their transmission and persistence. It is of interest to determine what are the properties that favor the retention of their drug resistance phenotype even in the absence of the selective pressure that would otherwise be conferred by the drug. The hypothesis that these authors set out to test is that these parasites have developed a new capacity to acquire and utilize lipids, especially cholesterol which affords them the capacity to grow robustly in infected hosts.

      We sincerely appreciate Reviewer 1's thoughtful and positive evaluation of our manuscript. We acknowledge that the reviewer has a few major concerns, and we would like to address them one by one in the following section.

      Major issues:

      (1) There are several experiments for which they do not provide sufficient details, but proceed to make significant conclusions.

      Experiments in section 5 are poorly described. They supposedly isolated PVs from infected cells. No details of their protocol for the isolation of PVs are provided. They reference a protocol for PV isolation that focused on the isolation of PVs after L. amazonensis infection. In the images of infection that they show, by 24 hrs, infected cells harbor a considerable number of parasites. Is it at the 24 hr time point that they recover PVs? What is the purity of PVs? The authors should provide evidence of the success of this protocol in their hands. Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We would like to thank the reviewer for correctly pointing out lack of details regarding PV isolation and its purity. There are multiple questions raised by the reviewer and we will answer them one by one in a point wise manner:

      Firstly, “Is it at the 24 hr time point that they recover PVs?”

      In the ‘Methods’ section of the original submission (Line number 606-611), there is a separate section on “Parasitophorous vacuole (PV) Isolation and cholesterol measurement”, where it is clearly mentioned, “24Hrs LD infected KCs were lysed by passing through a 22-gauge syringe needle to release cellular contents. Parasitophorous vacuoles (PV) were then isolated using a previously outlined protocol [Ref: 73].” However, we do acknowledge further details might be useful to enrich this section, and hence we would like to include the following details in the Methods section of the revised manuscript, Line 663-678 “Parasitophorous vacuoles (PV) were isolated using a previously outlined protocol with slight modifications [76]. 107 KCs were seeded in a 100 mm plate and allowed to adhere for 24Hrs. Following this infection was performed with Leishmania donovani (LD) for 24Hrs, the infected KCs were then harvested by gentle scraping and lysed through five successive passages through an insulin needle to ensure membrane disruption while preserving organelle integrity. The lysate was centrifuged at 200 × g for 10mins at 4°C to remove intact cells and large debris. The resulting supernatant was carefully collected and subjected to a discontinuous sucrose density gradient (60%, 40%, and 20%). The gradient was centrifuged at 700 × g for 25mins at 4°C to facilitate organelle separation. The interphase between the 40% and 60% sucrose layers, enriched with PVs, was carefully collected and subjected to a final centrifugation step at 12,000 × g for 25mins at 4°C. The supernatant was discarded, and the resulting pellet was enriched for purified parasitophorous vacuoles, suitable for downstream biochemical and molecular analyses. Cholesterol and protein contents in PV were determined by an Amplex Red assay kit and Bradford assay, respectively. Resulting data were represented as micrograms of cholesterol per microgram of protein.”

      Secondly, What is the purity of PVs? Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We appreciate the reviewer for pointing this critical lack of data in the submitted manuscript. In the revised manuscript, we have now provided data on the purity of isolated fraction by performing Confocal imaging and Western blot against PV and cytoplasmic fraction in the revised manuscript. We admit, as rightly pointed out by the reviewer we need to access the purity of isolated PV in our experiment. As suggested by the reviewer, we have included the results of this experiment in the Figure 3C i, C ii and C iii. Our results clearly showed an efficient PV isolation with demarcating LAMP-1 positive staining around LD amastigotes, which was further validated by Western Blot showing a significant enrichment of LAMP-1 specifically in the PV fraction. This has been included as (Line 225-234), in the revised manuscript which read as, “Parasitophorous vacuole fractions were isolated from LD-S and LD-R-infected KCs at 24Hrs p.i. using a previously established protocol [35]. Following isolation, PV purity was confirmed through LAMP-1 staining which showed a significant enrichment around isolated PV in Confocal microscopy (Figure 3C i). Purity of isolated PV fractions was further confirmed by Western blot which showed an enhanced enrichment of LAMP-1 for LD-R-PV fraction as compared to LD-S-PV fraction, while PV excluded cellular fraction showed residual LAMP-1 expression confirming the purity of the isolated PV fractions (Figure 3C ii, iii). Following isolation, protein concentration was measured for isolated PV fractions using the Bradford assay, and PV fractions from both LD-S- and LD-R-infected KCs were normalized accordingly.”

      (2) In section 6 they evaluate the mechanism of LDL uptake in macrophages. Several approaches and endocytic pathway inhibitors are employed. The authors must be aware that the role of cytochalasin D in the disruption of fluid phase endocytosis is controversial. Although they reference a study that suggests that cytochalasin D has no effect on fluid-phase endocytosis, other studies have found the opposite (doi: 10.1371/journal.pone.0058054). It wasn't readily evident what concentrations were used in their study. They should consider testing more than 1 concentration of the drug before they make their conclusions on their findings on fluid phase endocytosis.

      We thank the reviewer for this insightful comment and we apologise for missing out mentioning Cytochalasin-D concentration. To clarify, LDL uptake by LD-R infected KCs is LDL-receptor independent as clearly shown in Section 6, Figure 4A, Figure S4A, Figure S4B i and Figure S4B ii in the  submitted manuscript. In (Figure 4F and Figure S4D) of the  submitted manuscript, as referred by the Reviewer, Cytochalasin-D was used at a concentration of 2.5µg/ml. At this concentration, we did not observe any effect of Cytochalasin-D on LDL-receptor independent fluid phase endocytosis as intracellular LD-R amastigotes was able to uptake LDL successfully and proliferate in infected Kupffer cells, unlike Latranculin-A (5µM) treatment which completely inhibited intracellular proliferation of LD-R amastigotes by blocking only receptor independent Fluid phase endocytosis (Video 2A and 2B and Figure 4E in the  submitted manuscript). In fact, the study referred by the reviewer (doi: 10.1371/journal.pone.0058054), used a concentration of 4µg/ml Cytochalasin-D which did affect both LDL-receptor dependent and also receptor independent endocytosis in bone marrow derived macrophages. We would also like to clarify that in this work during our preliminary experiments we have also tested higher concentration Cytochalasin-D (5µg/ml). However, even at this higher concentration there were no significant effect of Cytochalasin-D on LD-R induced LDL-receptor independent fluid phase endocytosis as observed from intracellular LD-R amastigote count. Thus, we strongly believe that Cytochalasin-D does not have any impact on LD-R induced fluid phase endocytosis even at higher concentration. We have now included this data as Figure 4F and Figure S4E in the revised manuscript. Further, to clear out any confusion that readers might have, and also concentration of all the inhibitors used in the study will be mentioned in the Result section (Line 278 and 284), as well as in the revised Figure labels.

      (3) In Figure 5 they present a blot that shows increased Lamp1 expression from as early as 4 hrs after infection with LD-R and by 12 hrs after infection of both LD-S and LD-R. Increased Lamp1 expression after Leishmania infection has not been reported by others. By what mechanism do they suggest is causing such a rapid increase (at 4hrs post-infection) in Lamp-1 protein? As they report, their RNA seq data did not show an increase in LAMP1 transcription (lines 432-434).

      We would like to express our gratitude to the reviewer for highlighting the novelty of this observation. Indeed, to the best of our knowledge, no similar findings (we could not find reference of any quantitative Western blot for LAMP-1) have been reported previously in primary macrophages infected with Leishmania donovani (LD). Firstly, we would like to point out, as stated in the Methods section (Lines 556–566) of the  submitted manuscript: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This indicates that our actual study points correspond to approximately the 8th and 28th hours post-infection”. We just wanted to clarify the time point just to prevent any potential confusion.

      Now regarding LAMP1 expression, although we could not find any previous reports of its expression in LD infected primary macrophages, we would like to mention that there is a previous report (doi.org/10.1128/mBio.01464-20), which shows a similar punctuated LAMP-1 upregulation (as observed by us in Figure 5A i of the  submitted manuscript) in response to leishmania infection in nonphagocytic fibroblast. It is tempting to speculate that increased LAMP-1 expression observed in response to LD-R infected macrophages might be due to increased lysosomal biogenesis, required for degrading increased endocytosed-LDL into bioavailable cholesterol. However, since no change in LAMP-1 expression in RNA seq data (Figure 6, of the  submitted manuscript), we can only speculate that this is happening due to some post transcriptional or post translational modifications. But further work will definitely require to investigate this mechanism in details which is beyond the scope of this work. That is why, in the  submitted manuscript, (Line 432-435), we have discussed this, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. LD infection can regulate LAMP-1 expression, and the role of VSPs in LDLvesicle fusion with LD-R-PV is worthy of further investigation.”

      However, we agree with the reviewer that this might not be enough for the clarification. Hence in the revised manuscript this has been updated in the Discussion section (Line 465-472) as follows, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. How, LD infection can regulate LAMP-1 expression, and the role of VSPs in LDL-vesicle fusion with LD-R-PV is worthy of further investigation. It is possible and has been earlier reported that LD infection can regulate host proteins expression through post transcriptional and post translational modifications [61-63]. It is tempting to speculate that LD-R amastigote might be promoting an increased lysosomal biogenesis through any such mechanism to increase supply of bioavailable cholesterol through action of lysosomal acid hydrolases on LDL.”

      (4) In Figure 6, amongst several assays, they reported on studies where SPC-1 is knocked down in PECs. They failed to provide any evidence of the success of the knockdown, but nonetheless showed greater LD-R after NPC-1 was knocked down. They should provide more details of such experiments.

      Although we do understand the concern raised by the reviewer, this statement in question is factually incorrect. We would like to point out that in Figure 6F i, of the  submitted manuscript (Figure 6G ii in the revised manuscript), we have demonstrated decreased NPC-1 staining following transfection with NPC-1-specific siRNA, whereas no such reduction was observed with scrambled RNA. Similar immunofluorescence data confirming LDL-receptor knockdown has also been provided in Figure S4B i of the  submitted manuscript (Figure S4B ii in the revised manuscript). However, we acknowledge that the reviewer may be referring to the lack of quantitative validation of the knockdown via Western blot. We would like to clarify although, we already had this data, but we did not include it to avoid duplication to reduce the data density of the MS. But as suggested by the reviewer, we have included western blot for both NPC-1 and LDL-receptor knock down in the revised manuscript as Figure 6G i and Figure S4B i which again confirms an efficient Knock down of NPC-1 and LDLr as we have observed with IFA.

      Additionally, as suggested by the reviewer, we also noticed lack of details in Methods section of the  submitted manuscript, concerning siRNA mediated Knock down (KD). Therefore, we have included more details in the revised manuscript (Line 821-828), which read as, “For all siRNA transfections, Lipofectamine® RNAiMAX Reagent (Life Technologies, 13778100) specifically designed for knockdown assays in primary cells was used according to the manufacturer's instructions with slight modifications. PECs were seeded into 24-well plates at a density of 1x10<sup>5</sup> per well, and incubated at 37°C with 5% CO2. The transfection complex, comprising (1µl Lipofectamine® RNAiMAX and 50µl Opti MEM) and (1 µl siRNA and 50µl Opti MEM) mixed together directly added to the incubated PECs. Gene silencing was checked by IFA and by Western blot as mentioned previously.”

      Minor issues

      (1) There is an implication that parasite replication occurs well before 24hrs post-infection?

      Studies on Leishmania parasite replication have reported on the commencement of replication after 24hrs post-infection of macrophages (PMCID: PMC9642900). Is this dramatic increase in parasite numbers that they observed due to early parasite replication?

      We thank the reviewer for this insightful comment and appreciate the opportunity to clarify our findings. Indeed, as rightly assumed by the Reviewer, as our data suggest, and we also believe that this increase intracellular amastigotes number is a consequence of early replication of Leishmania donovani. As already mentioned in response to Point number 3 raised by Reviewer 1, we would again like to highlight that in the Methods section (Lines 562–566), it is clearly stated: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This effectively means that our actual study points correspond to approximately the 8th and 28th hours post-infection and we just want to mention it to avoid any confusion regarding experimental time points.

      Now, regarding specific concern related to Leishmania parasite replication, we would like to point out that the study referred by the reviewer on the commencement of replication after 24hrs, was conducted on Leishmania major, which may differ significantly from Leishmania donovani owing to its species and strain-specific characteristics (PMCID: PMC9642900). In fact, doubling time of Leishmania donovani (LD) has been previously reported to be approximately 11.4 hours (doi: 10.1111/j.1550-7408. 1990.tb01147.x). Moreover, multiple studies have indicated an exponential increase in intracellular LD amastigote number (more than two-fold increase) by 24Hrs post infection. (doi:10.1128/AAC.0119607, doi.org/10.1016/j.ijpara.2011.07.013). We also have a similar observation for both infected PEC and KC as depicted in Figure 1C and Figure S1C in the  submitted and revised manuscript) indicating that active replication is happening in this time frame for Leishmania donovani. Hence it was an informed decision from our side to focus on 24Hrs time point to perform the analysis on intracellular LD proliferation.

      (2) Several of the fluorescence images in the paper are difficult to see. It would be helpful if a blown-up (higher magnification image of images in Figure 1 (especially D) for example) is presented.

      We apologise for the inconvenience. Although we have provided Zoomed images for several other Figures in the  submitted manuscript and revised manuscript, like Figure 4, Figure 5, Figure 6 and Figure 8. However, this was not always doable for all the figures (like for Figure 1D), due to lack of space and Figure arrangements requirements. However, to accommodate Reviewer’s request we have provide a blown-up image for Figure 1D iii in the revised manuscript.

      (3) The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection. Also, they stop in vitro infection of Apoe/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.

      Reviewer has raised two independent concerns and we would like to address them individually.

      Firstly, “The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection.”

      We have already provided a detail justification for time point selection in our response to Reviewer 1, Minor Comment 1. As mentioned already we observed a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post-infection in KC, with replication rate appeared to be not increasing proportionally (not doubling) after that (Figure 1C in the revised manuscript). This early stage of rapid replication of LD amastigotes, therefore likely coincides with a critical period of lipid acquisition by intracellular amastigotes (Video 3A and 3B and Figure 4E in the  submitted manuscript and revised manuscript) and thus 24Hrs infected KC was specifically selected. In this regard, we would further like to add that at 72Hrs post-infection, we noticed a notable number of infected Kupffer cells began detaching from the wells with extracellular amastigotes probably egressing out. This phenomenon potentially reflects the severe impact of prolonged infection on Kupffer cell viability and adhesion properties as shown in Video 2 in the revised manuscript and Author response image 1. This observation further influenced our decision to conclude all infection studies in Kupffer cells by the 48Hrs post-infection, which necessitate to complete the infection time point at 24 Hrs, for allowing treatment of Amp-B for another 24 Hrs (Figure 8, and Figure S5, in the  submitted manuscript and revised manuscript). We acknowledge that we should have been possibly clearer on our selection of infection time points and as the Reviewer have suggested we have included this information in the revised manuscript (Line 134-141) for clear understanding of the reader. This read as, “Interestingly, as compared to a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post infected KC in response to LD-R infection, the number of intracellular amastigotes although increased significantly did not doubled from 24Hrs to 48Hrs p.i. suggesting exponential LD amastigote replication between 4Hrs and 24Hrs time frame and slowing down after that (Figure 1Ci, ii). Moreover, it was also noticed that at 72Hrs p.i. a notable number of infected-KC began detaching from the wells with extracellular amastigotes probably egressing out from the infected-KCs (Video 2). Thus, 24Hrs time point was selected to conduct all further infection studies involving KCs.”

      Author response image 1.

      Representative images of Kupffer cells infected with Leishmania donovani at 72Hrs post-infection showing a significant morphological change. Infected cells exhibit a rounded morphology and progressive detachment. Scale bar 10µm.

      Secondly “Also, they stop in vitro infection of Apoe-/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.”

      We apologize for not providing an explanation regarding the selection of the 11-day time point for  Apoe<sup>-/-</sup> experiments (Figure 2 of the  submitted and revised manuscript). Our rationale for this choice is based on both previous literature and the specific objectives of our study. Previous report suggests that Leishmania donovani infection in hypercholesteraemic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to C57BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe<sup>-/-</sup> which rectifies leishmania mediated defective antigen presentation at a later stage of infection (DOI 10.1194/jlr.M026914). Additionally, previous studies have also indicated that Leishmania donovani infection is well-established in vivo within 6 to 11 days post-infection in murine models (doi: 10.1128/AAC.47.5.1529-1535.2003). Given that in this experiment we particularly aimed to assess the early infection status (parasite load) in diet-induced hypercholesterolemic mice, we would like to argue that the selection of the 11-day time point was rational and well-aligned with our study objectives as this time point within this window are optimal for capturing initial parasite burden depending on initial lipid utilization, before host-driven immune clearance mechanisms could significantly alter infection dynamics. We have included this explanation in the revised manuscript (Line 170-179) as suggested by the Reviewer and this read as, “Previous report has suggested that LD infection in hypercholesteremic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to wild type BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe-/- which rectifies leishmania mediated defective antigen presentation at a later stage of LD infection [20]. Additionally, previous studies have also indicated that LD infection is well-established in mice within 6 to 11 days post-infection in murine models [33]. Thus to evaluate impact of initial lipid utilization on LD amastigote replication in vivo, BL/6 and diet-induced hypercholesterolemic Apoe<sup>-/-</sup> mice were infected with GFP expressing LD-S or LD-R promastigotes and sacrificed 11 days p.i.”

      Reviewer #2 (Public review):

      Summary:

      This study by Pradhan et al. offers critical insights into the mechanisms by which antimonyresistant Leishmania donovani (LD-R) parasites alter host cell lipid metabolism to facilitate their own growth and, in the process, acquire resistance to amphotericin B therapy. The authors illustrate that LD-R parasites enhance LDL uptake via fluid-phase endocytosis, resulting in the accumulation of neutral lipids in the form of lipid droplets that surround the intracellular amastigotes within the parasitophorous vacuoles (PV) that support their development and contribute to amphotericin B treatment resistance. The evidence provided by the authors supporting the main conclusions is compelling, presenting rigorous controls and multiple complementary approaches. The work represents an important advance in understanding how intracellular parasites can modify host metabolism to support their survival and escape drug treatment.

      We would like to sincerely thank the reviewer for appreciating our work and find the evidence compelling to address the issue of emergence of drug resistance in infection with intracellular protozoan pathogens.

      Strengths:

      (1) The study utilizes clinical isolates of antimony-resistant L. donovani and provides interesting mechanistic information regarding the increased LD-R isolate virulence and emerging amphotericin B resistance.

      (2) The authors have used a comprehensive experimental approach to provide a link between antimony-resistant isolates, lipid metabolism, parasite virulence, and amphotericin B resistance. They have combined the following approaches:

      a) In vivo infection models involving BL/6 and Apoe-/- mice.

      b) Ex-vivo infection models using primary Kupffer cells (KC) and peritoneal exudate macrophages (PEC) as physiologically relevant host cells.

      c) Various complementary techniques to ascertain lipid metabolism including GC-MS, Raman spectroscopy, microscopy.

      d) Applications of genetic and pharmacological tools to show the uptake and utilization of host lipids by the infected macrophage resident L. donovani amastigotes.

      (3) The outcome of this study has clear clinical significance. Additionally, the authors have supported their work by including patient data showing a clear clinical significance and correlation between serum lipid profiles and treatment outcomes.

      (4) The present study effectively connects the basic cellular biology of host-pathogen interactions with clinical observations of drug resistance.

      (5) Major findings in the study are well-supported by the data:

      a) Intracellular LD-R parasites induce fluid-phase endocytosis of LDL independent of LDL receptor (LDLr).

      b) Enhanced fusion of LDL-containing vesicles with parasitophorous vacuoles (PV) containing LD-R parasites both within infected KCs and PECs cells.

      c) Intracellular cholesterol transporter NPC1-mediated cholesterol efflux from parasitophorous vacuoles is suppressed by the LD-R parasites within infected cells.

      d) Selective exclusion of inflammatory ox-LDL through MSR1 downregulation.

      e) Accumulation of neutral lipid droplets contributing to amphotericin B resistance.

      Weaknesses:

      The weaknesses are minor:

      (1) The authors do not show how they ascertain that they have a purified fraction of the PV postdensity gradient centrifugation.

      (2) The study could have benefited from a more detailed analysis of how lipid droplets physically interfere with amphotericin B access to parasites.

      We have addressed both these concerns in the revised Version of this work as elaborated in the following section.

      Impact and significance:

      This work makes several fundamental advances:

      (1) The authors were able to show the link between antimony resistance and enhanced parasite proliferation.

      (2) They were also able to reveal how parasites can modify host cell metabolism to support their growth while avoiding inflammation.

      (3) They were able to show a certain mechanistic basis for emerging amphotericin B resistance.

      (4) They suggest therapeutic strategies combining lipid droplet inhibitors with current drugs.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Experimental suggestions:

      a) The authors could have provided a more detailed analysis of lipid droplet composition. This is a critically missing piece in this nice study.

      We completely agree with the Reviewer on this, a more detailed analysis of lipid droplets composition, dynamics of its formation and mechanism of lipid transfer to amastigotes residing within the PV would be worthy of further investigation. To answer the Reviewer, we are already conducting investigation in this direction and have very promising initial results which we are willing to share with the Reviewer as unpublished communication if requested. Since, we plan to address these questions independently, we hope Reviewer will understand our hesitation to include these data into the present work which is already data dense. We sincerely believe existence of lipid droplet contact sites with the PV along with the specific lipid type transfer to amastigotes and its mechanism requires special attention and could stand out as an independent work by itself.

      b) The macrophages (PEC, KC) could have been treated with latex beads as a control, which would indicate that cholesterol and lipids are indeed utilized by the Leishmania parasitophorous vacuole (PV) and essential for its survival and proliferation.

      We thank the reviewer for this nice suggestion, which we believe will further strengthen the conclusion of this work. We have now included this data as Figure 5E in the revised manuscript. Our data showed that infected KC harbouring both LD-R amastigotes and Fluorescent Latex Beads, showed a concentrated staining of Cholesterol around amastigotes, with no positive Cholesterol staining around internalized latex beads similar to LD-S amastigotes. This observation clearly confirmed specific lipid uptake in LD-R-PV, which can not be replicated by phagocytosed Latex Beads.

      c) HMGCoA reductase is an important enzyme for the mevalonate pathway and cholesterol synthesis. The authors have not commented on this enzyme in either host or parasite. Additionally, western blots of these enzymes along with SREBP2 could have been performed.

      We appreciate the concern and do see the point why reviewer is suggesting this. We would like to mention that regarding HMGCoA we already do have real time qPCR data which perfectly aligns with our RNAseq data (Figure 6 A i, in the  submitted and revised manuscript), showing significant downregulation specifically in LD-R infected KC as compared to uninfected control. We are including this data as Author response image 2. However, we did not proceed with checking the level of HMGCoA at the protein level as we noticed several previous reports have suggested that HMGCoA reductase remains under transcriptional control of SERBP2 (doi.org/10.1016/j.cmet.2011.03.005, doi: 10.1194/jlr.C066712, doi:10.1194/jlr.RA119000201), which acts the master regulator of mevalonate pathway and cholesterol synthesis (doi.org/10.1161/ATVBAHA.122.317320) and SERBP2 remains significantly downregulated in response to LD-R infection (Figure 6B i and Figure 6C in the  submitted and revised manuscript). However, as suggested by the Reviewer, we have updated this data in the revised manuscript as Figure 6D. Western blot data further confirmed a significant expected downregulation of HMGCoA in response to LD-R infection.

      Author response image 2.

      qPCR Analysis of HMGCR Expression Following Leishmania donovani Infection: Quantitative PCR analysis showing the relative expression of hmgcr (3-hydroxy-3-methylglutaryl-CoA reductase) in Kupffer cells after 24 hours of Leishmania donovani (LD) infection compared to uninfected control cells. Gene expression levels are normalized to β-actin as an internal control, and fold change is represented relative to the uninfected condition.

      d) The authors should discuss the expression pattern of any enzyme of the mevalonate pathway that they have found to be dysregulated in the transcript data.

      As per the reviewer’s suggestion, we have looked into the RNA seq data and observed that apart from hmgcr, hmgcs (3-hydroxy-3methylglutaryl-CoA synthase), another key enzyme in the mevalonate pathway, is significantly downregulated in host PECs in response to LD-R infection compared to the LD-S infection. We have Discussed this in the revised manuscript (Line 484-490), which read as “Further RNA sequencing data also revealed a significant downregulation of hmgcs (3-hydroxy-3-methylglutarylCoA synthase) in LD-R infected PECs as compared to LD-S infection. Downregulation of HMGCS which catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA to form 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA), which serves as an intermediate in both cholesterol biosynthesis and ketogenesis further supports our observation that LD-R-infected PECs preferentially rely on endocytosed low-density lipoprotein (LDL)-derived cholesterol rather than de novo synthesized cholesterol to support their metabolic needs.”

      e) The authors have followed a previously published protocol by Real F (reference 73) to enrich for parasitophorous vacuole (PV). However, they do not show how they ascertain that they have a purified fraction of the PV post-density gradient centrifugation. The authors should at least show Western blot data for LAMP1 for different fractions of density gradient from which they enriched the PV.

      As we previously stated in our response to Reviewer 1, in the revised manuscript we have included a detailed analysis of purity for different fractions during PV isolation. We sincerely appreciate the reviewer for highlighting this important concern and for suggesting an approach to conduct the experiment. We have included this data as Figure 3C i, ii, iii) in the revised manuscript. Our Imaging and Western blot data showed a significant enrichment of LAMP-1 in PV fraction, and we believe this result further reinforce the conclusions of our study on increased Cholesterol.

      (2) Presentation improvements:

      a) Add a clear timeline for infection experiments.

      As suggested by the Reviewer, we have included a schematic of Timelines for all the animal infection experiment (Figure 2Ci and Figure 7A,Fi) in the revised manuscript.

      b) Provide more details on patient sample collection and analysis.

      We have included more details on the sample collection in the Method section of the revised manuscript (Line 830-835), “Blood samples were collected from a total of 22 individuals spanning a diverse age range (8 to 70 years) by RMRI, Bihar, India. Among these, nine samples were obtained from healthy individuals residing in endemic regions to serve as controls. Serum was isolated from each blood sample through centrifugation, and the lipid profile was subsequently analysed using a specialized diagnostic kit (Coral Clinical System) following the manufacturer's protocol.”

      c) Consider reorganizing figures to better separate mechanistic and clinical findings.

      We would like to thank the reviewer for this suggestion. We felt that a major arrangement altering the sequence of the Figures as presented in the Original Submission will impact smooth flow of the story and hence, we did not disturb that. However, as suggested by the Reviewer we have performed major rearrangement within Figure 2, Figure 5 and Figure 6 and Figure 9 of the revised manuscript for a better representation of the data and convenience of the reader. Also, if the reviewer has specific suggestion regarding rearrangement of any particular figure, we will be happy to consider that.

      (3) Technical clarifications needed:

      a) Specify exact concentrations used for inhibitors.

      We apologise for this unwanted and unnecessary mistake. Please note we have now clearly mentioned the concentration of all the inhibitors used in this study in Result section and in the Figures of the revised manuscript. For easy understanding The revised section (Line 281-287) read as, “Finally, we infected the KCs with GFP expressing LD-R for 4Hrs, washed and allowed the infection to proceed in presence of fluorescent red-LDL and Latrunculin-A (5µM), a compound which specifically inhibits fluid phase endocytosis by inducing actin depolymerization [41]. Real-time fluorescence tracking demonstrated that Latrunculin-A treatment not only prevented the uptake of fluorescent red-LDL but also severely impacted intracellular proliferation of LD-R amastigotes (Video 2A and 2B and Figure 4E). In contrast, treatment with Cytochalasin-D, which alters cellular F-actin organization but does not affect fluid phase endocytosis [41], had no effect on the intracellular proliferation of LD-R irrespective of Cytochalasin-D concentrations (2.5µg/ml and 5µg/ml respectively) (Figure 4F and Figure S4D).”

      b) Include more details on image analysis methods.

      Please note that in specific sections like in Line numbers 574-579, 653-658, 10471049 of the  submitted manuscript, we have put special attention in describing the Image analysis process. However, we agree that in some particular cases more details will be appreciated by the reader. Hence, we have included an additional section of Image Analysis in the Methods section of the revised manuscript. This section (Line 727-739) read as, “Image processing and analysis were conducted using Fiji (ImageJ). For optimal visualization, Giemsa-stained macrophages (MΦs) were represented in grayscale to enhance contrast and structural clarity. To improve the distinction of different fluorescent signals, pseudo-colors were assigned to fluorescence images, ensuring better differentiation between various cellular components. For colocalization analysis (Figures 3, Figure 5, Figure 6, and Figure S2), we utilized the RGB profile plot plugin in ImageJ, which allows for the precise assessment of signal overlap by generating fluorescence intensity profiles across selected regions of interest. This approach provided quantitative insights into the spatial relationship between labelled molecules within infected cells. Additionally, for analyzing the distribution of cofilin in Figure 4, the ImageJ surface plot plugin was employed. This tool enabled three-dimensional visualization of fluorescence intensity variations, facilitating a more detailed examination of cofilin localization and its potential reorganization in response to infection.”

      c) Clarify statistical analysis procedures.

      We have already provided a dedicated section of Statistical Analysis in the Methods section of the Original Submission and also have also shown the groups being compared to determine the statistical analysis in the Figure and in the Figure Legends of the  submitted manuscript. Furthermore, as suggested by the Reviewer we have now also add additional clarification regarding the statistical analysis performed in the revised manuscript (Line 737-749). In the revised manuscript this section read as, “All statistical analyses were performed using GraphPad Prism 8 on raw datasets to ensure robust and reproducible results. For datasets involving comparisons across multiple conditions, one-way or two-way analysis of variance (ANOVA) was conducted, followed by Tukey’s post hoc test to assess pairwise differences while controlling for multiple comparisons. A 95% confidence interval (CI) was applied to determine the statistical reliability of the observed differences. For non-parametric comparisons across multiple groups, Wilcoxon rank-sum tests were employed, maintaining a 95% confidence interval, which is particularly useful for analysing skewed data distributions. In cases where only two groups were compared, Student’s t-test was used to determine statistical significance, ensuring an accurate assessment of mean differences. All quantitative data are represented as mean ± standard error of the mean (SEM) to illustrate variability within experimental replicates. Statistical significance was determined at P ≤ 0.05. Notation for significance levels: *P ≤ 0.05; **P ≤ 0.001; ***P ≤ 0.0001.”

      (4) Minor corrections:

      a) Methods section could benefit from more details on Raman spectroscopy analysis.

      We agree with this suggestion of the Reviewer. For providing more clarity have incorporate additional details in the Methodology for the Raman section of the revised manuscript (Line 638-649). The updated section will read as follows in the revised manuscript. “For confocal Raman spectroscopy, spectral data were acquired from individual cells at 1000× magnification using a 100 × 100 μm scanning area, following previously established specifications. After spectral acquisition, distinct Raman shifts corresponding to specific biomolecular signatures were extracted for further analysis. These included: Cholesterol (535–545 cm¹), Nuclear components (780–790 cm¹), Lipid structures (1262–1272 cm<sup>1</sup>), Fatty acids (1436–1446 cm<sup>1</sup>) Following spectral extraction, pseudo-color mapping was applied to highlight the spatial distribution of each biomolecular component within the cell. These processed spectral images are presented in Figure 3D1, where the first four panels illustrate the individual biomolecular distributions. A merged composite image was then generated to visualize the co-localization of these biomolecules within the cellular microenvironment, with the final panel specifically representing the spatial distribution of key biomolecules.”

      b) In the methods section line 609, page 14, the authors cite Real F protocol as reference 73 for PV enrichment. However, in the very next section on GC-MS analysis (lines 615-616, page 15), they state they have used reference 74 for PV enrichment. Can they explain why a discrepancy in PV isolation references this? Reference 74 does not mention anything related to PV isolation.

      Response: We would like to sincerely apologise for this confusion which probably raised from our writing of this section. We would like to confirm that our PV isolation protocol is based on the published work of Real F protocol (reference 73). However, in the next section of the submitted manuscript, GC-MS analysis was described and that was performed based on protocol referenced in 74. In the revised manuscript, we have avoided this confusion and made correction by putting the references in the proper places. In the revised manuscript, this section (Line 663-678) read as,

      “GC-MS analysis of LD-S and LD-R-PV

      Following a 24Hrs infection period, KCs were harvested, washed with phosphate-buffered saline (PBS), and pelleted. Subsequent to this, PV isolation was carried out using the previously described protocol [35]. After PV isolation Bradford assay was carried out for normalizing the protein concentration. The resulting equal volume of PV pellet was suspended in 20 ml of dichloromethane: methanol (2:1, vol/vol) and incubated at 4°C for 24hours. After centrifugation (11,000 g, 1 hour, 4°C), the supernatant was checked through thin layer chromatography (TLC) and subsequently evaporated under vacuum. The residue and pellet were saponified with 30% potassium hydroxide (KOH) in methanol at 80°C for 2 hours. Sterols were extracted with n-hexane, evaporated, and dissolved in dichloromethane. A portion of the clear yellow sterol solution was treated with N, O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) and heated at 80°C for 1 hour to form trimethylsilyl (TMS) ethers. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using a Varian model 3400 chromatograph equipped with DB5 columns (methyl-phenylsiloxane ratio, 95/5; dimensions, 30 m by 0.25 mm). Helium was used as the gas carrier (1 ml/min). The column temperature was maintained at 270°C, with the injector and detector set at 300°C. A linear gradient from 150 to 180°C at 10°C/min was used for methyl esters, with MS conditions set at 280°C, 70 eV, and 2.2 kV[77].

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      This study investigates the hypoxia rescue mechanisms of neurons by non-neuronal cells in the brain from the perspective of exosomal communication between brain cells. Through multi-omics combined analysis, the authors revealed this phenomenon and logically validated this intercellular rescue mechanism under hypoxic conditions through experiments. The study proposed a novel finding that hemoglobin maintains mitochondrial function, expanding the conventional understanding of hemoglobin. This research is highly innovative, providing new insights for the treatment of hypoxic encephalopathy.

      Overall, the manuscript is well organized and written, however, there are some minor/major points that need to be revised before this manuscript is accepted.

      We thank the reviewer for the detailed analysis of our study. Please find our answers to the points raised by the reviewer below.

      Major points:

      (1) Hypoxia can induce endothelial cells to release exosomes carrying hemoglobin, however, how neurons are able to actively take up these exosomes? It is possible for other cells to take up these exosomes also? This point needs to be clarified in this study.

      We sincerely appreciate the reviewer’s valuable comments. Regarding the question of how neurons actively uptake extracellular vesicles (EVs) carrying hemoglobin mRNA, existing studies suggest that EVs can enter cells via three main pathways: direct fusion, receptor-mediated endocytosis, and phagocytosis (PMID: 25288114). Our experimental results show that neurons are able to actively uptake EVs from endothelial cells without any treatment, and hypoxic conditions did not significantly increase the uptake of endothelial EVs by neurons (Fig. 5A and I). As for the specific uptake mechanism, there is currently no definitive conclusion. Some studies have found that hypoxic-ischemic injury may induce neurons to upregulate Cav-1, which could enhance the uptake of endothelial-derived EVs via Cav-1-mediated endocytosis (PMID: 31740664), but this mechanism still requires further validation.

      Regarding whether other cell types also take up these EVs, we focused on neurons based on existing literature and our own data, which show that the increased hemoglobin in the brain under hypoxic conditions is primarily found in neurons (Fig. 4H-J, PMID: 19116637). Moreover, we observed that, under hypoxic conditions, almost all non-neuronal supporting cells in the brain transcribe hemoglobin in large amounts and release it via EVs (Fig. 3J). Furthermore, we would like to emphasize that although neurons do not transcribe hemoglobin, we observed substantial expression of hemoglobin within neurons. This suggests that it may serve as an important protective mechanism for the brain. Therefore, the focus of our study is on the protective effect of EVs carrying hemoglobin mRNA on neurons, and the uptake by other cell types was not explored. We greatly appreciate the reviewer’s question, and we believe this is an intriguing avenue for further investigation. This could provide new insights for interventions in hypoxic brain injury, and we plan to delve into this topic in future studies.

      (2) The expression of hemoglobin in neurons is important for mitochondrial homeostasis, but its relationship with mitochondrial homeostasis needs to be further elucidated in the study.

      We sincerely appreciate the reviewer’s valuable comments. We fully agree with the importance of hemoglobin expression in neurons for mitochondrial homeostasis. In this study, we have confirmed through in vitro experiments that when neurons are treated with conditioned medium from endothelial cells, they exhibit increased hemoglobin expression. This, in turn, enhances their resistance to hypoxia by restoring mitochondrial membrane potential and increasing mitochondrial numbers, thereby effectively improving neuronal viability. Notably, this protective effect disappears when EVs are removed from the endothelial-conditioned medium or when hemoglobin in endothelial cells is disrupted, further supporting the notion that endothelial cells transfer hemoglobin via EVs, helping neurons express hemoglobin under hypoxic conditions and exert protective effects.

      In summary, hemoglobin primarily helps maintain mitochondrial membrane potential, thereby supporting the restoration of energy metabolism and production under hypoxic conditions, which effectively improves the neuronal resistance to hypoxia. Although we were unable to explore the specific mechanisms of hemoglobin’s role in mitochondrial homeostasis in detail within this study, we recognize the importance of this aspect and plan to further investigate how hemoglobin regulates mitochondrial homeostasis and function in neurons in future research.

      Once again, we greatly appreciate the reviewer’s insightful comments. We will continue to optimize our research direction and look forward to further elucidating these important biological mechanisms in future studies.

      Minor points:

      (1) In Figures 1-3, the authors use "Endo" to represent endothelial cells, while in Figures 4-7, the abbreviation "EC" is used. Please standardize the format.

      Thank you for the reviewer’s suggestion. We will use “EC” consistently to refer to endothelial cells throughout the manuscript to ensure uniformity.

      (2) In all qPCR statistical results, please italicize the gene names on the axis.

      Thank you for the reviewer’s valuable suggestion. We will make sure to italicize the gene names on the axis in all qPCR statistical results to adhere to the formatting requirements.

      (3) In the Western blot result of Figure 3C, what type of cell-derived exosomes does the Control group represent, and why can it be used as a control group for brain-derived exosomes?

      Thank you for the reviewer’s insightful question. In Fig. 3C, the control group (Control) represents the cell lysate sample, which serves as a positive control in the EVs Western blot analysis. In this experiment, the positive control is primarily used to validate the specificity of the antibody and the accuracy of the experimental procedure. We used cell lysate as the control to confirm that the antibody can detect EV-associated markers in the cell lysates, thus providing a comparative basis for the identification of brain-derived EVs.

      (4) In Figure 4F, the morphology of hemoglobin in the Con group and the H28d group is not entirely consistent with Figure 4H. Is this difference due to different experimental batches?

      Thank you for the reviewer’s careful observation. The observed difference may indeed be due to variations between different experimental batches. To ensure consistency of the results, we have updated the representative immunofluorescence images, which are now presented in Fig. 4H.

      (5) Supplement the transcription and expression levels of hemoglobin in neurons under different treatment conditions after medium exchange with exosome removal and medium exchange after HBA1 interference.

      Thank you for the reviewer’s valuable suggestions. We have added the experimental data regarding the exchange of culture medium after the removal of EVs. As shown in Fig. S6, the endothelial-derived medium without EVs does not enhance the hemoglobin levels in neurons under hypoxic conditions. Additionally, we have included the detection results of hemoglobin expression in neurons after HBA1 interference, as shown in Fig. S7E-F. The results indicate that the culture medium derived from HBA1-interfered endothelial cells also fails to help neurons increase hemoglobin expression under hypoxic conditions.

      (6) Figure S3 should be split to separately explain the increased exosome release induced by hypoxia, the non-toxic effect of endothelial cell culture medium on neurons, and the successful screening of the HBA1 interference plasmid.

      Thank you for the reviewer’s suggestions. Based on your feedback, we have split the original Fig. S3 into multiple parts to more clearly present the different experimental results. Specifically, the results of hypoxia-induced EVs release increase have been updated in Fig. S4, the non-toxic effects of endothelial cell culture medium on neurons are shown in Fig. S5, and the successful screening of the HBA1 interference plasmid is presented in Fig. S7.

      (7) Regarding the extracellular vesicles/exosomes, it should be expressed consistently in the whole manuscript.

      Thank you for the reviewer’s reminder. We will ensure that the term “extracellular vesicles” is used consistently throughout the manuscript.

      (8) In lines 70 and 80, the O2 should be changed to "O<sub>2</sub>".

      Thank you for the reviewer’s careful observation. We have corrected the formatting of “O2” to “O₂” in lines 70 and 80.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study with a lot of data. Some of these ideas are intriguing. But a few major points require further consideration.

      We thank the reviewer for the detailed assessment of our study and pinpointing its current weaknesses. Please find our answers to all comments below.

      Major points:

      (1) What disease is this model of whole animal hypoxia supposed to mimic? If one is focused on the brain, can one just use a model of focal or global cerebral ischemia?

      Thank you for the reviewer’s insightful question. The chronic hypoxia model we employed is designed to mimic the multi-organ damage caused by systemic hypoxia, which is relevant to clinical conditions such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. In contrast to focal or global cerebral ischemia models, the focus of our study is on how the brain, under extreme systemic hypoxia, utilizes endothelial cell-derived extracellular vesicles (EVs) to transfer hemoglobin mRNA, thereby protecting neurons and aiding the brain’s response to hypoxia-induced damage.

      We understand the reviewer’s concern that focal or global ischemia models are typically used to simulate localized brain hypoxia or ischemic injury. However, the core of our research is to explore the brain’s overall adaptive mechanisms under systemic hypoxic conditions. By using a systemic hypoxia model, we can more comprehensively simulate the effects of global hypoxia on the brain and uncover how the brain engages specific molecular mechanisms for self-protection. This approach offers a novel perspective on brain hypoxic-ischemic diseases and holds potential clinical applications, particularly in the study of stroke, vascular cognitive impairment and dementia (VCID), and related conditions.

      Additionally, we have observed that hemoglobin significantly increases in the brain in an animal model of focal ischemia (as shown in Author response image 1 below). This finding further supports the idea that hemoglobin upregulation may be a universal protective mechanism for the brain’s response to hypoxic damage. While this part of the research is still ongoing, preliminary results suggest that both systemic hypoxia and focal ischemia might trigger protective effects through hemoglobin regulation.

      Author response image 1.

      The expression level of Hba-a1 in the brain of VCID mouse.

      Therefore, the core of our study is to elucidate the brain’s self-protection mechanisms under systemic hypoxia, rather than focusing solely on cerebral ischemia models. We believe this approach provides new insights into the prevention and treatment of brain hypoxic-ischemic diseases, with significant clinical application potential.

      In light of this, we have added a related discussion to the manuscript, clearly explaining the rationale for choosing the systemic hypoxia model. The updated content can be found on P11, Line 13-21 as follows: “To investigate this phenomenon, we employed a chronic hypoxia model in which mice were exposed to 7% oxygen for 28 days. This model aims to mimic systemic hypoxia-induced multi-organ damage, a condition observed in diseases such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. The primary goal of this model is to explore how the brain adapts under extreme low-oxygen conditions and employs specific mechanisms to protect itself from hypoxia-induced damage. This approach provides valuable insight into diseases related to hypoxic-ischemic injury in the brain, including stroke and vascular dementia, offering a novel perspective for potential clinical applications.”

      (2) If this model subjects the entire animal to hypoxia, then other organs will also be hypoxic. Should one also detect endothelial upregulation and release of extracellular vesicles containing hemoglobin mRNA in non-CNS organs? Where do these vesicles go? Into blood?

      Thank you for the reviewer’s valuable feedback. Indeed, in a whole-body hypoxia model, other organs are also affected by hypoxia. Therefore, future research may need to investigate the upregulation of endothelial cells in organs other than the central nervous system, as well as the release of EVs containing hemoglobin mRNA from these organs. However, in this study, we isolated EVs from the brain tissue in situ following perfusion with physiological saline, a method that effectively eliminates the influence of EVs from blood or other organs. As a result, our primary focus was on studying how EVs released by brain endothelial cells are actively taken up by neurons to exert neuroprotective effects. The potential for these EVs to enter the bloodstream and their subsequent fate is indeed a topic worthy of further investigation. Future research could offer new insights into the cross-organ effects of systemic hypoxia.

      (3) What other mRNA are contained in the vesicles released from brain endothelial cells?

      Thank you for the reviewer’s valuable suggestions. We have further analyzed EVs derived from brain endothelial cells, and in addition to hemoglobin mRNA, these EVs also contain a variety of other mRNAs, including Vwf, Hbb-bt, Hba-a1, Hbb-bs, Hba-a2, Acer2, Angpt2, Ldha, Gm42418, Slc16a1, Cxcl12, B2m, Ctla2a, Ccnd1, and Hmgcs2 (Log2FC > 1.2). The biological processes associated with these mRNAs primarily involve: cell-substrate adhesion, regulation of cellular amide metabolic process, negative regulation of cell migration, negative regulation of cell motility, and negative regulation of cellular component movement. These processes may be closely related to the neuroprotective effects of endothelial cell EVs in a hypoxic environment, especially in terms of regulating cell behavior and maintaining cell structure and function. Additionally, these EVs contain multiple key factors associated with intracellular metabolism, movement, and migration, which may collectively influence neuronal function and survival. Notably, our study also found that mRNA of various hemoglobin subunits ranks among the top five in terms of abundance in the mRNA secreted by hypoxic endothelial EVs, further emphasizing the importance of hemoglobin mRNA in endothelial-derived EVs. Therefore, future research may explore the functions of these mRNAs and reveal how they act in concert to protect neurons from hypoxia-induced damage.

      We have updated and added these results in Fig. S4, and have further elaborated on the findings in the revised figure. Once again, we thank the reviewer for the attention and valuable suggestions regarding our work.

      (4) Where do the endothelial vesicles go? Only to neurons? Or to other cells as well?

      Thank you for the reviewer’s important question. As previously mentioned, the focus of this study is to investigate how EVs carrying hemoglobin mRNA influence neuronal function. Through a combined analysis of single-cell transcriptomics and EV transcriptomics from brain tissue, we found that, besides neurons, almost all types of supportive cells in the brain and their secreted EVs contain a significant amount of hemoglobin mRNA (Fig. 3J, 4B). Notably, although neurons do not transcribe hemoglobin mRNA themselves, under hypoxic conditions, neurons significantly increase hemoglobin expression, resulting in a phenomenon where the transcription and expression levels of hemoglobin in neurons are inconsistent. This phenomenon has been observed both in our study and others (Fig. 4H-J, PMID: 19116637). This observation led us to focus on the active uptake of EVs by neurons and the potential neuroprotective effects they might bring.

      Regarding whether other cell types uptake these EVs and their potential functions, although our current research is focused on neurons, this is indeed an important area for further investigation. Given that non-neuronal supportive cells may also transfer hemoglobin mRNA via EVs under hypoxic conditions, future research will further explore the uptake of EVs by different cell types and their roles in hypoxic adaptation.

      We are particularly interested in the hemoglobin expression in neurons under hypoxic conditions and consider neurons to be the primary expressers of hemoglobin, providing a new perspective for exploring the neuroprotective role of hemoglobin. We plan to delve deeper into these issues in future studies.

      (5) Neurons can express endogenous hemoglobin. Is it useful to subject neurons to hypoxia and then see how much the endogenous mRNA goes up? How large is the magnitude of endogenous hemoglobin gene upregulation compared to the hypothesized exogenous mRNA that is supposed to be donated from endothelial vesicles?

      Thank you for the reviewer’s valuable question. We have observed that, in the absence of treatment with endothelial cell-derived conditioned medium, there is no significant change in the transcription and expression levels of endogenous hemoglobin in neurons under hypoxic conditions (Fig. 5I, 6C-D). However, when neurons were treated with endothelial cell-conditioned medium, under the same hypoxic conditions, the transcription levels of hemoglobin increased by approximately 1.2-fold, and the expression levels increased by approximately 3.8-fold (Fig. 6B-D). Additionally, we have added pre-treatment experiments involving EVs depletion from the endothelial culture medium and HBA interference. The results show that, after these two pre-treatments, the conditioned medium lost its ability to enhance the transcription and expression of hemoglobin in neurons under hypoxic conditions (Fig. S6, S7D-F), further emphasizing the important role of endothelial EVs in this process. This finding indicates that endothelial-derived EVs significantly promote hemoglobin expression in neurons, and this effect is far greater than the upregulation of endogenous hemoglobin in neurons. Therefore, while neurons can express endogenous hemoglobin, exogenous hemoglobin significantly enhances its expression, which may help neurons tolerate the hypoxic environment and provide additional protection.

      (6) Finally, it may be useful to provide more information and data to explain how the expression of this exogenous endothelial-derived hemoglobin binds to neuronal mitochondria to alter function.

      Thank you for the reviewer’s valuable suggestion. As we previously mentioned, hemoglobin plays a protective role in neurons by maintaining mitochondrial membrane potential, helping neurons restore energy metabolism and energy production under hypoxic conditions. We fully agree on the importance of this research direction. Several studies have shown that when hemoglobin is expressed in neurons, it predominantly localizes to mitochondria, which aligns with the physiological process of heme synthesis within mitochondria (PMID: 23187133). Furthermore, in the brains of Parkinson’s disease patients, the localization of hemoglobin in neuronal mitochondria is altered compared to normal conditions (PMID: 27181046). Therefore, the interaction between hemoglobin and mitochondria plays a crucial role in neuronal function.

      Although existing research indicates the role of hemoglobin in neuronal mitochondria, studies in this area remain limited. We plan to further investigate how hemoglobin binds to mitochondria and its specific effects on mitochondrial function in our future work. We believe that a deeper understanding of this mechanism will provide essential theoretical insights into the effects of hypoxia on neurons and offer new potential strategies for neuroprotective therapies.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Vision is a highly active process. Humans move their eyes 3-4 times per second to sample information with high visual acuity from our environment, and where eye movements are directed is critical to our understanding of active vision. Here, the authors propose that the cost of making a saccade contributes critically to saccade selection (i.e., whether and where to move the eyes). The authors build on their own recent work that the effort (as measured by pupil size) that comes with planning and generating an eye movement varies with saccade direction. To do this, the authors first measured pupil size for different saccade directions for each participant. They then correlated the variations in pupil size obtained in the mapping task with the saccade decision in a free-choice task. The authors observed a striking correlation: pupil size in the mapping task predicted the decision of where to move the eyes in the free choice task. In this study, the authors provide a number of additional insightful analyses (e.g., based on saccade curvature, and saccade latency) and experiments that further support their claim that the decision to move the eyes is influenced by the effort to move the eyes in a particular direction. One experiment showed that the same influence of assumed saccade costs on saccade selection is observed during visual search in natural scenes. Moreover, increasing the cognitive load by adding an auditory counting task reduced the number of saccades, and in particular reduced the costly saccades. In sum, these experiments form a nice package that convincingly establishes the association between pupil size and saccade selection.

      We thank the reviewer for highlighting the novelty and cogency of our findings.

      In my opinion, the causal structure underlying the observed results is not so clear. While the relationship between pupil size and saccade selection is compelling, it is not clear that saccade-related effort (i.e., the cost of a saccade) really drives saccade selection. Given the correlational nature of this relationship, there are other alternatives that could explain the finding. For example, saccade latency and the variance in landing positions also vary across saccade directions. This can be interpreted for instance that there are variations in oculomotor noise across saccade directions, and maybe the oculomotor system seeks to minimize that noise in a free-choice task. In fact, given such a correlational result, many other alternative mechanisms are possible. While I think the authors' approach of systematically exploring what we can learn about saccade selection using pupil size is interesting, it would be important to know what exactly pupil size can add that was not previously known by simply analyzing saccade latency. For example, saccade latency anisotropies across saccade directions are well known, and the authors also show here that saccade costs are related to saccade latency. An important question would be to compare how pupil size and saccade latency uniquely contribute to saccade selection. That is, the authors could apply the exact same logic to their analysis by first determining how saccade latencies (or variations in saccade landing positions; see Greenwood et al., 2017 PNAS) vary across saccade directions and how this saccade latency map explains saccade selection in subsequent tasks. Is it more advantageous to use one or the other saccade metric, and how well does a saccade latency map correlate with a pupil size map?

      We thank the reviewer for the detailed comment. 1) The reviewer first points out the correlational nature of many of our results. Thereafter, 2), the reviewer asks whether saccade latencies and landing precision also predict saccade selection, and could be these potential predictors be considered alternative explanations to the idea of effort driving saccade selection? Moreover, what can pupil size add to what can be learned from saccade latency?

      In brief, although we report a combination of correlational and causal findings, we do not know of a more parsimonious explanation for our findings than “effort drives saccade selection”. Moreover, we demonstrate that oculomotor noise cannot be construed as an alternative explanation for our findings.

      (1) Correlational nature of many findings.

      We acknowledge that many of our findings are predominantly correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified directed hypothesis. Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most – which actually constitutes causal evidence for our hypothesis. A minimal oculomotor noise account would not directly predict a reduction in saccade rate under higher cognitive demand. To summarize, we have a combination of correlational and causal findings, although mediators cannot be ruled out fully for the latter. That said, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection (see following points for saccade latencies). We now address causality in the discussion for transparency and point more explicitly to the second visual search experiment for causal evidence.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      (2) Do anisotropies in saccade latencies constitute an alternative explanation?

      First of all, we would like to to first stress that differences in saccade latencies are indeed thought to reflect oculomotor effort (Shadmehr et al., 2019; TINS). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies would predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main novel conclusion: effort drives saccade selection. There are several reasons why pupil size can be used as a more general marker of effort (see responses to R2), but ultimately, our conclusions do not hinge on the employed measure of effort per se. As stressed above in 1), we see no equally parsimonious explanation besides the cost account. Moreover, we predicted this relationship in our previous publication before running the currently reported experiments and analyses (Koevoet et al., 2023). That said, we are open to discuss further alternative options and would be looking forward to test these accounts in future work against each other – we are welcoming the reviewers’ (but also the reader’s) suggestions.

      We now discuss this in the manuscript as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost.

      Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      Second, we followed the reviewer’s recommendation in testing whether other oculomotor metrics would predict saccade selection. To this end, we conducted a linear regression across directions. We calculated pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AICbased backward model selection to determine the ‘best’ model model to determine which factor would predict saccade selection best. The best model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043; see Author response image 1) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency also predicts saccade preferences, pupil size remains a robust predictor of saccade selection. These findings demonstrate that minimizing oculomotor noise cannot fully explain the pattern of results.

      Author response image 1.

      The relationship between saccade latency (from the saccade planning task) and saccade preferences averaged across participants. Individual points reflect directions and shading represents bootstrapped 95% confidence intervals.

      We have added this argument into the manuscript, and discuss the analysis in the discussion. Details of the analysis have been added to the Supporting Information for transparency and further detail.

      “A control analysis ruled out that the correlation between pupil size and saccade preferences was driven by other oculomotor metrics such as saccade latency and landing precision (see Supporting Information).”

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      In addition to eye-movement-related anisotropies across the visual field, there are of course many studies reporting visual field anisotropies (see Himmelberg, Winawer & Carrasco, 2023, Trends in Neuroscience for a review). It would be interesting to understand how the authors think about visual field anisotropies in the context of their own study. Do they think that their results are (in)dependent on such visual field variations (see Greenwood et al., 2017, PNAS; Ohl, Kroell, & Rolfs, 2024, JEP:Gen for a similar discussion)?

      We agree that established visual field anisotropies are fascinating to be discussed in context of our own results. At the reviewer’s suggestion, we now expanded this discussion.

      The observed anisotropies in terms of saccade costs are likely related to established anisotropies in perception and early visual cortex. However, the exact way that these anisotropies may be linked remains elusive (i.e. what is cause, what is effect, are links causal?), and more research is necessary to understand how these are related.

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      We also added that the finding that more precise saccades are coupled with worse performance in a crowding task might be attributed to the increased effort associated with more precise saccades (Greenwood et al., 2017).

      “Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93].”

      Finally, the authors conclude that their results "suggests that the eye-movement system and other cognitive operations consume similar resources that are flexibly allocated among each other as cognitive demand changes. The authors should speculate what these similar resources could mean? What are the specific operations of the auditory task that overlap in terms of resources with the eye movement system?

      We agree that the nature of joint resources is an interesting question. Our previous discussion was likely too simplistic here (see also responses to R3). We here specifically refer to the cognitive resources that one can flexibly distribute between tasks.

      Our data do not directly speak to the question of what the shared resources between the auditory and oculomotor tasks are. Nevertheless, both tasks charge working memory as saccade targets are mandatorily encoded into working memory prior to saccade onset (Van der Stigchel & Hollingworth, 2018), and the counting task clearly engages working memory. This may indicate some domain-generality between visual and auditory working memory during natural viewing (see Nozari & Martin, 2024 for a recent review), but this remains speculative. Another possibility is that not the working memory encoding associated with saccades per se, but that the execution of overt motor actions itself also requires cognitive processing as suggested by Beatty (1982): “the organization of an overt motor act places additional demands on informationprocessing resources that are reflected in the task-evoked pupillary response”.

      We have added upon this in more detail in the results and discussion sections.

      “Besides the costs of increased neural activity when exerting more effort, effort should be considered costly for a second reason: Cognitive resources are limited. Therefore, any unnecessary resource expenditure reduces cognitive and behavioral flexibility [22, 31, 36, 116]. As a result, the brain needs to distribute resources between cognitive operations and the oculomotor system. We found evidence for the idea that such resource distribution is adaptive to the general level of cognitive demand and available resources: Increasing cognitive demand through an additional pri- mary auditory dual task led to a lower saccade frequency, and especially costly sac- cades were cut. In this case, it is important to consider that the auditory task was the primary task, which should cause participants to distribute resources from the ocu- lomotor system to the counting task. In other situations, more resources could be distributed to the oculomotor system instead, for example to discover new sources of reward [22, 136]. Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93]. Furthermore, it has been proposed that saccade costs are weighed against other cognitive operations such as using working memory [33, 143–146]. How would the resources between the oculomotor system and cognitive tasks (like the auditory counting task) be related? One possibility is that both consume from limited working memory resources [147, 148]. Saccades are thought to encode target objects in a mandatory fashion into (vi- sual) working memory [79], and the counting task requires participants to keep track of the auditory stream and maintain count of the instructed digit in working mem- ory. However, the exact nature of which resources overlap between tasks remain open for future investigation [also see 149]. Together, we propose that cognitive re- sources are flexibly (dis)allocated to and from the oculomotor system based on the current demands to establish an optimal balance between performance and cost minimization.”

      Reviewer #2 (Public Review):

      The authors attempt to establish presaccadic pupil size as an index of 'saccade effort' and propose this index as one new predictor of saccade target selection. They only partially achieved their aim: When choosing between two saccade directions, the less costly direction, according to preceding pupil size, is preferred. However, the claim that with increased cognitive demand participants would especially cut costly directions is not supported by the data. I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      Strengths:

      The paper is well-written, easy to understand, and nicely illustrated.

      The sample size seems appropriate, and the data were collected and analyzed using solid and validated methodology.

      Overall, I find the topic of investigating factors that drive saccade choices highly interesting and relevant.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      The authors obtain pupil size and saccade preference measures in two separate tasks. Relating these two measures is problematic because the computations that underly saccade preparation differ. In Experiment 1, the saccade is cued centrally, and has to be delayed until a "go-signal" is presented; In Experiment 2, an immediate saccade is executed to an exogenously cued peripheral target. The 'costs' in Experiment 1 (computing the saccade target location from a central cue; withholding the saccade) do not relate to Experiment 2. It is unfortunate, that measuring presaccadic pupil size directly in the comparatively more 'natural' Experiment 2 (where saccades did not have to be artificially withheld) does not seem to be possible. This questions the practical application of pupil size as an index of saccade effort

      This is an important point raised by the reviewer and we agree that a discussion on these points improves the manuscript. We reply in two parts: 1) Although the underlying computations during saccade preparation might differ, and are therefore unlikely to be fully similar (we agree), we can still predict saccade selection between (Saccade planning to Saccade preference) and within tasks (Visual search). 2) Pupil size is a sluggish physiological signal, but this is outweighed by the advantages of using pupil size as a general marker of effort, also in the context of visual selection compared with saccade latencies.

      (1) Are delayed saccades (cost task) and the much faster saccades (preference task) linked?

      As the reviewer notes the underlying ‘type’ of oculomotor program may differ between voluntarily delayed-saccades and those in the saccade preference task. There are, however, also considerable overlaps between the oculomotor programs as the directions and amplitudes are identical. Moreover, the different types of saccades have considerable overlap in their underlying neural circuitry. Nevertheless, the underlying oculomotor programs likely still differ in some regard. Even despite these differences, we were able to measure differences across directions in both tasks, and costs and preferences were negatively and highly correlated between tasks. The finding itself therefore indicates that the costs of saccades measured during the saccade planning task generalize to those in the saccade preference task. Note also that we predicted this finding and idea already in a previous publication before starting the present study (Koevoet et al., 2023).

      We now address this interesting point in the discussion as follows:

      “We observed that aOordable saccades were preferred over costly ones. This is especially remarkable given that the delayed saccades in the planning task likely differ in their oculomotor program from the immediate saccades in the preference task in some regard.”

      (2) Is pupil size a sensible measure of saccade effort?

      As the reviewer points out, the pupillary signal is indeed relatively sluggish and therefore relatively slow and more artifical tasks are preferred to quantify saccade costs. This does not preclude pupil size from being applied in more natural settings, as we demonstrate in the search experiments – but a lot of care has to be taken to control for many possible confounding factors and many trials will be needed.

      That said, as saccade latencies may also capture differences in oculomotor effort (Shadmehr et al., 2019) they are a possible alternative option to assess effort in some oculomotor tasks (see below on why saccade latencies do not provide evidence for an alternative to effort driving saccade selection, but converging evidence). Whilst we do maintain that pupil size is an established and versatile physiological marker of effort, saccade latencies provide converging evidence for our conclusion that effort drives saccade selection.

      As for the saccade preference task, we are not able to analyze the data in a similar manner as in the visual search task for two reasons. First, the number of saccades is much lower than in the natural search experiments. Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets. Even simple binary decisions go hand in hand with reliable pupil dilations as they require effort (e.g. de Gee et al., 2014).

      There are three major reasons why pupil size is a more versatile marker of saccade costs than saccade latencies (although as mentioned, latencies may constitute another valuable tool to study oculomotor effort). First, pupil size is able to quantify the cost of attentional shifts more generally, including covert attention as well as other effector systems such as head and hand movements. This circumvents the issue of different latencies of different effector systems and also allows to study attentional processes that are not associated with overt motor movements. Second, saccade latencies are difficult to interpret in natural viewing data, as fixation duration and saccade latencies are inherently confounded by one another. This makes it very difficult to separate oculomotor processes and the extraction of perceptual information from a fixated target. Thus, pupil size is a versatile marker of attentional costs in a variety of settings, and can measure costs that saccade latencies cannot (i.e. covert attention). Lastly, pupil size is highly established as a marker of effort which has been demonstrated across wide range of cognitive tasks and therefore not bound to eye movements alone (Bumke, 1911; Koevoet et al., 2024; Laeng et al., 2012; Loewenfeld, 1958; Mathôt, 2018; Robison & Unsworth, 2019; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018).

      We now discuss this as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      The authors claim that the observed direction-specific 'saccade costs' obtained in Experiment 1 "were not mediated by differences in saccade properties, such as duration, amplitude, peak velocity, and landing precision (Figure 1e,f)". Saccade latency, however, was not taken into account here but is discussed for Experiment 2.

      The final model that was used to test for the observed anisotropies in pupil size across directions indeed did not include saccade latencies as a predictor. However, we did consider saccade latencies as a potential predictor originally. As we performed AICbased backward model selection, however, this predictor was removed due to the marginal predictive contribution of saccade latency beyond other predictors explaining pupil size.

      For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (b \= 1.859e-03, t \= .138, p \= .889). The asymmetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (b \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (b \= 3.344, t \= 3.334, p \= .003).

      The apparent similarity of saccade latencies and pupil size, however, is striking. Previous work shows shorter latencies for cardinal than oblique saccades, and shorter latencies for horizontal and upward saccades than downward saccades - directly reflecting the pupil sizes obtained in Experiment 1 as well as in the authors' previous study (Koevoet et al., 2023, PsychScience).

      As the reviewer notes, there are substantial asymmetries across the visual field in saccade latencies. These assymetries in saccade latency could also predict saccade preferences. We will reply to this in three points: 1) even if saccade latency is a predictor of saccade preferences, this would not constitute as an alternative explanation to the conclusion of effort driving saccade selection, 2) saccade latencies show an up-down asymmetry but oblique-cardinal effects in latency may not be generalizable across saccade tasks, 3) pupil size remains a robust predictor of saccade preferences even when saccade latencies are considered as a predictor of saccade preferences.

      (1) We want to first stress that saccade latencies are thought to reflect oculomotor effort (Shadmehr et al., 2019). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main conclusion – effort predicting saccade selection (rather than pupil size predicting saccade selection per se).

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      (2) We first tested anisotropies in saccade latency in the saccade planning task (Wilkinson notation: latency ~ obliqueness + updownness + leftrightness + saccade duration + saccade amplitude + saccade velocity + landing error + (1+obliqueness + updownness|participant)). We found upward latencies to be shorter than downward saccade latencies (b \= -.535, t \= 3.421, p \= .003). In addition, oblique saccades showed shorter latencies than cardinal saccades (b \= -1.083, t \= 3.096, p \= .002) – the opposite of what previous work has demonstrated.

      We then also tested these latency anisotropies in another dataset wherein participants (n \= 20) saccaded toward a single peripheral target as fast as possible (Koevoet et al., submitted; same amplitude and eccentricity as in the present manuscript). There we did not find a difference in saccade latency between cardinal and oblique targets, but we did observe shorter latencies for up- compared with downward saccades. We are therefore not sure in which situations oblique saccades do, or do not differ from cardinal saccades in terms of latency, and even in which direction the effect occurs.

      In contrast, we have now demonstrated a larger pupil size prior to oblique compared with cardinal saccades in two experiments. This indicates that pupil size may be a more reliable and generalizable marker of saccade costs than saccade latency. However, this remains to be investigated further.

      (3) To gain further insights into which oculomotor metrics would predict saccade selection, we conducted a linear regression across directions. We created pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AIC-based model selection to determine the ‘best’ model to determine which factor would predict saccade selection best. The selected model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency predicts saccade preferences, pupil size remains a robust predictor of saccade selection.

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      The authors state that "from a costs-perspective, it should be eOicient to not only adjust the number of saccades (non-specific), but also by cutting especially expensive directions the most (specific)". However, saccade targets should be selected based on the maximum expected information gain. If cognitive load increases (due to an additional task) an effective strategy seems to be to perform less - but still meaningful - saccades. How would it help natural orienting to selectively cut saccades in certain (effortful) directions? Choosing saccade targets based on comfort, over information gain, would result in overall more saccades to be made - which is non-optimal, also from a cost perspective.

      We thank the reviewer for this comment. Although we do not fully agree, the logic is quite close to our rationale and it is worth adding a point of discussion here. A vital part of the current interpretation is the instruction given to participants. In our second natural visual search task, participants were performing a dual task, where the auditory task was the primary task, whilst the search task was secondary. Therefore, participants are likely to adjust their resources to optimize performance on the primary task – at the expense of the secondary task. Therefore, less resources are made available and used to searching in the dual than in the single task, because these resources are needed for the auditory task. Cutting expensive directions does not help search in terms of search performance, but it does reduce the cost of search, so that more resources are available for the prioritized auditory task. Also note that the search task was rather difficult – participants did it, but it was tough (see the original description of the dataset for more details), which provides another reason to go full in on the auditory task at expense of the visual task. This, however, opens up a nice point of discussion: If one would emphasize the importance of search (maybe with punishment or reward), we would indeed expect participants to perform whichever eye movements are getting them to their goal fastest – thus reducing the relative influence of costs on saccade behavior. This remains to be tested however - we are working on this and are looking forward to discussing such findings in the future.

      Together, we propose that there is a trade-off between distributing resources either towards cognitive tasks or the oculomotor system (also see Ballard et al., 1995; Van der Stigchel, 2020). How these resources are distributed depends highly on the current task demands (also see Sahakian et al., 2023). This allows for adaptive behavior in a wide range of contexts.

      We now added these considerations to the manuscript as follows (also see our previous replies):

      “Do cognitive operations and eye movements consume from a similar pool of resources [44]? If so, increasing cognitive demand for non-oculomotor processes should result in decreasing available resources for the oculomotor system. In line with this idea, previous work indeed shows altered eye-movement behavior un- der effort as induced by dual tasks, for example by making less saccades under increased cognitive demand [62–64]. We therefore investigated whether less sac- cades were made as soon as participants had to count the occurrence of a specific digit in the auditory number stream in comparison to ignoring the stream (in Exp. 2; Figure 4a). Participants were instructed to prioritize the auditory digit-counting task over finding the visual search target. Therefore, resources should be shifted from the oculomotor system to the primary auditory counting task. The additional cognitive demand of the dual task indeed led to a decreased saccade frequency (t(24) = 7.224, p < .001, Cohen’s d = 1.445; Figure 4h).”

      I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      The reviewer’s point is taken from the initial comment, which we will address here. First, we’d like to point out that is it not established that saccade costs in different directions are always the same. Instead, it is possible that saccade costs could be different in natural viewing compared with our delayed-saccade task. Therefore, we used pupil size during natural viewing for the search experiments. Second, the reviewer correctly notes that oblique saccades are hardly cut when under additional cognitive demand. However, participants already hardly execute oblique saccades when not confronted with the additional auditory task (Figure 4b, d), making it difficult to reduce those further (i.e. floor effect). Participants chose to cut vertical saccades, possibly because these are more costly than horizontal saccades.

      We incorporated these point in our manuscript as follows:

      “To test this, we analyzed data from two existing datasets [63] wherein participants (total n = 41) searched for small targets (’Z’ or ’H’) in natural scenes (Figure 4a; [64]). Again, we tested whether pupil size prior to saccades negatively linked with saccade preferences across directions. Because saccade costs and preferences across directions could differ for different situations (i.e. natural viewing vs. saccade preference task), but should always be negatively linked, we established both cost and preferences independently in each dataset.”

      “We calculated a saccade-adjustment map (Figure 4g) by subtracting the saccade preference map in the single task (Figure 4f) from the dual task map (Fig- ure 4d). Participants seemingly cut vertical saccades in particular, and made more saccades to the top right direction. This pattern may have emerged as vertical saccades are more costly than horizontal saccades (also see Figure 1d). Oblique saccades may not have been cut because there were very little oblique saccades in the single condition to begin with (Figure 4d), making it difficult to observe a further reduction of such saccades under additional cognitive demand (i.e. a floor effect).”

      Overall, I am not sure what practical relevance the relation between pupil size (measured in a separate experiment) and saccade decisions has for eye movement research/vision science. Pupil size does not seem to be a straightforward measure of saccade effort. Saccade latency, instead, can be easily extracted in any eye movement experiment (no need to conduct a separate, delayed saccade task to measure pupil dilation), and seems to be an equally good index.

      There are two points here.

      (1) What is the practical relevance of a link between effort and saccade selection for eyemovement research and vision science?

      We see plenty – think of changing eye movement patterns under effort (be it smooth pursuits, saccade rates, distributions of gaze positions to images etc.) which have substantial implications for human factors research, but also neuropsychology. With a cost account, one may predict (rather than just observe) how eye movement changes as soon as resources are reduced/ non-visual demand increases. With a cost account, we can explain such effects (e.g. lower saccade rates under effort, cardinal bias, perhaps also central bias) parsimoniously that cannot be explained by what is so far referred to as the three core drivers of eye movement behavior (saliency, selection history, goals, e.g., Awh et al., 2012). Conversely, one must wonder why eye-movement research/vision science simply accepts/dismisses these phenomena as such, without seeking overarching explanations.

      (2) What is the usefulness of using pupil size to measure effort?

      We hope that our replies to the comments above illustrate why pupil size is a sensible, robust and versatile marker of attentional costs. We briefly summarize our most important points here.

      - Pupil size is an established measure of effort irrespective of context, as demonstrated by hundreds of original works (e.g. working memory load, multiple object tracking, individual differences in cognitive ability). This allows pupil size to be a versatile marker of the effort, and therefore costs, of non-saccadic attentional shifts such as covert attention or those realized by other effector systems (i.e. head or hand movements).

      - Our new analysis indicates that pupil size remains a strong and robust predictor of saccade preference, even when considering saccade latency.

      - Pupil size allows to study saccade costs in natural viewing. In contrast, saccade latencies are difficult to assess in natural viewing as fixation durations and saccade latencies are intrinsically linked and very difficult to disentangle.

      - Note however, that we think that it is interesting and useful so study effects of effort/cost on eye movement behavior. Whichever index is used to do so, we see plenty potential in this line of research, this paper is a starting point to do so.

      Reviewer #3 (Public Review):

      This manuscript extends previous research by this group by relating variation in pupil size to the endpoints of saccades produced by human participants under various conditions including trial-based choices between pairs of spots and search for small items in natural scenes. Based on the premise that pupil size is a reliable proxy of "effort", the authors conclude that less costly saccade targets are preferred. Finding that this preference was influenced by the performance of a non-visual, attentiondemanding task, the authors conclude that a common source of effort animates gaze behavior and other cognitive tasks.

      Strengths:

      Strengths of the manuscript include the novelty of the approach, the clarity of the findings, and the community interest in the problem.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      Enthusiasm for this manuscript is reduced by the following weaknesses:

      (1) A relationship between pupil size and saccade production seems clear based on the authors' previous and current work. What is at issue is the interpretation. The authors test one, preferred hypothesis, and the narrative of the manuscript treats the hypothesis that pupil size is a proxy of effort as beyond dispute or question. The stated elements of their argument seem to go like this:

      PROPOSITION 1: Pupil size varies systematically across task conditions, being larger when tasks are more demanding.

      PROPOSITION 2: Pupil size is related to the locus coeruleus.

      PROPOSITION 3: The locus coeruleus NE system modulates neural activity and interactions.

      CONCLUSION: Therefore, pupil size indexes the resource demand or "effort" associated with task conditions.

      How the conclusion follows from the propositions is not self-evident. Proposition 3, in particular, fails to establish the link that is supposed to lead to the conclusion.

      We inadvertently laid out this rationale as described above, and we thank the reviewer for pointing out this initial suboptimal structure of argumentation. The notion that the link between pupil size and effort is established in the literature because of its neural underpinnings is inaccurate. Instead, the tight link between effort and pupil size is established based on covariations of pupil diameter and cognition across a wide variety of tasks and domains. In line with this, we now introduce this tight link predominantly based on the relationships between pupil size and cognition instead of focusing on putative neural correlates of this relationship.

      As reviewed previously (Beatty, 1982; Bumke, 1911; Kahneman, 1973; Kahneman & Beatty, 1966; Koevoet et al., 2024; Laeng et al., 2012; Mathôt, 2018; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018), any increase in effort is consistently associated with an increase in pupil size. For instance, the pupil dilates when increasing load in working memory or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials (Alnæs et al., 2014; Koevoet et al., 2024; Robison & Brewer, 2020; Robison & Unsworth, 2019; Unsworth & Miller, 2021). This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements (Koevoet et al., 2023; Richer & Beatty, 1985). The link between pupil size and effort has thus been firmly established for a long time, irrespective of the neural correlates of these effort-linked pupil size changes.

      We again thank the reviewer for spotting this logical mistake, and now revised the paragraph where we introduce pupil size as an established marker of effort as follows:

      “We recently demonstrated that the effort of saccade planning can be measured with pupil size, which allows for a physiological quantification of saccade costs as long as low-level visual factors are controlled for [33]. Pupil size is an established marker of effort [36–44]. For instance, loading more in working memory or tracking more objects results in stronger pupil dilation [44–52]. Pupil size not only reflects cognitive (or mental) effort but also the effort of planning and executing movements [37, 53, 54]. We leveraged this to demonstrate that saccade costs can be captured with pupil size, and are higher for oblique compared with cardinal directions [33]. Here, we addressed whether saccade costs predict where to saccade.”

      We now mention the neural correlates of pupil size only in the discussion. Where we took care to also mention roles for other neurotransmitter systems:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117]. Effort-linked pupil dilations are thought to be, at least in part, driven by activity in the brainstem locus coeruleus (LC) [40, 118–120] [but other neurotransmitters also affect pupil size, e.g. 121, 122]. Activity in LC with its widespread connections throughout the brain [120, 123–127] is considered to be crucial for the communication within and between neu- ral populations and modulates global neural gain [128–132]. Neural firing is costly [22, 133], and therefore LC activity and pupil size are (neuro)physiologically plausible markers of cost [40]. Tentative evidence even suggests that continued exertion of effort (accompanied by altered pupil dilation) is linked to the accumulation of glutamate in the lateral prefrontal cortex [134], which may be a metabolic marker of cost [also see 116, 134, 135]. “

      (2) The authors test one, preferred hypothesis and do not consider plausible alternatives. Is "cost" the only conceivable hypothesis? The hypothesis is framed in very narrow terms. For example, the cholinergic and dopamine systems that have been featured in other researchers' consideration of pupil size modulation are missing here. Thus, because the authors do not rule out plausible alternative hypotheses, the logical structure of this manuscript can be criticized as committing the fallacy of aOirming the consequent.

      As we have noted in the response to the reviewer’s first point, we did not motivate our use of pupil size as an index of effort clearly enough. For the current purpose, the neural correlates of pupil size are less relevant than the cognitive correlates (see previous point). We reiterate that the neuromodulatory underpinnings of the observed pupil size effects (which indeed possibly include effects of the cholinergic, dopaminergic and serotonergic systems), while interesting for the discussion on the neural origin of effects, are not crucial to our conclusion. We hope the new rationale (without focusing too much on the (irrelevant) exact neural underpinnings) convinces the reviewer and reader.

      Our changes to the manuscript are shown in our reply to the previous comment.

      The reviewer notes that other plausible alternative hypotheses could explain the currently reported results. However, we did not find a more parsimonuous explanation for our data than ‘Effort Drives Saccade Selection’. Effort explains why participants prefer saccading toward specific directions in (1) highly controlled and (2) more natural settings. Note that we also predicted this effect previously (Koevoet et al., 2023). Moreover, this account explains (3) why participants make less saccades under additional cognitive demand, and (4) why especially costly saccades are reduced under additional cognitive demand. We are very open to the reviewer presenting other possible interpretations of our data so these can be discussed to be put to test in future work.

      (3) The authors cite particular publications in support of the claim that saccade selection is influenced by an assessment of effort. Given the extensive work by others on this general topic, the skeptic could regard the theoretical perspective of this manuscript as too impoverished. Their work may be enhanced by consideration of other work on this general topic, e.g, (i) Shenhav A, Botvinick MM, Cohen JD. (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013 Jul 24;79(2):217-40. (ii) Müller T, Husain M, Apps MAJ. (2022) Preferences for seeking effort or reward information bias the willingness to work. Sci Rep. 2022 Nov 14;12(1):19486. (iii) Bustamante LA, Oshinowo T, Lee JR, Tong E, Burton AR, Shenhav A, Cohen JD, Daw ND. (2023) Effort Foraging Task reveals a positive correlation between individual differences in the cost of cognitive and physical effort in humans. Proc Natl Acad Sci U S A. 2023 Dec 12;120(50):e2221510120.

      We thank the reviewer for pointing us toward this literature. These papers are indeed relevant for our manuscript, and we have now incorporated them. Specifically, we now discuss how the costs of effort are weighed in relation to possible rewards during decision-making. We have also incorporated work that has investigated how the biomechanical costs of arm movements contribute to action selection.

      “Our findings are in line with established effort-based models that assume costs to be weighed against rewards during decision-making [102–107]. In such studies, reward and cognitive/physical effort are often parametrically manipulated to as- sess how much effort participants are willing to exert to acquire a given (monetary) reward [e.g. 108, 109]. Whereas this line of work manipulated the extrinsic costs and/or rewards of decision options (e.g. perceptual consequences of saccades [110, 111] or consequences associated with decision options), we here focus on the intrin- sic costs of the movement itself (in terms of cognitive and physical effort). Relatedly, the intrinsic costs of arm movements are also considered during decision-making: biomechanically aOordable movements are generally preferred over more costly ones [26–28]. We here extend these findings in two important ways. First, until now, the intrinsic costs of saccades and other movements have been inferred from gaze behavior itself or by using computational modelling [23, 25–28, 34, 35, 112]. In con- trast, we directly measured cost physiologically using pupil size. Secondly, we show that physiologically measured saccade costs predict where saccades are directed in a controlled binary preference task, and even during natural viewing. Our findings could unite state-of-the-art computational models [e.g. 23, 25, 34, 35, 113] with physiological data, to directly test the role of saccade costs and ultimately further our understanding of saccade selection.”

      (4) What is the source of cost in saccade production? What is the currency of that cost? The authors state (page 13), "... oblique saccades require more complex oculomotor programs than horizontal eye movements because more neuronal populations in the superior colliculus (SC) and frontal eye fields (FEF) [76-79], and more muscles are necessary to plan and execute the saccade [76, 80, 81]." This statement raises questions and concerns. First, the basis of the claim that more neurons in FEF and SC are needed for oblique versus cardinal saccades is not established in any of the publications cited. Second, the authors may be referring to the fact that oblique saccades require coordination between pontine and midbrain circuits. This must be clarified. Second, the cost is unlikely to originate in extraocular muscle fatigue because the muscle fibers are so different from skeletal muscles, being fundamentally less fatigable. Third, if net muscle contraction is the cost, then why are upward saccades, which require the eyelid, not more expensive than downward? Thus, just how some saccades are more effortful than others is not clear.

      Unfortunately, our current data do not allow for the specification of what the source is of differences in saccade production, nor what the currency is. We want to explicitly state that while pupil size is a sensitive measure of saccade costs, pupil size cannot directly inform what underlying mechanisms are causing differences in saccade costs across conditions (e.g. directions). Nevertheless, we do speculate about these issues because they are important to consider. We thank the reviewer for pointing out the shortcomings in our initial speculations.

      Broadly, we agree with the reviewer that a neural source of differences in costs between different types of saccades is more likely than a purely muscular account (also see Koevoet et al., 2023). Furthermore, we think that the observed differences in saccade costs for oblique vs. cardinal and up vs. down could be due to different underlying mechanisms. While we caution against overinterpreting single directions, tentative evidence for this may also be drawn by the different time course of effects for up/down versus cardinal/oblique, Figure 1c.

      Below we speculate about why some specific saccade directions may be more costly than others:

      Why would oblique saccades be more costly than cardinal saccades? We thank the reviewer for pointing out that oblique saccades additionally require coordination between pontine and midbrain circuits (Curthoys et al., 1984; King & Fuchs, 1979; Sparks, 2002). This point warrants more revised discussion compared to our initial version. We have incorporated this as follows:

      “The complexity of an oculomotor program is arguably shaped by its neural underpinnings. For example, oblique but not cardinal saccades require communication between pontine and midbrain circuits [73–75]. Such differences in neural complexity may underlie the additional costs of oblique compared with cardinal saccades. Besides saccade direction, other properties of the ensuing saccade such as its speed, distance, curvature, and accuracy may contribute to a saccade’s total cost [22, 33, 53, 76, 77] but this remains to be investigated directly.”

      Why would downward saccades be more costly than upward saccades? As the reviewer points out: from a net muscular contraction account of cost, one would expect the opposite pattern due to the movement of the eyelid. Instead, we speculate that our findings may be associated with the well-established anisotropy in early visual cortex along the vertical meridian. Specifically, the upper vertical meridian is represented at substantially less detail than the lower vertical meridian (Himmelberg et al., 2023; Silva et al., 2018). Prior to a saccade, attention is deployed towards the intended saccadic endpoint (Deubel & Schneider, 1996; Kowler et al., 1995). Attention tunes neurons to preferentially process the attended location over non-attended locations. Due to the fact that the lower visual field is represented at higher detail than the upper visual field, attention may tune neuronal responses differently when preparing up- compared with downward saccades (Hanning et al., 2024; Himmelberg et al., 2023). Thus, it may be more costly to prepare down- compared with upward saccades. This proposition, however, does not account for the lower costs associated horizontal compared with up- and downward saccades as the horizontal meridian is represented at a higher acuity than the vertical merdian. This makes it unlikely that this explains the pattern of results completely. Again, at this point we can only speculate why costs differ, yet we demonstrate that these differences in cost are decisive for oculomotor behavior. We now explicitly state the speculative nature of these ideas that would all need to be tested directly.

      We have updated our discussion of this issue as follows:

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      (5) The authors do not consider observations about variation in pupil size that seem to be incompatible with the preferred hypothesis. For example, at least two studies have described systematically larger pupil dilation associated with faster relative to accurate performance in manual and saccade tasks (e.g., Naber M, Murphy P. Pupillometric investigation into the speed-accuracy trade-off in a visuo-motor aiming task. Psychophysiology. 2020 Mar;57(3):e13499; Reppert TR, Heitz RP, Schall JD. Neural mechanisms for executive control of speed-accuracy trade-off. Cell Rep. 2023 Nov 28;42(11):113422). Is the fast relative to the accurate option necessarily more costly?

      We thank the reviewer for this interesting point that we will answer in two ways. First, we discuss the main point: the link between pupil size, effort, and cost. Second, we discuss the findings described specifically in these two papers and how we interpret these from a pupillometric account.

      First, one may generally ask whether 1) any effort results in pupil dilation, 2) whether any effort is costly, and 3) whether this means that pupil dilation always reflects effort and cost respectively. Indeed, it has been argued repeatedly, prominently, and independently (e.g., Bumke, 1911; Mathôt, 2018) that any change in effort (no matter the specific origin) is associated with an evoked pupil dilation. Effort, in turn, is consistently and widely experienced as aversive, both across tasks and cultures (David et al., 2024). Effort minimization may therefore be seen as an universal law of human cognition and behavior with effort as a to-be minimized cost (Shadmehr et al., 2019; Hull 1943, Tsai 1932). However, this does not imply that any pupil dilation necessarily reflects effort or that, as a consequence thereof, any pupil dilation is always signaling cost. For instance, the pupil dark response, the pupil far response and changes in baseline pupil size are not associated with effort. Baseline and task-evoked pupil dilation responses have to be interpreted differently (see below), moreover, the pupil also changes (and dilates) due to other factors (see Strauch et al., 2022; Mathôt, 2018, Bumke 1911, Loewenfeld, 1999 for reviews).

      Second, as for Naber & Murphy (2020) & Reppert at al. (2023) specifically: Both Reppert et al. (2023) and Naber & Murphy (2020) indeed demonstrate a larger baseline pupil size when participants made faster, less accurate responses. However, baseline pupil size is not an index of effort per-se, but task-evoked pupil dilation responses are (as studied in the present manuscript) (Strauch et al., 2022). For work on differences between baseline pupil diameter and task-evoked pupil responses, and their respective links with exploration and exploitation please see Jepma & Nieuwenhuis (2011). Indeed, the link between effort and larger pupil size holds for task evoked responses, but not baseline pupil size per se (also see Koevoet et al., 2023).

      Still, Naber (third author of the current paper) & Murphy (2020) also demonstrated larger task-evoked pupil dilation responses when participants were instructed to make faster, less accurate responses compared with making accurate and relatively slow responses. However, this difference in task-evoked response gains significance only after the onset of the movement itself, and peaks substantially later than response offset. Whilst pupil dilation may be sluggish, it isn’t extremely sluggish either. As feedback to the performance of the participant was displayed 1.25s after performing the movement and clicking (taking about 630ms), we deem it possible that this effect may in part result from appraising the feedback to the participant rather than the speed of the response itself (in fact, Naber and Murphy also discuss this option). In addition to not measuring saccades but mouse movements, it is therefore possible that the observed evoked pupil effects in Naber & Murphy (2020) are not purely linked to motor preparation and execution per se. Therefore, future work that aims to investigate the costs of movements should isolate the effects of feedback and other potential factors that may drive changes in pupil size. This will help clarify whether fast or more accurate movements could be linked to the underlying costs of the movements.

      Relatedly, we do not find evidence that pupil size during saccade planning predicts the onset latency of the ensuing saccade (please refer to our second response to Reviewer 2 for a detailed discussion).

      Together, we therefore do not see the results from Reppert et al. (2023) and Naber & Murphy (2020) to be at odds with our interpretation of evoked pupil size reflecting effort and cost in the context of planning saccades.

      We think that these are considerations important to the reader, which is why we now added them to the discussion as follows:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117].”

      (6) The authors draw conclusions based on trends across participants, but they should be more transparent about variation that contradicts these trends. In Figures 3 and 4 we see many participants producing behavior unlike most others. Who are they? Why do they look so different? Is it just noise, or do different participants adopt different policies?

      We disagree with the transparency point of the reviewer. Note that we deviated from the norm here by being more transparent than common: we added individual data points and relationships rather than showing pooled effects across participants with error bars alone (see Figures 2c, 3b,c, 4c,e,f).

      Moreover, our effects are consistent and stable across participants and are highly significant. To illustrate, for the classification analysis based on cost (Figure 2E) 16/20 participants showed an effect. As for the natural viewing experiments (total > 250,000 fixations), we also find that a majority of participants show the observed effects: Experiment 1: 15/16 participants; Experiment 2: 16/25 participants; Experiment 2 – adjustment: 22/25 participants.

      We fully agree that it’s interesting to understand where interindividual variation may originate from. We currently have too little data to allow robust analyses across individuals and zooming in on individual differences in cost maps, preference maps, or potential personalized strategies of saccade selection. That said, future work could study this further. We would recommend to hereby reduce the number of directions to gain more pupil size data per direction and therefore cleaner signals that may be more informative on the individual level. With such stronger signals, studying (differences in) links on an individual level may be feasible and would be interesting to consider – and will be a future direction in our own work too. Nonetheless, we again stress that the reported effects are robust and consistent across participants, and that interindividual differences are therefore not extensive. Moreover, our results from four experiments consistently support our conclusion that effort drives saccade selection.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      - Based on the public review, I would recommend that the authors carefully review and correct the manuscript with regard to the causal conclusions. The study is largely correlational (i.e. the pupil was only observed, not manipulated) and therefore does not allow causal conclusions to be drawn about the relationship between pupil size and saccade selection. These causal conclusions become even more confusing when pupil size is equated with effort and saccade cost. As a consequence, an actual correlation between pupil size and saccade selection has led to the title that effort drives saccade selection. It would also be helpful for the reader to summarize in an additional section of the discussion what they consider to be a causal or correlational link based on their results.

      We agree with the reviewer, and we have indeed included more explicitly which findings are correlational and which causal in detail now. As outlined before we do not see a more parimanious explanation for our findings than our title, but we fully agree that the paper benefits from making the correlational/causal nature of evidence for this idea explicitly transparent.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      - Can the authors please elaborate in more detail on how they transformed the predictors of their linear mixed model for the visualization in Figure 1f? It is difficult to see how the coeOicients in the table and the figure match.

      We used the ‘effectsize’ package to provide effect sizes of for each predictor of the linear mixed-effects model (https://cran.r-project.org/web/packages/effectsize/index.html). We report absolute effect sizes to make it visually easier to compare different predictors. These details have now been included in the Methods section to be more transparent about how these effect sizes were computed.

      “Absolute effect sizes (i.e. r) and their corresponding 95% confidence intervals for the linear mixed-effects models were calculated using t and df values with the ’effectsize’ package (v0.8.8) in R.”

      - Could the authors please explain in more detail why they think that a trial-by-trial analysis in the free choice task adds something new to their conclusions? In fact, a trialby-trial analysis somehow suggests that the pupil size data would enter the analysis at a single trial level. If I understand correctly, the pupil size data come from their initial mapping task. So there is only one mean pupil size for a given participant and direction that goes into their analysis to predict free choice in a single trial. If this is the case, I don't see the point of doing this additional analysis given the results shown in Figure 2c.

      The reviewer understands correctly that pupil size data is taken from the initial mapping task. We then used these mean values to predict which saccade target would be selected on a trial-by-trial basis. While showing the same conceptual result as the correlation analysis, we opted to include this analysis to show the robustness of the results across individuals. Therefore we have chosen to keep the analysis in the manuscript but now write more clearly that this shows the same conceptual finding as the correlation analysis.

      “As another test of the robustness of the effect, we analyzed whether saccade costs predicted saccade selection on a trial-by-trial basis. To this end, we first determined the more aOordable option for each trial using the established saccade cost map (Figure 1d). We predicted that participants would select the more aOordable option. Complementing the above analyses, the more aOordable option was chosen above chance level across participants (M = 56.64%, 95%-CI = [52.75%-60.52%], one-sample t-test against 50%: t(19) = 3.26, p = .004, Cohen’s d = .729; Figure 2e). Together, these analyses established that saccade costs robustly predict saccade preferences.”

      Reviewer #2 (Recommendations For The Authors):

      The authors report that "Whenever the difference in pupil size between the two options was larger, saccades curved away more from the non-selected option (β = .004, SE = .001, t = 4.448, p < .001; Figure 3b), and their latencies slowed (β = .050, SE = .013, t = 4.323, p < .001; Figure 3c)". I suspect this effect might not be driven by the difference but by a correlation between pupil size and latency.

      The authors correlate differences in pupil size (Exp1) with saccade latencies (Exp2), I recommend correlating pupil size with the latency directly, in either task. This would show if it is actually the difference between choices or simply the pupil size of the respective individual option that is linked to latency/effort. Same for curvature.

      The reviewer raises a good point. Please see the previous analyses concerning the possible correlations between pupil size and saccade latency, and how they jointly predict saccade selection.

      Our data show that saccade curvature and latencies are linked with the difference in pupil size between the selected and non-selected options. Are these effects driven by a difference in pupil size or by the pupil size associated with the chosen option?

      To assess this, we conducted two linear mixed-effects models. We predicted saccade curvature and latency using pupil size (from the planning task) of the selected and nonselected options while controlling for the chosen direction (Wilkinson notation: saccade curvature/latency ~ selected pupil size + non-selected pupil size + obliqueness + vertical + horizontal + (1+ selected pupil size + non-selected pupil size|participant). We found that saccades curved away more from costlier the non-selected targets (β \=1.534, t \= 8.151, p < .001), and saccades curved away from the non-selected target less when the selected target was cheaper (β \=-2.571, t \= -6.602, p < .001). As the costs of the selected and non-selected show opposite effects on saccade curvature, this indicates that the difference between the two options drives oculomotor conflict.

      As for saccade latencies, we found saccade onsets to slow when the cost of the selected target was higher (b \= .068, t \= 2.844, p \= .004). In contrast, saccade latencies were not significantly affected by the cost of the non-selected target (β \= -.018, t \= 1.457, p \= .145), although numerically the effect was in the opposite direction. This shows that latencies were primarily driven by the cost of the selected target but a difference account cannot be fully ruled out.

      Together, these analyses demonstrate that the difference in costs between two alternatives reliably affects oculomotor conflict as indicated by the curvature analysis. However, saccade latencies are predominantly affected by the cost of the selected target – even when controlling for the obliqueness, updownness and leftrightness of the ensuing saccade. We have added these analyses here for completeness, but because the findings seem inconclusive for saccade latency we have chosen to not include these analyses in the current paper. We are open to including these analyses in the supplementary materials if the reviewer and/or editor would like us to, but have chosen not to do so due to conciseness and to keep the paper focused.

      I was wondering why the authors haven't analyzed the pupil size in Experiment 2. If the pupil size can be assessed during a free viewing task (Experiment 3), shouldn't it be possible to also evaluate it in the saccade choice task?

      We did not analyze the pupil size data from the saccade preference task for two reasons. First, the number of saccades is much lower than in the natural search experiments (~14.000 vs. ~250.000). Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets (de Gee et al., 2014), and has the possibility of two oculomotor programs being realized instead of only a single one (Van der Stigchel, 2010).

      Discussion: "due to stronger presaccadic benefits for upward compared with downward saccades [93,94]". I think this should be the other way around.

      We thank the reviewer for pointing this out. We have corrected our mistake in the revised manuscript.

      Saccade latencies differ around the visual field; to account for that, results / pupil size should be (additionally) evaluated relative to saccade onset (rather than cue offset). It is interesting that latencies were not accounted for here (Exp1), since they are considered for Exp2 (where they correlate with a pupil size difference). I suspect that latencies not only correlate with the difference in pupil size, but directly with pupil size itself.

      We agree with the reviewer that locking the pupil size signal to saccade onset instead of cue offset may be informative. We included an analysis in the supporting information that investigates this (see Figure S1). The results of the analysis were conceptually identical.

      The reviewer writes that latencies were not accounted for in Experiment 1. Although saccade latency was not included in the final model reported in the paper, it was considered during AIC-based backward model selection. As saccade latency did not predict meaningful variance in pupil size, it was ultimately not included in the analysis as a predictor. For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (β \= 1.859e-03, t \= .138, p \= .889). The assymetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (β \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (β \= 3.344, t \= 3.334, p \= .003).

      In addition, we have included a new analysis in the supporting information that directly addresses this issue. We will reiterate the main results here:

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      We have also added this point in our discussion:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      References

      Alnæs, D., Sneve, M. H., Espeseth, T., Endestad, T., van de Pavert, S. H. P., & Laeng, B. (2014). Pupil size signals mental eFort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. Journal of Vision, 14(4), 1. https://doi.org/10.1167/14.4.1

      Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16(8), 437–443. https://doi.org/10.1016/j.tics.2012.06.010

      Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66

      Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2), 276–292. https://doi.org/10.1037/0033-2909.91.2.276

      Bumke, O. (1911). Die Pupillenstörungen bei Geistes-und Nervenkrankheiten (2nd ed.). Fischer.

      Curthoys, I. S., Markham, C. H., & Furuya, N. (1984). Direct projection of pause neurons to nystagmusrelated excitatory burst neurons in the cat pontine reticular formation. Experimental Neurology, 83(2), 414–422. https://doi.org/10.1016/S0014-4886(84)90109-2

      David, L., Vassena, E., & Bijleveld, E. (2024). The unpleasantness of thinking: A meta-analytic review of the association between mental eFort and negative aFect. Psychological Bulletin, 150(9), 1070–1093. https://doi.org/10.1037/bul0000443

      de Gee, J. W., Knapen, T., & Donner, T. H. (2014). Decision-related pupil dilation reflects upcoming choice and individual bias. Proceedings of the National Academy of Sciences, 111(5), E618–E625. https://doi.org/10.1073/pnas.1317557111

      Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36(12), 1827–1837. https://doi.org/10.1016/0042-6989(95)00294-4

      Greenwood, J. A., Szinte, M., Sayim, B., & Cavanagh, P. (2017). Variations in crowding, saccadic precision, and spatial localization reveal the shared topology of spatial vision. Proceedings of the National Academy of Sciences, 114(17), E3573–E3582. https://doi.org/10.1073/pnas.1615504114

      Hanning, N. M., Himmelberg, M. M., & Carrasco, M. (2024). Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. Journal of Neuroscience, 44(12). https://doi.org/10.1523/JNEUROSCI.1023-23.2023

      Himmelberg, M. M., Winawer, J., & Carrasco, M. (2023). Polar angle asymmetries in visual perception and neural architecture. Trends in Neurosciences, 46(6), 445–458. https://doi.org/10.1016/j.tins.2023.03.006

      Jepma, M., & Nieuwenhuis, S. (2011). Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-oF: Evidence for the Adaptive Gain Theory. Journal of Cognitive Neuroscience, 23(7), 1587– 1596. https://doi.org/10.1162/jocn.2010.21548

      Kahneman, D. (1973). Attention and Effort. Prentice-Hall.

      Kahneman, D., & Beatty, J. (1966). Pupil diameter and load on memory. Science (New York, N.Y.), 154(3756), 1583–1585. https://doi.org/10.1126/science.154.3756.1583

      King, W. M., & Fuchs, A. F. (1979). Reticular control of vertical saccadic eye movements by mesencephalic burst neurons. Journal of Neurophysiology, 42(3), 861–876. https://doi.org/10.1152/jn.1979.42.3.861

      Koevoet, D., Strauch, C., Naber, M., & Van der Stigchel, S. (2023). The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychological Science, 34(8), 887–898. https://doi.org/10.1177/09567976231179378

      Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S., & Naber, M. (2024). Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cognitive Science, e1668. https://doi.org/10.1002/wcs.1668

      Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35(13), 1897–1916. https://doi.org/10.1016/0042-6989(94)00279-U

      Laeng, B., Sirois, S., & Gredebäck, G. (2012). Pupillometry: A Window to the Preconscious? Perspectives on Psychological Science, 7(1), 18–27. https://doi.org/10.1177/1745691611427305

      Loewenfeld, I. E. (1958). Mechanisms of reflex dilatation of the pupil. Documenta Ophthalmologica, 12(1), 185–448. https://doi.org/10.1007/BF00913471

      Mathôt, S. (2018). Pupillometry: Psychology, Physiology, and Function. Journal of Cognition, 1(1), 16. https://doi.org/10.5334/joc.18

      Naber, M., & Murphy, P. (2020). Pupillometric investigation into the speed-accuracy trade-oF in a visuomotor aiming task. Psychophysiology, 57(3), e13499. https://doi.org/10.1111/psyp.13499

      Nozari, N., & Martin, R. C. (2024). Is working memory domain-general or domain-specific? Trends in Cognitive Sciences, 0(0). https://doi.org/10.1016/j.tics.2024.06.006

      Reppert, T. R., Heitz, R. P., & Schall, J. D. (2023). Neural mechanisms for executive control of speedaccuracy trade-oF. Cell Reports, 42(11). https://doi.org/10.1016/j.celrep.2023.113422

      Richer, F., & Beatty, J. (1985). Pupillary Dilations in Movement Preparation and Execution. Psychophysiology, 22(2), 204–207. https://doi.org/10.1111/j.1469-8986.1985.tb01587.x

      Robison, M. K., & Brewer, G. A. (2020). Individual diFerences in working memory capacity and the regulation of arousal. Attention, Perception, & Psychophysics, 82(7), 3273–3290. https://doi.org/10.3758/s13414-020-02077-0

      Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81(2), 407–419. https://doi.org/10.3758/s13414-0181618-4

      Sahakian, A., Gayet, S., PaFen, C. L. E., & Van der Stigchel, S. (2023). Mountains of memory in a sea of uncertainty: Sampling the external world despite useful information in visual working memory. Cognition, 234, 105381. https://doi.org/10.1016/j.cognition.2023.105381

      Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T., & Ahmed, A. A. (2019). Movement Vigor as a Reflection of Subjective Economic Utility. Trends in Neurosciences, 42(5), 323–336. https://doi.org/10.1016/j.tins.2019.02.003

      Silva, M. F., Brascamp, J. W., Ferreira, S., Castelo-Branco, M., Dumoulin, S. O., & Harvey, B. M. (2018). Radial asymmetries in population receptive field size and cortical magnification factor in early visual cortex. NeuroImage, 167, 41–52. https://doi.org/10.1016/j.neuroimage.2017.11.021

      Sirois, S., & Brisson, J. (2014). Pupillometry. WIREs Cognitive Science, 5(6), 679–692. https://doi.org/10.1002/wcs.1323

      Sparks, D. L. (2002). The brainstem control of saccadic eye movements. Nature Reviews Neuroscience, 3(12), Article 12. https://doi.org/10.1038/nrn986

      Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S., & Naber, M. (2022). Pupillometry as an integrated readout of distinct attentional networks. Trends in Neurosciences, 45(8), 635–647. https://doi.org/10.1016/j.tins.2022.05.003

      Unsworth, N., & Miller, A. L. (2021). Individual DiFerences in the Intensity and Consistency of Attention. Current Directions in Psychological Science, 30(5), 391–400. https://doi.org/10.1177/09637214211030266

      Van der Stigchel, S. (2010). Recent advances in the study of saccade trajectory deviations. Vision Research, 50(17), 1619–1627. https://doi.org/10.1016/j.visres.2010.05.028

      Van der Stigchel, S. (2020). An embodied account of visual working memory. Visual Cognition, 28(5–8), 414–419. https://doi.org/10.1080/13506285.2020.1742827

      Van der Stigchel, S., & Hollingworth, A. (2018). Visuospatial Working Memory as a Fundamental Component of the Eye Movement System. Current Directions in Psychological Science, 27(2), 136–143. https://doi.org/10.1177/0963721417741710

      van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of eFort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths: 

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible. 

      Weaknesses: 

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data. 

      We agree with the reviewer that it would be useful to have data from more independent stem cell clones and ideally an equal gender balance of the donors would be preferable. As usual, practical cost-benefit, and time available affect the scope of work that can be performed. We note that the impact of biological donor sex on proteome expression in iPSC lines has already been addressed in previous studies13. We will however revise the manuscript to include specific mention of these limitations and propose a larger-scale follow-up when resources are available.

      Regarding the estimation of protein copy numbers in our study, we would like to highlight that the proteome ruler approach we have used has been employed extensively in the field previously, with direct validation of differences in copy numbers provided using orthogonal methods to MS, e.g., FACS2-4,7,10. Furthermore, the original manuscript14 directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This original study was performed with an older generation mass spectrometer and reduced peptide coverage, compared with the instrumentation used in our present study. Further, we noted that these authors predicted that higher peptide coverage, such as we report in our study, would further increase quantitative performance.

      Reviewer #2 (Public Review):

      Summary: 

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine. A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs. This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs. 

      We thank the reviewer for the positive assessment.

      Strengths: 

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets. 

      Weaknesses: 

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions. 

      We agree with the Reviewer that our study does not extend to also providing a detailed mechanistic explanation for the quantitative differences observed between the two stem cell types and did not claim to have done so. We have now included an expanded section in the discussion where we discuss potential causes. However, in our view fully understanding the reasons for this difference is likely to involve extensive future in-depth analysis in additional studies and is not something that can be determined just by one or two additional supplemental experiments.

      We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest. Again, while we agree it is a useful way forward, in practice this will require a very substantial additional commitment of time and resources. We have now included a section discussing this opportunity within the discussion to encourage further research into the area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) aizi1 and ueah1 clones, which were analyzed in Figure 1A, were excluded from the proteome analysis. In particular, the GAPDH expression level of the aizi1 clone is similar to that of ESCs and different from other iPSC clones. An explanation of how the clones were selected for proteome analysis is needed. Previously, the comparative analysis of iPSCs and ESCs reported in many studies from 2009-2017 (Ref#1-7) has already shown that the number of clones used in the comparative analysis is small, claiming differences (Ref#1-3) and that the differences become indistinguishable when the number of clones is increased (Ref#4-7). Certainly, few studies have been done at the proteome level, so it is important to examine what differences exist in the proteome. Also, it is interesting to focus on the amount of protein per cell. However, if the authors want to describe biological differences, it would be better to get the proteome data in biological duplicate and state the reason for selecting the clones used.

      (1) M. Chin, Cell Stem Cell, 2009, PMID: 19570518

      (2) K. Kim, Nat Biotechnol., 2011, PMID: 22119740

      (3) R. Lister, Nature, 2011, PMID: 21289626

      (4) A.M. Newman, Cell Stem Cell, 2010, PMID: 20682451

      (5) M.G. Guenther, Cell Stem Cell, 2010, PMID: 20682450

      (6) C. Bock, Cell, 2010, PMID: 21295703

      (7) S. Yamanaka, Cell Stem Cell, PMID: 22704507

      We agree with the reviewer that analysing more clones would be beneficial. We have included a section of this topic in the discussion. In our study, we only had access to the 4 hESC lines included, therefore in the original proteomic study we also analysed 4 hiPSC lines, which were routinely grown within our stem cell facility. While as the study progressed the stem cell facility expanded the culture of additional hiPSC lines, unfortunately we couldn’t also access additional hESC lines.

      We agree that ideally combining each biological replicate with additional technical replicates would provide extra robustness. As usual, cost and practical considerations at the time the experiments were performed affected the experimental design chosen. For the experimental design, each experiment was contained within 1 batch to avoid the strong batch effects present in TMT (Brenes et al 2019).

      (2) iPSC samples used in the proteome analysis are two types of female and two types of male, while ESC samples are three types of female and one type of female. The number of sexes of the cells in the comparative analysis should be matched because sex differences may bias the results.

      While we agree with the reviewer in principle, we have previously performed detailed comparisons of proteome expression in many independent iPSC lines from both biological male and female donors (see Brenes et al., Cell Reports 2021) and it seems unlikely that biological sex differences alone could account for the proteome differences between iPS and ESC lines uncovered in this study . However, as this is a relevant point, we have revised the manuscript to explicitly mention this caveat within the discussion section.

      (3) In Figure 1h, I suspect that the variation of PCA plots is very similar between ESCs and iPSCs. In particular, the authors wrote "copy numbers for all 8 replicates" in the legend, but if Figure 1b was done 8 times, there should be 8 types of cells x 8 measurements = 64 points. Even if iPSCs and ESCs are grouped together, there should be 8 points for each cell type. Is it possible that there is only one TMT measurement for this analysis? If so, at least technical duplicates or biological duplicates would be necessary. I also think each cell should be plotted in the PCA analysis instead of combining the four types of ESCs and iPSCs into one.

      We thank the reviewer for bringing this error to our attention. The legend has been corrected to state, “for all 8 stem cell lines”. Each dot represents the proteome of each of the 4 hESCs and 4 hiPSCs that were analysed using proteomics.

      (4) It is necessary to show what functions are enriched in the 4408 proteins whose protein copies per cell were increased in the iPSCs obtained in Figure 2B.

      The enrichment analysis requested has been performed and is now included as a new supplemental figure 2. We find it very interesting that despite the large number of proteins involved here (4,408), the enrichment analysis still shows clear enrichment for specific cellular processes. The summary plot using affinity propagation within webgestalt is included here:

      Author response image 1.

      (5) The Proteomic Ruler method used in this study is a semi-quantitative method to calculate protein copy numbers and is a concentration estimation method. Therefore, if the authors want to have a biological discussion based on the results, they need to show that the estimated concentrations are correct. For example, there are Western Blotting (WB) results for genes with no change in protein levels in hESC and hiPSC in Fig. 6ij, but the WB results for the group of genes that are claimed to have changed are not shown throughout the paper. Also, there is no difference in the total protein level between iPSCs and ESCs from the ponceau staining in Fig.6ij. WB results for at least a few genes are needed to show whether the concentration estimates obtained from the proteome analysis are plausible. If the protein per cell is increased in these iPSC clones, performing WB analysis using an equal number of cells would be better.

      Regarding the ‘proteome ruler’ approach we would like to highlight that this method has previously been used extensively in the field, with detailed validation, as already explained above. It is also not ‘semi-quantitative’ and can estimate absolute abundance, as well as concentrations. Our work does not use their concentration formulas, but the estimation of protein copy numbers, which was shown to closely match the observed copy numbers as determined when spike-ins are used14.

      In providing here additional validation using Western Blotting (WB), we prioritised for analysis also by WB the proteins related to pluripotency markers, which are vital to determine the pluripotency state of the hESCs and hiPSCs, as well as histone markers. We have included a section in the discussion concerning additional validation data and agree in general that further validation is always useful.

      (6) Regarding the experiment shown in Figure 4l, the gender of iPSC used (wibj2) is female and WA01 (H1; WA01) is male. Certainly, there is a difference in the P/E control ratio, but isn't this just a gender difference? The sexes of the cells need to be matched.

      We accept that ideally the sexes of donors should ideally have been matched and have mentioned this within the discussion. Nonetheless, as previously mentioned, our previous detailed proteomic analyses of multiple hiPSC lines13 derived from both biological male and female donors provide relevant evidence that the results shown in this study are not simply a reflection of the sex of the donors for the respective iPSC and ESC lines. When comparing eroded and non-eroded female hiPSCs to male hiPSCs we found no significant differences in any electron transport chain proteins, not TCA proteins between males and females.

      Minor comments:

      (1) Method: Information on the hiPSCs and hESCs used in this study should be described. In particular, the type of differentiated cells, gender, and protocols that were used in the reprogramming are needed.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium, as described in the flagship HipSci paper15. We cite the flagship paper, which specifies in great detail the reprogramming protocols and quality control measures, including analysis of copy number variations15. However, we agree that this information may not be easily accessible for readers. We agree it is relevant to explicitly include this information in our present manuscript, instead of expecting readers to look at the flagship paper. These details have therefore been added to the revised version.

      (2) Method: In Figure1a, Figure 6i, j, the antibody information of Nanog, Oct4, Sox2, and Gapdh is not written in the method and needs to be shown.

      The data relating to these has now been included within the methods section.

      (3) Method: In Figure 1b and other figures, the authors should indicate which iPSC corresponds to which TMT label; the data in the Supplemental Table also needs to indicate which data is which clone.

      We have now added this to the methods section.

      (4) Method: The method of the FACS experiment used in Figure 2 should be described.

      The methods related to the FACS analysis have now been included within the manuscript.

      (5) Method: The cell name used in the mitochondria experiment shown in Figure 4 is listed as WA01, which is thought to be H1. Variations in notation should be corrected.

      This has now been corrected.

      (6) Method: The name of the cell clone shown in Figure 3l,m should be mentioned.

      We have now added these details on the corresponding figure and legend.

      Reviewer #2 (Recommendations For The Authors):

      This study utilized quantitative mass spectrometry to compare protein expression in independently derived 4 ihPSC and 4 hESC cell lines. The investigation quantified approximately 7,900 proteins, and employing the "Proteome ruler" approach, estimated protein copy numbers per cell. Principal component analyses, based on protein copy number per cell, clearly separated hiPSC and hESC, while different hiPSCs and hESCs grouped together. The study revealed a global increase in the expression of cytoplasmic, mitochondrial, membrane transporters, and secreted proteins in hiPSCs compared to hESCs. Interestingly, standard median-based normalization approaches failed to capture these differences, and the disparities became apparent only when protein copy numbers were adjusted for cell numbers. Increased protein abundance in hiPSC was associated with augmented ribosome biogenesis. Total protein content was >50% higher in hiPSCs compared to hESCs, a observation independently verified by total protein content measurement via the EZQ assay and further supported by the larger cell size of hiPSCs in flow cytometry. However, the cell cycle distribution of hiPSC and hESC was similar, indicating that the difference in protein content was not due to variations in the cell cycle. At the phenotypic level, differences in protein expression also correlated with increased glutamine uptake, enhanced mitochondrial potential, and lipid droplet formation in hiPSCs. ihPSCs also expressed higher levels of extracellular matrix components and growth factors.

      Overall, the presented conclusions are adequately supported by the data. Although the mechanistic basis of proteome differences in ihPSC and hESC is not investigated, the work presents interesting findings that are worthy of publication. Below, I have listed my specific questions and comments for the authors.

      (1) Figure 1a displays immunoblots from 6 iPSC and 4 ESC cell lines, with 8 cell lines (4 hESC, 4 hiPSC) utilized in proteomic analyses (Fig. 1b). The figure legend should specify the 8 cell lines included in the proteomic analyses. The manuscript text describing these results should explicitly mention the number and names of cell lines used in these assays.

      We agree with the reviewer and have now marked in figure 1 all the lines that were used for proteomics and have added a section in the methods specifying which cell lines were analysed in each TMT channel.

      (2) In most figures, the quantitative differences in protein expression between hiPSC and hESC are evident, and protein expression is highly consistent among different hiPSCs and hESCs. However, the glutamine uptake capacity of different hiPSC cell lines, and to some extent hESC cell lines, appears highly variable (Figure 3e). While proteome changes were measured in 4 hiPSCs and 4 hESCs, the glutamine uptake assays were performed on a larger number of cell lines. The authors should clarify the number of cell lines used in the glutamine uptake assay, clearly indicating the cell lines used in the proteome measurements. Given the large variation in glutamine uptake among different cell lines, it would be useful to plot the correlation between the expression of glutamine transporters and glutamine uptake in individual cell lines. This may help understand whether differences in glutamine uptake are related to variations in the expression of glutamine transporters.

      The “proteomic ruler” has the capacity to estimate the protein copy numbers per cell, as such changes in the absolute number of cells that were analysed do not cause major complications in quantification. Furthermore, TMT-based proteomics is the most precise proteomics methods available, where the same peptides are detected in all samples across the same data points and peaks, as long as the analysis is done within a single batch, as is the case here.

      The glutamine uptake assay is much more sensitive to the variation in the number of cells. The number of cells were estimated by plating the cells with approximately 5e4 cells two days before the assay, which creates variability. Furthermore, hESCs and hiPSCs are more adhesive than the cells used in the original protocol, hence the quench data was noisier for these lines, making the data from the assay more variable.

      (3) In Figure 4j, it would be helpful to indicate whether the observed differences in the respiration parameters are statistically significant.

      We have now modified the plot to show which proteins were significantly different.

      (4) The iPSCs used here are generated from human primary skin fibroblasts. Different cells vary in size; for instance, fibroblast cells are generally larger than blood lymphocytes. This raises the question of whether the parent cell origin impacts differences in hiPSCs and hESC proteomes. For example, do the authors anticipate that hiPSCs derived from small somatic cells would also display higher expression of cytoplasmic, mitochondrial, and membrane transporters compared to ESC? The authors may consider discussing this point.

      This is a very interesting point. We have now added an extension to the discussion focussed on this subject.

      (5) One wonders if the "Proteome ruler" approach could be applied retrospectively to previously published ihPSC and hESC proteome data, confirming higher expression of cytoplasmic and mitochondrial proteins in ihPSCs, which may have been masked in previous analyses due to median-based normalization.

      We agree with the reviewer and think this is a very good suggestion. Unfortunately, in the main proteomic papers comparing hESC and hiPSCs16,17  the authors did not upload their raw files to a public repository (as it was not mandatory at that period in time), and they also used the International Protein Index (IPI), which is a discontinued database. So the raw files can’t be reprocessed and the database doesn’t match the modern SwissProt entries. Therefore, reprocessing the previous data was impractical.

      (6) The work raises a fundamental question: what is the mechanistic basis for the higher expression of cytoplasmic and mitochondrial proteins in ihPSCs? Conceivably, this could be due to two reasons: (a) Genes encoding cytoplasmic and mitochondrial proteins are expressed at a higher level in ihPSCs compared to hESC. (b) mRNAs encoding cytoplasmic and mitochondrial proteins are translated at a higher level in ihPSCs compared to hESC. The authors may check published transcriptome data from the same cell lines to shed light on this point.

      This is a very interesting point. We believe that the reprogrammed cells contained mature mitochondria, which are not fully regressed upon reprogramming and that this can establish a growth advantage in the normoxic environments in which the cells are grown. Unfortunately, the available transcriptomic data lacked spike-ins, and thus only enables comparison of concentration, not of copy numbers13. Therefore, we could not determine with the available data if there was an increase in the copies of specific mRNAs. However, with a future study where there was a transcriptomic dataset with spike-ins included, this would be very interesting to analyse.

      Reviewer #3 (Recommendations For The Authors):

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data show that hESCs and hiPSCs show significant differences in protein mass and cell size, with the MS data validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus, increased size and protein content would require higher growth rates to sustain the increased mass, which is what we observe.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.

      We apologise for this omission; the details have been included in the revised version of the manuscript.

      Details and characterisation of iPSC and ESC lines used in this study are overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this and refer to the reply above concerning this issue.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer. These has been corrected in the revised version of the manuscript, with additional details included.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      The validation experiments were performed at different time points, and the selection of lines reflected the availability of hiPSC and hESC lines within our stem cell facility at a given point in time.

      We chose to use a range of different lines for comparison, rather than always comparing only one set of lines, to try to avoid a possible bias in our conclusions and thus to make the results more general.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and have added a sentence in the discussion making this point explicitly.

      Differences in H1 histones abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature, as well as discussions with with chromatin and histone experts, both within our institute and externally, have not shed light into what the differences could imply, based upon previous literature. We think therefore that this is a striking and interesting result that merits further study, but we have not yet been able to formulate a clear hypothesis on the consequences.

      (1) Howden, A. J. M. et al. Quantitative analysis of T cell proteomes and environmental sensors during T cell differentiation. Nat Immunol, doi:10.1038/s41590-019-0495-x (2019).

      (2) Marchingo, J. M., Sinclair, L. V., Howden, A. J. & Cantrell, D. A. Quantitative analysis of how Myc controls T cell proteomes and metabolic pathways during T cell activation. Elife 9, doi:10.7554/eLife.53725 (2020).

      (3) Damasio, M. P. et al. Extracellular signal-regulated kinase (ERK) pathway control of CD8+ T cell differentiation. Biochem J 478, 79-98, doi:10.1042/BCJ20200661 (2021).

      (4) Salerno, F. et al. An integrated proteome and transcriptome of B cell maturation defines poised activation states of transitional and mature B cells. Nat Commun 14, 5116, doi:10.1038/s41467-023-40621-2 (2023).

      (5) Antico, O., Nirujogi, R. S. & Muqit, M. M. K. Whole proteome copy number dataset in primary mouse cortical neurons. Data Brief 49, 109336, doi:10.1016/j.dib.2023.109336 (2023).

      (6) Edwards, W. et al. Quantitative proteomic profiling identifies global protein network dynamics in murine embryonic heart development. Dev Cell 58, 1087-1105 e1084, doi:10.1016/j.devcel.2023.04.011 (2023).

      (7) Barton, P. R. et al. Super-killer CTLs are generated by single gene deletion of Bach2. Eur J Immunol 52, 1776-1788, doi:10.1002/eji.202249797 (2022).

      (8) Phair, I. R., Sumoreeah, M. C., Scott, N., Spinelli, L. & Arthur, J. S. C. IL-33 induces granzyme C expression in murine mast cells via an MSK1/2-CREB-dependent pathway. Biosci Rep 42, doi:10.1042/BSR20221165 (2022).

      (9) Niu, L. et al. Dynamic human liver proteome atlas reveals functional insights into disease pathways. Mol Syst Biol 18, e10947, doi:10.15252/msb.202210947 (2022).

      (10) Murugesan, G., Davidson, L., Jannetti, L., Crocker, P. R. & Weigle, B. Quantitative Proteomics of Polarised Macrophages Derived from Induced Pluripotent Stem Cells. Biomedicines 10, doi:10.3390/biomedicines10020239 (2022).

      (11) Ryan, D. G. et al. Nrf2 activation reprograms macrophage intermediary metabolism and suppresses the type I interferon response. iScience 25, 103827, doi:10.1016/j.isci.2022.103827 (2022).

      (12) Nicolas, P. et al. Systems-level conservation of the proximal TCR signaling network of mice and humans. J Exp Med 219, doi:10.1084/jem.20211295 (2022).

      (13) Brenes, A. J. et al. Erosion of human X chromosome inactivation causes major remodeling of the iPSC proteome. Cell Rep 35, 109032, doi:10.1016/j.celrep.2021.109032 (2021).

      (14) Wisniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. A "proteomic ruler" for protein copy number and concentration estimation without spike-in standards. Mol Cell Proteomics 13, 3497-3506, doi:10.1074/mcp.M113.037309 (2014).

      (15) Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370-375, doi:10.1038/nature22403 (2017).

      (16) Phanstiel, D. H. et al. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat Methods 8, 821-827, doi:10.1038/nmeth.1699 (2011).

      (17) Munoz, J. et al. The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol Syst Biol 7, 550, doi:10.1038/msb.2011.84 (2011).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khan et. al., investigated the functional redundancy of the non-canonical L-cysteine synthases of M. tuberculosis, CysM and CysK2, focussing on their role in mitigating the effects of host-derived stress. They found that while deletion mutants of the two synthases (Rv∆cysM, Rv∆cysK2) have similar transcriptomes under standard conditions, their transcriptional response to oxidative stress is distinct. The impact of deleting the synthases also differentially affected the pools of L-cysteinederived metabolites. They show that the mutants (Rv∆cysM, Rv∆cysK2) have impaired survival in peritoneal macrophages and in a mouse model of infection. Importantly, they show that the survival of the mutants increases when the host is defective in producing reactive oxygen and nitrogen species, linking the phenotype to a defect in combating host-derived stress. Finally, they show that compounds inhibiting L-cysteine synthases reduce the intracellular survival of M.

      tuberculosis.

      Strengths:

      (1) The distinct transcriptome of the Rv∆cysM and Rv∆cysK2 mutants in the presence of oxidative stress provides solid evidence that these mutants are distinct in their response to oxidative stress, and suggests that they are not functionally redundant.

      (2) The use of macrophages from phox-/- and INF-/- mice and an iNOS inhibitor for the intracellular survival assays provides solid evidence that the survival defect seen for the Rv∆cysM and Rv∆cysK2 mutants is related to their reduced ability to combat host-derive oxidative and nitrosative stress. This is further supported by the infection studies in phox-/- and INF-/- mice.

      Weaknesses:

      (1) There are several previous studies looking at the transcriptional response of M. tuberculosis to host-derived stress, however, the authors do not discuss initial RNA-seq data in the context of these studies. Furthermore, while several of the genes in sulfur assimilation and L-cysteine biosynthetic pathway genes are upregulated by more than one stress condition, the data does not support the statement that it is the "most commonly upregulated pathway in Mtb exposed to multiple host-like stresses".

      We have made changes in the manuscript in line with reviewer’s suggestion.  

      “Thus RNA-Seq data suggest that genes involved in sulfur assimilation and L-cysteine biosynthetic pathway are upregulated during various host-like stresses in Mtb (Figure S2). Given the importance of sulphur metabolism genes in in vivo survival of Mtb [1, 2], it is not surprising that these genes are dynamically regulated by diverse environment cues. Microarray studies have shown upregulation of genes encoding sulphate transporter upon exposure to hydrogen peroxide and nutrient starvation [3-7] Similarly, ATP sulfurlyase and APS kinase is induced during macrophage infection and by nutrient depletion. Induction of these genes that coordinate first few steps of sulphur assimilation pathway indicate that probable increase in biosynthesis of sulphate containing metabolites that may be crucial against host inflicted stresses. Furthermore, genes involved in synthesis of reduced sulphur moieties (cysH, sirA and cysM) are also induced by hydrogen peroxide and nutrient starvation. Sulfur metabolism has been postulated to be important in transition to latency. This hypothesis is based on transcriptional upregulation of cysD, cysNC, cysK2, and cysM upon exposure to hypoxia. Multiple transcriptional profiling studies have reported upregulation of moeZ, mec, cysO and cysM genes when cells were subjected to oxidative and hypoxic stress [1, 6-11] further suggesting an increase in the biosynthesis of reduced metabolites such as cysteine and methionine and sulfur containing cell wall glycolipids upon exposure to oxidative stress [12]. We have modified the sentence to “significantly upregulated pathway in Mtb exposed to multiple host-like stresses”

      (2) For the quantification of the metabolites, it isn't clear how the abundance was calculated (e.g., were standards for each metabolite used? How was abundance normalised between samples?), and this information should be included to strengthen the data.

      Thanks for picking up this. We have extended our description of metabolomics methods. It now reads: “Due to the tendency of M. tuberculosis to form clamps, which significantly skews any cell number estimation we normalized samples to protein/peptide concentration using the BCA assay kit (Thermo). Therefore, our LC-MS data is expressed as ion counts/mg protein or ratios of that for the same metabolite. This is a standard way to express ion abundance data as it was done previously [13, 14].

      Furthermore, labelling with L-methionine was performed to determine the rate of synthesis of the L-cysteine-derived metabolites. L-cysteine is produced from L-methionine via the transsulfuration pathway, which is independent of CysM and CysK2. It is therefore difficult to interpret this experiment, as the impact of deleting CysM and CysK2 on the transsulfuration pathway is likely indirect.

      The reviewer may have misunderstood the experiment and the results presented. Labelling was not performed with L-methionine. We use 34S derived from SO42-, to monitor reductive assimilation of sulfur and its transit from S2- until L-methionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration, but instead direct synthesis of L-methionine via cysteine, and consequently we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling, to make sure other readers will not think we are measuring transulfuration.

      (3) The ability of L-cysteine to rescue the survival defect of the Rv∆cysM and Rv∆cysK2 mutants in macrophages is interpreted as exogenous L-cysteine being able to compensate for reduced intracellular levels. However, there is no evidence that L-cysteine is being taken up by the mutants and an alternate explanation is that L-cysteine functions as an antioxidant within cells i.e., it reduces intracellular ROS.

      The concentration of L-cysteine used for peritoneal macrophage survival rescue experiments was titrated to have no minimum survival advantage in case of wild-type Rv. Thus, at the given concentration, we believe that the contribution of cysteine in reducing intracellular ROS within cells does not have a major role since there is no significant difference in the survival of wild-type Rv strain. Had cysteine reduced intracellular ROS, we would expect increased bacterial survival of Rv due to diminished oxidative stress. 

      Furthermore, L-cysteine addition also mitigates CHP induced survival defect in vitro [15] and nullifies observed effect of Cysteine inhibitors in vitro [16] suggesting that cysteine or cystine can be transported into Mtb. This has also been previously shown in case of AosR mutant strain [15], CysH [2] and over 70% uptake of exogenously added [35S] cysteine to a growing culture of Mtb [17].

      The authors sought to investigate the functional redundancy of the non-canonical L-cysteine synthases CysM and CysK2. While their distinct transcriptional response to oxidative stress suggests distinct physiological roles, the study did not explore these differences and therefore provides only preliminary insight into the underlying reasons for this observation. In the context of drug development, this work suggests that while L-cysteine synthase inhibitors do not have high potency for killing intracellular M. tuberculosis, they have the potential to decrease the pathogen's survival in the presence of host-derive stress.

      Reviewer #2 (Public Review):

      Summary:

      The paper examines the role L-cysteine metabolism plays in the biology of Mycobacterium tuberculosis. The authors have preliminary data showing that Mycobacterium tuberculosis has two unique pathways to synthesize cysteine. The data showing new compounds that act synergistically with INH is very interesting.

      Strengths:

      RNAseq data is interesting and important.

      Weaknesses:

      The paper would be strengthened if the authors were to add further detail to their genetic manipulations.

      The authors provide evidence that they have successfully made a cysK2 mutant by recombineering. This data looks promising, but I do not see evidence for the cysM deletion. It is also important to state what sort of complementation was done (multicopy plasmid, integration proficient vector, or repair of the deletion). Since these mutants are the basis for most of the additional studies, these details are essential. It is important to include complementation in mouse studies as unexpected loss of PDIM could have occurred.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section.  

      Reviewer #3 (Public Review):

      In this work, the authors conduct transcriptional profiling experiments with Mtb under various different stress conditions (oxidative, nitrosative, low pH, starvation, and SDS). The Mtb transcriptional responses to these stress conditions are not particularly new, having been reported extensively in the literature over the past ~20 years in various forms. A common theme from the current work is that L-cysteine synthesis genes are seemingly up-regulated by many stresses. Thus, the authors focused on deleting two of the three L-cysteine synthesis genes (cysM and cysK2) in Mtb to better understand the roles of these genes in Mtb physiology.

      The cysM and cysK2 mutants display fitness defects in various media (Sautons media, starvation, oxidative and nitrosative stress) noted by CFU reductions. Transcriptional profiling studies with the cysM and cysK2 mutants revealed that divergent gene signatures are generated in each of these strains under oxidative stress, suggesting that cysM and cysK2 have non-redundant roles in Mtb's oxidative stress response which likely reflects the different substrates used by these enzymes, CysO-L-cysteine and O-phospho-L-serine, respectively. Note that these studies lack genetic complementation and are thus not rigorously controlled for the engineered deletion mutations.

      The authors quantify the levels of sulfur-containing metabolites (methionine, ergothioneine, mycothiol, mycothionine) produced by the mutants following exposure to oxidative stress. Both the cysM or cysK2 mutants produce more methionine, ergothioneine, and mycothionine relative to WT under oxidative stress. Both mutants produce less mycothiol relative to WT under the same condition. These studies lack genetic complementation and thus, do not rigorously control for the engineered mutations.

      Next, the mutants were evaluated in infection models to reveal fitness defects associated with oxidative and nitrosative stress in the cysM or cysK2 mutants. In LPS/IFNg activated peritoneal macrophages, the cysM or cysK2 mutants display marked fitness defects which can be rescued with exogenous cysteine added to the cell culture media. Peritoneal macrophages lacking the NADPH oxidase (Phox) or IFNg fail to produce fitness phenotypes in the cysM or cysK2 mutants suggesting that oxidative stress is responsible for the phenotypes. Similarly, chemical inhibition of iNOS partly abrogated the fitness defect of the cysM or cysK2 mutants. Similar studies were conducted in mice lacking IFNg and Phox establishing that cysM or cysK2 mutants have fitness defects in vivo that are dependent on oxidative and nitrosative stress.

      Lastly, the authors use small molecule compounds to inhibit cysteine synthases. It is demonstrated that the compounds display inhibition of Mtb growth in 7H9 ADC media. No evidence is provided to demonstrate that these compounds are specifically inhibiting the cysteine synthases via "ontarget inhibition" in the whole Mtb cells. Additionally, it is wrongly stated in the discussion that "combinations of L-cys synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial load inside the host". This statement suggests that the INH + cysteine synthase inhibitor combinations reduce Mtb loads within a host in an infection assay. No data is presented to support this statement.

      We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in the whole Mtb cells. However, the inhibitors used in this study have been previously profiled in vitro (https://www.sciencedirect.com/science/article/abs/pii/S0960894X17308405?via%3Dihub).  We have modified the sentence to “a combination of L-cysteine synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial survival in vitro”

      References

      (1) Hatzios, S.K. and C.R. Bertozzi, The regulation of sulfur metabolism in Mycobacterium tuberculosis. PLoS Pathog, 2011. 7(7): p. e1002036.

      (2) Senaratne, R.H., et al., 5'-Adenosinephosphosulphate reductase (CysH) protects Mycobacterium tuberculosis against free radicals during chronic infection phase in mice. Mol Microbiol, 2006. 59(6): p. 1744-53.

      (3) Betts, J.C., et al., Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol, 2002. 43(3): p. 717-31.

      (4) Hampshire, T., et al., Stationary phase gene expression of Mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms? Tuberculosis (Edinb), 2004. 84(3-4): p. 228-38.

      (5) Schnappinger, D., et al., Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosomal Environment. J Exp Med, 2003. 198(5): p. 693-704.

      (6) Voskuil, M.I., et al., The response of mycobacterium tuberculosis to reactive oxygen and nitrogen species. Front Microbiol, 2011. 2: p. 105.

      (7) Voskuil, M.I., K.C. Visconti, and G.K. Schoolnik, Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb), 2004. 84(3-4): p. 218-27.

      (8) Brunner, K., et al., Profiling of in vitro activities of urea-based inhibitors against cysteine synthases from Mycobacterium tuberculosis. Bioorg Med Chem Lett, 2017. 27(19): p. 4582-4587.

      (9) Manganelli, R., et al., Role of the extracytoplasmic-function sigma factor sigma(H) in Mycobacterium tuberculosis global gene expression. Mol Microbiol, 2002. 45(2): p. 365-74.

      (10) Burns, K.E., et al., Reconstitution of a new cysteine biosynthetic pathway in Mycobacterium tuberculosis. J Am Chem Soc, 2005. 127(33): p. 11602-3.

      (11) Manganelli, R., et al., The Mycobacterium tuberculosis ECF sigma factor sigmaE: role in global gene expression and survival in macrophages. Mol Microbiol, 2001. 41(2): p. 423-37.

      (12) Tyagi, P., et al., Mycobacterium tuberculosis has diminished capacity to counteract redox stress induced by elevated levels of endogenous superoxide. Free Radic Biol Med, 2015. 84: p. 344-354.

      (13) de Carvalho, L.P., et al., Metabolomics of Mycobacterium tuberculosis reveals compartmentalized co-catabolism of carbon substrates. Chem Biol, 2010. 17(10): p. 1122-31.

      (14) Agapova, A., et al., Flexible nitrogen utilisation by the metabolic generalist pathogen Mycobacterium tuberculosis. Elife, 2019. 8.

      (15) Khan, M.Z., et al., Redox homeostasis in Mycobacterium tuberculosis is modulated by a novel actinomycete-specific transcription factor. EMBO J, 2021. 40(14): p. e106111.

      (16) Brunner, K., et al., Inhibitors of the Cysteine Synthase CysM with Antibacterial Potency against Dormant Mycobacterium tuberculosis. J Med Chem, 2016. 59(14): p. 6848-59.

      (17) Wheeler, P.R., et al., Functional demonstration of reverse transsulfuration in the Mycobacterium tuberculosis complex reveals that methionine is the preferred sulfur source for pathogenic Mycobacteria. J Biol Chem, 2005. 280(9): p. 8069-78.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure S1 it would be useful to include the reverse transsulfuration pathway given that it contributes to the L-cysteine pool, and that L-methionine was used for metabolite labelling experiments.

      We are in agreement with the reviewer’s suggestion, and we have included reverse transsulfuration in Fig S1. Please note that Labelling was not performed with L-methionine. We used 34S derived from SO42-to monitor the reductive assimilation of sulfur and its transit from S2- until Lmethionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration but instead a direct synthesis of Lmethionine via cysteine, and consequently, we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling to make sure other readers will not think we are measuring transulfuration.

      Author response image 1.

      (2) In Figure S2 it is unclear why the control is included in this figure given that the stress conditions were compared to the control. What is the control being compared to here?

      The heat maps of controls have been included to demonstrate relative gene expression in independent/each of the replicates. The normalized count for the differentially expressed genes are plotted. To better understand the RNA-seq results, we plotted the fold change of differentially expressed genes due to different stress conditions (New figure & table- Figure S3 & Table S2). This allowed us to understand the expression profile of genes in all the stress conditions simultaneously, regardless of whether they were identified as differentially expressed. The data revealed that specific clusters of genes are up- and downregulated in oxidative, SDS, and starvation conditions. In comparison, the differences observed in the pH 5.5 and nitrosative conditions were limited (Figure S3 & Table S2).  

      (3) In Figure S3 it would be more informative to show fold-enrichment than gene counts in (b) to (f).

      In our opinion, gene counts are more informative when plotting GO enrichments, as the number of genes in each GO category can vary drastically. The significance values are already calculated based on the fold enrichment of a category compared to the background, and hence, p-adj values plotted on the x-axis can be sort of a proxy for fold enrichment. Hence, instead of plotting two related variables, plotting the total gene counts that belonged to a category is usually helpful for the reader in understanding the “scale” in which a category is affected.

      (4) Figure 1c standard Sautons is a defined media, and is not nutrient-limiting - the authors should clarify the composition of the media that they used here.

      The composition of Sautons media used in the study is 0.5g/L MgSO4.7H20, 2 g/L citric acid, 1g/L L-asparagine, 0.3 g/L KCl.H20, 0.2% glycerol, 0.64 g/L FeCl3, 100 μM NH4Cl and 0.7 g/L K2HPO4.3H20. We have modified the sentence in line with reviewer’s suggestion.  

      (5) The authors claim that the distinct transcriptomes for the two mutants indicate that "CysM and CysK2 distinctly modulate 324 and 1104 genes". The effect is likely due to distinct downstream consequences of the deletions, rather than direct regulation by the synthases. This section should be reworded for clarity.

      We have modified the sentence in line with reviewer’s suggestion.

      (6) In Figure 3 it would be useful to express mycothione levels as a percentage of the total mycothiol pool to give an indication of the extent to which the thiol is being oxidised.

      While we appreciate reviewer’s suggestion, we cannot make ratios of IC for two different compounds, as they ionize different. 100 ion counts of one does NOT equal to 100 ion counts of the other.

      (7) Figure 6 is difficult to interpret as the concentrations used in the INH + inhibitor wells are not clear. It would be useful to indicate the concentrations of each compound added next to the wells in the figure.

      We have modified the figure and legends in line with reviewer’s suggestion

      Reviewer #2 (Recommendations For The Authors):

      (1) Document the cysM deletion.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section. 

      (2) The oxidative stress CHP is not defined in the figure legend.

      We have modified the legend in line with the reviewer’s suggestion.

      (3) Can we see the structures of the compounds?

      Kindly refer to Fig 6a for the structures of compounds 

      (4) Fix the genetics and the paper is very interesting.

      I might be missing something. The authors do provide promising complementation data for several of the stresses. Provide evidence for the cysM deletion and complementation and the data will be very compelling. The focus of the paper is important for our understanding of the biology of Mycobacterium tuberculosis.

      Thank you for appreciating our study. The details of CysM knockout and complementation strain generation have been previously published ([15]; Appendix Figure S4 & Methods)). CysK2 mutant and complementation strain details are included in the present manuscript (Figure 1b & Methods).

      Reviewer #3 (Recommendations For The Authors):

      The transcriptional profiling studies do not rigorously control for the engineered mutations using genetic complementation.

      The complementation strains used in all in vitro, ex vivo and in vivo experiments showcase that the phenotypes associated with knockouts are gene specific. We choose not to include complementation strains in RNA sequencing experiments due to the large number of samples handling and associated costs.  

      Figure 3. These data are not rigorously controlled without genetic complementation, explain why some data in Figure 3 was generated at 24 hr and other data was generated at 48 hr, remove subbars in 3g. Please provide more clarification on Fig 3e-g because the normalization in these panels makes it appear as if there is little- or no-difference in the levels of 34S incorporation into the thiol metabolites.

      The complementation strains used in all in vitro, ex vivo, and in vivo experiments showcase that the phenotypes associated with knockouts are gene-specific. We chose not to include complementation strains in Figure 3 experiments due to the large number of sample handling and associated costs. 

      The time points in the given experiment were chosen based on an initial pilot experiment. It is apparent that a longer duration is required to see the phenotypes associated with labelling compared to pool size. The differences observed are statistically significant. 

      Surfactant and SDS stress are used interchangeably in the text, legends, and figures. Please be consistent here.

      We have modified the text in line with reviewer’s suggestion.

      Consider re-wording the 1st paragraph on page 5 to better clarify how Trp, Lys, and His interact with the host immune cells.

      We have modified the text in line with reviewer’s suggestion.

      Cite the literature associated with the sulfur import system in Mtb on page 3 in the 2nd paragraph.

      We have modified the text in line with reviewer’s suggestion.

      The manuscript nicely describes the construction of a cysK2 mutant. It is unclear how the cysM mutant was generated. Please clarify, cite, or add the cysM mutant construction to this manuscript.

      The details of CysM knockout and complementation strain generation has been previously published ([15]; Appendix Figure S4 & Methods)). We have included the citation in the methods section of current manuscript.

      Provide evidence that the small molecules used in Fig 6 are on target and inhibit the cysteine biosynthetic enzymes in whole bacteria. It is unclear how a MIC can be determined with these compounds in 7H9 ADC when deletion mutants grow just fine in this media. Is this because the compounds inhibit multiple cysteine synthesis enzymes and/or enzymatic targets in other pathways? To me, the data suggests that the compounds are hitting multiple enzymes in whole Mtb cells. Does cysteine supplementation reverse the inhibitory profiles with the compounds in Figure 6?

      As mentioned in the text, all the compounds were ineffective in killing Mtb, likely because Lcysteine synthases are not essential during regular growth conditions. Hence, the MIC for cysteine inhibitors was very high - C1 (0.6 mg/ml), C2 (0.6 mg/ml), and C3 (0.15 mg/ml) opposed to the standard drug, isoniazid with MIC of 0.06 ug/ml. We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in  Mtb cells. The inhibitors used in this study have been previously profiled in vitro [8]. However, one cannot rule out the hypothesis that these compounds might also have some off-target effects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both reviewers for taking the time to review our manuscript and data in great detail. We thank you for the fair assessment of our work, the helpful feedback, and for recognizing the value of our work. We have done our best to address your concerns below:

      eLife assessment This work reports a valuable finding on glucocorticoid signaling in male and female germ cells in mice, pointing out sexual dimorphism in transcriptomic responsiveness. While the evidence supporting the claims is generally solid, additional assessments would be required to fully confirm an inert GR signaling despite the presence of GR in the female germline and GR-mediated alternative splicing in response to dexamethasone treatment in the male germline. The work may interest basic researchers and physician-scientists working on reproduction and

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cincotta et al set out to investigate the presence of glucocorticoid receptors in the male and female embryonic germline. They further investigate the impact of tissue-specific genetically induced receptor absence and/or systemic receptor activation on fertility and RNA regulation. They are motivated by several lines of research that report inter and transgenerational effects of stress and or glucocorticoid receptor activation and suggest that their findings provide an explanatory mechanism to mechanistically back parental stress hormone exposure-induced phenotypes in the offspring.

      Strengths:

      A chronological immunofluorescent assessment of GR in fetal and early life oocyte and sperm development.

      RNA seq data that reveal novel cell type specific isoforms validated by q-RT PCR E15.5 in the oocyte.

      2 alternative approaches to knock out GR to study transcriptional outcomes. Oocytes: systemic GR KO (E17.5) with low input 3-tag seq and germline-specific GR KO (E15.5) on fetal oocyte expression via 10X single cell seq and 3-cap sequencing on sorted KO versus WT oocytes both indicating little impact on polyadenylated RNAs

      2 alternative approaches to assess the effect of GR activation in vivo (systemic) and ex vivo (ovary culture): here the RNA seq did show again some changes in germ cells and many in the soma.

      They exclude oocyte-specific GR signaling inhibition via beta isoforms.

      Perinatal male germline shows differential splicing regulation in response to systemic Dex administration, results were backed up with q-PCR analysis of splicing factors. Weaknesses:

      COMMENT #1: The presence of a protein cannot be entirely excluded based on IF data

      We agree that very low levels of GR could escape the detection by IF and confocal imaging. We feel that our IF data do match transcript data in our validation studies of the GR KO using (1) qRT-PCR on fetal ovary in Fig 2E and (2) scRNA-seq in germ cells and ovarian soma in Fig S2B.

      COMMENT #2: (staining of spermatids is referred to but not shown).

      You are correct that this statement was based on a morphological identification of spermatids using DAPI morphology. We have performed a co-stain for GR with the spermatocyte marker SYCP3, and the spermatid/spermatozoa marker PNA (Peanut Agglutinin; from Arachis hypogaea) in adult testis tissue. We have updated Figure 4D to reflect this change, as well as the corresponding text in the Results section.

      COMMENT #3: The authors do not consider post-transcriptional level a) modifications also triggered by GR activation b) non-coding RNAs (not assessed by seq).

      We thank the reviewer for raising this very important point about potential post-transcriptional (non-genomic) effects of GR in the fetal oocyte. We agree that while our RNA-seq results show only a minimal transcriptional response, we cannot rule out a non-canonical signaling function of GR, such as the regulation of cellular kinases (as reviewed elsewhere1), or the regulation of non coding RNAs at the post-transcriptional level, and we have amended the discussion to include a sentence on this point. However, while we fully acknowledge the possibility of GR regulating non-genomic level cellular signaling, we chose not to explore this option further based on the lack of any overall functional effect on meiotic progression when GR signaling was perturbed- either by KO (Figure 2D) or dex-mediated activation (Figure S3C).

      COMMENT #4: Sequencing techniques used are not total RNA but either are focused on all polyA transcripts (10x) or only assess the 3' prime end and hence are not ideal to study splicing

      We thank the reviewer for raising this concern, however this statement is not correct and we have clarified this point in the Results section to explain how the sequencing libraries of the male germ cell RNA-seq were prepared. We agree that certain sequencing techniques (such as 3’ Tag-Seq) that generate sequencing libraries from a limited portion of an entire transcript molecule are not appropriate for analysis of differential splicing. This was not the case, however, for the RNA-seq libraries prepared on our male germ cells treated with dexamethasone. These libraries were constructed using full length transcripts that were reverse transcribed using random hexamer priming, thus accounting for sequencing coverage across the full transcript length. As a result, this type of library prep technique should be sufficient for capturing differential splicing events along the length of the transcript. We do, however, point out that these libraries were constructed on polyA-enriched transcripts. Thus while we obtained full length transcript coverage for these polyA transcripts, any differential splicing taking place in non poly-adenylated RNA moieties were not captured. While we are excited about the possibility of exploring GR-mediated splicing regulation of other RNA species in the future, we chose to focus the scope of our current study on polyA mRNA molecules specifically.

      COMMENT #5: The number of replicates in the low input seq is very low and hence this might be underpowered

      While the number of replicates (n=3-4 per condition) is sufficient for performing statistical analysis of a standard RNA-seq experiment, we do acknowledge and agree with the reviewer that low numbers of FACS-sorted germ cells from individual embryos combined with the low input 3’ Tag-Seq technique could have led to higher sample variability than desired. Given that we validated our bulk RNA-seq analysis of GR knockout ovaries using an orthogonal single-cell RNA-seq approach, we feel that our conclusions regarding a lack of transcriptional changes upon GR deletion remain valid.

      COMMENT #6: Since Dex treatment showed some (modest) changes in oocyte RNA - effects of GR depletion might only become apparent upon Dex treatment as an interaction.

      We may be missing the nuance of this point, but our interpretation of an effect that is seen only when the KO is treated with Dex would be that the mechanism would not be autonomous in germ cells but indirect or off-target.

      COMMENT #7: Effects in oocytes following systemic Dex might be indirect due to GR activation in the soma.

      As both the oocytes and ovarian soma express GR during the window of dex administration, we agree that it is possible that the few modest changes seen in the oocyte transcriptome are the result of indirect effects following robust GR signaling in the somatic compartment. However, given that these modest oocyte transcript changes in response to dex treatment did not significantly alter the ability of oocytes to progress through meiosis, we chose not to explore this mechanism further.

      COMMENT #8: Even though ex vivo culture of ovaries shows GR translocation to the nucleus it is not sure whether the in vivo systemic administration does the same.

      AND

      The conclusion that fetal oocytes are resistant to GR manipulation is very strong, given that "only" poly A sequencing and few replicates of 3-prime sequencing have been analyzed and information is lacking on whether GR is activated in germ cells in the systemically dex-injected animals.

      If we understand correctly, the first part refers to a technical limitation and the second part takes issue with our interpretation of the data. For the former, we appreciate this astute insight on the conundrum of detecting a response to systemic dex in fetal oocytes, which is generally monitored by nuclear translocation of GR. As shown in Figure 1A and 1B, GR localization is overwhelmingly nuclear in fetal oocytes of WT animals at E13.5 without addition of any dex. We could not, therefore, use GR translocation as a proxy for activation in response to dex treatment. We instead used ex vivo organ culture to monitor localization changes, as we were able to maintain fetal ovaries ex vivo in hormone-depleted and ligand negative conditions. As shown in Fig. 3, these defined culture conditions elicited a shift of GR to the cytoplasm of fetal oocytes. This led us to conclude that GR is capable of translocating between nucleus and cytoplasm in fetal oocytes, and we were able to counteract this loss in nuclear localization by providing dex ligand in the media.

      We feel that our conclusion that oocytes are resistant to manipulation of glucocorticoid signaling despite their possession of the receptor and capacity for nuclear translocation is substantiated by multiple results: meiotic phenotyping, bulk RNA-seq and scRNA-seq analysis of both GR KO and dex dosed mice. Our basis for testing the timing and fidelity of meiotic prophase I was the coincident onset of GR expression in female germ cells at E13, and the disappearance of GR in neonatal oocytes as they enter meiotic arrest. The lack of transcriptional changes observed in oocytes in response to dex has made it even more challenging to demonstrate a bona fide “activation” of GR. Observation of a dose-dependent induction of the canonical GR response gene Fkbp5 in the somatic cells of the fetal ovary (Figure S3A and 3A) affirmed that dex traverses the placenta. We agree with the reviewer that it remains possible that dex or GR KO could lead to changes in epigenetic marks or small RNAs in oocytes, and have mentioned these possibilities in the discussion, but we note that even epigenetic perturbations during oocyte development such as the loss of Tet1 or Dnmt1 result in measurable changes in the transcriptome and the timing of meiotic prophase 2–4.

      COMMENT #9: This work is a good reference point for researchers interested in glucocorticoid hormone signaling fertility and RNA splicing. It might spark further studies on germline-specific GR functions and the impact of GR activation on alternative splicing. While the study provides a characterization of GR and some aspects of GR perturbation, and the negative findings in this study do help to rule out a range of specific roles of GR in the germline, there is still a range of other potential unexplored options. The introduction of the study eludes to implications for intergenerational effects via epigenetic modifications in the germline, however, it does not mention that the indirect effects of reproductive tissue GR signaling on the germline have indeed already been described in the context of intergenerational effects of stress.

      The reviewer raises an excellent point that we have not made sufficient distinction in our manuscript between prior studies of gestational stress and preconception stress and the light that our work may shed on those findings. We have revised the introduction to clarify this difference, and added reference to an outstanding study that identifies glucocorticoid-induced changes to microRNA cargo of extracellular vesicles shed by epididymal epithelial cells that when transferred to mature sperm can induce changes in the HPA axis and brain of offspring 5. Interestingly, this GR-mediated effect in the epididymal epithelial cells concurs with our observation in the adult testis that GR can be detected only cKit+ spermatogonia but not in subsequent stages of spermatids.

      COMMENT #10: Also, the study does not assess epigenetic modifications.

      We agree with the reviewer that exploring the role of GR in regulating epigenetic modifications within the germline is an area of extreme interest given the potential links between stress and transgenerational epigenetic inheritance. As this is a broader topic that requires a more thorough and comprehensive set of experiments, we have intentionally chosen to keep this work separate from the current study, and hope to expand upon this topic in the future.

      COMMENT #11: The conclusion that the persistence of a phenotype for up to three generations suggests that stress can induce lasting epigenetic changes in the germline is misleading. For the reader who is unfamiliar with the field, it is important to define much more precisely what is referred to as "a phenotype". Furthermore, this statement evokes the impression that the very same epigenetic changes in the germline have been observed across multiple generations.

      We see how this may be misleading, and we have amended the text of the introduction and discussion accordingly to avoid the use of the term “phenotype”.

      COMMENT #12: The evidence of the presence of GR in the germline is also somewhat limited - since other studies using sequencing have detected GR in the mature oocyte and sperm.

      As described above in response to Comment #2, we have included immunostaining of adult testis in a revised Figure 4D and shown that we detect GR in PLZF+ and cKIT+ spermatogonia. We also show low/minimal expression in some (SYCP3+) early meiotic spermatocytes, but not in (Lectin+) spermatids. We are not aware of any studies that have shown expression of GR protein in the mature oocyte.

      COMMENT #13: The discussion ends again on the implications of sex-specific differences of GR signaling in the context of stress-induced epigenetic inheritance. It states that the observed differences might relate to the fact that there is more evidence for paternal lineage findings, without considering that maternal lineage studies in epigenetic inheritance are generally less prevalent due to some practical factors - such as more laborious study design making use of cross-fostering or embryo transfer.

      We thank the reviewer for this valid point, and we have amended the discussion section.

      Reviewer #2 (Public Review):

      Summary:

      There is increasing evidence in the literature that rodent models of stress can produce phenotypes that persist through multiple generations. Nevertheless, the mechanism(s) by which stress exposure produces phenotypes are unknown in the directly affected individual as well as in subsequent offspring that did not directly experience stress. Moreover, it has also been shown that glucocorticoid stress hormones can recapitulate the effects of programmed stress. In this manuscript, the authors test the compelling hypothesis that glucocorticoid receptor (GR)-signaling is responsible for the transmission of phenotypes across generations. As a first step, the investigators test for a role of GR in the male and female germline. Using knockouts and GR agonists, they show that although germ cells in male and female mice have GR that appears to localize to the nucleus when stimulated, oocytes are resistant to changes in GR levels. In contrast, the male germline exhibits changes in splicing but no overt changes in fertility.

      Strengths:

      Although many of the results in this manuscript are negative, this is a careful and timely study that informs additional work to address mechanisms of transmission of stress phenotypes across generations and suggests a sexually dimorphic response to glucocorticoids in the germline. The work presented here is well-done and rigorous and the discussion of the data is thoughtful. Overall, this is an important contribution to the literature.

      Reviewer #1 (Recommendations For The Authors):

      RECOMMENDATION #1: To assess whether in females the systemic Dex administration directly activates GR in oocytes it would be great to assess GR activation following Dex administration, and ideally to see the effects abolished when Dex is administered to germline-specific KO animals.

      In regard to the recommendation to assess GR activation in response to systemic dex administration, we refer the reviewer back to our response in Comment #8 highlighting the difficulties defining and measuring GR activation in the germline.

      This therefore has made it difficult to assess whether any of the modest effects seen in response to dex are abolished in our germline-specific KO animals. While repeating our RNA-seq experiment in dex-dosed germline KO animals would address whether the ~60 genes induced in oocytes are the result of oocyte-intrinsic GR activity, we have decided not to explore this mechanism further due to the overall lack of a functional effect on meiotic progression in response to dex (Figure S3C).

      RECOMMENDATION #2: To further strengthen the link between GR and alternative splicing it would be great to see the dex administration experiment repeated in germline specific GR KO's.

      While we understand the reviewer’s suggestion to explore whether deletion of GR in the spermatogonia is sufficient to abrogate the dex-mediated decreases in splice factor expression, we chose not to explore the details of this mechanism given that deletion of GR in the male germline does not impair fertility (Figure 6).

      RECOMMENDATION #3: I am wondering how much a given reduction in one of the splicing factors indeed affects splicing events. Can the authors relate this to literature, or maybe an in vitro experiment can be done to see whether the level of differential splicing events detected is in a range that can be expected in the case of the magnitude of splicing factor reduction?

      It has been shown in many instances in the literature that a full genetic deletion of a single splice factor leads to impairments in spermatogenesis, and ultimately infertility 6–16. We suspect that dex treatment leads to fewer differential splicing events than a full splice factor deletion, given that dex treatment causes a broader decrease in splice factor expression without entirely abolishing any single splice factor. We have amended the discussion section to include this point. While we share the reviewer’s curiosity to compare the effects of dex vs genetic deletion of splicing machinery on the overall magnitude of differential splicing events, we unfortunately do not have access to mice with a floxed splice factor at this time. While we have considered knocking out one or more splice factors in an ex vivo cultured testis to compare alongside dex treatment, our efforts to date have proven unsuccessful due to high cell death upon culture of the postnatal testis for more than 24 hours.

      RECOMMENDATION #4: It is unclear from the methods whether in germline-specific KO's also the controls received tamoxifen.

      We thank the reviewer for catching this missing piece of information. All control embryos that were assessed received an equivalent dose of tamoxifen to the germline-specific KO embryos. The only difference between cKOs and controls was the presence of the Cre transgene. We have updated the Materials and Methods 3’ Tag-Seq sample preparation section to include the sentence: “Both GRcKO/cKO and control GRflox/flox embryos were collected from tamoxifen-injected dams, and thus were equally exposed to tamoxifen in utero”.

      Reviewer #2 (Recommendations For The Authors):

      I just have only a few comments/questions.

      RECOMMENDATION #5: It is somewhat surprising that GR is expressed in female germ cells, yet there doesn't seem to be a requirement. Is there any indication of what it does? Is the long-term stability of the germline compromised?

      We thank the reviewer for these questions, and we agree that it was quite surprising to find a lack of GR function in the female germline despite its robust expression. The question of whether loss of GR affects the long-term stability of the female germline is interesting, given that similar work in GR KO zebrafish has shown impairments to female reproductive capacity, yet only upon aging 17–19.

      While we have shared interest in this question, technical limitations thus far have prevented us from properly assessing the effect of GR loss in aged females. Homozygous deletion of GR results in embryonic lethality at approximately E17.5. Conditional deletion of GR using Oct4-CreERT2 with a single dose of tamoxifen (2.5 mg / 20g mouse) at E9.5 results in complete deletion of GR by E10.5, although dams consistently suffer from dystocia and are no longer able to deliver viable pups. While using the more active tamoxifen metabolite (4OHT) at 0.1 mg / 20g has allowed for successful delivery, the resulting deletion rate is very poor (see qPCR results in panel below, left). While using half the dose of standard tamoxifen (1.25 mg / 20g mouse) at E9.5 has on rare occasions led to a successful delivery, the resulting recombination efficiency is insufficient (Author response image 1 right panel).

      Author response image 1.

      While a Blimp1-Cre conditional KO model was used to assess male fertility on GR deletion, we believe this model may not be ideal for studying fertility in the context of aging. While Blimp1-Cre is highly specific to the germ cells within the gonad, there are many cell types outside of the gonad that express Blimp1, including the skin and certain cells of the immune system. It is unclear, particularly over the course of aging, whether any effects on fertility seen would be due to an oocyte-intrinsic effect, or the result of GR loss elsewhere in the body. While we hope to explore the role of GR in the aging oocyte further using alternative Cre models in the future, this is currently outside the scope of this work.

      RECOMMENDATION #6: Figure 5b: what is the left part of that panel? Is it the same volcano plot for germ cells as shown in part a but with splicing factors?

      We apologize if this panel was unclear. Yes, the left panel of Figure 5B is in fact the same volcano plot in 5A, labeled with splicing factors instead of top genes. We have edited Figure 5B and corresponding figure legend to clarify this.

      References: 1. Oakley, R.H., and Cidlowski, J.A. (2013). The biology of the glucocorticoid receptor: New signaling mechanisms in health and disease. J. Allergy Clin. Immunol. 132, 1033–1044. 10.1016/j.jaci.2013.09.007.

      1. Hargan-Calvopina, J., Taylor, S., Cook, H., Hu, Z., Lee, S.A., Yen, M.-R., Chiang, Y.-S., Chen, P.-Y., and Clark, A.T. (2016). Stage-Specific Demethylation in Primordial Germ Cells Safeguards against Precocious Differentiation. Dev. Cell 39, 75–86. 10.1016/j.devcel.2016.07.019.

      2. Hill, P.W.S., Leitch, H.G., Requena, C.E., Sun, Z., Amouroux, R., Roman-Trufero, M., Borkowska, M., Terragni, J., Vaisvila, R., Linnett, S., et al. (2018). Epigenetic reprogramming enables the transition from primordial germ cell to gonocyte. Nature 555, 392–396. 10.1038/nature25964.

      3. Eymery, A., Liu, Z., Ozonov, E.A., Stadler, M.B., and Peters, A.H.F.M. (2016). The methyltransferase Setdb1 is essential for meiosis and mitosis in mouse oocytes and early embryos. Development 143, 2767–2779. 10.1242/dev.132746.

      4. Chan, J.C., Morgan, C.P., Leu, N.A., Shetty, A., Cisse, Y.M., Nugent, B.M., Morrison, K.E., Jašarević, E., Huang, W., Kanyuch, N., et al. (2020). Reproductive tract extracellular vesicles are sufficient to transmit intergenerational stress and program neurodevelopment. Nat Commun 11, 1499. 10.1038/s41467-020-15305-w.

      5. Kuroda, M., Sok, J., Webb, L., Baechtold, H., Urano, F., Yin, Y., Chung, P., Rooij, D.G. de, Akhmedov, A., Ashley, T., et al. (2000). Male sterility and enhanced radiation sensitivity in TLS−/− mice. Embo J 19, 453–462. 10.1093/emboj/19.3.453.

      6. Liu, W., Wang, F., Xu, Q., Shi, J., Zhang, X., Lu, X., Zhao, Z.-A., Gao, Z., Ma, H., Duan, E., et al. (2017). BCAS2 is involved in alternative mRNA splicing in spermatogonia and the transition to meiosis. Nat Commun 8, 14182. 10.1038/ncomms14182.

      7. Li, H., Watford, W., Li, C., Parmelee, A., Bryant, M.A., Deng, C., O’Shea, J., and Lee, S.B. (2007). Ewing sarcoma gene EWS is essential for meiosis and B lymphocyte development. J Clin Invest 117, 1314–1323. 10.1172/jci31222.

      8. O’Bryan, M.K., Clark, B.J., McLaughlin, E.A., D’Sylva, R.J., O’Donnell, L., Wilce, J.A., Sutherland, J., O’Connor, A.E., Whittle, B., Goodnow, C.C., et al. (2013). RBM5 Is a Male Germ Cell Splicing Factor and Is Required for Spermatid Differentiation and Male Fertility. Plos Genet 9, e1003628. 10.1371/journal.pgen.1003628.

      9. Zagore, L.L., Grabinski, S.E., Sweet, T.J., Hannigan, M.M., Sramkoski, R.M., Li, Q., and Licatalosi, D.D. (2015). RNA Binding Protein Ptbp2 Is Essential for Male Germ Cell Development. Mol Cell Biol 35, 4030–4042. 10.1128/mcb.00676-15.

      10. Xu, K., Yang, Y., Feng, G.-H., Sun, B.-F., Chen, J.-Q., Li, Y.-F., Chen, Y.-S., Zhang, X.-X., Wang, C.-X., Jiang, L.-Y., et al. (2017). Mettl3-mediated m6A regulates spermatogonial differentiation and meiosis initiation. Cell Res 27, 1100–1114. 10.1038/cr.2017.100.

      11. Horiuchi, K., Perez-Cerezales, S., Papasaikas, P., Ramos-Ibeas, P., López-Cardona, A.P., Laguna-Barraza, R., Balvís, N.F., Pericuesta, E., Fernández-González, R., Planells, B., et al. (2018). Impaired Spermatogenesis, Muscle, and Erythrocyte Function in U12 Intron Splicing-Defective Zrsr1 Mutant Mice. Cell Reports 23, 143–155. 10.1016/j.celrep.2018.03.028.

      12. Ehrmann, I., Crichton, J.H., Gazzara, M.R., James, K., Liu, Y., Grellscheid, S.N., Curk, T., Rooij, D. de, Steyn, J.S., Cockell, S., et al. (2019). An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning. Elife 8, e39304. 10.7554/elife.39304.

      13. Legrand, J.M.D., Chan, A.-L., La, H.M., Rossello, F.J., Änkö, M.-L., Fuller-Pace, F.V., and Hobbs, R.M. (2019). DDX5 plays essential transcriptional and post-transcriptional roles in the maintenance and function of spermatogonia. Nat Commun 10, 2278. 10.1038/s41467-019-09972-7.

      14. Yuan, S., Feng, S., Li, J., Wen, H., Liu, K., Gui, Y., Wen, Y., and Wang, X. (2021). hnRNPH1 recruits PTBP2 and SRSF3 to cooperatively modulate alternative pre-mRNA splicing in germ cells and is essential for spermatogenesis and oogenesis. 10.21203/rs.3.rs-1060705/v1.

      15. Wu, R., Zhan, J., Zheng, B., Chen, Z., Li, J., Li, C., Liu, R., Zhang, X., Huang, X., and Luo, M. (2021). SYMPK Is Required for Meiosis and Involved in Alternative Splicing in Male Germ Cells. Frontiers Cell Dev Biology 9, 715733. 10.3389/fcell.2021.715733.

      16. Maradonna, F., Gioacchini, G., Notarstefano, V., Fontana, C.M., Citton, F., Valle, L.D., Giorgini, E., and Carnevali, O. (2020). Knockout of the Glucocorticoid Receptor Impairs Reproduction in Female Zebrafish. Int J Mol Sci 21, 9073. 10.3390/ijms21239073.

      17. Facchinello, N., Skobo, T., Meneghetti, G., Colletti, E., Dinarello, A., Tiso, N., Costa, R., Gioacchini, G., Carnevali, O., Argenton, F., et al. (2017). nr3c1 null mutant zebrafish are viable and reveal DNA-binding-independent activities of the glucocorticoid receptor. Sci Rep-uk 7, 4371. 10.1038/s41598-017-04535-6.

      18. Faught, E., Santos, H.B., and Vijayan, M.M. (2020). Loss of the glucocorticoid receptor causes accelerated ovarian ageing in zebrafish. Proc Royal Soc B 287, 20202190. 10.1098/rspb.2020.2190.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses molecular dynamics simulations to understand how forces felt by the intracellular domain are coupled to the opening of the mechanosensitive ion channel NOMPC. The concept is interesting - as the only clearly defined example of an ion channel that opens due to forces on a tethered domain, the mechanism by which this occurs is yet to be fully elucidated. The main finding is that twisting of the transmembrane portion of the protein - specifically via the TRP domain that is conserved within the broad family of channels- is required to open the pore. That this could be a common mechanism utilised by a wide range of channels in the family, not just mechanically gated ones, makes the result significant. It is intriguing to consider how different activating stimuli can produce a similar activating motion within this family. However, the support for the finding can be strengthened as the authors cannot yet exclude that other forces could open the channel if given longer or at different magnitudes. In addition, they do not see the full opening of the channel, only an initial dilation. Even if we accept that twist is essential for this, it may be that it is not sufficient for full opening, and other stimuli are required.

      Strengths:

      Demonstrating that rotation of the TRP domain is the essential requirement for channel opening would have significant implications for other members of this channel family.

      Thank you for your positive summary and comments.

      Weaknesses:

      The manuscript centres around 3 main computational experiments. In the first, a compression force is applied on a truncated intracellular domain and it is shown that this creates both a membrane normal (compression) and membrane parallel (twisting) force on the TRP domain. This is a point that was demonstrated in the authors’ prior eLife paper - so the point here is to quantify these forces for the second experiment.

      The second experiment is the most important in the manuscript. In this, forces are applied directly to two residues on the TRP domain with either a membrane normal (compression) or membrane parallel (twisting) direction, with the magnitude and directions chosen to match that found in the first experiment. Only the twisting force is seen to widen the pore in the triplicate simulations, suggesting that twisting, but not compression can open the pore. This result is intriguing and there appears to be a significant difference between the dilation of pore with the two force directions.

      However, there are two caveats to this conclusion. Firstly, is the magnitude of the forces - the twist force is larger than the applied normal force to match the result of experiment 1. However, it is possible that compression could also open the pore at the same magnitude or if given longer. It may be that twist acts faster or more easily, but I feel it is not yet possible to say it is the key and exclude the possibility that compression could do something similar.

      Thank you for your insightful comment. As you pointed out, the membranenormal pushing forces exerted at residues E1571 and R1581 are approximately onethird and two-thirds, respectively, of the membrane-parallel twisting forces. These magnitudes were derived from a previous simulation (Wang et al., 2021), in which we decomposed the resultant force into its membrane-parallel and membrane-normal components upon applying a compressive force to the intracellular AR end. Our results indicated that, upon reaching the TRP helix, the induced twisting force is indeed greater, which partially reflects actual physiological conditions. Therefore, considering the magnitudes of the resultant forces alone, the twisting force is predominantly greater than the pushing force when the AR domain is subjected to compression.

      Then the question became, if forces of the same magnitude are applied in either the membrane-normal or membrane-parallel directions, what would the outcome be? To address this, we conducted additional simulations. Considering the situations discussed above, we applied a smaller membrane-parallel force instead of a larger membranenormal force that may disrupt the integrity of protein and membrane structure. As shown in the new Figure S6, we adjusted the applied membrane-parallel force to either half or one-third of the original value. When we applied half of the force used in the original setup, the channel opened in two out of three trajectories. When applying onethird of the force, the channel opened in one out of three trajectories. Together with our previous results, these findings suggest that if forces of equal magnitude are applied in the membrane-normal and membrane-parallel directions, the membrane-parallel force has a higher probability of inducing channel opening.

      Still, one cannot completely exclude the possibility that the pushing force on the TRP helix can open the channel if given a very long time. This becomes unfeasible to examine with MD simulations, so we investigated the likely conformational changes of multiple TRP family proteins upon opening, and found that the TRP rotation is a universal conformational change, while the TRP tilt is much less consistent (Figure 6). These findings gives us more confidence that the twist force plays a more crucial role in channel gating than the pushing force. We have added a new table (Table 1) and a new figure (Figure 6) to present this analysis.

      In addition, we did not intend to imply that compression is incapable of contributing to channel opening. In fact, our aim was to highlight that compression can generate both a twisting force and a pushing force, with the twisting force appearing to be the more critical component for facilitating channel opening. We concur that we cannot completely dismiss the possibility that the pushing component may also assist in channel opening. Consequently, we have revised our discussion on pages 4,6 to enhance clarity.

      I also note that when force was applied to the AR domain in experiment 1, the pore widened more quickly than with the twisting force alone, suggesting that compression is doing something to assist with opening.

      You are correct that the trajectory corresponding to Experiment 1 (Figure S1(b)) indicates pore opening around 300-400 ns, while the trajectory for Experiment 2 (800 ns) shows pore opening around 600 ns. This observation may suggest that the pore opens more rapidly in Experiment 1, assuming that the simulation conditions were identical for both experiments. However, it is important to note that in Experiment 1, an external force was applied to AR29. In contrast, in Experiment 2, the force was applied exclusively to two selected residues on the TRP domain, while other TRP residues also experienced mechanical forces, albeit to a lesser extent. The differing methods of force application in the two experiments complicate the comparison of pore opening speeds under these conditions.

      We acknowledge that the compression of the AR spring can facilitate pore opening. This compression generates both a twisting component and a pushing component on the TRP domain. Our simulations and structural analyses of multiple TRP channels suggest that the twisting component plays a predominant role in gating. However, we cannot entirely rule out the possibility that the pushing component may also contribute to this process. We have carefully revised our Result (page 6), Discussion (pages 10–12) and Methods (pages 14–17) sections to enhance clarity.

      Given that the forces are likely to be smaller in physiological conditions it could still be critical to have both twist and compression present. As this is the central aspect of the study, I believe that examining how the channel responds to different force magnitudes could strengthen the conclusions and recommend additional simulations be done to examine this.

      Thank you for your valuable comments. We agree that the force applied in Experiment 2 is possible to be larger than the physiological conditions. Therefore, we performed additional simulations to investigate the possibility of opening the pore using smaller torsional forces.

      As shown in the new Figure S6, we applied half and one-third of the original force and performed three replicate simulations for each condition. With half the force, the pore opened in two out of the three simulations. And with one-third of the applied force, the pore opened in one out of the three replicate simulations. The probability of pore opening within the same simulation time decreased as the applied force was reduced, consistent with our expectations. These new results are provided as supplementary figures (Figure S6) in the revised manuscript.

      We anticipate that further reductions in the forces will result in additional delays in the opening process; however, this would lead to prohibitive computational costs. Consequently, we have decided to conclude our analysis at this stage and have discussed this matter on page 6 of the revised manuscript.

      The second important consideration is that the study never sees a full pore opening, but rather a widening that is less than that seen in open state structures of other TRP channels and insufficient for rapid ion currents. This is something the authors acknowledge in their prior manuscript in eLife 2021. Although this may simply be due to the limited timescale of the simulations, it needs to be clearly stated as a caveat to the conclusions. Twist may be the key to getting this dilation, but we do not know if it is the key to full pore opening. To demonstrate that the observed dilation is a first step in the opening of pores, a structural comparison to open-state TRP channels would be beneficial in providing evidence that this motion is along the expected pathway of channel gating.

      We are grateful for this insightful comment. We acknowledge that our simulations do not capture a fully open state, but rather a dilation that is smaller than the open-state structures of other TRP channels. In our simulations, a pore radius exceeding 2 Å is considered as a partially open state, as this is generally sufficient for the permeation of water molecules or even small cations such as K<sup>+</sup> and Na<sup>+</sup> However, the passage of larger molecules and ions, such as Ca<sup>2+</sup> and clusters of hydrated ions, remains challenging. As you noted, this partial opening may be attributed to the limited timescale of the simulations.

      Furthermore, in accordance with your suggestion, we analyzed numerous TRP proteins for which multiple open or intermediate states have been resolved, and we have included a new figure (Figure 6). A clockwise rotation of the TRP domain is observed in the majority of these proteins upon gating. For instance, in the case of RnTRPV1, our analysis revealed that during TRPV1 activation, when different ligands are bound (RTX, DkTX), the pore undergoes gradual dilation, which involves a progressive clockwise rotation of the TRP domain. This analysis provides evidence that the observed motion aligns with expected gating transitions, supporting the notion that twist-induced TRP rotation and pore dilation may represent an initial step in the pore opening process.

      Nonetheless, we concur that further studies, including extended simulations, which are currently unfeasible, or experimental validation, will be necessary to ascertain whether our proposed mechanism is adequate for the complete opening of the pore. We have carefully discussed this on pages 10–12.

      Experiment three considers the intracellular domain and determines the link between compression and twisting of the intracellular AR domain. In this case, the end of the domain is twisted and it is shown that the domain compresses, the converse to the similar study previously done by the authors in which compression of the domain was shown to generate torque. While some additional analysis is provided on the inter-residue links that help generate this, this is less significant than the critical second experiment.

      Although experiment three is less significant in revealing the underlying gating mechanism, it provides quantitative measurements of the mechanical properties of the intriguing AR spring structure, which are currently challenging to obtain experimentally. These provide computational predictions for future experiments to validate.

      Reviewer #2 (Public review):

      This study uses all-atom MD simulation to explore the mechanics of channel opening for the NOMPC mechanosensitive channel. Previously the authors used MD to show that external forces directed along the long axis of the protein (normal to the membrane) result in AR domain compression and channel opening. This force causes two changes to the key TRP domains adjacent to the channel gate: 1) a compressive force pushes the TRP domain along the membrane normal, while 2) a twisting torque induces a clock-wise rotation on the TRP domain helix when viewing the bottom of the channel from the cytoplasm. Here, the authors wanted to understand which of those two changes is responsible for increasing the inner pore radius, and they show that it is the torque. The simulations in Figure 2 probe this question with different forces, and we can see the pore open with parallel forces in the membrane, but not with the membrane-normal forces. I believe this result as it is reproducible, the timescales are reaching 1 microsecond, and the gate is clearly increasing diameter to about 4 Å. This seems to be the most important finding in the paper, but the impact is limited since the authors already show how forces lead to channel opening, and this is further teasing apart the forces and motions that are actually the ones that cause the opening.

      Thank you for your insightful comments. We appreciate your recognition of our key finding that torque is responsible for increasing the inner pore radius. Indeed, our simulations illustrated in Figure 2 systematically explore the effects of different forces on pore opening. These results demonstrate that membrane-parallel forces are effective, while membrane-normal forces are not within the simulation time. We acknowledge that this study builds upon previous findings regarding force-induced channel opening. However, we believe that further decomposition of the specific forces and motions responsible for this process provides valuable mechanistic insights. By distinguishing the role of torque from the membrane-normal forces of the TRP helix, which is highly conserved across the TRP channel family, our work contributes to a more precise understanding of TRP channel gating. Moreover, in the revised manuscript, we conducted a systematic analysis of the structures of TRP family proteins and discovered that the clockwise rotation of the TRP domain is likely a universal gating mechanism among the TRP family, which significantly enhances and strengthens our original findings (Figure 6).

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Duan and Song interrogates the gating mechanisms and specifically force transmission in mechanosensitive NOMPC channels using steered molecular dynamics simulations. They propose that the ankyrin spring can transmit force to the gate through torsional forces adding molecular detail to the force transduction pathways in this channel.

      Strengths:

      Detailed, rigorous simulations coupled with a novel model for force transduction.

      Thank you for your positive comments.

      Weaknesses:

      Experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would greatly enhance the manuscript. I have some additional questions documented below:

      We attempted to measure the mechanical properties of the AR domain and conduct mutagenesis experiments in collaboration with Prof. Jie Yan’s laboratory at the Mechanobiology Institute, National University of Singapore; however, this proved to be a significant challenge at this time. Given the urgency of the publication, we have decided to first publish the computational results and reserve further experimental studies for future investigations.

      (1) The membrane-parallel torsion force can open NOMPC

      How does the TRP domain interact with the S4-S5 linker? In the original structural studies, the coordination of lipids in this region seems important for gating. In this manner does the TRP domain and S4-S5 linker combined act like an amphipathic helix as suggested first for MscL (Bavi et al., 2016 Nature Communications) and later identified in many MS channels (Kefauver et al., 2020 Nature).

      In our analysis of the compression trajectories (trajectory: CI-1, Figure S4), we identified stable interactions between the TRP domain and the S4-S5 linker. These interactions primarily involve the residues S1421 and F1422 of the S4-S5 linker, as indicated by the large pink data points in Figure S4. Therefore, we agree that the TRP helix and the S4–S5 linker can be considered an amphipathic helical unit, analogous to the amphipathic helix observed in MscL and other mechanosensitive channels. Moreover, the pocket adjacent to the S4-S5 linker has been recognized as a binding site for small molecules in other ligand-activated TRP channels, such as the vanilloid-binding TRPV1. We hypothesize that this unit is likely to play a critical role in the polymodal gating of the TRP channel family, including ligand-induced activation. In the revised manuscript, we have included an analysis of the interaction between the TRP domain and the transmembrane (TM) domain on page 4 (Figure S4), and we have briefly discussed its implications on pages 10 and 12.

      (2) Torsional forces on shorter ankyrin repeats of mammalian TRP channels

      Is it possible torsional forces applied to the shorter ankyrin repeats of mammalian TRPs may also convey force in a similar manner?

      This is an intriguing question.

      To answer your question, we studied the full-length squirrel TRPV1 (PDB: 7LQY, Nadezhdin et al. (2021)) using all-atom steered MD simulations. We applied pushing or torsional forces to the intracellular AR1-2 region of TRPV1, separately (Figure S10(a)). Similar to NOMPC, rotation of the TRP domain was observed under both types of mechanical stimulation (Figure S10(b-e)). The conformational change induced by the torsional force on the TRP domain resembles the change observed in NOMPC. This suggests that a torsional force applied to the shorter ankyrin repeats of mammalian TRPs may yield similar effects on channel gating. However, given that these ankyrin repeats do not act like tether elements, the implications of these results in the context of biological functions remain unclear. Additionally, in NOMPC, the AR domain is connected to the TRP domain through a linker helix (LH) domain, composed of multiple stacked helices that form a relatively compact structure (Figure 1(a)). In contrast, TRPV1 does not possess a similarly compact LH domain connecting the AR domain to the TRP domain (Figure S10(a)). These structural differences render our conclusions regarding NOMPC not directly applicable to TRPV1. We have included an additional discussion about this on page 12 (Figure S10).

      (3) Constant velocity or constant force

      For the SMD the authors write "and a constant velocity or constant force". It’s unclear from this reviewer’s perspective which is used to generate the simulation data.

      Thank you for pointing out this ambiguity. In our simulations, we first applied constant-velocity pulling to achieve specific force magnitudes, followed by constantforce pulling. This protocol allowed us to initiate the motion of the protein in a controlled manner and observe the response of the system under sustained forces. We have now clarified this in the revised Methods section.

      Reviewer #1 (Recommendations for the authors):

      The language in the paper requires some editing - particularly in the introduction. For example, what is meant by ion channels ’coalescing to form mechanical receptors’? Are the authors implying it requires multiple channels to form a receptor? It is stated that mechanically gated ion channels are only found in nerve endings when in fact they are found in almost every cell type. Another example is the statement ’In the meantime’ the TRP domain was observed to rotate when this observation came prior to the others mentioned before. While these sound like minor edits, they significantly change the meaning of the introduction. I recommend careful editing of the manuscript to avoid accidental inaccuracies like this.

      Thank you for your feedback on the clarity and accuracy of the introduction. We have carefully revised the manuscript, particularly the abstract and instroduction sections, to address these concerns:

      (1) We have reworded the original sentence ’These mechanosensitive ion channels, coalescing to form mechanical receptors, are strategically positioned within the sensory neuron terminals intricately nestled within the epidermal layer.’ into ’In both vertebrates and invertebrates, mechanosensitive ion channels are widely expressed in peripheral sensory neurons located near or within the surface tissues responsible for detecting mechanical stimuli.’

      (2) We have replaced the phrase "In the meantime" with "Interestingly" to introduce the conformational change of the TRP domain that we believe is crucial.

      (3) We have carefully reviewed the entire manuscript and used a language editing tool, Writefull integrated within Overleaf, to proof-check the language problems.

      Reviewer #2 (Recommendations for the authors):

      How do the energy values in Figure 3b, compare with the continuum energy values reported by Argudo et al. JGP (2019)? I wonder what value the authors would get with a new replicate run slower - say 200 ns total aggregate simulation? This would probe the convergence of this energy value. It seems important to determine whether the loading velocity of the experiments performed here with the steered MD is slow enough to allow the protein to relax and adopt lower energy configurations during the transition. The true loading is likely to occur on the millisecond timescale, not the nanosecond to low microsecond timescale. That said, I don’t mean to detract from the result in Figure 2, as this is likely quite solid in my opinion given the nearly 1 microsecond simulations and the replicates showing the same results.

      Thank you for your valuable suggestions. It is important to note that we calculated different physical quantities compared to those reported in Argudo’s study. In Figure 3b, we calculated the torque ( instead of the energy, although they share the same dimensional units) of the long AR bundle (AR9-29 of the four filaments combined) and subsequently determined its torsion coefficient. Argudo’s study calculated the torsional spring constant (𝑘<sub>ɵ</sub>) of three 6-AR-unit stretches of one filament, which were designated as ANK1 (AR 12-17), ANK2 (AR 17-22) and ANK3 (AR 22–27). As the four filaments are coupled within the bundled structure and the torsional axes differ between an individual filament and the four-filament bundle, a direct comparison of the torsional spring constants reported in the two studies is not meaningful.

      We agree that extending the simulation time may provide deeper insights into the convergence of energy values. In accordance with your suggestion, we conducted additional simulations to further investigate convergence and compare the results with our existing data, thereby ensuring robustness and consistency. Specifically, we slowed down the original operation of twisting from 10 degrees over 100 ns to 10 degrees over 200 ns, and extended the holding time for selected frames (sampled every 2.5 degrees) from 100 ns to 200 ns. We have updated Figure 3 and relevant main text accordingly (page 7). The results of the new simulations are similar to those of the previous ones, with the fitted torsion coefficient revised from (2.31 ± 0.44) × 10<sup>3</sup>kJ mol<sup>−1</sup>  ra<sup>−1</sup> 1 to (2.30 ± 0.31) × 10<sup>3</sup> kJmol<sup>−1</sup> rad<sup>−1</sup>  This close agreement indicates that our simulations are well-converged. Additionally, we updated the compression–twist coupling coefficient, , from (1.67 ± 0.14) nmrad<sup>−1</sup> to (1.32 ± 0.11) nmrad<sup>−1</sup>

      As you suggested, we conducted an additioanl analysis to determine whether the loading velocity/force with the steered MD is sufficiently slow to facilitate the relaxation of the protein and its adoption of lower-energy configurations during the transition. For simulations involving the application of membrane-normal or membrane-parallel force on the TRP domain, we utilized DSSP (Define Secondary Structure of Proteins) analysis to assess the stability of the secondary structure of the TRP domain. The results indicated that, during the application of external forces, the secondary structure of the TRP domain maintained good stability, as illustrated in Figure S11. For simulations involving the rotation of the AR domain, we also analyzed the DSSP of the AR9 to AR11 units, which are positioned directly above the AR8 domain where the twisting force is applied. The secondary structure of the AR domain also exhibited good stability (Figure S12). These are briefly discussed in the Methods section of the revised manuscript (page 17).

      It is unclear to me that the force transmission analysis in Figure 4 provides much insight into the mechanics of opening. Perhaps the argument was made, but I did not appreciate it. Related to this the authors state that the transfer velocity is 1.8 nm/ps based on their previous study. Is this value profound or is it simply the velocity of sound in the protein?

      The analysis of force transmission presented in Figure 4 offers detailed insights into the transfer of force along the AR domain. While this may appear straightforward, the information elucidates how a pushing force can induce a twisting force during its transmission through the AR spring structure, as well as the primary contributions that stabilize this transmission pathway. To enhance clarity, we have included an additional discussion on page 9.

      The force transfer velocity is expected to align with the velocity of sound within the protein. The value of 1.8 nm/ps, however, is specific to the unique structure of the AR spring, which is quite interesting to report in our opinion. Additionally, this rapid transfer speed suggests that the simulation timescale is sufficient for enabling the transfer of compression force from the bottom of the AR domain to the TRP domain in our simulations, given that the simulation timescale is considerably longer than the force propagation timescale within the protein.

      The methods description is largely complete, but is missing some details on the MD simulations (barostat, thermostat, piston constants, etc.).

      Thank you for pointing out the missing details; we have added the additional information in the revised Methods section.

      References

      Nadezhdin, K. D., A. Neuberger, Y. A. Nikolaev, L. A. Murphy, E. O. Gracheva, S. N. Bagriantsev, and A. I. Sobolevsky (2021). Extracellular cap domain is an essential component of the trpv1 gating mechanism. Nature communications 12(1), 2154.

      Wang, Y., Y. Guo, G. Li, C. Liu, L. Wang, A. Zhang, Z. Yan, and C. Song (2021). The pushto-open mechanism of the tethered mechanosensitive ion channel nompc. Elife 10, e58388.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This paper presents a comprehensive study of how neural tracking of speech is a ected by background noise. Using five EEG experiments and Temporal response function (TRF), it investigates how minimal background noise can enhance speech tracking even when speech intelligibility remains very high. The results suggest that this enhancement is not attention-driven but could be explained by stochastic resonance. These findings generalize across di erent background noise types and listening conditions, o ering insights into speech processing in real-world environments. I find this paper well-written, the experiments and results are clearly described. However, I have a few comments that may be useful to address.

      I thank the reviewer for their positive feedback.

      (1) The behavioral accuracy and EEG results for clear speech in Experiment 4 di er from those of Experiments 1-3. Could the author provide insights into the potential reasons for this discrepancy? Might it be due to linguistic/ acoustic di erences between the passages used in experiments? If so, what was the rationale behind using di erent passages across di erent experiments?

      The slight di erences in behavior and EEG magnitudes may be due to several factors. Di erent participants took part in the di erent experiments (with some overlap). Stories and questions were generated using ChatGPT using the same approach, but di erent research assistants have supported story and question generation, and ChatGPT advanced throughout the course of the study, such that di erent versions were used over time (better version control was only recently introduced by OpenAI). The same Google voice was used for all experiments, so this cannot be a factor. Most critically, within each experiment, assignment of speech-clarity conditions to di erent stories was randomized, such that statistical comparisons are una ected by these minor di erences between experiments. The noise-related enhancement generalizes across all experiments, showing that minor di erences in experimental materials do not impact it.

      (2) Regarding peak amplitude extraction, why were the exact peak amplitudes and latencies of the TRFs for each subject not extracted, and instead, an amplitude average within a 20 ms time window based on the group-averaged TRFs used? Did the latencies significantly di er across di erent SNR conditions?

      Estimation of peak latency can be challenging if a deflection is not very pronounced in a participant. Especially the N1 was small for some conditions. Using the mean amplitude in a specific time window is very common practice in EEG research that mitigates this issue. Another, albeit less common, approach is to use a Jackknifing procedure to estimate each participant’s latencies (Smulders 2010 Psychophysiology; although this may sometimes not work well). For the revision, I used the Jackknifing approach to estimate peak latencies for each participant and condition, and extracted the mean amplitude around the peak latency. As expected, this approach provides very similar e ects as reported in the main article, here exemplified for Experiments 1 and 2. The results are thus not a ected by this data analysis choice. The estimated latencies di ered across SNRs, e.g., the N1 increased with decreasing SNR (this is less surprising/novel and was thus not added to the manuscript to avoid increasing the amount of information).

      Author response image 1.

      P1-minus-N1 amplitude for Experiment 1 and 2, using amplitudes centered on individually estimated peak latencies. The asterisk indicates a significant di erence from the clear speech condition (FDR-thresholded).

      (3) How is neural tracking quantified in the current study? Does improved neural tracking correlate with EEG prediction accuracy or individual peak amplitudes? Given the di ering trends between N1 and P2 peaks in babble and speech-matched noise in experiment 3, how is it that babble results in greater envelope tracking compared to speech-matched noise?

      Neural tracking is generally used for responses resulting from TRF analyses, crosscorrelations, or coherence, where the speech envelope is regressed against the brain signals (see review of Brodbeck & Simon 2020 Current Opinion in Physiology). Correlations between EEG prediction accuracy and individual peak amplitudes was not calculated because the data used for the analyses are not independent. The EEG prediction accuracy essentially integrates information over a longer time interval (here 0–0.4 s), whereas TRF amplitudes are more temporally resolved. If one were to shorten the time interval (e.g., 0.08–0.12 s), then EEG prediction accuracy would look more similar to the TRF results (because the TRF is convolved with the amplitude-onset envelope of the speech [predicted EEG] before calculating the EEG prediction accuracy). Regarding the enhancement di erence between speech-matched noise and babble, I have discussed a possible interpretation in the discussion section. The result is indeed surprising, but it replicates across two experiments (Experiments 3 and 4), and is consistent with previous work using speech-matched noise that did not find the enhancement. I reproduce the part of the discussion here.

      “Other work, using a noise masker that spectrally matches the target speech, have not reported tracking enhancements (Ding and Simon, 2013; Zou et al., 2019; Synigal et al., 2023). However, in these works, SNRs have been lower (<10 dB) to investigate neural tracking under challenging listening conditions. At low SNRs, neural speech tracking decreases (Ding and Simon, 2013; Zou et al., 2019; Yasmin et al., 2023; Figures 1 and 2), thus resulting in an inverted u-shape in relation to SNR for attentive and passive listening (Experiments 1 and 2).”

      “The noise-related enhancement in the neural tracking of the speech envelope was greatest for 12talker babble, but it was also present for speech-matched noise, pink noise, and, to some extent, white noise. The latter three noises bare no perceptional relation to speech, but resemble stationary, background buzzing from industrial noise, heavy rain, waterfalls, wind, or ventilation. Twelve-talker babble – which is also a stationary masker – is clearly recognizable as overlapping speech, but words or phonemes cannot be identified (Bilger, 1984; Bilger et al., 1984; Wilson, 2003; Wilson et al., 2012b). There may thus be something about the naturalistic, speech nature of the background babble that facilitates neural speech tracking.”

      “Twelve-talker babble was associated with the greatest noise-related enhancement in neural tracking, possibly because the 12-talker babble facilitated neuronal activity in speech-relevant auditory regions, where the other, non-speech noises were less e ective.”

      (4) The paper discusses how speech envelope-onset tracking varies with di erent background noises. Does the author expect similar trends for speech envelope tracking as well? Additionally, could you explain why envelope onsets were prioritized over envelope tracking in this analysis?

      The amplitude-onset envelope was selected because several previous works have used the amplitude-onset envelope, our previous work that first observed the enhancement also used the amplitude-onset envelope, and the amplitude-onset envelope has been suggested to work better for speech tracking. This was added to the manuscript. For the manuscript revision, analyses were calculated for the amplitude envelope, largely replicating the results for the amplitude-onset envelope. The results for the amplitude envelope are now presented in the Supplementary Materials and referred to in the main text.

      “The amplitude-onset envelope was selected because a) several previous works have used it (Hertrich et al., 2012; Fiedler et al., 2017; Brodbeck et al., 2018a; Daube et al., 2019; Fiedler et al., 2019), b) our previous work first observing the enhancement also used the amplitude-onset envelope (Yasmin et al., 2023; Panela et al., 2024), and c) the amplitude-onset envelope has been suggested to elicit a strong speech tracking response (Hertrich et al., 2012). Results for analyses using the amplitude envelope instead of the amplitude-onset envelope show similar e ects and are provided in the Supplementary Materials (Figure 1-figure supplement 1).”

      Recommendations for the authors:

      (1) Include all relevant parameters related to data analysis where applicable. For example, provide the filter parameters (Line 154, Line 177, Line 172), and the default parameters of the speech synthesizer (Line 131).

      Additional filter information and parameter values are provided in the revised manuscript.

      (2) Please share the data and codes or include a justification as to why the data cannot be shared.

      Data and code are provided on OSF (https://osf.io/zs9u5/). A materials availability statement has been added to the manuscript.

      Reviewer #2 (Public review):

      The author investigates the role of background noise on EEG-assessed speech tracking in a series of five experiments. In the first experiment, the influence of di erent degrees of background noise is investigated and enhanced speech tracking for minimal noise levels is found. The following four experiments explore di erent potential influences on this e ect, such as attentional allocation, di erent noise types, and presentation mode. The step-wise exploration of potential contributors to the e ect of enhanced speech tracking for minimal background noise is compelling. The motivation and reasoning for the di erent studies are clear and logical and therefore easy to follow. The results are discussed in a concise and clear way. While I specifically like the conciseness, one inevitable consequence is that not all results are equally discussed in depth. Based on the results of the five experiments, the author concludes that the enhancement of speech tracking for minimal background noise is likely due to stochastic resonance. Given broad conceptualizations of stochastic resonance as a noise benefit this is a reasonable conclusion. This study will likely impact the field as it provides compelling support questioning the relationship between speech tracking and speech processing.

      I thank the reviewer for the positive review and thoughtful feedback.

      Recommendations for the authors:

      As mentioned in the public review, I like the conciseness. However, some points might benefit from addressing them.

      (1) The absence of comprehension e ects is on the one hand surprising, as the decreased intelligibility should (theoretically) be visible in this data. On the other hand, from my own experience, the generation of "good" comprehension questions is quite di icult. While it is mentioned in the methods section, that comprehension accuracy and gist rating go hand in hand, this is not the case here. I am wondering if the data here should be rather understood as "there is no di erence in intelligibility" or that comprehension assessment via comprehension questions is potentially not a valid measure.

      I assume that the reviewer refers to Experiment 1, where SNRs approximately below 15 dB led to reduced gist ratings (used as a proxy for speech intelligibility; Davis and Johnsrude, 2003, J Neurosci; Ritz et al., 2022, J Neurosci). That story comprehension accuracy does not decrease could be due to the comprehension questions themselves (as indicated by the reviewer, “good” questions can be hard to generate, potentially having low sensitivity). On the other hand, speech for the most di icult SNR was still ‘reasonably’ intelligible (gist ratings suggest ~85% of words could be understood), and participants may still have been able to follow the thread of the story. I do not further discuss this point in the manuscript, since it is not directly related to the noise-related enhancement in the neural tracking response, because the enhancement was present for high SNRs for which gist ratings did not show a di erence relative to clear speech (i.e., 20 dB and above).

      (2) However, if I understood correctly, the "lower" manipulation (same RMS for the whole sound stimulus) of experiment 3 was, what was also used in experiment 1. In experiment 3, unlike 1, there are comprehension e ects. I wondered if there are ideas about why that is.

      Yes indeed, the ‘lower’ manipulation in Experiment 3 was also used in Experiments 1, 2, 4, and 5. The generation of the stimulus materials was similar across experiments. However, a new set of stories and comprehension questions was used for each experiment and the participants di ered as well (with some overlap). These aspects may have contributed to the di erence. 

      (3) Concerning the prediction accuracy, for a naive reader, some surrounding information would be helpful: What is the purpose/expectation of this measure? Is it to show that all models are above chance?

      EEG prediction accuracy was included here, mainly because it is commonly used in studies using TRFs. A reader may wonder about EEG prediction accuracy if it were not reported. The hypotheses of the current study are related to the TRF weights/amplitude. This was added to the manuscript.

      “EEG prediction accuracy was calculated because many previous studies report it (e.g., Decruy et al., 2019; Broderick et al., 2021; Gillis et al., 2021; Weineck et al., 2022; Karunathilake et al., 2023), but the main focus of the current study is on the TRF weights/amplitude.”

      (4) Regarding the length of training and test data I got confused: It says per story 50 25-s snippets. As the maximum length of a story was 2:30 min, those snippets were mostly overlapping, right? It seems that depending on the length of the story and the "location within the time series" of the snippets, the number of remaining non-over-lapping snippets is variable. Also, within training, the snippets were overlapping, correct? Otherwise, the data for training would be too short. Again, as a naive reader, is this common, or can overlapping training data lead to overestimations?

      The short stories made non-overlapping windows not feasible, but the overlap unlikely a ects the current results. Using cross-correlation (Hertrich et al 2012 Psychophysiology; which is completely independent for di erent snippets) instead of TRFs shows the same results (now provided in the supplementary materials). In one of our previous studies where the enhancement was first observed (Yasmin et al. 2023 Neuropsychologia), non-overlapping data were used because the stories were longer. This makes any meaningful impact of the overlap very unlikely. Critically, speech-clarity levels were randomized and all analyses were conducted in the same way for all conditions, thus not confounding any of the results/conclusions. The methods section was extended to further explain the choice of overlapping data snippets.

      “Speech-clarity levels were randomized across stories and all analyses were conducted similarly for all conditions. Hence, no impact of overlapping training data on the results is expected (consistent with noise-related enhancements observed previously when longer stories and non-overlapping data were used; Yasmin et al., 2023). Analyses using cross-correlation, for which data snippets are treated independently, show similar results compared to those reported here using TRFs (Figure 1figure supplement 2).”

      (5) For experiment 1, three stories were clear, while the other 21 conditions were represented by one story each. Presumably, the ratio of 3:1 can a ect TRFs?

      TRFs were calculated for each story individually and then averaged across three stories: either three clear stories, or three stories in babble for neighboring SNRs. Hence, the same number of TRFs were averaged for clear and noise conditions, avoiding exactly this issue. This was described in the methods section and is reproduced here:

      “Behavioral data (comprehension accuracy, gist ratings), EEG prediction accuracy, and TRFs for the three clear stories were averaged. For the stories in babble, a sliding average across SNR levels was calculated for behavioral data, EEG prediction accuracy, and TRFs, such that data for three neighboring SNR levels were averaged. Averaging across three stories was calculated to reduce noise in the data and match the averaging of three stories for the clear condition.”

      (6) Was there an overlap in participants?

      Some participants took part in several of the experiments in separate sessions on separate days. This was added to the manuscript.

      “Several participants took part in more than one of the experiments, in separate sessions on separate days: 7, 7, 9, 9, and 14 (for Experiments 1-5, respectively) participated only in one experiment; 3 individuals participated in all 5 experiments; 68 unique participants took part across the 5 experiments.”

      (7) Can stochastic resonance also explain inverted U-shape results with vocoded speech?

      This is an interesting question. Distortions to the neural responses to noise-vocoding may reflect internal noise, but this would require additional research. For example, the Hauswald study (2022 EJN), showing enhancements due to noise-vocoding, used vocoding channels that also reduced speech intelligibility. The study would ideally be repeated with a greater number of vocoding channels to make sure the e ects are not driven by increased attention due to reduced speech intelligibility. I did not further discuss this in detail in the manuscript as it would go too far away from the experiments of the current study.

      (8) Typo in the abstract: box sexes is probably meant to say both sexes?

      This text was removed, because more detailed gender identification is reported in the methods, and the abstract needed shortening to meet the eLife guidelines.

      Reviewing Editor Comments:

      Interesting series of experiments to assess the influence of noise on cortical tracking in di erent conditions, interpreting the results with the mechanism of stochastic resonance.

      I thank the editor for their encouraging feedback.

      For experiment 2, the author wishes to exclude the role of attention, by making participants perform a visual task. Data from low performers on the visual task was excluded, to avoid that participants attended the spoken speech. However, from the high performers on the visual task, how can you be sure that they did not pay attention to the auditory stimuli as well (as auditory attention is quite automatic, and these participants might be good at dividing their attention)? I understand that you can not ask participants about the auditory task during the experiment, but did you ask AFTER the experiment whether they were able to understand the stimuli? I think this is crucial for your interpretation.

      Participants were not asked whether they were able to understand the stimuli. Participants would unlikely invest e ort/attention in understanding the stories in babble without a speech-related task. Nevertheless, for follow-up analyses, I removed participants who performed above 0.9 in the visual task (i.e., the high performers), and the di erence between clear speech and speech in babble replicates. In the plots, data from all babble conditions above 15 dB SNR (highly intelligible) were averaged, but the results look almost identical if all SNRs are averaged. Moreover, the correlation between visual task performance and the babble-related enhancement was not-significant. These analyses were added to the Supplementary Materials (Figure 2-figure supplement 1).  

      Statistics: inconsistencies across experiments with a lot of simple tests (FDR corrected) and in addition sometimes rmANOVA added - if interactions in rmANOVA are not significant then all the simple tests might not be warranted. So a bit of double dipping and over-testing here, but on the whole the conclusions do not seem to be overstated.

      The designs of the di erent experiments di ered, thus requiring di erent statistical approaches. Moreover, the di erent tests assess di erent comparisons. For all experiments, contrasting the clear condition to all noise conditions was the main purpose of the experiments. To correct for multiple comparison, the False Discovery Rate correction was used. Repeated-measures ANOVAs were conducted in addition to this – excluding the clear condition because it would not fit into a factorial structure (e.g., Experiment 3) or to avoid analyzing it twice (e.g., Experiment 5) – to investigate di erences between di erent noise conditions. There was thus no over-testing in the presented study.

      Small points:

      Question on methods: For each story, 50 25-s data snippets were extracted (Page 7, line 190). As you have stories with a duration of 1.5 to 2 minutes, does that mean there is a lot of overlap across data snippets? How does that influence the TRF/prediction accuracy?

      The short stories made non-overlapping windows not feasible, but the overlap unlikely a ects the current results. Using cross-correlation (Hertrich et al 2012 Psychophysiology; which is completely independent for di erent snippets) instead of TRFs shows the same results (newly added Figure 1-figure supplement 2). In one of our previous studies where the enhancement was first observed (Yasmin et al. 2023 Neuropsychologia), non-overlapping data were used because the stories were longer. This makes any meaningful impact of the overlap very unlikely. Critically, speechclarity levels were randomized and all analyses were conducted in the same way for all conditions, thus not confounding any of the results/conclusions. The methods section was extended to further explain the choice of overlapping data snippets.

      “Overlapping snippets in the training data were used to increase the amount of data in the training given the short duration of the stories. Speech-clarity levels were randomized across stories and all analyses were conducted similarly for all conditions. Hence, no impact of overlapping training data on the results is expected (consistent with noise-related enhancements observed previously when longer stories and non-overlapping data were used; Yasmin et al., 2023). Analyses using crosscorrelation, for which data snippets are treated independently, show similar results compared to those reported here using TRFs (Figure 1-figure supplement 2).”

      Results Experiment 3: page 17, line 417: no di erences were found between clear speech and masked speech - is this a power issue (as it does look di erent in the figure, Figure 4b)?

      I thank the editor for pointing this out. Indeed, I made a minor mistake. Two comparisons were significant after FDR-thresholding. This is now included in the revised Figure 4. I also made sure the mistake was not present for other analyses; which it was not.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      1. Evidence for a disulfide bridge contained in membrane-associated FGF2 dimers

      This aspect was brought up in detail by both Reviewer #1 and Reviewer #3. It has been addressed in the revised manuscript by (i) new experimental and computational analyses, (ii) a more detailed discussion of previous work from our lab in which experiments were done the reviewers were asking for and (iii) a more general discussion of known examples of disulfide formation in protein complexes with a particular focus on membrane surfaces facing the cytoplasm, the inner plasma membrane leaflet being a prominent example. Please find our detailed comments in our direct response to Reviewers #1 and #3, see below.

      1. Affinity towards PI(4,5)P2 comparing FGF2 dimers versus monomers

      This is an aspect that has been raised by Reviewer 3 along with additional comments on the interaction of FGF2 with PI(4,5)P2. Please find our detailed response below. With regard to PI(4,5)P2 affinity aspects of FGF2 dimers versus FGF2 monomers, we think that the increased avidity of FGF2 dimers with two high affinity binding pockets for PI(4,5)P2 are a good explanation for the different values of free energies of binding that were calculated from the atomistic molecular dynamics simulations shown in Fig. 9. This phenomenon is well known for many biomolecular interactions and is also consistent with the cryoEM data contained in our manuscript, showing a FGF2 dimer with two PI(4,5)P2 binding sites facing the membrane surface.

      1. C95-C95 FGF2 dimers as signaling units

      We have put forward this hypothesis since in structural studies analyzing the FGF ternary signaling complex consisting of FGF2, FGF receptor and heparin, FGF2 mutants were used that lack C95. Nevertheless, two FGF2 molecules are contained in FGF signaling complexes. In addition to the papers on the structure of the FGF signaling complex, we have cited work that showed that C95-C95 crosslinked FGF2 dimers are efficient FGF signaling modules (Decker et al, 2016; Nawrocka et al, 2020). Therefore, being based on an assembly/disassembly mechanism with the transient formation of poreforming FGF2 oligomers, we think it is an interesting idea that the FGF2 secretion pathway produces C95-C95 disulfide-linked FGF2 dimers at the outer plasma membrane leaflet that can engage in FGF2 ternary signaling complexes. While this is a possibility we put forward to stimulate the field, it of course remains a hypothesis which has been clearly indicated as such in the revised manuscript.

      Reviewer #1:

      1. Evidence for disulfide-bridged FGF2 dimers and higher oligomers on non-reducing versus reducing SDS gels

      The experiment suggested by Reviewer #1 is an important one that has been published by our group in previous work. In these studies, we found FGF2 oligomers analyzed on non-reducing SDS gels to be sensitive to DTT, turning the vast majority of oligomeric FGF2 species into monomers [(Müller et al, 2015); Fig. 3, compare panel D with panel H]. This phenomenon could be observed most clearly after short periods of incubations (0.5 hours) of FGF2 with PI(4,5)P2-containing liposomes. These findings constituted the original evidence for PI(4,5)P2-induced FGF2 oligomerization to depend on the formation of intermolecular disulfide bridges.

      In the current manuscript, we established the structural principles underlying this process and identified C95 to be the only cysteine residue involved in disulfide formation. Based on biochemical cross-linking experiments in cells, cryo-electron tomography, predictions from AlphaFold-2 Multimer and molecular dynamics simulations, we demonstrated a strong FGF2 dimerization interface in which C95 residues are brought into close proximity when FGF2 is bound to membranes in a PI(4,5)P2-dependent manner. These findings provide the structural basis by which disulfide bridges can be formed from the thiols contained in the side chains of two C95 residues directly facing each other in the dimerization interface. In the revised manuscript, we included additional data that further strengthen this analysis. In the experiments shown in the new Fig. 10, we combined chemical cross-linking with mass spectrometry, further validating the reported FGF2 dimerization interface. In addition, illustrated in the new Fig. 8, we employed a new computational analysis combining 360 individual atomistic molecular dynamics simulations, each spanning 0.5 microseconds, with advanced machine learning techniques. This new data set corroborates our findings, demonstrating that the C95-C95 interface self-assembles independently of C95-C95 disulfide formation, based on electrostatic interactions. Intriguingly, it is consistent with our experimental findings based on cross-linking mass spectrometry (new Fig. 10) where cross-linked peptides could also be observed with the C77/95A variant form of FGF2, suggesting a protein-protein interface whose formation does not depend on disulfide formation. Therefore, we propose that disulfide formation occurs in a subsequent step, representing the committed step of FGF2 membrane translocation with the formation of disulfide-bridged FGF2 dimers being the building blocks for pore-forming FGF2 oligomers.

      As a more general remark on the mechanistic principles of disulfide formation in different cellular environments, we would like to emphasize that it is a common misconception that the reducing environment of the cytoplasm generally makes the formation of disulfide bridges unlikely or even impossible. From a biochemical point of view, the formation of disulfide bridges is not limited by a reducing cellular environment but is rather controlled by kinetic parameters when two thiols are brought into proximity. Indeed, it has become well established that disulfide bridges can also be formed in compartments other than the lumen of the ER/Golgi system, including the cytoplasm. For example, viruses maturing in the cytoplasm can form stable structural disulfide bonds in their coat proteins (Locker & Griffiths, 1999; Hakim & Fass, 2010). Moreover, many cytosolic proteins, including phosphatases, kinases and transcriptions factors, are now recognized to be regulated by thiol oxidation and disulfide bond formation, formed as a post-transcriptional modification (Lennicke & Cocheme, 2021). In numerous cases with direct relevance for our studies on FGF2, disulfide bond formation and other forms of thiol oxidation occur in association with membrane surfaces. In fact, many of these processes are linked to the inner plasma membrane leaflet (Nordzieke & Medrano-Fernandez, 2018). Growth factors, hormones and antigen receptors are observed to activate transmembrane NADPH oxidases generating O2·-/H2O2 (Brown & Griendling, 2009). For example, the local and transient oxidative inactivation of membrane-associated phosphatases (e.g., PTEN) serves to enhance receptor associated kinase signaling (Netto & Machado, 2022). It is therefore conceivable that similar processes introduce disulfide bridges into FGF2 while assembling into oligomers at the inner plasma membrane leaflet. In the revised version of our manuscript, we have discussed the above-mentioned aspects in more detail, with the known role of NADPH oxidases in disulfide formation at the inner plasma membrane leaflet being highlighted.

      Reviewer #2:

      1. Potential effects of a C95A substitution on protein folding and comparison with a C95S substitution with regard to phenotypes observed in FGF2 secretion

      A valid point that we indeed addressed at the beginning of this project. Most importantly, we tested whether both FGF2 C95A and FGF2 C95S are characterized by severe phenotypes in FGF2 secretion efficiency. As shown in the revised Fig. 1, cysteine substitutions by serine showed very similar FGF2 secretion phenotypes compared to cysteine to alanine substitutions (Fig. 1C and 1D). In addition, in the pilot phase of this project, we also compared recombinant forms of FGF2 C95A and FGF2 C95S in various in vitro assays. For example, we tested the full set of FGF2 variants in membrane integrity assays as the ones contained in Fig. 4. As shown in Author response image 1, FGF2 variant forms carrying a serine in position 95 behaved in a very similar manner as compared to FGF2 C95A variant forms. Relative to FGF2 wild-type, membrane pore formation was strongly reduced for both types of C95 substitutions. By contrast, both FGF2 C77S and C77A did show activities that were similar to FGF2 wild-type.

      Author response image 1.

      From these experiments, we conclude that changes in protein structure are not the basis for the phenotypes we report on the C95A substitution in FGF2.

      1. Effects of a C77A substitution on FGF2 membrane recruitment in cells

      The effect of a C77A substitution in FGF2 recruitment to the inner plasma membrane leaflet is indeed a moderate one. This is likely to be the case because C77 is only one residue of a more complex surface that contacts the α1 subunit of the Na,K-ATPase. Stronger effects can be observed when K54 and K60 are changed, residues that are positioned in close proximity to C77 (Legrand et al, 2020). Nevertheless, as shown in the revised Fig. 1, we consistently observed a reduction in membrane recruitment when comparing FGF2 C77A with FGF2 wild-type. When analyzing the raw data without GFP background subtraction, a significant reduction of FGF2 C77A was observed compared to FGF2 wild-type (Fig. 1A and 1B). We therefore conclude that C77 does not only play a role in FGF2/α1 interactions in biochemical assays using purified components (Fig. 7) but also impairs FGF2/α1 interactions in a cellular context (Fig. 1A and 1B).

      1. Identity of the protein band in Fig. 3 labeled with an empty diamond

      This is a misunderstanding as we did not assign this band to a FGF2-GFP dimer. When we produced the corresponding cell lines, we used constructs that link FGF2 with GFP via a ‘self-cleaving’ P2A sequence. During translation, even though arranged on one mRNA, this causes the production of FGF2 and GFP as separate proteins in stoichiometric amounts, the latter being used to monitor transfection efficiency. However, a small fraction is always expressed as a complete FGF2-P2A-GFP fusion protein (a monomer). This band can be detected with the FGF2 antibodies used and was labeled in Fig. 3 by an empty diamond.

      1. Labeling of subpanels in Fig. 5A

      We have revised Fig. 5 according to the suggestion of Reviewer #2.

      1. FGF2 membrane binding efficiencies shown in Fig. 5C

      It is true that FGF2 variant forms defective in PI(4,5)P2-dependent oligomerization (C95A and C77/95A) bind to membranes with somewhat reduced efficiencies. This is also evident form the intensity profiles shown in Fig. 5A and was observed in biochemical in vitro experiments as well. A plausible explanation for this phenomenon would be the increased avidity when FGF2 oligomerizes, stabilizing membrane interactions (see also Fig. 9B).

      1. Residual activities of FGF2 C95A and C77/95A in membrane pore formation?

      We do not assign the phenomenon in Fig. 5 Reviewer #2 is referring to as controlled activities of FGF2 C95A and C77/95A in membrane pore formation. Rather, GUVs containing PI(4,5)P2 are relatively labile structures with a certain level of integrity issues upon protein binding and extended incubation times being conceivable. It is basically a technical limitation of this assay with GUVs incubated with proteins for 2 hours. Even after substitution of PI(4,5)P2 with a Ni-NTA membrane lipid, background levels of loss of membrane integrity can be observed (Fig. 6). Therefore, as compared to FGF2 C95A and C77/95A, the critical point here is that FGF2 wt and FGF2 C77A do display significantly higher levels of a loss of membrane integrity in PI(4,5)P2-containing GUVs, a phenomenon that we interpret as controlled membrane pore formation. By contrast, all variant forms of FGF2 show only background levels for loss of membrane integrity in GUVs containing the Ni-NTA lipid.

      1. Why does PI(4,5)P2 induce FGF2 dimerization?

      This has been studied extensively in previous work (Steringer et al, 2017). As also discussed in the current manuscript, the interaction of FGF2 with membranes through its high affinity PI(4,5)P2 binding pocket orients FGF2 molecules on a 2D surface that increase the likelihood of the formation of the C95containing FGF2 dimerization interface. Moreover, in the presence of cholesterol at levels typical for plasma membranes, PI(4,5)P2 clusters containing up to 4 PI(4,5)P2 molecules (Lolicato et al, 2022), a process that may further facilitate FGF2 dimerization.

      1. Is it possible to pinpoint the number of FGF2 subunits in oligomers observed in cryo-electron tomography?

      We indeed took advantage of the Halo tags that appear as dark globular structures in cryo-electron tomography. For most FGF2 oligomers with FGF2 subunits on both sides of the membrane, we could observe 4 to 6 Halo tags which is consistent with the functional subunit number that has been analyzed for membrane pore formation (Steringer et al., 2017; Sachl et al, 2020; Singh et al, 2023). However, since the number of higher FGF2 oligomers we observed in cryo-electron tomography was relatively small and the nature of these oligomers appears to be highly dynamic, caution should be taken to avoid overinterpretation of the available data.

      Reviewer #3:

      1. Conclusive demonstration of disulfide-linked FGF2 dimers

      A similar point was raised by Reviewer #1, so that we would like to refer to our response on page 2, see above.

      1. Identity of FGF2-P2A-GFP observed in Fig. 3

      Again, a similar point has been made, in this case by Reviewer #2 (Point 3). The observed band is not a FGF2-P2A-GFP dimer but rather the complete FGF2-P2A-GFP fusion protein (a monomer) that corresponds to a small population produced during mRNA translation where the P2A sequence did not cause the production of FGF2 and GFP as separate proteins in stoichiometric amounts.

      1. Quantification of GFP signals in Fig. 6

      Fig. 6 has been revised according to the suggestion of Reviewer #3. A comprehensive comparison of PI(4,5)P2 and the Ni-NTA membrane lipid in FGF2 membrane translocation assays is also contained in previous work that introduced the GUV-based FGF2 membrane translocation assay (Steringer et al., 2017).

      1. Experimental evidence for various aspects of FGF2 interactions with PI(4,5)P2

      Most of the points raised by Reviewer #3 have been addressed in previous work. For example, FGF2 has been demonstrated to dimerize only on membrane surfaces containing PI(4,5)P2 (Müller et al., 2015). In solution, FGF2 remained a monomer even after hours of incubation as analyzed by native gel electrophoresis and reducing vs. non-reducing SDS gels (see Fig. 3 in Müller et al, 2015). In the same paper, the first evidence for a potential role of C95 in FGF2 oligomerization has been reported, however, at the time, our studies were limited to FGF2 C77/95A. In the current manuscript, the in vitro experiments shown in Figs. 2 to 6 establish the unique role of C95 in PI(4,5)P2-dependent FGF2 oligomerization. As discussed above, FGF2 oligomers have been shown to contain disulfide bridges based on analyses on non-reducing gels in the absence and presence of DTT (Müller et al., 2015).

      References

      Brown DI, Griendling KK (2009) Nox proteins in signal transduction. Free Radic Biol Med 47: 1239-1253 Decker CG, Wang Y, Paluck SJ, Shen L, Loo JA, Levine AJ, Miller LS, Maynard HD (2016) Fibroblast growth factor 2 dimer with superagonist in vitro activity improves granulation tissue formation during wound healing. Biomaterials 81: 157-168

      Hakim M, Fass D (2010) Cytosolic disulfide bond formation in cells infected with large nucleocytoplasmic DNA viruses. Antioxid Redox Signal 13: 1261-1271

      Legrand C, Saleppico R, Sticht J, Lolicato F, Muller HM, Wegehingel S, Dimou E, Steringer JP, Ewers H, Vattulainen I et al (2020) The Na,K-ATPase acts upstream of phosphoinositide PI(4,5)P2 facilitating unconventional secretion of Fibroblast Growth Factor 2. Commun Biol 3: 141

      Lennicke C, Cocheme HM (2021) Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81: 3691-3707

      Locker JK, Griffiths G (1999) An unconventional role for cytoplasmic disulfide bonds in vaccinia virus proteins. J Cell Biol 144: 267-279

      Lolicato F, Saleppico R, Griffo A, Meyer A, Scollo F, Pokrandt B, Muller HM, Ewers H, Hahl H, Fleury JB et al (2022) Cholesterol promotes clustering of PI(4,5)P2 driving unconventional secretion of FGF2. J Cell Biol 221

      Müller HM, Steringer JP, Wegehingel S, Bleicken S, Munster M, Dimou E, Unger S, Weidmann G, Andreas H, GarciaSaez AJ et al (2015) Formation of Disulfide Bridges Drives Oligomerization, Membrane Pore Formation and Translocation of Fibroblast Growth Factor 2 to Cell Surfaces. J Biol Chem 290: 8925-8937

      Nawrocka D, Krzyscik MA, Opalinski L, Zakrzewska M, Otlewski J (2020) Stable Fibroblast Growth Factor 2 Dimers with High Pro-Survival and Mitogenic Potential. Int J Mol Sci 21

      Netto LES, Machado L (2022) Preferential redox regulation of cysteine-based protein tyrosine phosphatases: structural and biochemical diversity. FEBS J 289: 5480-5504

      Nordzieke DE, Medrano-Fernandez I (2018) The Plasma Membrane: A Platform for Intra- and Intercellular Redox Signaling. Antioxidants (Basel) 7

      Sachl R, Cujova S, Singh V, Riegerova P, Kapusta P, Muller HM, Steringer JP, Hof M, Nickel W (2020) Functional Assay to Correlate Protein Oligomerization States with Membrane Pore Formation. Anal Chem 92: 14861-14866

      Singh V, Macharova S, Riegerova P, Steringer JP, Muller HM, Lolicato F, Nickel W, Hof M, Sachl R (2023) Determining the Functional Oligomeric State of Membrane-Associated Protein Oligomers Forming Membrane Pores on Giant Lipid Vesicles. Anal Chem 95: 8807-8815

      Steringer JP, Lange S, Cujova S, Sachl R, Poojari C, Lolicato F, Beutel O, Muller HM, Unger S, Coskun U et al (2017) Key steps in unconventional secretion of fibroblast growth factor 2 reconstituted with purified components. eLife 6: e28985

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1. This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      1. The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data below.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      1. FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      1. All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      1. Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.  

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 1c. Why does the viability of PARG KO cells improve at higher doses of PARGi? How do the authors explain this paradox?

      This phenomenon was observed in 293A PARG KO cells and happened in CellTiter-Glo assay, especially with the top three PARGi concentrations (100 µM, 33.33 µM and 11.11 µM). This may due to the low solubility of this PARGi in the medium, since we sometimes observed precipitation at high concentrations when PARGi stock was diluted in medium.

      1. Figure 2d. The authors show that PARGi reduced NAD+ level by 20%. This reduction in NAD+ probably does not explain the cell death phenotype observed by parthanatos cell death. What pathway is activated by PARGi to induce cell death?

      Since PARG KO cells treated with PARGi led to uncontrolled pADPr accumulation, it is possible that some of these cells may die due to parthanotos. However, we did not observe a dramatic reduction in NAD+ level. A previous study showed that Parg(-/-) mouse ES cells predominantly underwent caspase-dependent apoptosis (Shirai et al., 2013). Indeed, PARP1 cleavage was detected in PARG KO cells with prolonged PARGi treatment, indicating that at least some of these cells die due to apoptosis (Fig. 2A). Cytotoxicity of PARGi in PARG KO cells may due to several mechanisms including apoptosis, parthanatos and NAD+ reduction.

      1. The authors refer to FK866 in the text without explaining what this agent is. FK866 is a noncompetitive inhibitor of nicotinamide phosphoribosyltransferase (NAPRT), a key enzyme in the regulation of NAD+ biosynthesis from the natural precursor nicotinamide. The authors should explain experimental tools in the text as they use them for clarity to the reader.

      Thanks for the suggestion! We will include additional citations and discuss how FK866 works in our revised manuscript.

      1. In addition to these issues, there are significant formatting and textual problems, such that there are multiple gaps in the body of the text that make coherent reading of the manuscript impossible. Examples are: Page 3 line 10. Page 6 line 5 and line 15, Page 7 line 2, 3, and line 8. Page 8, line 1, and line 3 from bottom. Page 9 line 1, line 7 from bottom and line 9 from the bottom, Page 18 of the results in several places, etc. etc. etc. These formatting errors convey the impression that the submitting authors did not adequately review the manuscript for technical problems prior to submission. The authors need to correct these errors.

      Sorry, we will edit the text and remove these gaps as suggested.

      Reviewer #3 (Recommendations For The Authors):

      1. The major problem with this paper is conceptual - namely, how could PARG knockout cells be hypersensitive to a selective PARG small molecular inhibitor. The evidence in Figure 7 that there is measurable residual PARG activity in the so-called PARG KO 293A and HeLa cells provides a partial explanation for why PARG inhibitor treatment might be deleterious to the PARG KO cells, i.e., because PARGi blocks this residual PARG activity. However, although the authors characterized the PARG alleles in the 293A PARG KO cells by sequencing, the molecular origin of the significant level of residual PARG activity remains unclear (see points 7-9).

      Yes, in our study we showed that PARGi treatment inhibited the residual PARG activity in PARG KO cells, which mimics complete loss of PARG as PARG is an essential gene. These data agree with a previous study using Parg(-/-) mouse cells (Koh et al., 2004).We attempted to define the molecular origin of the residual PARG activity, unfortunately this was challenging (please see below for additional discussions). Nevertheless, we showed that residual PARG activity could be detected in PARG KO cells and more importantly cells with reduced PARG expression or activity are sensitive to PARGi. These results indicate that PARG expression and/or activity may be used as a biomarker for PARGi-based therapy.

      1. Although the most obvious explanation for the PARGi sensitivity data presented in Figures 1-4 is that the PARG KO cells have residual PARG activity, the authors wait until the discussion on page 26 to raise the possibility that the PARG KO cells might have residual PARG activity that renders them sensitive to PARGi. It would be more logical to move the PARG activity data in Figure 7 earlier in the paper as a supplementary figure, so that the reader is not left wondering how a PARG KO cell remains sensitive to a PARG inhibitor. For this reason, it is recommended that the whole paper be reorganized and rewritten to provide a more logical flow that allows the reader to understand what was done, and why it is hard to generate complete PARG KO cells because the accumulation of pADPR adducts is toxic to the cell.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Exactly how PARG activity would be coordinated with PARP1/2 activity during normal S phase to ensure that PARylation can serve its required function, whatever that may be, and is then removed by PARG is unclear - how would this be orchestrated at the level of a replication fork?

      PARG is known to be recruited to sites of DNA damage through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Our current hypothesis is that PARP1 is one of the major PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression. Precisely how PARG regulates S phase progression warrants further investigation.

      1. Figure 2B: What gRNAs were used to generate the 293A and HeLa PARG knock clones, i.e., where are they located in the PARG gene? If they are not in the catalytic domain it might be possible to generate PARG proteins with N-terminal deletions that are still active (see points 8-10 below).

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends to show the localization of gRNAs.

      We agree with this reviewer that truncated but active forms of PARG exist in these KO cells. We attempted to identify these trunated forms of PARG by using two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform/truncated form was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. Based on these results, we stated that the residual PARG activity was detected in our KO cells, but we were not able to specify the truncated variants of PARG in these cells.

      Author response image 4.

      1. Figure 3B/page 19: The authors state that "emetine, which diminishes Okazaki fragments, greatly inhibited S phase pADPr signaling in PARG KO cells", and from this deduced that Okazaki fragments on the lagging strand activate PARylation. However, emetine is not a specific lagging strand synthesis inhibitor, as implied here, but rather a protein synthesis inhibitor, which inhibits Okazaki fragment formation indirectly (see PMID: 36260751). The authors need to rewrite this section to explain how emetine works in this context.

      As suggested, we will cite this reference and discuss how emetine inhibits Okazaki fragment maturation in our revised manuscript. Additionally, we used three different POLA1 inhibitors to diminish Okazaki fragments. As shown in Fig. S3B, all three POLA1 inhibitors significantly abolished S-phase pADPr induced by PARGi in PARG KO cells. Furthermore, POLA1 inhibitors, adarotene and CD437, were able to rescue cell lethality caused by PARGi in PARG KO cells (Fig. 3E).

      1. Figure 7: It is not clear why these cells are called PARG complete/conditional KO cells (cKO). Generally, "conditional knockout" refers to a cell or animal in which a gene can be conditionally knocked out by inducible expression of Cre. Here, it appears that "conditional" refers to the fact that the PARG KO cells only grow in the presence of olaparib - is this the case?

      Yes, we used the name to separate these cells from our initial PARG KO cells. Moreover, we were only able to obtain and maintain these PARG cKO clones with complete loss of PARG activity in the presence of PARP inhibitor. Therefore, we called them PARG complete/conditional KO (cKO) cells.

      1. Figure 7B and D: The level of full-length PARG protein was much lower in the 293A and HeLa cKO cells compared to WT cells consistent with cKO cells representing a more complete PARG KO. The level of PARG protein in the 293A PARG cKO cells was apparently also lower than in the original PARG KO cells, but the KO and cKO samples should be run side by side to demonstrate this conclusively, and the bands need to be quantified. In panel B, it is not clear from the legend what cKO_3 and cKO_4 are, but presumably, they are different clones, and this should be stated.

      Full-length PARG was not detected in either PARG KO or PARG cKO cells by WB. The apparent lower level of endogenous PARG in Fig. 7D was due to the fact that reconstituted cells had high exogenous PARG expression and therefore we had to reduce exposure time for WB.

      As for cKO_3 and cKO_4 in Fig.7, they are different clones created by different sgRNAs. As suggested, we will include additional information in figure legends to clearly state which sgRNA was used to generate the respective KO and cKO clones.

      1. Figure S8: There is not enough information here or in the text to allow the reader to interpret these PARG allele sequences obtained from the PARG KO cells. From the Methods section, it appears that the PARG KO cells were clonal, with sequence data from one clone of each of the 293A and HeLa cell PARG KO cells being shown. If this is right, then in both cell types one out of four PARG alleles is wild type, and therefore one would expect the PARG protein signal to be ~25% of that in WT cells. However, based on the 293A PARG KO cells PARG immunoblot in Figure 2B the PARG protein signal is clearly much lower than 25% (these bands need to be quantified), and this discrepancy needs to be explained. What is the level of PARG protein in the PARG KO HeLa cells? If different PARG KO cell clones are analyzed by sequencing, do they all have an apparently intact PARG allele? Four different gRNA target sites in the PARG gene are shown in panel A in Figure 7, but the description in the text regarding how the four gRNAs were used is totally inadequate - were all four used simultaneously or only the two in the catalytic domain? Were pairs of gRNAs used in an attempt to generate a large intervening deletion - some Southern blots of the PARG gene region in the PARG cKO cells are needed to figure this out. The gRNAs are given numbers in Figure 7A, but it is unclear from the sequences shown in Figures S8 and S9 which gRNA sites are shown. All of this has to be clarified, so that the reader can understand the nature of the KO/cKO cells knockout alleles, and what PARG-related products, if any, they can express.

      Yes, all KO and cKO cells used in this study are single clones. As suggested, we will revise figure legends in Fig.7, S8 and S9 to include detailed information. To avoid any further misunderstanding, we will label the allele “WT” to “WT (reference)” in Fig. S8 and S9. We did not detect intact/wild-type PARG sequence in any single KO/cKO clone by DNA sequencing. Sequencing of single KO/cKO clones was performed by using TOP TA Cloning kit. Briefly, genomic DNA was extracted from each single KO/cKO clone. Approximately 300bp surrounding the sgRNA targeting sequence was amplified by PCR. The PCR product was cloned into the vector and approximately 10-15 bacteria clones were extracted and sent for sequencing. If any intact/wild-type PARG sequence was detected in these 10-15 bacteria clones, this KO/cKO clone was considered heterozygous clone and discarded.

      HEK293A and HeLa cells are not diploid cells and have complex karyotypes. PARG gene is located on chromosome 10. Karyotyping by M-FISH shows that HeLa cells have 3 copies of chromosome 10 (Landry et al., 2013). HEK293 cells predominantly have 3 copies of chromosome 10 and sometimes 4 copies can be detected by G-banding (Binz et al., 2019). Therefore, it is anticipated that 1 to 4 mutant alleles would be detected in each KO/cKO clone by sequencing.

      Only one sgRNA was transfected into cells for the selection of single clones. We did not use paired or multiple sgRNAs in any of these experiments. As shown in Fig. S1D and Fig. 7A, HEK293A derived and HeLa derived PARG KO single clones were generated with the use of different sgRNAs. In addition, the two PARG cKO single clones from HEK293A and HeLa cells were also generated by the use of two different sgRNAs, as shown in Fig. 7A-B. We will include all the information above in the revised manuscript, i.e. in Methods section as well as in figure legends.

      1. Figure S9A: The sequences of the 293A PARG alleles in the cKO cells suggest that these cells also have one intact PARG allele, which again does not fit with the very low level of intact PARG protein shown in Figure 7B. How do the authors explain this?

      Sorry, this is a misunderstanding. The allele “WT” in Fig. S8 and S9 is the reference sequence. We will change it to “Reference sequence” to avoid further confusion. As mentioned above, we did not detect any intact/wild-type PARG sequence in any of our single KO/cKO clones by sequencing.

      1. Figure S9B: These critical lysate activity data show that the PARG KO cells have ~50% of the PARG activity detected in WT cells. However, this is not consistent with the PARG protein level detected in PARG immunoblot in Figure 1B, which appears to be less than 5% of the PARG protein level in WT cells (with one intact PARG allele in these cells one would theoretically expect~ 25%, although this depends on whether all four alleles are expressed equally). One possibility is that active PARG fragments are generated from one or more of the PARG KO alleles in the PARG KO cells. Targeted sequencing of PARG mRNAs might reveal whether there are shorter RNAs that could encode a protein containing the C-terminal catalytic domain (aa 570-910). In addition, the authors need to show the entire immunoblot to determine if there are smaller proteins recognized by the anti-PARG antibodies that might represent shorter PARG gene products (for this we need to know where the epitope against which the PARG antibodies are directed are located within the PARG protein - ideally they authors need to use an antibody directed against an epitope near the C-terminus).

      As stated in the Methods section, we incubated cell lysates with substrates overnight to evaluate the maximum level of pADPr hydrolysis, i.e. PARG activity, we were able to detect in this assay. It is very likely that the PARG activity in PARG KO cells was much lower than 50%, due to saturation of signals for lysates isolated from wild-type cells. Thus, the data presented in our manuscript probably underestimate the reduction of PARG activity in PARG KO cells. Nevertheless, these data indicate that residual PARG activity was detected in PARG KO cells, however this activity was absent in PARG cKO cells.

      As aforementioned, we used two independent antibodies that recognize the C-terminus of PARG for WB. Unfortunately, we could not draw a clear conclusion which functional isoforms or truncated proteins were expressed in our PARG KO cells. The dePARylation assay used here may be the best way to test the residual PARG activity in our KO and cKO cells.

      1. Figure 7D: In this experiment, the level of re-expressed WT PARG protein was much higher than that of the endogenous PARG protein (quantification is needed) - how might this affect the interpretation of these experiments (N.B., WT and catalytically-dead PARG were also re-expressed for the experiments shown in Figure 1, but there are no PARG immunoblots to demonstrate how much the exogenous proteins were overexpressed, or activity measurements). If regulated pADPr signaling is important for a normal S phase, then one would have thought that expressing a very high level of active PARG would create problems.

      In Fig. S1E, we blotted endogenous PARG level in control cells and exogenous PARG level in reconstituted cells. The reviewer is correct that exogenous PARG expression was much higher (~10-fold) than that of endogenous PARG in WT control cells. Nevertheless, we did not observe any obvious phenotypes in PARG KO/cKO cells reconstituted with high level of exogeneous PARG, which may reflect excess PARG level/activity in wild-type control cells.

      References:

      Binz, R. L., Tian, E., Sadhukhan, R., Zhou, D., Hauer-Jensen, M., and Pathak, R. (2019). Identification of novel breakpoints for locus- and region-specific translocations in 293 cells by molecular cytogenetics before and after irradiation. Sci Rep 9, 10554.

      Hanzlikova, H., Kalasova, I., Demin, A. A., Pennicott, L. E., Cihlarova, Z., and Caldecott, K. W. (2018). The Importance of Poly(ADP-Ribose) Polymerase as a Sensor of Unligated Okazaki Fragments during DNA Replication. Mol Cell 71, 319-331 e313.

      Koh, D. W., Lawler, A. M., Poitras, M. F., Sasaki, M., Wattler, S., Nehls, M. C., Stoger, T., Poirier, G. G., Dawson, V. L., and Dawson, T. M. (2004). Failure to degrade poly(ADP-ribose) causes increased sensitivity to cytotoxicity and early embryonic lethality. Proc Natl Acad Sci U S A 101, 17699-17704.

      Kumamoto, S., Nishiyama, A., Chiba, Y., Miyashita, R., Konishi, C., Azuma, Y., and Nakanishi, M. (2021). HPF1-dependent PARP activation promotes LIG3-XRCC1-mediated backup pathway of Okazaki fragment ligation. Nucleic Acids Res 49, 5003-5016.

      Landry, J. J., Pyl, P. T., Rausch, T., Zichner, T., Tekkedil, M. M., Stutz, A. M., Jauch, A., Aiyar, R. S., Pau, G., Delhomme, N., et al. (2013). The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213-1224.

      Mortusewicz, O., Fouquerel, E., Ame, J. C., Leonhardt, H., and Schreiber, V. (2011). PARG is recruited to DNA damage sites through poly(ADP-ribose)- and PCNA-dependent mechanisms. Nucleic Acids Res 39, 5045-5056.

      Shirai, H., Fujimori, H., Gunji, A., Maeda, D., Hirai, T., Poetsch, A. R., Harada, H., Yoshida, T., Sasai, K., Okayasu, R., and Masutani, M. (2013). Parg deficiency confers radio-sensitization through enhanced cell death in mouse ES cells exposed to various forms of ionizing radiation. Biochem Biophys Res Commun 435, 100-106.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early-onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in night-time sleep loss, consistent with the well supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in night-time sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are night-time sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause night-time sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–supplement 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts (Author response image 1):

      – One indication is shared by 4/7 knockouts: “opioid dependence” (significant for appa/appb, psen1, apoea/apoeb, cd2ap).

      – Four targets are shared by 4/7 knockouts: “strychnine-binding glycine receptor” (psen1, apoea/apoeb, clu, sorl1); “neuronal acetylcholine receptor beta-2” (psen1, apoea/apoeb, cd2ap, clu); thyroid peroxidase (psen1, apoea/apoeb, cd2ap, clu); carbonic anhydrase IV (appa/appb, psen1, psen2, cd2ap).

      – Three KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1); tyrosine metabolism (psen2, apoea/apoeb, cd2ap, clu, sorl1); and “nitrogen metabolism” (appa/appb, psen1, psen2, apoea/apoeb, cd2ap).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. Indication “depression” is only significant for sorl1 knockouts; target “serotonin transporter” is also significant for appa/appb and psen2 knockouts; and KEGG pathway “serotonergic synapse” is also significant for psen2 knockouts. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on AD. For example, the first four drugs ever approved by the FDA to treat AD were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in AD.

      Author response image 1.

      Common predictions from ZOLTAR for the seven Alzheimer’s risk genes tested. Predictions from ZOLTAR which are shared by multiple knockout behavioural fingerprints presented in the study. Only indications, targets, and KEGG pathways which are significant for at least three of the seven knockouts tested are shown, ranked from the annotations which are significant for the largest number of knockouts.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (MyersTurnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR may seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      While working on the Author Response, we made some changes to the analysis ran by ZOLTAR to calculate enrichments (see Methods and github.com/francoiskroll/ZOLTAR, notes on v2). With the new version, 5-HT receptor type 2 is not a significantly enriched target for the sorl1 knockout fingerprint but type 4 is. 5-HT receptor type 4 was also shown to interact with sorting nexin 27, a subunit of retromer, so is a promising candidate (Joubert et al., 2004). Antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested. In our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Note, all the results presented in the “Version of Records” are from ZOLTAR v2.

      Despite these important considerations, this study provides a valuable platform for highthroughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that night-time sleep is disrupted in mutants for multiple late onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere inbetween: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, assuming the secondary effects are tolerable, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature (Ashlin et al., 2018; Hoffman et al., 2016). The example from Hoffman et al. (2016) is especially convincing. Drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint were enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram and fluvoxamine treatments (Fig. 5c,d and Fig. 5–supplement 1c,d), cntnap2a/cntnap2b knockout larvae were overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae were found to have fewer GABAergic neurons in the forebrain. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      On your last point, we hope our experiment testing fluvoxamine, another selective serotonin reuptake inhibitor (SSRI), makes the connection between Sorl1 and serotonin signalling more convincing.

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 KO larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose, regardless of being driven by a subset of the animals. The result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 KO + 10 µM larvae, but this amounts to excluding 20 (3/14) or 35 (5/14) % of the datapoints as potential outliers, which is unreasonable. In fact, excluding the top 5 sorl1 KO + 10 µM is equivalent to calling any datapoint with z-score > 0.2 an outlier (z-scores of the top 5 datapoints are 0.2–1.8). Applying consistently the same criterion to the scrambled + 10 µM group would remove the top 6 datapoints (z-scores = 0.5–3.9). Comparing the resulting two distributions again gives the sorl1 KO + 10 µM distribution as significantly higher (p = 0.0015). We would also mention that Euclidean distance, as a summary metric for distance between behavioural fingerprints, has limitations. For example, the measure will be more sensitive to changes in some parameters but not others, depending on how much room there is for a given parameter to change. We included this metric to lend support to the observation one can draw from the fingerprint plot (Fig. 5c) that sorl1 mutants respond in an exaggerated way to citalopram across many parameters, while being agnostic to which parameter might matter most.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relied on this result being robust. As you and Reviewer #3 suggested, we repeated this experiment with a different SSRI, fluvoxamine (Fig. 5–supplement 1). We cannot readily explain why the result was opposite to what we found with citalopram, but in both cases sorl1 knockout larvae reacted differently than their control siblings, which adds an argument to our claim that ZOLTAR correctly predicted serotonin signalling as a disrupted pathway from the behavioural fingerprint. Accordingly, we mostly kept the Discussion on Sorl1 the same, although we concede that we may not have identified the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we adapted the wording to make it clear this is one possible alternative hypothesis which could be tested in the future. The small differences found by HCR are actually more in line with the new results from the fluvoxamine experiment, so it may also be that both hypotheses (pre-synaptic neurons releasing less serotonin when reuptake is blocked; or post-synaptic neurons being less sensitive) contribute. The fluvoxamine experiment was performed in a different lab (ICM, Paris; all other experiments were done in UCL, London) in a different wild-type strain (TL in ICM, AB x Tup LF in UCL), which complicates how one interprets this discrepancy.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutchto-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more detail on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout of psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of AD.

      Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae.

      We added a sentence in Results (section psen2 knockouts […]) to briefly justify our appa/appb knockout approach. To be clear, we do not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed. A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–supplement 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we did not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We added a few sentences in the legend of Fig. 2–supplement 1.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant.  

      b. psen2_KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null, given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–supplement 3a). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer #2, we are planning to assemble a paper that expressly compares behavioural phenotypes measured in F0 vs. stable mutants to allay some of these concerns. Our current manuscript, which combines CRISPR-Cas9 rapid F0 screening with in silico pharmacological predictions, inevitability represents a first step in characterizing the functions of these genes. 

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is substantial clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e. from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g. pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings (Joo et al., 2021), as the clutchto-clutch variability we and others observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but likely decreases the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four experiments (five to six clutches) as sorl1 knockout larvae were also tracked in the citalopram and fluvoxamine experiments (Fig. 5 and Fig. 5–supplement 1), and psen2 knockout larvae were also tracked in the small molecule rescue experiment (Fig. 6 and Fig. 6–supplement 1).

      The psen2 behavioural phenotype replicated well across the six clutches tested (pairwise cosine similarities: 0.62 ± 0.15; Author response image 2a). 5/6 clutches were less active and initiating more sleep bouts during the day, as we claimed in Fig. 3.

      In the citalopram experiment, the H<sub>2</sub>O-treated sorl1 knockout fingerprint replicated fairly well the baseline recordings in Fig. 4, despite the smaller sample size (cos = 0.30 and 0.78; Author response image 2b, see “KO Fig. 5”). 5/6 of the significant parameters presented in Fig. 4–supplement 4 moved in the same direction, and knockout larvae were also hypoactive during the day but hyperactive at night. Note that two clutches were tracked on the same 96-well plate in this experiment. We calculated each larva’s z-score using the average of its control siblings, then we averaged all the z-scores to generate the fingerprint. The H<sub>2</sub>O treated sorl1 knockout clutch from the fluvoxamine experiment did not replicate well the baseline recordings (cos = 0.08 and 0.11; Author response image 2b, see “KO Fig. 5–suppl. 1”). Knockout larvae were hypoactive during the day as expected, but behaviour at night was not as robustly affected. As mentioned above, knockouts were made in a different genetic background (TL, instead of AB x Tup LF used for all other experiments), which could explain the discrepancy.

      We also took the opportunity to check whether our SSRI treatments replicated well the data from Rihel et al., 2010. For both citalopram (n = 3 fingerprints in the database) and fluvoxamine (n = 4 fingerprints in the database), replication was excellent (cos ≥ 0.67 for all comparisons of a fingerprint from this study vs. a fingerprint from Rihel et al. 2010; Author response image 2c,d). Note that the scrambled + 10 µM citalopram and + 10 µM fluvoxamine fingerprints correlate extremely well (cos = 0.92; can be seen in Author response image 2c,d), which was predicted by the small molecule screen dataset.

      Author response image 2.

      Replication of psen2 and sorl1 F0 knockout fingerprints and SSRI treatments from Rihel et al., 2010. a, (left) Every psen2 F0 knockout behavioural fingerprint generated in this study. Each dot represents the mean deviation from the same-clutch scrambled-injected mean for that parameter (z-score, mean ± SEM). From the experiments in Fig. 6, presented is the psen2 F0 knockout + H<sub>2</sub>O fingerprints. The fingerprints in grey (“not shown”) are from a preliminary drug treatment experiment we did not include in the final study. These fingerprints are from psen2 F0 knockout larvae treated with 0.2% DMSO, normalised to scrambled-injected siblings also treated with 0.2% DMSO. (right) Pairwise cosine similarities (−1.0–1.0) for the fingerprints presented. b, Every sorl1 F0 knockout behavioural fingerprint, as in a). c, The scrambled-injected + citalopram (10 µM) fingerprints (grey) in comparison to the citalopram (10–15 µM) fingerprints from the Rihel et al., 2010 database (green). d, The scrambled-injected + fluvoxamine (10 µM) fingerprint (grey) in comparison to the fluvoxamine fingerprints from the Rihel et al., 2010 database (pink). In c) and d), the scrambled-injected fingerprints are from the experiments in Fig. 5 and Fig. 5–suppl. 1, but were converted here into the behavioural parameters used by Rihel et al., 2010 for comparison. Parameters: 1, average activity (sec active/min); 2, average waking activity (sec active/min, excluding inactive minutes); 3, total sleep (hr); 4, number of sleep bouts; 5, sleep bout length (min); 6, sleep latency (min until first sleep bout).

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a prejy minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim (“early-life dysfunction is related to future AD”) from expression alone. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,” which does not mention future risk of AD. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seemed reasonable. Nevertheless, we adapted the wording to address this point and Reviewer #2’s complaint below. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in children at high genetic risk of AD (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      Please also see our answer to a similar point raised by Reviewer #2 below (cf. Author response image 7).

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but our choice of labelling can fairly be challenged.

      To address “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we looked at the larvae’s behaviour immediately after lights abruptly switched on in the mornings. Almost every larva, regardless of genotype, responded strongly to every lights-off transition during the experiment. Instead, we chose the lights-on transition for this analysis because it is a weaker startling stimulus for the larvae than the lights-off transition (Fig. 3–supplement 3), potentially exposing differences between genotypes or behavioural states (quiescent or awake). We defined a larva as having reacted to the lights switching on if it made a swimming bout during the second (25 frames) a er the lights-on transition. Across two clutches and two lights-on transitions, an average of 65% (range 52–73%) of all larvae reacted to the stimulus. psen2 knockout larvae were similarly likely, if not more likely, to respond (in average 69% responded, range 60–76%) than controls (60% average, range 44– 75%). When the lights switched on, about half of the larvae (39–51%) would have been classified as asleep according to the one-minute inactivity definition (i.e. the larva did not move in the minute preceding the lights transition). This allowed us to also compare behavioural states, as suggested by the reviewer. For three of the four light transitions, larvae which were awake when lights switched on were more likely to react than asleep larvae, but this difference was not striking (overall, awake larvae were only 1.1× more likely to react; Author response image 3). Awake psen2 knockout larvae were 1.1× (range 1.04–1.11×) more likely to react than awake control larvae, so, yes, psen2 knockout larvae respond normally when awake. Asleep psen2 knockout larvae were 1.4× (range 0.63–2.19×) more likely to react than asleep control larvae, so psen2 knockouts are also more or equally likely to react than control larvae when asleep. In summary, the overall health of psen2 knockouts did not seem to be a significant confound in the experiment. As the reviewer suggested, if psen2 knockout larvae were seriously unhealthy, they would not be as responsive as control larvae to a startling stimulus.

      Author response image 3.

      psen2 F0 knockouts react normally to lights switching on, indicating they are largely healthy. At each lights-on transition (9 AM), each larva was categorised as awake if it had moved in the preceding one minute or asleep if it had been inactive for at least one minute. Darker tiles represent larvae which performed a swimming bout during the second following lights-on; lighter tiles represent larvae which did not move during that second. The total count of each waffle plot was normalised to 25 so plots can be compared to each other. The real count is indicated in the corner of each plot. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Next, we compared inactive period durations during the day between psen2 and control larvae. If psen2 knockout larvae indeed sleep more during the day compared to controls, we may predict inactive periods longer than one minute to increase disproportionately compared to the increase in shorter inactive periods. This broadly appeared to be the case, especially for one of the two clutches (Author response image 4). In clutch 1, inactive periods lasting 1–60 sec were equally frequent in both psen2 and control larvae (fold change 1.0× during both days), while inactive periods lasting 1–2 min were 1.5× (day 1) and 2.5× (day 2) more frequent in psen2 larvae compared to control larvae. In clutch 2, 1–60 sec inactive periods were also equally frequent in both psen2 and control larvae, while inactive periods lasting 1–2 min were 3.4× (day 1) and 1.5× (day 2) more frequent in psen2 larvae compared to control larvae. Therefore, psen2 knockouts disproportionately increased the frequency of inactive periods longer than one minute, suggesting they genuinely slept more during the day.

      Author response image 4.

      psen2 F0 knockouts increased preferentially the frequency of longer inactive bouts. For each day and clutch, we calculated the mean distribution of inactive bout lengths across larvae of same genotype (psen2 F0 knockout or scrambled-injected), then compared the frequency of inactive bouts of different lengths between the two genotypes. For example, in clutch 1 during day 2, 0.01% of the average scrambled-injected larva’s inactive bouts lasted 111–120 seconds (X axis 120 sec) while 0.05% of the average psen2 F0 knockout larva lasted this long, so the fold change was 5×. Inactive bouts lasting < 1 sec were excluded from the analysis. In clutch 2, day 1 plot, two datapoints fall outside the Y axis limit: 140 sec, Y = 32×; 170 sec, Y = 16×. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Ultimately, this criticism seems challenging to definitely address experimentally. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus that is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Nevertheless, we believe the two analyses presented here are consistent with psen2 knockout larvae genuinely sleeping more during the day, so we decided to keep this label. We agree with the reviewer that the one-minute inactivity definition has limitations, especially for day-time inactivity.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just one drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly the question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug(s) than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–supplement 4). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts and that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we concede that the experiment does not necessarily say much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with the reviewer and Reviewer #2 that the conclusions were overly confident. As suggested, we decided to repeat this experiment with another SSRI, fluvoxamine. Please find the results of this experiment in Fig. 5–supplement 1. The suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments:

      - Data are presented in a variety of different ways, occasionally making comparisons across figures difficult. Perhaps at a minimum, behavioral fingerprints as in Figure 3 - Supplementary Figure 1 should be presented for all mutants in the main figures.

      We like this suggestion! Thank you. We brought the behavioural fingerprints figure (previously Fig. 4–supplement 5) as main Fig. 4, and put the figure focused on the sorl1 knockout behavioural phenotype in supplementary, with the other gene-by-gene figures.

      - It is not clear why some data were selected for supplemental rather than main figures. In many cases, detailed phenotypic data is provided for one example mutant in the main figures, and then additional mutants are described in detail in the supplement. Again, to facilitate comparisons between mutants, fingerprints could be provided for all mutants in a main figure, with detailed analyses moved to the supplements.

      The logic was to dedicate one main figure to psen2 (Fig. 3) as an example of an early-onset Alzheimer’s risk gene, and one to sorl1 (previously Fig. 4) as an example of a late-onset Alzheimer’s risk gene. We focused on them in main figures as they are both tested again later (Fig. 5 and Fig. 6). Having said that, we agree that the fingerprints may be a better use of main figure space than the parameters plots. In addition to the above (fingerprints of lateonset Alzheimer’s risk genes in main figure), we rearranged the figures in the early-onset AD section to have the psen2 F0 knockout fingerprint in main.

      - The explication of the utility of behavioral fingerprinting on page 35 is somewhat confusing. The authors describe drugs used to treat depression as enriched among small molecules anti-correlating with the sorl1 fingerprint. However, in Figure 5 - Supplementary Figure 1, drugs used to treat depression are biased toward positive cosines, which are indicated as having a more similar fingerprint to sorl1. These drugs should be described as more present among compounds positively correlating with the sorl1 fingerprint.

      Sorry, the confusion is about “(anti-)correlating”. Precisely, we meant “correlating and/or anti-correlating”, not just anti-correlating. We changed to that wording. In short, the analysis is by design agnostic to whether compounds with a given annotation are found more on the positive cosines side (le side in Fig. 5–supplement 1a) or the negative cosines side (right side). This is because the dataset often includes both agonists and antagonists to a given pathway but these are difficult to annotate. For example, say 10 compounds in the dataset target the dopamine D4 receptor, but these are an unknown mix of agonists and antagonists. In this case, we want ZOLTAR to generate a low p-value when all 10 compounds are found at extreme ends of the list, regardless of which end(s) that is (e.g. top 8 and bottom 2 should give an extremely low p-value). Initially, we were splitting the list, for each annotation, into positive-cosine fingerprints and negative-cosine fingerprints and testing enrichment on both separately, but we think the current approach is better as it reflects better the cases we want to detect and considers all available examples for a given annotation in one test. In sum, yes, in this case drugs used to treat depression were mostly in the positive-cosine side, but the other drugs on the negative-cosine side also contributed to what the p-value is, so it reflects better the analysis to say “correlating and/or anticorrelating”. You can read more about our logic for the analysis in Methods (section Behavioural pharmacology from sorl1 F0 knockout’s fingerprint).

      - The authors conclude the above-described section by stating: "sorl1 knockout larvae behaved similarly to larvae treated with small molecules targeting serotonin signaling, suggesting that the loss of Sorl1 disrupted serotonin signaling." Directionality here may be important. Are all of the drugs targeting the serotonin transporter SSRIs or similar? If so, then a correct statement would be that loss of Sorl1 causes similar phenotypes to drugs enhancing serotonin signaling. Finally, based on the correlation between serotonin transporter inhibitor trazodone and the sorl1 crispant phenotype, it is potentially surprising that the SSRI citalopram caused the opposite phenotype from sorl1, that is, increased sleep during the day and night. It is potentially interesting that this result was enhanced in mutants, and suggests dysfunction of serotonin signaling, but the statement that "our behavioral pharmacology approach correctly predicted from behaviour alone that serotonin signaling was disrupted" is too strong a conclusion.

      We understand “disrupt” as potentially going either way, but this may not be the common usage. We changed to “altered”.

      The point regarding directionality is excellent, however. We tested the proportion of serotonin transporter agonists and antagonists (SSRIs) on each side of the ranked list of small molecule fingerprints. We used the STITCH database for this analysis as it has more drug–target interactions, but likely less curated, than the Therapeutic Target Database (Szklarczyk et al., 2016). As with the Therapeutic Target Database, most fingerprints of compounds interacting with the serotonin transporter SLC6A4 were found on the side of positive cosines (p ~ 0.005 using the custom permutation test), which replicates Fig. 5a with a different source for the drug–target annotations (Author response image 5). On the side of positive cosines (small molecules which generate behavioural fingerprints correlating with the sorl1 fingerprint), there were 2 agonists and 26 antagonists. On the side of negative cosines (small molecules which generate behavioural fingerprints anti-correlating with the sorl1 fingerprint), there were 3 agonists and 2 antagonists. Using a Chi-squared test, this suggests a significant (p = 0.002) over-representation of antagonists (SSRIs) on the positive side (expected count = 24, vs. 26 observed) and agonists on the negative side (expected count = 1, vs. 3 observed). If SLC6A4 antagonists, i.e. SSRIs, indeed tend to cause a similar behavioural phenotype than knockout of sorl1, this would point in the direction of our original interpretation of the citalopram experiment; which was that excessive serotonin signalling is what causes the sorl1 behavioural phenotype.

      Author response image 5.

      Using the STITCH database as source of annotations also predicts SLC6A4 as an enriched target for the sorl1 behavioural fingerprint. Same figures as Fig. 5a,b but using the STITCH database (Szklarczyk et al., 2016) as source for the drug targets. a, Compounds annotated by STITCH as interacting with the serotonin transporter SLC6A4 tend to generate behavioural phenotypes similar to the sorl1 F0 knockout fingerprint. 40,522 compound–target protein pairs (vertical bars; 1,592 unique compounds) are ranked from the fingerprint with the most positive cosine to the fingerprint with the most negative cosine in comparison with the mean sorl1 F0 knockout fingerprint. Fingerprints of drugs that interact with SLC6A4 are coloured in yellow. Simulated p-value = 0.005 for enrichment of drugs interacting with SLC6A4 at the top (positive cosine) and/or bottom (negative cosine) of the ranked list by a custom permutation test. b, Result of the permutation test for top and/or bottom enrichment of drugs interacting with SLC6A4 in the ranked list. The absolute cosines of the fingerprints of drugs interacting with SLC6A4 (n = 52, one fingerprint per compound) were summed, giving sum of cosines = 15.9. To simulate a null distribution, 52 fingerprints were randomly drawn 100,000 times, generating a distribution of 100,000 random sum of cosines. Here, only 499 random draws gave a larger sum of cosines, so the simulated p-value was p = 499/100,000 = 0.005 **.

      If this were true, we would expect, as the reviewer suggested, SSRI treatment (citalopram or fluvoxamine) on control larvae to give a similar behavioural phenotype as knockout of sorl1. However, this generally did not appear to be the case (sorl1 knockout fingerprint vs. SSRI-treated control fingerprint, cosine = 0.08 ± 0.35; Author response image 6).

      Author response image 6.

      sorl1 F0 knockouts in comparison to controls treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the scrambled-injected + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the scrambled-injected + fluvoxamine (10 µM) fingerprint.

      The comparison with trazodone is an interesting observation, but it is only a weak serotonin reuptake inhibitor (Ki for SLC6A4 = 690 nM, vs. 8.9 nM for citalopram; Owens et al., 1997) and it has many other targets, both as agonist or antagonist, including serotonin, adrenergic, and histamine receptors (Mijur, 2011). In any case, the average trazodone fingerprint does not correlate particularly well to the sorl1 knockout fingerprint (cos = 0.3). Finally, the sorl1 knockout behavioural phenotype could be primarily caused by altered serotonin signalling in the hypothalamus, where we found both the biggest difference in tph1a/1b/2 HCR signal intensity (Fig. 5f) and the highest expression of sorl1 across scRNA-seq clusters (Fig. 1– supplement 2). In this case, it would be correct to expect sorl1 knockouts to react differently to SSRIs than controls, but it would be incorrect to expect SSRI treatment to cause the same behavioural phenotype, as it concurrently affects every other serotonergic neuron in the brain.

      Finally, we agree the quoted conclusion was too strong given the current evidence. We since tested another SSRI, fluvoxamine, on sorl1 knockouts.

      - Also in reference to Figure 5: in panel c, data are presented as deviation from vehicle treated. Because of this data presentation choice, it's no longer possible to determine whether, in this experiment, sorl1 crispants sleep less at night relative to their siblings. Does citalopram rescue / reverse sleep deficits in sorl1 mutants?

      On your first point, please see our response to Reviewer #3 (2)c and Author Response 2b above.

      On “does citalopram rescue/reverse sleep deficits in sorl1 mutants”: citalopram (and fluvoxamine) tends to reverse the key aspects of the sorl1 knockout behavioural phenotype by reducing night-time activity (% time active and total Δ pixels), increasing night-time sleep, and shortening sleep latency (Author response image 7). Extrapolating from the hypothesis presented in Discussion, this may be interpreted as a hint that sorl1 knockouts have reduced levels of 5-HT receptors, as increasing serotonin signalling using an SSRI tends to rescue the phenotype. However, we do not think that focusing on the significant behavioural parameters necessarily make sense here. Rather, one should take all parameters into account to conclude whether knockouts react differently to the drug than wild types (also see answer to Reviewer #3, (7) on this). For example, citalopram increased more the night-time sleep bout length of sorl1 knockouts than the one of controls (Fig. 5), but this parameter was not modified by knockout of sorl1 (Fig. 4). To explain the rationale more informally, citalopram is only used as a tool here to probe serotonin signalling in sorl1 knockouts, whether it worsens or rescues the behavioural phenotype is somewhat secondary, the key question is whether knockouts react differently than controls.

      Author response image 7.

      Comparing untreated sorl1 F0 knockouts vs. treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the sorl1 knockout + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the sorl1 + fluvoxamine (10 µM) fingerprint.

      - Possible molecular pathways targeted by tinidazole, fenoprofen, and betamethasone are not described.

      Tinidazole is an antibiotic, fenoprofen is a non-steroidal anti-inflammatory drug (NSAIDs), betamethasone is a steroidal anti-inflammatory drug. Interestingly, long-term use of NSAIDs reduces the risk of AD (in ’t Veld Bas A. et al., 2001). Several mechanisms are possible (Weggen et al., 2007), including reduction of Aβ42 production by interacting with γ-secretase (Eriksen et al., 2003). However, we did not explore the mechanism of action of these drugs on psen2 knockouts so do not feel comfortable speculating. We do not know, for example, whether these findings apply to betamethasone.

      Minor Comments:

      - On page 25, panel "g" should be labeled as "f".

      Thank you!

      - On page 35, a reference should be provided for the statement "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes.".

      Thank you, this is now corrected. There were the same studies as mentioned in Introduction.

      - On page 43, the word "and" should be added - "in wild-type rats and mice, overexpressing mutated human APP and PSEN1, AND restricting sleep for 21 days...".

      Right, this sentence could be misread, we edited it. “overexpressing […]” only applied to the mice, not the rats (as they are wild-type); and both are sleep-deprived.

      - On page 45, a reference should be provided for the statement "SSRIs can generally be used continuously with no adverse effects" and this statement should potentially be softened.

      The reference is at the end of that sentence (Cirrito et al., 2011). You are correct though; we reformulated this statement to: “SSRIs can generally be used safely for many years”. SSRIs indeed have side effects.

      - On page 54, a 60-minute rolling average is described as 45k rows, but this seems to be a 30-minute rolling average.

      Thank you! We corrected. It should have been 90k rows, as in: 25 frames-per-second × 60 seconds × 60 minutes.

      Reviewer #2 (Recommendations For The Authors):

      "As we observed in the scRNA-seq data, most genes tested (appa, appb, psen1, psen2, apoea, cd2ap, sorl1) were broadly expressed throughout the 6-dpf brain (Fig. 1d and Fig. 1supplement 3 and 4)."

      - apoea and appb are actually not expressed highly in the scRNA-seq data, and the apoea in situ looks odd, as if it has no expression. The appb gene mysteriously does not look as though it has high expression in the Raj data, but it is clearly expressed based on the in situ. I had previously noticed the same discrepancy, and I attribute it to the transcriptome used to map the Raj data, as the new DanioCell data uses a new transcriptome and indicates high appb expression in the brain. Please point out the discrepancy and possible explanation, perhaps in the figure legend.

      All excellent points, thank you. We included them directly in Results text.

      "most of these were expressed in the brain of 5-6-dpf zebrafish larvae, suggesting they play a role in early brain development or function."

      - Evidence of expression does not suggest function, particularly not a function in brain development. As one example, almost half of the genome is expressed prior to the maternal-zygotic transition but does not have a function in those earliest stages of development. There are numerous other instances where expression does not equal function. Please change the sentence even as simply as "it is possible that they".

      We mostly agree and edited to “[…], so they could play a role […]”.

      Out of curiosity, we plotted, for each zebrafish developmental stage, the proportion of Alzheimer’s risk gene orthologues expressed in comparison to the proportion of all genes expressed (Author response image 8). We defined “all genes” as every gene that is expressed in at least one of the developmental stages (n = 24,856), not the complete transcriptome, to avoid including genes that are never expressed in the brain or whose expression is always below detection limit. We counted a gene as “expressed” if at least three cells had detectable transcripts. Using these definitions, 82 ± 7% of genes are expressed during development. For every developmental stage except 5 dpf (so 11/12), a larger proportion of Alzheimer’s risk genes than all genes are expressed (+5 ± 4%).

      Author response image 8.

      Proportion of Alzheimer’s risk genes orthologues expressed throughout zebrafish development. Proportion of Alzheimer’s risk genes orthologues (n = 42) and all genes (n = 24,856) expressed in the zebrafish brain at each developmental stage, from 12 hours post-fertilisation (hpf) to 15 days post-fertilisation (dpf). “All genes” corresponds to every gene expressed in the brain at any of the developmental stages, not the complete transcriptome. A gene is considered “expressed” (green) if at least three cells had detectable transcripts. Single-cell RNA-seq dataset from Raj et al., 2020.

      "This frame-by-frame analysis has several advantages over previous methods that analysed activity data at the one-minute resolution."

      - Which methods are these? There are no citations. There are certainly existing methods in the zebrafish field that can produce similar data to the method developed for this project. This new package is useful, as most existing software is not written in R, so it would help scientists who prefer this programming language. However, I would be careful not to oversell its novelty, since many methods do exist that produce similar results.

      We added the references. There were referenced above after “we combined previous sleep/wake analysis methods”, but should have been referenced again here.

      We are not convinced by this criticism. We would obviously not claim that the FramebyFrame package is as sophisticated and versatile as video-tracking tools like SLEAP or DeepLabCut, but we do think it answers a genuine need that was not addressed by other methods. Specifically, we know of many labs recording pixel count data across multiple days using the Zebrabox or DanioVision (we added support for DanioVision data after submission), but there were no packages to extract behavioural parameters from these data. Other methods involved standalone scripts with no documentation or version tracking. We would concede the FramebyFrame package is mostly targeted at these labs, but we already know of six labs routinely using it and were recently contacted by a researcher tracking Daphnia in the Zebrabox.

      "F0 knockouts of both cutches" - "clutches"

      Thank you!

      Reviewer #3 (Recommendations For The Authors):

      I would suggest totally revamping the Introduction section, and being sure to provide readers with the context and background they need for the data that comes thereafter. Key areas to touch on, in no particular order, include:

      • Far more detail on the behavioral pharm screen upon which this paper builds, as a brief overview of that approach and the data generated are needed.

      Thank you for the suggestion, we added a sentence hinting at this work in the last Introduction paragraph.

      • Limitations of current zebrafish sleep/arousal assays that motivated the authors to develop a new, temporally high-resolution system.

      We think this is better explained in Results, as is currently. For example, we need to point to Fig. 2–supplement 2a,b,c to explain that one-minute methods were missing sleep bouts and how FramebyFrame resolves this issue.

      • A paragraph about sleep and AD, that does a better job of citing work in humans, mammalian, and invertebrate models that motivate the interest in the connection pursued here.

      Sorry, we think this would place too much focus on sleep and AD. We want the main topic of the paper to be the behavioural pharmacology approach, not AD or sleep per se. As the Introduction states, we see Alzheimer’s risk genes as a case study for the behavioural pharmacology approach, rather than the reason why the approach was developed. Additionally, presenting sleep and AD in Introduction risks sounding like ZOLTAR is specifically designed for this context, while we conceived of it as much more generalisable and explicitly encourage its use to study genes associated to other diseases. Note that the paragraph you suggest is, we think, mostly present in Discussion (section Disrupted sleep and serotonin signalling […]).

      • I modestly suggest eliminating making such a strong case for a gene-first approach being the best way to understand disease. It is not a zero-sum game, and there is plenty to learn from proteomics, metabolomics, etc. I suspect nobody will argue with the authors saying they leveraged the strength of their system and focused on key AD genes of interest.

      From your point below, we understand the following quote is the source of the issue: “For finding causal processes, studying the genome, rather than the transcriptome or epigenome, is advantageous because the chronology from genomic variant to disease is unambiguous […]”. We did not want to suggest it is a zero-sum game, but we now understand how it can be read this way. We adapted slightly the wording. What we want to do is highlight the causality argument as the advantage of the genomics approach. We feel we do not read this argument often enough, while it remains a ‘magic power’ of genomics. One essentially does not have to worry about causality when studying a pathogenic germline variant, while it is a constant concern when studying the transcriptome or epigenome (i.e. did the change in this transcript’s level cause disease, or vice-versa?). To take an example in the context of AD, arguments based on genomics (e.g. Down syndrome or APP duplication) are often the definite arbiters when debating the amyloid hypothesis, exactly because their causality cannot be doubted.

      Minor comments

      (1) The opening of the introduction is perhaps overly broad, spending an entire paragraph on genome vs transcriptome, etc and making the claim that a gene-first approach is the best path. It isn't zero-sum, and the authors could just get right into AD and study genes of interest. Similar issues occur throughout the manuscript, with sentences/paragraphs that are not necessarily needed.

      Please see our answer to your previous point. On the introduction being overly broad, we perfectly agree it is broad, but related to your point about presenting sleep and AD in the Introduction, we wish to talk about finding causal processes from genomics findings using behavioural pharmacology. We purposefully present research on AD as one instance of this broader goal, not the primary topic of the paper.

      Another example are these sentences, which could be totally removed as the following paragraph starts off making the same point much more succinctly. "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes. Presumably, the same processes are disrupted in zebrafish sorl1 knockouts, and some caused the behavioural alterations we observed. Can we now follow the thread backwards and predict some of the biological processes in which Sorl1 is involved based on the behavioural profile of sorl1 knockouts?"

      Thanks for the suggestion, but we think these sentences are useful to place back this Results section in the context of the Introduction. Think of the paper as mainly about the behavioural pharmacology approach, not on Alzheimer’s risk genes. The function of the paragraph here is not simply to explain the method by which we decided to study sorl1; it is to reiterate the rationale behind the behavioural pharmacology approach so that the reader understands where this Results section fits in the overall structure.

      (2) Related to the above, the authors use lecanemab as an example to support their approach, but there has been a great deal of controversy regarding this drug. I don't think such extensive justification is needed. This study uses AD risk genes as a case study in a newly developed behavioral pharm pipeline. A great deal of the rest of the intro seems to just fill space and could be more focused on the study at hand. Interestingly, a er gene selection, the next step in their pipeline is sleep/wake analysis yet nothing is covered about AD and sleep in the intro. Some justification of that approach (why focus on sleep/wake as a starting point for behavioral pharm rather than learning and memory?) would be a better use of intro space.

      There has indeed been controversy about lecanemab, but even the harshest critiques of the amyloid hypothesis concede that it slows down cognitive decline (Espay et al., 2023). That is all that is needed to support our argument, which is that research on AD started primarily from genomics and thereby yielded a disease-modifying drug. The controversy seems mostly focused on whether this effect size is clinically significant, and we think we correctly represent this uncertainty (e.g. “antibodies against Aβ such as lecanemab show promise in slowing down disease progression” and “the beneficial effects from targeting Aβ aggregation currently remain modest”).

      Your next point is entirely fair. We mostly answered it above. To explain further, the primary reason why we measured sleep/wake behaviour is to match the behavioural dataset from Rihel et al., 2010 so we can use it to make predictions, not to study sleep in the context of AD per se. Sure, perhaps learning and memory would have been interesting, but we do not know of any study testing thousands of small molecules on zebrafish larvae during a memory task. We understand it can be slightly confusing though, as we then spend a paragraph of Discussion on sleep as a causal process in AD, but we obviously need to discuss this topic given the findings. However, to reiterate, we purposefully designed FramebyFrame and ZOLTAR to be useful beyond studying sleep/wake behaviour. For example, FramebyFrame would not calculate 17 behavioural parameters if the only goal was to measure sleep. We now mention the Rihel et al., 2010 study in the Introduction as you suggested above (“Far more detail on the behavioral pharm screen […]”), as that is the real reason why sleep/wake behaviour was measured in the first place.

      (3) Also related to the above, another more relevant point that could be talked about in the intro is the need for more refined approaches to analyze sleep in zebrafish, given the effort that went into the new analysis system described here. Again, I think the context for why the authors developed this system would be more meaningful than the current content.

      Thank you, we think we answered this point above (especially below Limitations of current zebrafish sleep/arousal assays […]).

      (4) GWAS can stand for Genome-wide associate studies (plural) so I do not think the extra "s" is needed (GWASs) .

      Indeed, that seems to be the common usage. Thank you.

      (5) AD candidate risk genes were determined from loci using "mainly statistic colocalization". Can the authors add a few more details about what was done and what the "mainly" caveat refers to?

      “Mainly” simply refers to the fact that other methods were used by Schwartzentruber et al. (2021) to annotate the GWAS loci with likely causal genes, but that most calls were ultimately made from statistic colocalisation. Readers can refer to this work to learn more about the methods used.

      (6) The authors write "The loss of psen1 only had mild effects on behaviour" but I think they mean "sleep behaviors" as there could be many other behaviors that are disrupted but were not assessed. The same issue a few sentences later with "Behaviour during the day was not affected" and at the end of the following paragraph.

      Yes, that would be more precise, thank you.

      (7) For the Sorl1 pharmacology data, it is very hard to understand what is being measured behaviorally. Are the authors measuring sleep +/- citalopram, or something else, and why the change to Euclidean distance rather than all the measures we were just introduced to earlier in the manuscript?

      We understand these plots (Fig. 5c,d) are less intuitive, but it is important that we show the difference in behaviour compared to H<sub>2</sub>O-treated larvae of same genotype. The claim is that citalopram has a larger effect on knockouts than on controls, so the reader needs to focus on the effect of the drug on each genotype, not on the effect of sorl1 knockout. We added the standard fingerprints (i.e. setting controls to z-score = 0) here in Author response figures.

      Euclidean distance takes as input all the measures we introduced. The point is precisely not to select a single measure. For example, say we were only plotting active bout number during the day, we would conclude that 10 µM citalopram has the same effect on knockouts and controls. Conversely, if we had taken sleep bout length at night, we would conclude 10 µM has a stronger effect on knockouts. What is the correct parameter to select? Using Euclidean distance resolves this by taking all parameters into account, rather than arbitrarily choosing one.

      And what exactly is a "given spike in serotonin"? and how is this hypothesis the conclusion based on the lack of evidence for the second hypothesis? As the authors say, there could be other ways sorl1 knockouts are more sensitive to citalopram, so the absence of evidence for one hypothesis certainly does not support the other hypothesis.

      We mean a given release of serotonin in the synaptic cleft. We have fixed this wording. 

      We tend to disagree on the second point. We can think of two ways that sorl1 knockouts are more sensitive to citalopram: 1) they produce more serotonin, so blocking reuptake causes a larger spike in knockouts; or 2) blocking reuptake causes the same increase in both knockouts and wild-types but knockouts react more strongly to serotonin. We cannot in fact think of another way to explain the citalopram results. Not finding overwhelming evidence for 1) surely supports 2) somewhat, even if we do not have direct evidence for it. As an analogy, if two diagnoses are possible for a patient, testing negative for the first one supports the other one, even before it is directly tested.

      (8) Again some language is used without enough care. Fish are referred to as "drowsier" under some drug conditions. How do the authors know the animal is drowsy? The phenotype is more specific - more sleep, less activity.

      Thank you, we switched to “Furthermore, fenoprofen worsened the day-time hypoactivity of psen2 knockout larvae […]”.

      (9) This sentence is misleading as it gives the impression that results in this manuscript suggest the conclusion: "Our observation that disruption of genes associated with AD diagnosis after 65 years reduces sleep in 7-day zebrafish larvae suggest that disrupted sleep may be a common mechanism through which these genes exert an effect on risk." That idea is widely held in the field, and numerous other previous manuscripts/reviews should be cited for clarity of where this hypothesis came from.

      This idea is not widely held in the field. You likely read this point as “disrupted sleep is a risk factor for AD”, which, yes, is widely discussed in the field, but is not precisely what we are saying. We hypothesise that mutations in some of the Alzheimer’s risk genes cause disrupted sleep, possibly from a very early age, which then causes AD decades later. Studies and reviews on sleep and AD rarely make this hypothesis, at least not explicitly. The closest we know of are a few recent human genetics studies, typically using Mendelian Randomisation, finding that higher genetic risk of AD correlates with some sleep phenotypes, such as sleep duration (Chen et al., 2022; Leng et al., 2021). The work of Muto et al. (2021) is particularly interesting as it found correlations between higher genetic risk of AD and some sleep phenotypes in men in their early twenties, which seems unlikely to be a consequence of early pathology (Muto et al., 2021). Note, however, that even these studies do not mention sleep possibly being disrupted early in development, which is what our findings in zebrafish larvae support. As we mention, we think a team should test whether sleep is different in infants at higher genetic risk of AD, essentially performing an analogous, but obviously much more difficult, experiment as we did in zebrafish larvae. We do not know of any study testing this or even raising this idea, so evidently it is not widely held. Having said that, the studies we mention here were not referenced in the Discussion paragraph. We have now corrected this.

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and LateOnset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, RoonJva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the AuJsm Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A, Ruitenberg A, Hofman A, Launer LJ, van Duijn CM, Stijnen T, Breteler MMB, Stricker BHC. 2001. Nonsteroidal Anti inflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbek BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Mikur A. 2011. Trazodone: properties and utility in multiple disorders. Expert Rev Clin Pharmacol 4:181–196. doi:10.1586/ecp.10.138

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Owens MJ, Morgan WN, Plok SJ, Nemeroff CB. 1997. Neurotransmiker receptor and transporter binding profile of antidepressants and their metabolites. J Pharmacol Exp Ther 283:1305– 1322.

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a reexamination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476– E485. doi:10.1073/pnas.1618657114

      Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. 2016. STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res 44:D380–D384. doi:10.1093/nar/gkv1277

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.Jps.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Daka SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study has uncovered some important initial findings about cellular responses to aneuploidy through analysis of gene expression in a set of donated human embryos. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The authors should try to get much more insight with their data highlighting the novel findings.

      We thank the editor for considering our manuscript for publication at elife, and for the helpful and thorough reviews of our work. Based on the suggestions of the reviewers, we have carried out additional experiments, expanded the sample size and reanalyzed the data. This has resulted in a thoroughly revised manuscript and much improved work, which we are convinced meets the requirements to be published as a version of record. Of note, the experiments for the revision required the support by 2 additional researchers from our lab which are now coauthors.

      These are the main changes made to the initial manuscript:

      (1) The RNA-seq data (Figures 1+2) is now FDR corrected and been reanalyzed. This has not affected the initial observations on the activation of p53 and apoptosis in aneuploid human embryos, as well as that the transcriptomic changes are driven by gene dosage effects. 

      (2) We have included the transcriptome analysis of reversine-treated embryos in the supplementary data.

      (3) For validation of novel findings such as the presence of DNA-damage and the expression of DRAM1 in aneuploid embryos, we now include the stainings of 30 human blastocysts (Figure 3o-t). We found absence of DNA-damage in aneuploid embryos and that DRAM1 is increased in the TE but not the ICM of aneuploid embryos. 

      (4) We re-analyzed the co-expression of CASP8/HSP70 in reversine-embryos as suggested by reviewer 1 and found that both proteins tend to be co-expressed. 

      (5) We have added a new analysis of NANOG expression (Figure 4a,b) of the embryos used in Figure 3o-t and have found retention of NANOG protein in both the TE and ICM.

      (6) We have added 6 euploid and 4 aneuploid embryos to Figure 4l-s, which support the conclusions on the absence of autophagy activation in the ICM and failure of PrE formation in aneuploid embryos.

      (7) We have significantly changed the layout of the figures, revised the supplementary tables, added source data files and rewritten the discussion.

      Regarding the sample size of the study, it is important to emphasize that human embryos are ethically sensitive material and that those with the specific genetic content we used in this study are rare, limiting our ability to expand the sample size. For the revision, we have added 40 human blastocysts to our initial 85 embryos. Compared to similar and high-quality studies using human embryos, our study shows a relatively large sample size (n=125): Victor et al. 2021: 30 human blastocysts for immunostainings1; Martin et al. 2023: 14 human blastocysts2; Martin et al. 2024: 64 human blastocysts3; Domingo-Muelas et al. 2023: 23 human blastocysts4.              

      Public Reviews:

      Reviewer#1(PublicReview):

      This study investigated an important question in human reproduction: why most fully aneuploid embryos is incompatible with normal fetal development. Specifically, the authors investigated the cellular responses to aneuploidy through analysis of gene expression in a set of donated human blastocysts. The samples included uniform aneuploid embryos of meiotic origin and mosaic aneuploid embryos from the SAC inhibitor reversine treatment. The authors relied mainly on low-input RNA sequencing and immunofluorescence staining. Pathway analysis with RNA-seq data of trophectoderm cells suggested activation of p53 and possibly apoptosis, and this cellular signature appeared to be stronger in TE cells with a higher degree of aneuploidy. Immunostaining also found some evidence of apoptosis, increased expression of HSP70 and autophagy in some aneuploid cells. With combinational OCT4 and GATA4 as lineage markers, it appeared that aneuploidy could alter the second lineage segregation and primitive endoderm formation in particular.

      Although this study is largely descriptive, it generated valuable RNA-seq data from a set of aneuploid TE cells with known karyotypes. Immunostaining results in general were consistent with findings in mouse embryos and human gastruloids.

      We thank the reviewer for the thorough evaluation of our manuscript. We have implemented most of the suggestions, which have further strengthened the original findings.

      While there is a scarcity of human embryo materials for research, the lack of single cell level data limits further extension of the presented data on the consequences of mosaic embryos.  

      We did not include single cell RNA-seq data of mosaic human embryos in our study because we focused on embryos diagnosed with complex meiotic abnormalities. Our hypothesis was that the cellular consequences of aneuploidy would be strongest in this type of aneuploidies and most evident to identify and would allow us to provide a basis for the mechanisms of elimination of aneuploid cells in human embryos. In the manuscript (lines 596-626) we acknowledge the limitations of the extrapolation of our results to mosaic embryos.

      A major concern is that the gene list used for pathway analysis is not FDR controlled. It is also unclear how the many plots generated with the "supervised approach" were actually performed. 

      We agree with the concerns about the fact that our differential expression gene list was not FDR but p-value ranked. We followed the suggestion of the reviewer and revised the RNAseq analysis and focused primarily on pathway analysis. We have also added the comparison between aneuploid and reversine treated embryos to the supplementary data and expanded the analysis of high dosage and low dosage embryos. Importantly, the new analysis has not changed the original finding that aneuploid embryos show hallmarks of p53 activation and apoptosis, and that these effects are gene dosage dependent. The manuscript now includes two completely revised and new figures 1 and 2.

      Since we discarded the data generated from our previous approach, we do not use the term supervised approach anymore.

      The authors also appear to have ignored the possibility that high-dosage group could have a higher mitotic defect.

      This is indeed a possibility. In the discussion (lines 504-508) we have now incorporated the notion that the high dosage embryos could have higher mitotic defects, although our data cannot provide any evidence for this. Of note, the gene expression data shows that all aneuploid embryos (including low dosage and reversine embryos) equally show an enrichment for mitotic spindle pathway genes.

      Assuming a fully aneuploid embryo, why do only some cells display p53 and autophagy marker? 

      This is a very good question, on which we can only speculate, but the answer likely lies in the diversity across cells of the same embryo.

      Even in genetically homogenous tissues and cell cultures, individual cells can exhibit different levels of stress responses, such as p53 activation and apoptosis. This variation may be influenced by the local cellular environment, stochastic gene expression, or differences in cell cycle stages. Other studies on fully aneuploid human embryos could also not detect apoptotic responses in every cell1,3.

      For instance, p53 activation differs even between cells that have a similar number of DNA breaks, and this activation is influenced by both cell-intrinsic factors and previous exposure to DNA damage5.

      Cell cycle tightly regulates the response of cells to different stressors. For instance, cells in G1 or S-phase might be more sensitive to apoptosis signals6, while those in G2/M might escape this response temporarily7.  Autophagy is more induced in G1 and S phases, with reduced activity in G2 and M phases8.

      Individual cells may also have different levels of success in the activation of the compensatory pathways, including the unfolded protein response, autophagy, or changes in metabolism, resulting in some cells adapting better than others.

      The expression of p53 and the sensitivity to apoptosis could also be influenced by epigenetic differences between cells, which may alter their transcriptional response to aneuploidy. Even in a genetically identical population, cells can have different epigenetic landscapes, leading to heterogeneous gene expression patterns.

      The conclusion about proteotoxic stress was largely based on staining of HSP70. It appears from Figure 3 d,h that the same cells exhibited increased HSP70 and CASP8 staining. Since HSP70 is known to have anti-apoptotic effect, could the increased expression of Hsp70 be an anti-apoptotic response?

      Our conclusion about proteotoxic stress was not solely based on HSP70 expression. We also stained for LC3B and p62, which are markers for autophagy and when highly expressed indirectly point towards underlying proteotoxic stress in the cells. 

      We reanalyzed the imaging of the stainings in the reversine-treated embryos, and found that the same cells were positive for both HSP70 and CASP8 staining while the minority was single positive (shown now in Figure 3k,l). 

      HSP70 does indeed not only unfold misfolded and aggregated proteins but does also have a function during cell survival and apoptosis9. HSP70 has been for instance found to inhibit the cleavage of Bid through active CASP8 within the extrinsic apoptosis pathway10. It is thus possible that it temporarily plays this role, and we have acknowledged this in the discussion (lines 623-626). On the other hand, the evidence points at an active apoptosis in the TE, with concomitant cell loss, so if HSP70 is indeed having an anti-apoptotic effect, it is having a limited impact.

      Reviewer #2 (Public Review): 

      A high fraction of cells in early embryos carry aneuploid karyotypes, yet even chromosomally mosaic human blastocysts can implant and lead to healthy newborns with diploid karyotypes. Previous studies in other models have shown that genotoxic and proteotoxic stresses arising from aneuploidy lead to the activation of the p53 pathway and autophagy, which helps eliminate cells with aberrant karyotypes. These observations have been here evaluated and confirmed in human blastocysts. The study also demonstrates that the second lineage and formation of primitive endoderm are particularly impaired by aneuploidy.

      This is a timely and potentially important study. Aneuploidy is common in early embryos and has a negative impact on their development, but the reasons behind this are poorly understood. Furthermore, how mosaic aneuploid embryos with a fraction of euploidy greater than 50 % can undergo healthy development remains a mystery. Most of our current information comes from studies on murine embryos, making a substantial study on human embryos of great importance. However, there are only very few new findings or insights provided by this study. Some of the previous findings were reproduced, but it is difficult to say whether this is a real finding, or whether it is a consequence of a low sample number. The authors could get much more insight with their data.

      We thank the reviewer for the thorough evaluation of our manuscript and the valuable suggestions made in the private recommendations. We have expanded the sample size and have carried out additional experiments that have significantly improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Instead of using cut off to generate a list, the authors could just rank the entire detected transcriptome for GSEA. This method fits better the authors' intentions of "primarily focused on pathway analysis." The cut-off value "-log10(p-value)<0.05" is not correct. As we can see from the PCA plot, one would not expect many cut off defined DEGs at all. The most obvious transcriptome change is dosage dependent, as the authors cleared showed with InferCNV.

      We thank the reviewer for this suggestion and agree that this was an important concern of the study. We have entirely revised the RNA-seq analysis based on the proposed approach (Figure 1 and 2, Supplementary Figure 1). Also, we have included the analysis of aneuploid versus reversine treated embryos, which has allowed us to determine the differences between naturally occurring chromosomal abnormalities and those that are induced using reversine (Supplementary Figure 1). 

      We first performed differential gene expression analysis using DESEq2 with a cut-off value for significantly differentially expressed genes of | log2FC | > 1 and an FDR < 0.05. Based on the PCAs and the low number of differentially expressed genes for all comparisons, besides high dosage versus euploid embryos, we focussed primarily on pathway analysis. 

      For that, based on the reviewer’s suggestion, we generated a ranked gene list using the GSEA software (version 4.2.2, MSigDatabase) based on the normalized count matrix of the whole transcriptome that was detected after differential gene expression. The ranked gene list was then subjected to the run GSEA function, and we searched the Hallmark and C2 library for significantly enriched pathways. Thus, we could generate normalized enrichment scores, allowing us to predict whether a pathway is activated or suppressed. The details of the new analysis are described in the Material and Methods section (lines 220-232). Significance was determined using a cut-off value of 25% FDR. This cut-off is proposed in the user guide of the GSEA (https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm) especially for incoherent gene expression datasets, as suggested by our PCAs, which allows for hypothesis driven validation of the dataset. 

      Indeed, we found that the most important transcriptome changes are aneuploidy dosage dependent. High dosage embryos show signatures of cellular unfitness, while low-dosage embryos still seem to activate survival pathways (lines 349-364). 

      This new analysis did not only increase robustness of our results but also introduced novel findings, which pave the road for future studies. 

      The validity of our findings is supported by recent work by the Zernicka-Goetz lab. We found that hypoxia is upregulated in low dosage human aneuploid TE cells. In line with our data, the Zernicka-Goetz lab found in a mouse model of low degree chromosomal abnormalities that hypoxia inducible factor 1A (HIF1A) promotes survival of extraembryonic aneuploid cells by reducing levels of DNA damage11.

      (2) It would be very helpful if the authors could perform co-staining of multiple stress markers to better understand the origins of apoptosis and autophagy cells. In Fig 3d and 3h, it seems that the same reversine treated embryo was stained with CASP8, LC3B and HSP70. Is there any correlation between CASP8 and HSP70 at the single cell level? Is there any correlation between p53 and LC3B as the authors suggested, possibly through DRAM1?

      We decided to use the complex aneuploid embryos that were left at our facility for the validation of novel findings such as upregulation of DRAM1 and presence and consequences of DNA damage in aneuploid embryos. As suggested by the editor and the other reviewer we also added embryos to existing datasets to increase the sample size where necessary. Therefore, we did not include other co-staining’s of multiple stress markers.

      Following the reviewer’s suggestion, we reanalyzed the existing stainings and evaluated whether there is a correlation between CASP8 and HSP70 at the single cell level. The reversine-treated embryos were the only embryo group that was co-stained for both CASP8 and HSP70. We quantified the percentage of cells that were single or double positive for CASP8 and HSP70 and found a higher proportion of double positive cells than to single positives. Therefore, we concluded that there is indeed a correlation between both proteins at the single cell level in reversine-treated embryos and included this data in Figure 3k,l. 

      During the experiments for the revision, we found that the DRAM1 protein was upregulated in the cytoplasm of TE cells but not in the ICM of aneuploid embryos (Figure 3s,t), which validates the findings of the gene expression analysis. This data also supports our findings that autophagy is active in aneuploid TE cells while not significantly increased in aneuploid pluripotent ICM cells. Unfortunately, we could not stain LC3B and DRAM1 in the same embryo because the antibodies were raised in the same species.

      (3) While " the possibilities for functional studies and lineage tracing experiments in human embryos are very limited," the authors can leverage in silico modelling (ie, PMID: 28700688) to address the roles of aneuploidy in blastocyst formation and development. Is there any selfregulating mechanism underlying the ratios of PrE and EPI? Is apoptosis of ICM cells a natural process during PrE formation (PMID: 18725515)?

      It is a very interesting proposal to use in silico modelling to address the roles of aneuploidy during human blastocyst formation and lineage segregation. Although this type of analysis would yield very important insights, we are not able to address this point of the revision due to lack of expertise for this type of analysis in our group, requiring setting up a collaboration with experts in this field.  In the discussion we proposed that future studies can leverage our data to be carried out in silico modelling and cited the proposed article (lines 608-610).

      On the second part of the question, we would like to discuss the differences between mouse and human embryo studies. Parts of this were included in the discussion on the possible mechanisms of PrE elimination. 

      Is there a self-regulating mechanism for EPI/PrE formation?

      To extrapolate the knowledge on mouse development to human it is important to bear in mind that (1) human embryos are outbred, as compared to inbred super-fertile laboratory mouse strains and (2) the embryos are donated to research by subfertile couples, which could compromise the EPI/PrE ratios. For instance, Chousal and colleagues found that poor quality blastocysts have a reduced number of PrE cells12. In human embryos the proportion EPI and PrE cells is indeed highly variable (20%-60%) and while the number of EPI cells does not increase between dpf6 and 7, the number of PrE cells does grow13. We found a similar variable number of EPI and PrE in our study on the lineage segregation mechanisms in good quality human embryos, with an absolute number of EPI of 12.1±6.5 cells and 8.4±3.44 PrE cells14.

      By comparison, in late mouse blastocysts, the ratio EPI/PrE cells is consistent (2/3)15. Overall, self-regulating mechanisms in the human embryo are not yet studied in detail due to the lack of possible functional testing.

      Is apoptosis a natural process during PrE formation?

      Yes, in mice apoptosis is a natural process during PrE formation to eliminate misallocated cells of the inner cell mass through cell competition16,17. Yet, in the human embryo there is no evidence of such mechanisms. Although apoptosis is present even in human blastocysts of good quality18, the origin of such apoptotic cells is now still shown, although suboptimal culture conditions are known to increase cellular fragmentation19. Conversely, our data and that of others1,2 supports the notion that the pluripotent inner cell mass in human embryos is more resistant to apoptosis than the trophectoderm, even in karyotypically aberrant cells. 

      (4) The "count tables generated from the raw data files" could not be found in the source data files.

      This slipped to our attention, we have added now the count tables to the source data files. Our apologies.

      (5) Citations on aneuploidy literature were not done in a fully scholarly manner. It appears that authors selectively cite previous papers that are in support of their hypothesis but left out those with alternative conclusions.

      We apologize if we missed any literature that contradicts our findings, it is not intentional. We would be grateful if the reviewer could provide such references. 

      In the manuscript we describe the alignment and differences of key findings with several studies (listed below) and the limitations of our study are extensively described in lines 596626.

      Our findings align with other work on these aspects:

      - RNA-sequencing data2,20–26

      - Gene dosage effects drive the transcriptome of the aneuploid human embryo27,28

      - Aneuploid cells are cleared by sustained proteotoxic stress followed by p53 activation, autophagy and eventually apoptosis29–37.

      - p53 is active in constitutional aneuploid cells38

      - The ICM is less sensitive to apoptosis1,2

      Our findings differ with other work on these points:

      - p53 activation is independent from DNA-damage39

      - p53 is active in constitutional aneuploid cells40,41

      - Apoptosis is only present in the aneuploid TE of aneuploid cells in the embryo29,30,42    

      Reviewer #2 (Recommendations For The Authors):

      Comments:

      (1) The main problem is that there is no substantial novelty. The authors look at previously identified factors affected by chromosome gains and losses, but none of the new one from their analysis. Anything what could be potentially novel is not carefully analyzed (e.g. the difference between reversine-treated and aneuploid samples, or new potential candidates) or explained. This is really a pity.

      In the revision, we have further elaborated on the DNA damage aspect by staining for DNA double-stranded breaks and have validated DRAM1 as an activated downstream effector of p53. We have also added the analyses of the gene-expression of the reversine-treated embryos.

      (2) Some of the general statements on aneuploidy are confusing and often borderline generalized. E.g. introduction line 106: "If this (proteotoxic stress) remains unresolved by the activation of autophagy..." I am not aware of any publication suggesting that autophagy resolves proteotoxic stress in aneuploid cells. Citations that replication stress causes DNA damage in aneuploid cells are wrong. This link was first shown by Passerini et al. in 2016. etc.

      We have clarified these statements in the introduction and added the proposed citations on replication stress that causes DNA damage in aneuploid cells (lines 95-108).

      (3) In the figures the authors show a representative image of aneuploid and diploid embryos. Given the aneuploid embryos have widely different karyotypes, it would be important to clarify which of the embryos has been actually shown. Similarly, in the heat maps it is not clear which line is which embryo. This would be very useful.

      We added the karyotypes of the aneuploid embryos to the images in figure 3 and 4. Since the heatmaps were removed from the figures we added the karyotypes to the PCAs in all figures.

      (4) The authors constantly state that aneuploid embryo accumulate more DNA damage, which is supported by some of their observations, e.g. the DNA damage response is upregulated. It would be great if they would validated this statements with testing some markers for DNA damage.

      We agree with the reviewer that this was an important point and addressing it has revealed that our initial assumption was incorrect and has provided new interesting findings. From the revised RNA-seq analysis, we found only one pathway (DNA damage response TP53) to be activated in all aneuploid embryos (Fig.1e). The ATM pathway was also activated specifically in high-dosage embryos. Following this, we set to test if DNA damage was indeed increased in aneuploid embryos by staining for DNA double strand breaks with gH2AX. 

      First, we investigated the gH2AX expression in 5dpf embryos in which we induced DNAdamage with Bleomycin. We compared 6 untreated versus 6 Bleomycin treated human embryos (Fig. 3m) and found that gH2AX foci were rarely present in the untreated embryos and that all cells of the treated embryos showed a pan-nuclear gH2AX staining. 

      Second, we compared the presence of gH2AX foci in the TE (NANOG negative cells), ICM (NANOG positive cells) and the whole embryo of 7 euploid versus 11 aneuploid embryos. Interestingly, we found no differences in the number of gH2AX foci or pan-nuclear gH2AX nuclei between euploid and aneuploid embryos (Fig 3o). When dividing our aneuploid embryos into high and low dosage embryos we could also not account for differences. Our data now suggests that complex aneuploid human embryonic cells of meiotic origin do not contain more DNA-double strand breaks, precluding DNA-damage as the source of p53 activation. Last, in our previous experiment we found that phosphorylated S15p53 is increased in aneuploid embryos, supporting an active p53 pathway as suggested by our transcriptomic data. Since we could not find DNA-damage in aneuploid human embryos we speculate that p53 is phosphorylated on Serine15 through metabolic stress as suggested by Jones and colleagues43. We also argue that proteotoxic stress might induce p53 expression as proposed by Singla and colleagues29.

      (5) The source of embryos is only partially described in a figure legend. This should be expanded and described in the Materials and Methods section. The embryos are named, but this is nowhere explained. One can only assume that T is for trisomy and M is for monosomy.

      We have divided the embryos into different experimental series (Experiment 1-4). This is now described in the Material and methods section (lines 157-175). Also, we have added the experiment number of each embryo to the supplementary tables and to the source data. The abbreviation for T = Trisomy and M= Monosomy was initially introduced in the last paragraph of the figure legend of figure 4.  We now added it to every panel.

      (6) Recent works from non-embryonic cells suggest that the cellular response to monosomy is different than the response to trisomy. Did the authors try to test this possible difference? For example, one could compare embryos M174/21, M2/19 and M17 with T2/10, T10/22 and T1/15/18/22.

      We thank the reviewer for pointing this out. Our RNA-seq. dataset consisted of three embryos that contained trisomies only and four embryos that contained monosomies only. When reanalyzing our data we found different transcriptomic responses between monosomic only and trisomic only cells. Compared to euploid cells, monosomy only cells activate mainly the p53pathway and protein secretion while translation, DNA replication, cell cycle G1/S, DNA synthesis and processing of DNA double strand breaks were inhibited. Trisomy only cells show activated oxidative phosphorylation, ribosome and translation while protein secretion, apoptosis and cell cycle are inhibited. These differences were confirmed by testing transcriptomic differences between trisomic versus monosomic cells. Our results are similar to studies on human embryos20,26 and other monosomic and trisomic cell lines44,45. However, the interpretation of these results is very limited by the small sample size and the comparison of monosomies and trisomies of different chromosomes. Thus, we decided to keep this analysis out of the manuscript.

      Author response image 1.

      On the protein level, next to the small sample size, our results were also limited by the fact that not all embryos were stained with the same combinations of antibodies. LC3B was the only protein for which all embryos were immunostained. Thus, other protein data could not be re-analyzed due to even lower sample sizes. 

      Below we have separated the LC3B puncta per cell counts into euploid, trisomies only, monosomies only and all other aneuploid embryos. We performed a Kruskal Wallis test with multiple comparisons. It is worth noticing that the difference between euploid and monosomies only (and those that contained both) was statistically significant, while the difference between euploid vs trisomies only and trisomies only vs monosomies only was not statistically significant. These differences contradict the studies on monosomic cell lines that found that proteotoxic stress and autophagy are not present and specific to trisomic cell lines. Here we also decided to keep this specific protein expression analysis out of the manuscript due to the above-mentioned limitations.

      Author response image 2.

      (7) Line 329: "a trisomy 12 meiotic chromosomal abnormality in one reversine-treated embryo." What does it mean? Why meiotic chromosomal abnormality when the reversine treatment was administered 4 days after fertilization? In the discussion, the authors state "presumed meiotic," but this should be discussed and described more clearly.

      Since reversine induces mitotic abnormalities of different types leading to chromosomally mosaic embryos, we could not identify these induced abnormalities using inferCNV on the RNAseq of TE biopsies of said embryos. However, we were not aware of the karyotype of the embryos that were used for these experiments, as they were thawed after they had been cryopreserved at day 3 of development and had not been subjected to genetic testing.  This makes it possible that some of those embryos we used for the reversine experiments in fact carried endogenously acquired meiotic and mitotic chromosomal abnormalities. Since we are only able to detect by inferCNV aneuploidies homogeneously present in the majority of the cells of the sequenced biopsy, we only picked up this trisomy 12.  It is possible that this was not a meiotic abnormality but a miotic one originating at the first cleavage and present at a high percentage of cells in the blastocyst. At any rate, the exact origin of this aneuploidy has no further implications for the results of the study. We clarified this in the manuscript (lines 310-315).

      (8) Line 422: "The gene expression profiles suggest that the accumulation of autophagic proteins in aneuploid embryos is caused by increased autophagic flux due to differential expression of the p53 target gene DNA Damage Regulated Autophagy Modulator-1 (DRAM1), rather than by inhibition of autophagy (Supplementary Table 2)." This is highly speculative, as the authors do not have any evidence to support this statement.

      To validate this finding we have now stained 7 euploid and 11 aneuploid embryos with a DRAM1 antibody. We found DRAM1 protein to be significantly enriched in the cytoplasm of TE cells but not in the ICM of aneuploid embryos when comparing with euploid embryos (Fig. 3s,t). This data is consistent with the finding that autophagy is increased in the TE and not the ICM of aneuploid human embryos. (Fig 4l-o). Potential implications of DRAM1 expression have been mentioned in the discussion.

      (9) The figure legends are confusing. They are mixed up with the methods and some key information are missing.

      We revised all figure legends accordingly and removed the experimental set-up figures from the manuscript to reduce any confusion. The methods section was revised and expanded.

      (10) In Figure 1, what is the difference between "activated" and "deregulated"?

      Since we analyzed our RNA-seq dataset with the method proposed by reviewer 1 we now generated normalized enrichment scores. The terms activated and deregulated are thus not present anymore.  

      (11) The p62 images are not really clear. There might be more puncta (not obvious, though), but the staining intensity seems lower in the representative images.  

      We do not agree with the reviewer that there might be more p62 puncta (purple), however, we agree that it was not clearly visible from the pictures. Below we show an example of the counting mask (in green) of the aneuploid embryo from figure 3i, where one can clearly appreciate that all the puncta are captured by the counting mask. In this case, the software counted 1704 puncta. To further clarify, we now added a zoom of a randomly chose ROI of the p62 staining’s to figure 3i.

      Author response image 3.

      (12) The authors claim that there are differences between lineages in response to aneuploidy, such as autophagy not being activated in the OCT4+ lineage, etc. However, the differences are very small and based on a small number of embryos. It is difficult to draw far-reaching conclusions based on a small number of experiments (Fig. 4n-r). The authors also claim in the Abstract that they demonstrated "clear differences with previous findings in the mouse", which are however difficult to identify in the text.

      We agree with the reviewer that our conclusions on figures 4l-o were based on a small number of embryos. We have increased as much as possible the sample size. This is challenging due to the constrictions in accessing human embryos, and especially the limited number of embryos with meiotic complex aneuploidy. We have performed immunostainings for LC3B, OCT4 and GATA4 of six additional euploid and four additional aneuploid human embryos. This did not change our overall findings that aneuploid embryos upregulate autophagy in the TE rather than the ICM (Figure 4l-o). After the inclusion of additional embryos, we removed our speculation from the manuscript that autophagy is present in ICM cells of already differentiated cells towards EPI/PrE.

      We have rephrased the abstract to state that we highlight a few differences with previous findings in the mouse. Here we focused especially on the different transcriptomic response of reversine treated embryos, that aneuploid mouse embryos do not seem to suffer from lineage segregation errors and that the ICM of aneuploid human embryos lacks apoptosis while aneuploid mouse embryos show elimination from the EPI. Likewise, we highlighted the similar stress responses and that we could give novel insights into p53 mediated autophagy and apoptosis activation through DRAM1 in aneuploid TE cells but not the ICM.  

      (13) The text needs thorough editing - long sentences, typos, and grammar errors are frequent. Punctuation is largely missing.

      We have revised the text.

      References

      (1) Victor, A. R. et al. One hundred mosaic embryos transferred prospectively in a single clinic: exploring when and why they result in healthy pregnancies. Fertil Steril 111, 280–293 (2019).

      (2) Martin, A. et al. Mosaic results after preimplantation genetic testing for aneuploidy may be accompanied by changes in global gene expression. Front Mol Biosci 10, 264 (2023).

      (3) Martín, Á. et al. Trophectoderm cells of human mosaic embryos display increased apoptotic levels and impaired differentiation capacity: a molecular clue regarding their reproductive fate? Human Reproduction 39, 709–723 (2024).

      (4) Domingo-Muelas, A. et al. Human embryo live imaging reveals nuclear DNA shedding during blastocyst expansion and biopsy. Cell 186, 3166-3181.e18 (2023).

      (5) Loewer, A., Karanam, K., Mock, C. & Lahav, G. The p53 response in single cells is linearly correlated to the number of DNA breaks without a distinct threshold. BMC Biol 11, 1–13 (2013).

      (6) Kim, H., Watanabe, S., Kitamatsu, M., Watanabe, K. & Ohtsuki, T. Cell cycle dependence of apoptosis photo-triggered using peptide-photosensitizer conjugate. Scientific Reports 2020 10:1 10, 1–8 (2020).

      (7) Pollak, N. et al. Cell cycle progression and transmitotic apoptosis resistance promote escape from extrinsic apoptosis. J Cell Sci 134, (2021).

      (8) Neufeld, T. P. Autophagy and cell growth--the yin and yang of nutrient responses. J Cell Sci 125, 2359–2368 (2012).

      (9) Lanneau, D. et al. Heat shock proteins: essential proteins for apoptosis regulation. J Cell Mol Med 12, 743 (2008).

      (10) Gabai, V. L., Mabuchi, K., Mosser, D. D. & Sherman, M. Y. Hsp72 and Stress Kinase cjun N-Terminal Kinase Regulate the Bid-Dependent Pathway in Tumor Necrosis Factor-Induced Apoptosis. Mol Cell Biol 22, 3415 (2002).

      (11) Sanchez-Vasquez, E., Bronner, M. E. & Zernicka-Goetz, M. HIF1A contributes to the survival of aneuploid and mosaic pre-implantation embryos. bioRxiv 2023.09.04.556218 (2023) doi:10.1101/2023.09.04.556218.

      (12) Chousal, J. N. et al. Molecular profiling of human blastocysts reveals primitive endoderm defects among embryos of decreased implantation potential. Cell Rep 43, (2024).

      (13) Corujo-Simon, E., Radley, A. H. & Nichols, J. Evidence implicating sequential commitment of the founder lineages in the human blastocyst by order of hypoblast gene activation. Development (Cambridge) 150, (2023).

      (14) Regin, M. et al. Lineage segregation in human pre-implantation embryos is specified by YAP1 and TEAD1. Human Reproduction 38, 1484–1498 (2023).

      (15) Saiz, N., Williams, K. M., Seshan, V. E. & Hadjantonakis, A. K. Asynchronous fate decisions by single cells collectively ensure consistent lineage composition in the mouse blastocyst. Nature Communications 2016 7:1 7, 1–14 (2016).

      (16) Plusa, B., Piliszek, A., Frankenberg, S., Artus, J. & Hadjantonakis, A. K. Distinct sequential cell behaviours direct primitive endoderm formation in the mouse blastocyst. Development 135, 3081–3091 (2008).

      (17) Hashimoto, M. & Sasaki, H. Epiblast Formation by TEAD-YAP-Dependent Expression of Pluripotency Factors and Competitive Elimination of Unspecified Cells. Dev Cell 50, 139-154.e5 (2019).

      (18) Hardy, K. Apoptosis in the human embryo. Rev Reprod 4, 125–134 (1999).

      (19) Ramos-Ibeas, P. et al. Embryo responses to stress induced by assisted reproductive technologies. Mol Reprod Dev 86, 1292–1306 (2019).

      (20) Licciardi, F. et al. Human blastocysts of normal and abnormal karyotypes display distinct transcriptome profiles. Sci Rep 8, 1–9 (2018).

      (21) Maxwell, S. M. et al. Investigation of Global Gene Expression of Human Blastocysts Diagnosed as Mosaic using Next-generation Sequencing. Reproductive Sciences 1–11 (2022) doi:10.1007/s43032-022-00899-x.

      (22) Groff, A. F. et al. RNA-seq as a tool for evaluating human embryo competence. Genome Res 29, 1705–1718 (2019).

      (23) Starostik, M. R., Sosin, O. A. & McCoy, R. C. Single-cell analysis of human embryos reveals diverse patterns of aneuploidy and mosaicism. Genome Res 30, 814–826 (2020).

      (24) Vera-Rodriguez, M., Chavez, S. L., Rubio, C., Pera, R. A. R. & Simon, C. Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis. Nat Commun 6, 7601 (2015).

      (25) Sanchez-Ribas, I. et al. Transcriptomic behavior of genes associated with chromosome 21 aneuploidies in early embryo development. Fertil Steril 111, 991-1001.e2 (2019).

      (26) Fuchs Weizman, N. et al. Towards Improving Embryo Prioritization: Parallel Next Generation Sequencing of DNA and RNA from a Single Trophectoderm Biopsy. Sci Rep 9, 1–11 (2019).

      (27) Fernandez Gallardo, E. et al. A multi-omics genome-and-transcriptome single-cell atlas of human preimplantation embryogenesis reveals the cellular and molecular impact of chromosome instability. bioRxiv 2023.03.08.530586 (2023) doi:10.1101/2023.03.08.530586.

      (28) Dürrbaum, M. & Storchová, Z. Effects of aneuploidy on gene expression: implications for cancer. FEBS J 283, 791–802 (2016).

      (29) Singla, S., Iwamoto-Stohl, L. K., Zhu, M. & Zernicka-Goetz, M. Autophagy-mediated apoptosis eliminates aneuploid cells in a mouse model of chromosome mosaicism. Nat Commun 11, 1–15 (2020).

      (30) Bolton, H. et al. Mouse model of chromosome mosaicism reveals lineage-specific depletion of aneuploid cells and normal developmental potential. Nat Commun 7, 1– 12 (2016).

      (31) Ohashi, A. et al. Aneuploidy generates proteotoxic stress and DNA damage concurrently with p53-mediated post-mitotic apoptosis in SAC-impaired cells. Nat Commun 6, 1–16 (2015).

      (32) Santaguida, S. & Amon, A. Short- and long-term effects of chromosome missegregation and aneuploidy. Nature Reviews Molecular Cell Biology vol. 16 473–485 Preprint at https://doi.org/10.1038/nrm4025 (2015).

      (33) Santaguida, S., Vasile, E., White, E. & Amon, A. Aneuploidy-induced cellular stresses limit autophagic degradation. Genes Dev 29, 2010–2021 (2015).

      (34) Chunduri, N. K. & Storchová, Z. The diverse consequences of aneuploidy. Nature Cell Biology 2019 21:1 21, 54–62 (2019).

      (35) Dürrbaum, M. et al. Unique features of the transcriptional response to model aneuploidy in human cells. BMC Genomics 15, 139 (2014).

      (36) Pan, J.-A., Ullman, E., Dou, Z. & Zong, W.-X. Inhibition of protein degradation induces apoptosis through a microtubule-associated protein 1 light chain 3-mediated activation of caspase-8 at intracellular membranes. Mol Cell Biol 31, 3158–70 (2011).

      (37) Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol Syst Biol 8, 608 (2012).

      (38) Tang, Y.-C., Williams, B. R., Siegel, J. J. & Amon, A. Identification of aneuploidyselective antiproliferation compounds. Cell 144, 499–512 (2011).

      (39) Janssen, A., Van Der Burg, M., Szuhai, K., Kops, G. J. P. L. & Medema, R. H. Chromosome segregation errors as a cause of DNA damage and structural chromosome aberrations. Science 333, 1895–1898 (2011).

      (40) Li, M. et al. The ATM-p53 pathway suppresses aneuploidy-induced tumorigenesis. Proc Natl Acad Sci U S A 107, 14188–14193 (2010).

      (41) Thompson, S. L. & Compton, D. A. Proliferation of aneuploid human cells is limited by a p53-dependent mechanism. J Cell Biol 188, 369–381 (2010).

      (42) Yang, M. et al. Depletion of aneuploid cells in human embryos and gastruloids. Nat Cell Biol 23, 314–321 (2021).

      (43) Jones, R. G. et al. AMP-activated protein kinase induces a p53-dependent metabolic checkpoint. Mol Cell 18, 283–293 (2005).

      (44) Chunduri, N. K., Barthel, K. & Storchova, Z. Consequences of Chromosome Loss: Why Do Cells Need Each Chromosome Twice? Cells 2022, Vol. 11, Page 1530 11, 1530 (2022).

      (45) Krivega, M., Stiefel, C. M. & Storchova, Z. Consequences of chromosome gain: A new view on trisomy syndromes. American Journal of Human Genetics vol. 109 2126–2140 Preprint at https://doi.org/10.1016/j.ajhg.2022.10.014 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Although the reviewers found our work interesting, they raised several important concerns about our study. To address these concerns, mostly we performed new experiments. The most important changes are highlighted in the summary paragraphs.

      First, in response to Reviewer 1’s suggestions, we have conducted the SFN experiments systematically, e.g., we further confirmed the mechanism of SFN-activated TFEB in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in HeLa NPC1 cells (New Fig. S3). The effect of SFN on NPC1 expression (New Fig. S5). Particularly, we examined the colocalization of DiO (a PM marker) staining and surface LAMP1 staining in HeLa NPC1 cells under SFN treatment to confirm the PM exocytosis. In main text and figure legends, accuracy of sentence is thoroughly checked and defined. Hence, we have significantly improved the presentation and clarity in the revision.

      Second, in response to Reviewer 2’s suggestions, we have performed additional experiments to demonstrate that the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells (New Fig. S7B). In TFEB KO cells, this increase of surface LAMP1 signal by SFN treatment was significantly reduced, suggestive of SFN-induced exocytosis in a TFEB-dependent manner. We also investigated the effect of U18666A on CF555-dextran endocytosis. By examining the localization of CF-dex and Lamp1, we found that CF555 is present in the lysosome with U18666A treatment (Fig for reviewers only A,B), suggesting that NPC1 deficiency/U18666A treatment has no effect on CF-dex endocytosis.

      Third, in response to Reviewer 3’s suggestions, we have performed experiments in addition to response to other reviewers’ suggestion ie. the cytotoxicity of the concentration of SFN used in this study in various cell lines (New Fig.S10).

      In addition, according to the reviewers’ suggestions, we made clarifications and corrections wherever appropriate in the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors are trying to determine if SFN treatment results in dephosphorylation of TFEB, subsequent activation of autophagy-related genes, exocytosis of lysosomes, and reduction in lysosomal cholesterol levels in models of NPC disease.

      Strengths:

      (1) Clear evidence that SFN results in translocation of TFEB to the nucleus.

      (2) In vivo data demonstrating that SFN can rescue Purkinje neuron number and weight in NPC1<sup>-/-</sup> animals.

      Thank you for the support!

      Weaknesses:

      (1) Lack of molecular details regarding how SFN results in dephosphorylation of TFEB leading to activation of the aforementioned pathways. Currently, datasets represent correlations.

      Thank you for raising this critical point! The reviewer is right that in this manuscript we did not talk too much about the molecular mechanism of SFN-evoked TFEB activation. Because in our previous study (Li, Shao et al. 2021), we explored the mechanism of SFN-induced TFEB activation. We show that SFN-evoked TFEB activation via a ROS-Ca<sup>2+</sup>-calcineurin dependent but MTOR -independent pathway (Li, Shao et al. 2021). In the current manuscript, we cited this paper, but did not talk the details of the mechanism, which obviously confused the reviewers. Therefore, in the revision manuscript we added more details of the molecular mechanism of SFN-activated TFEB. Also, we further confirmed this mechanism in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in NPC cells (New Fig.S3).

      (2) Based on the manuscript narrative, discussion, and data it is unclear exactly how steady-state cholesterol would change in models of NPC disease following SFN treatment. Yes, there is good evidence that lysosomal flux to (and presumably across) the plasma membrane increases with SFN. However, lysosomal biogenesis genes also seem to be increasing. Given that NPC inhibition, NPC1 knockout, or NPC1 disease mutations are constitutively present and the cell models of NPC disease contain lysosomes (even with SFN) how could a simple increase in lysosomal flux decrease cholesterol levels? It would seem important to quantify the number of lysosomes per cell in each condition to begin to disentangle differences in steady state number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosed.

      Thank you for this constructive comment. From our data, in NPC1 cells SFN reduced the cholesterol levels by inducing lysosomal exocytosis and increasing lysosomal biogenesis. We understand the reviewer’s point that it would be really helpful to differentiate the exact three states of original number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosis. Unfortunately, due to the technique limitation, so far seems there is no appropriate method that could clearly differentiate the lysosomes exactly come from which state. In the future, hopefully we will have technique to explore this mechanism.

      (3) Lack of evidence supporting the authors' premise that "SFN could be a good therapeutic candidate for neuropathology in NPC disease".

      Suggestion was taken! We removed this sentence. Thanks!

      Reviewer #2 (Public review):

      (4) The in vivo experiments demonstrate the therapeutic potential of SFN for NPC. A clear dose response analysis would further strengthen the proposed therapeutic mechanism of SFN.

      Thank you for this constructive suggestion. We examined the effect of two doses of SFN30 and 50mg/kg on NPC mice. As shown in Fig.6, SFN (50mg/kg), but not 30mg/kg prevents a degree of Purkinje cell loss in the lobule IV/V of cerebellum, suggesting a dose-correlated preventive effect of SFN. In the future study, we will continue optimizing the dosage form and amount of SFN and do a dose-responsive analysis.

      (5) Additional data supporting the activation of TFEB by SFN for cholesterol clearance in vivo would strengthen the overall impact of the study.

      Thank the reviewer for this constructive comment. We have detected a significant decrease of pS211-TFEB protein in brain tissues of NPC mice upon SFN treatment compared to vehicle, suggesting that SFN activates TFEB in brain tissue for the first time. It is worth to further examine the lysosomal cholesterol levels in brain tissues to show the direct effect of SFN. However, in our hands and in the literatures Filipin seems not suitable for detecting lysosomal cholesterol accumulation in brain tissue. So far there isn’t a good method to directly measure lysosomal cholesterol in tissue.

      (6) In Figure 4, the authors demonstrate increased lysosomal exocytosis and biogenesis by SFN in NPC cells. Including a TFEB-KO/KD in this assay would provide additional validation of whether these effects are TFEB-dependent.

      Great suggestion! We investigated the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells. As shown in New Suppl. Fig. 7B, in TFEB KO cells, this increase of surface LAMP1 signal by SFN (15 μM, 12 h) treatment was significantly reduced, suggestive of SFN induced exocytosis in a TFEB-dependent manner.

      (7) For lysosomal pH measurement, the combination of pHrodo-dex and CF-dex enables ratiometric pH measurement. However, the pKa of pHrodo red-dex (according to Invitrogen) is ~6.8, while lysosomal pH is typically around 4.7. This discrepancy may account for the lack of observed lysosomal pH changes between WT and U18666A-treated cells. Notably, previous studies (PMID: 28742019) have reported an increase in lysosomal pH in U18666A-treated cells.

      We understand the reviewer’s point. But as stated in the methods and main text, we used pHrodo™ Green-Dextran (P35368, Invitrogen), rather than pHrodo Red-dextran. According to the product information from Invitrogen, pHrodo Green-dex conjugates are non-fluorescent at neural pH, but fluorescence bright green at acidic pH around 4, such as those in endosomes and lysosomes. Therefore, pHrodo Green-dex is suitable to monitor the acidity of lysosome (Hu, Li et al. 2022). We also used LysoTracker Red DND-99 (Thermo Scien fic, L7528) to measure lysosomal pH (Fig. 4G, H), which is consistent with results from pHrodo Green/CF measurement.

      The reviewer mentioned that previous studies have reported an increase in lysosomal pH in U18666Atreated cells. We understood this concern. But in our hands, from our data with two lysosomal pH sensors, we have not detected lysosomal pH change in U18666A-treated NPC1 cell models.

      (7) The authors are also encouraged to perform colocalization studies between CF-dex and a lysosomal marker, as some researchers may be concerned that NPC1 deficiency could reduce or block the trafficking of dextran along endocytosis.

      Thank you for raising this important point and suggestion was taken! We investigated the effect of NPC1 deficiency on CF555-dextran trafficking into lysosome by examining the localization of CF-dex and Lamp1. To clearly define whether CF555-dex is present in the lysosome, we first used apilimod to enlarge lysosomes and then examined the relative posi on of CF555-dex and lamp1. As shown in Author response image 1A,B, in HeLa cells treated with U18666A, CF555 signals (red) clearly present inside lysosome (LAMP1 labelled lysosomal membrane, green signal), suggesting that CF555dex endocytosis is not affected by NPC1 deficiency (U18666A treatment).

      Author response image 1.

      The effect of NPC1 deficiency on CF555 endocytosis. HeLa cells were transiently transfected with LAMP1-GFP plasmid for 24 h. Cells were then treated with apilimod (100 nM) for 2 h to enlarge the lysosomes, and followed by co- treatment of U18666A (2.5 μM, 24 h) and CF555 (12 h). (A)Each panel shows fluorescence images taken by confocal microscopes. (B) Each panel shows the fluorescence intensity of a line scan (white line) through the double labeled object indicated by the white arrow. Scale bar, 20 μm or 2 μm (for zoom-in images).

      (9) In vivo data supporting the activation of TFEB by SFN for cholesterol clearance would significantly enhance the impact of the study. For example, measuring whole-animal or brain cholesterol levels would provide stronger evidence of SFN's therapeutic potential.

      We really appreciate the reviewer’s comments. Please see response to point #5.

      Reviewer #3 (Public review):

      (10) The manuscript is extremely hard to read due to the writing; it needs careful editing for grammar and English.

      Sorry for the defects in the writing and grammar. We had thoroughly checked grammar and polished the English to improve the manuscript.

      (11) There are a number of important technical issues that need to be addressed.

      We will address the technical issues mentioned in the following ques ons.

      (12) The TFEB influence on filipin staining in Figure 1A is somewhat subtle. In the mCherry alone panels there is a transfected cell with no filipin staining and the mCherry-TFEBS211A cells still show some filipin staining.

      Thank you for raising this point. The reviewer is right that not all the mCherry alone cells with the same level of filipin signal and not all mCherry-TFEBS211 transfected cells show completely no filipin signal. The statistical results were from randomly selected cells from 3 independent experiments. To avoid the confusion, we have included more cells in the statistical analysis to cover all the conditions as shown in the new Fig. 1B. Hopefully this helps to clarify the confusion.

      (13) Figure 1C is impressive for the upregulation of filipin with U18666A treatment. However, SFN is used at 15 microM. This must be hitting multiple pathways. Vauzour et al (PMID: 20166144) use SFN at 10 nM to 1microM. Other manuscripts use it in the low microM range. The authors should repeat at least some key experiments using SFN at a range of concentrations from perhaps 100 nM to 5 microM. The use of 15 microM throughout is an overall concern.

      The reason that we use this concentration of SFN is based on our previous study (Li, Shao et al. 2021). We had shown that SFN (10–15 μM, 2–9 h) induces robust TFEB nuclear translocation in a dose- and time-dependent manner in HeLa cells as well as in other human cell lines without cytotoxicity (Li, Shao et al. 2021). Also, tissue concentrations of SFN can reach 3–30 μM upon broccoli consumption (Hu, Khor et al. 2006), so we used low micromolar concentrations of SFN (15 μM) in our study. Moreover, we further confirmed that SFN (15 μM) induces TFEB nuclear translocation in HeLa NPC1 cells (Fig. 1F, G Fig. 2B, G) and this concentration of SFN has no cytotoxicity (New Fig.S10).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The following comments are designed to improve and focus the authors' work.

      (14) Related to data in Figure 1. The mechanism through which TFEB can reduce Filipin in U18 conditions is unclear. Inhibi on of NPC1 results in hyperactivation of mTOR through cholesterol transport at ER-Lysosome contacts (see Zoncu group publications). If mTORC is hyperac ve in NPC disease models, TFEB would be expected to remain cytoplasmic and not enter the nucleus as the representative image in Figure 1A demonstrates.

      In our previous study (Li, Shao et al. 2021), we have shown that SFN induces TFEB nuclear translocation in a mTOR-independent manner (Li, Shao et al. 2021). Consistent with this result, in this study we confirmed that SFN-induced TFEB nuclear translocation is mTor-independent in NPC1 cells (Now Fig. S4A, B). Thus, SFN induced TFEB nuclear translocation in various NPC cells (Fig. 1F, G, Fig. 2B, G). Please also see the discussion about the mechanism of SFN in response to point #1.

      (15) Therefore, how does overexpression of TFEB, which remains in the cytoplasm, result in a decreased filipin signal? Similar ques ons relate to Figure 1C-H.

      Medina et. al (Medina, Fraldi et al. 2011) show that TFEB overexpression (not activation, so overexpressed TFEB is in the cytoplasm) increases the pool of lysosomes in the proximity of the plasma membrane and promotes their fusion with PM by raising intracellular Ca<sup>2+</sup> levels through lysosomal Ca<sup>2+</sup> channel MCOLN1, leading to increased lysosomal exocytosis. Hence, TFEB overexpression only (TFEB is not activated) could reduce filipin signal via increasing lysosomal exocytosis. And with TFEB agonist treatment such as TFEB could further boost this increase.

      (16) It would seem appropriate to measure the NPC1 and NPC2 proteins using western blot to ensure that SFN-dependent clearance of cholesterol is not due to enhanced expression of the native protein in U18-treated cells or enhanced folding of the protein in patient fibroblasts.

      Thank you for this constructive comment! Because NPC1 gene mutation takes about 95% of NPC cases and NPC2 mutation takes about 5% of NPC cases. And in this study we focused on NPC1 deficiency cases. Thus, we measured the effect of SFN on the expression of NPC1 in human NPC1-patient fibroblasts. Western blot analysis showed that SFN (15 μM, 24 h) treatment did not affect NPC1 expression in human NPC1-patient fibroblasts (new Fig. S5).

      (17) Related to data in Figures 1C-E. Controls are missing related to the effect SFN has on steady-state cholesterol levels. This may be insightful in providing information on the mode of action of this compound.

      Suggestion was taken! We have supplemented the control- SFN only in new Fig. 1C-E.

      (18) The mechanism that links SFN to TFEB-dependent translocation is suggested to involve calcineur independent dephosphorylation of TFEB. However, no data is provided. It would seem important to iden fy the mechanism(s) through which SFN positively regulates TFEB location. This would shift the manuscript and its model from correlations to causation. Experiments involving calcineurin inhibitors, or agonists of TRPML1 that have been reported as being a key source of Ca<sup>2+</sup> for calcineurin activation, may provide molecular insight.

      Please see the paragraph in response to point #1.

      (19) Related to Figure 4. Using a plasma membrane counterstain to quantify plasma membrane LAMP1 would increase the rigor of the analysis.

      Great idea! We examined the colocalization of DiO (a PM marker) staining and LAMP1 staining in HeLa NPC1 cells under SFN treatment. As shown in new Fig.4A, surface LAMP1 signal(red) colocalized with DiO (green), a PM marker.

      (20) Related to Figure 5. How do the authors explain the kinetic disparity between SFN treatment for 24 vs 72 hrs? IF TFEB is activated and promoting lysosomal biogenesis and increased lysosomal flux across the PM, why does cholesterol accumulation lag? Perhaps related to this point. Are other cholesterol metabolizing enzymes that may have altered activity in NPC sensitive to SFN? A similar comment applies to the Sterol regulatory element binding protein pathway, which has been shown to be activated in models of NPC disease.

      We understand the reviewer’s point. As shown in Fig. 5C, D, in NPC1<sup>-/-</sup> MEF cells, SFN treatment for 24 h showed relative weaker cholesterol clearance compared to the effects in human cells (Fig.1C, D, Fig.2.E, I). Thus, we explored a longer treatment of SFN for 72 h (fresh SFN in medium was added every 24 h), and 72h treatment of SFN exhibited substantial cholesterol reduction (Fig. 5C, D). This different effect could be attributed to the continuous action of SFN, which could prolong the exocytosis, leading to more effective cholesterol clearance. As shown in the DMSO-treated MEF cells, the cholesterol levels are similar in both 24 and 72 h, thus 24 h U18666A treatment has reached the upper limit of the accumulated cholesterol, longer treatment me would not change the cholesterol levels. Thus, cholesterol accumulation has no lag.

      We did not investigate whether SFN regulates other cholesterol metabolizing enzymes or sterol regulatory element binding proteins although we cannot rule out this possibility. In this study we mainly focus on the cholesterol clearance effect by SFN via TFEB-mediated pathways. From our data, TFEB KO could significantly diminish SFN-evoked cholesterol clearance. Hence, the effect of other cholesterol metabolizing enzymes or sterol regulatory element binding proteins maybe not as important as TFEB, thus out of scope of this study. In the future, we may explore the involvement of possible other pathways on SFN’s effects.

      (21) Related to Figure 7. The western blots for pS211-TFEB are poor. It's suggested that whole blots are shown to increase rigor.

      Thank you for the comments. We have represented the blots with more spare space to increase the rigor.

      (22) Data demonstrating the ability of SFN to improve Purkinje cell survival are exci ng and pair well with the weight analysis, however, to address the overall goal of determining if "SFN could be a good therapeutic candidate for neuropathology in NPC disease" survival analysis should be tested as well.

      Please see the paragraph in response to point #3.

      Minor

      (23) Throughout the manuscript many different Fonts and font sizes are used. This is very jarring to readers. It is suggested that a more uniform approach is taken to presenting these nice datasets.

      We are so sorry and apologize for these oversights. We have thoroughly checked all the manuscript to make sure that Fonts and sizes of font are synchronized.

      (24) Related to data presentation. In general, there is a lack of alignment and organization of the figures.

      So sorry about this. We have reorganized the figures to get them better aligned.

      (25) Line 149, SFN is missing.

      Corrected!

      Reviewer #3 (Recommendations for the authors):

      (26) In Figure 3 the authors should use multiple single siRNAs or perform a functional rescue to determine specificity.

      We understand the reviewer’s point. We did design several siRNAs and the efficiency of these siRNAs were validated. Finally, we decide use this siRNA whose knockdown efficiency is best in the study and the specificity of the siTFEB has been validated by Western blot as shown in Fig. 3A. Furthermore, we used TFEB knockout cells constructed by CRISPR/Cas9 to further examine the role of TFEB in SFN-induced cholesterol clearance (Fig. 3D). Consistently with the results in the siTFEB-transfected HeLa NPC1 cells (Fig. 3B, C), SFN failed to diminish cholesterol in HeLa TFEB KO cells. The result from TFEB KO cells is even convincing than siRNA experiment. We also performed a functional rescue of re-expressing TFEB in TFEB KO cells, in which SFN-induced cholesterol clearance was restored (Fig. 3E, F). Collectively, these data indicate that TFEB is required for lysosomal cholesterol reduction upon SFN treatment. Thus, we did not repeat this rescue experiment in the siTFEB-transfected HeLa NPC1 cells.

      (27) The label for 3D is missing.

      Corrected! Thanks!

      (28) Figure 4, although the authors use an an body against the luminal domain of LAMP1 there could s ll be some permeabilization. A marker of the plasma membrane would be helpful.

      Please see the response to point #19.

      (29) Figure 4, cholesterol in the media because of lysosome exocytosis. This is where the high concentration of SFN is of concern. Is there any cell death that could explain the result? The authors should test for cell death with the SFN treatment.

      Thank you for raising this important point! We have measured the cytotoxicity of SFN of the concentrations used in this study in various cell lines (New Fig.S10). Please also see the paragraph in response to point #13.

      (30) The blot in Figure 6A is unclear. It is very hard to see any change in pS211-TFEB levels, and, the blurry signal is the detection of phospho-TFEB is uncertain.

      Please see the summary paragraph in response to point #21.

      References:

      Hu, M. Q., P. Li, C. Wang, X. H. Feng, Q. Geng, W. Chen, M. Marthi, W. L. Zhang, C. L. Gao, W. Reid, J. Swanson, W. L. Du, R. Hume and H. X. Xu (2022). "Parkinson's disease-risk protein TMEM175 is a proton-activated proton channel in lysosomes." Cell 185(13): 2292-+.

      Hu, R., T. O. Khor, G. Shen, W. S. Jeong, V. Hebbar, C. Chen, C. Xu, B. Reddy, K. Chada and A. N. Kong (2006). "Cancer chemoprevention of intestinal polyposis in ApcMin/+ mice by sulforaphane, a natural product derived from cruciferous vegetable." Carcinogenesis 27(10): 2038-2046.

      Li, D., R. Shao, N. Wang, N. Zhou, K. Du, J. Shi, Y. Wang, Z. Zhao, X. Ye, X. Zhang and H. Xu (2021). "Sulforaphane Activates a lysosome-dependent transcriptional program to mitigate oxidative stress." Autophagy 17(4): 872-887.

      Medina, D. L., A. Fraldi, V. Bouche, F. Annunziata, G. Mansueto, C. Spampanato, C. Puri, A. Pignata, J. A. Martina, M. Sardiello, M. Palmieri, R. Polishchuk, R. Puertollano and A. Ballabio (2011). "Transcriptional activation of lysosomal exocytosis promotes cellular clearance." Dev Cell 21(3): 421-430.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study uses state-of-the-art methods to label endogenous dopamine receptors in a subset of Drosophila mushroom body neuronal types. The authors report that DopR1 and Dop2R receptors, which have opposing effects in intracellular cAMP, are present in axons termini of Kenyon cells, as well as those of two classes of dopaminergic neurons that innervate the mushroom body indicative of autocrine modulation by dopaminergic neurons. Additional experiments showing opposing effects of starvation on DopR1 and DopR2 levels in mushroom body neurons are consistent with a role for dopamine receptor levels increasing the efficiency of learned food-odour associations in starved flies. Supported by solid data, this is a valuable contribution to the field.

      We thank the editors for the assessment, but request to change “DopR2” to “Dop2R”. The dopamine receptors in Drosophila have confusing names, but what we characterized in this study are called Dop1R1 (according to the Flybase; aka DopR1, dDA1, Dumb) and Dop2R (ibid; aka Dd2R). DopR2 is the name of a different dopamine receptor.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      Strengths:

      The split-GFP approach allows visualization of subcellular enrichment of dopamine receptors in the plasma membrane of GAL4-expressing neurons allowing for a high level of specificity.

      The authors resolve the presynaptic localization of DopR1 and Dop2R, in "giant" Drosophila neurons differentiated from cytokinesis-arrested neuroblasts in culture as it is not clear in the lobes and calyx.

      Starvation-induced opposite responses of dopamine receptor expression in the PPL1 and PAM DANs provide key insights into models of appetitive learning.

      Starvation-induced increase in D2R allows for increased negative feedback that the authors test in D2R knockout flies where appetitive memory is diminished.

      This dual autoreceptor system is an attractive model for how amplitude and kinetics of dopamine release can be fine-tuned and controlled depending on the cellular function and this paper presents a good methodology to do it and a good system where the dynamics of dopamine release can be tested at the level of behavior.

      Weaknesses:

      LI measurements of Kenyon cells and lobes indicate that Dop2R was approximately twice as enriched in the lobe as the average density across the whole neuron, while the lobe enrichment of Dop1R1 was about 1.5 times the average, are these levels consistent during different times of the day and the state of the animal. How were these conditions controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      To answer this question, we repeated the experiment in two replicates at different times of day and confirmed that the receptor localization was consistent (Figure 3 – figure supplement 1); LI measurements showed that Dop2R is enriched more in the lobe and less in the calyx compared to Dop1R1 (Figure 3D). The states of animals that could affect LI (e.g. feeding state and anesthesia for sorting, see methods) were kept constant. 

      The authors assume without discussion as to why and how presynaptic enrichment of these receptors is similar in giant neurons and MB.

      In the revision, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." In addition, we found punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). Therefore, the giant neuron serves as an excellent model to study the presynaptic localization of dopamine receptors in isolated large cells.

      Figures 1-3 show the expensive expression of receptors in alpha and beta lobes while Figure 5 focusses on PAM and localization in γ and β' projections of PAM leading to the conclusion that presynaptic dopamine neurons express these and have feedback regulation. Consistency between lobes or discussion of these differences is important to consider.

      In the revised manuscript, we show data in the γ KCs (Figure 4C, Figure 5 - figure supplement 1) in addition to α/β KCs, and demonstrate the consistent synaptic localization of Dop1R1 and Dop2R as in α/β KCs (Figure 4B and 5A). 

      Receptor expression in any learning-related MBONs is not discussed, and it would be intriguing as how receptors are organized in those cells. Given that these PAMs input to both KCs and MBONs these will have to work in some coordination.

      The subcellular localization of dopamine receptors in MBONs indeed provides important insights into the site of dopaminergic signaling in these neurons (Takemura et al., 2017; Pavlowsky et al., 2018; Pribbenow et al., 2022). Therefore, we added new data for Dop1R1 and Dop2R in MBON-γ1pedc>αβ (Figure 6). Interestingly, these receptors are localized to in the dendritic projection in the γ1 compartment as well as presynaptic boutons (Figure 6). 

      Although authors use the D2R enhancement post starvation to show that knocking down receptors eliminated appetitive memory, the knocking out is affecting multiple neurons within this circuit including PAMs and KCs. How does that account for the observed effect? Are those not important for appetitive learning? 

      In the appetitive memory experiment (Figure 9C), we knocked down Dop2R only in the select neurons of the PPL1 cluster, and this manipulation does not directly affect Dop2R expression in PAMs and KCs.

      Starvation-induced enhancement of Dop2R expression in the PPL1 neurons (Figure 8F) would attenuate their outputs and therefore disinhibit expression of appetitive memory in starved flies (Krashes et al., 2009). Consistently, Dop2R knock-down in PPL1 impaired appetitive memory in starved flies (Figure 9C). We revised the corresponding text to make this point clearer (Lines #224227).

      The evidence for fine-tuning is completely based on receptor expression and one behavioral outcome which could result from many possibilities. It is not clear if this fine-tuning and presynaptic feedback regulation-based dopamine release is a clear possibility. Alternate hypotheses and outcomes could be considered in the model as it is not completely substantiated by data at least as presented.

      The reviewer’s concern is valid, and the presynaptic dopamine tuning by autoreceptors may need more experimental support. We therefore additionally discussed another possibility (Lines #289-291): “Alternatively, these presynaptic receptors could potentially receive extrasynaptic dopamine released from other DANs. Therefore, the autoreceptor functions need to be experimentally clarified by manipulating the receptor expression in DANs.”

      Reviewer #2 (Public Review):

      Summary:

      Hiramatsu et al. investigated how cognate neurotransmitter receptors with antagonizing downstream effects localize within neurons when co-expressed. They focus on mapping the localization of the dopaminergic Dop1R1 and Dop2R receptors, which correspond to the mammalian D1- and D2-like dopamine receptors, which have opposing effects on intracellular cAMP levels, in neurons of the Drosophila mushroom body (MB). To visualize specific receptors in single neuron types within the crowded MB neuropil, the authors use existing dopamine receptor alleles tagged with 7 copies of split GFP to target reconstitution of GFP tags only in the neurons of interest as a read-out of receptor localization. The authors show that both Dop1R1 and Dop2R, with differing degrees, are enriched in axonal compartments of both the Kenyon Cells cholinergic presynaptic inputs and in different dopamine neurons (DANs), which project axons to the MB. Co-localization studies of dopamine receptors with the presynaptic marker Brp suggest that Dop1R1 and, to a larger extent Dop2R, localize in the proximity of release sites. This localization pattern in DANs suggests that Dop1R1 and Dop2R work in dual-feedback regulation as autoreceptors. Finally, they provide evidence that the balance of Dop1R1 and Dop2R in the axons of two different DAN populations is differentially modulated by starvation and that this regulation plays a role in regulating appetitive behaviors.

      Strengths:

      The authors use reconstitution of GFP fluorescence of split GFP tags knocked into the endogenous locus at the C-terminus of the dopamine receptors as a readout of dopamine receptor localization. This elegant approach preserves the endogenous transcriptional and post-transcriptional regulation of the receptor, which is essential for studies of protein localization.

      The study focuses on mapping the localization of dopamine receptors in neurons of the mushroom body. This is an excellent choice of system to address the question posed in this study, as the neurons are well-studied, and their connections are carefully reconstructed in the mushroom body connectome. Furthermore, the role of this circuit in different behaviors and associative memory permits the linking of patterns of receptor localization to circuit function and resulting behavior. Because of these features, the authors can provide evidence that two antagonizing dopamine receptors can act as autoreceptors within the axonal compartment of MB innervating DANs. The differential regulation of the balance of the two receptors under starvation in two distinct DAN innervations provides evidence of the role that regulation of this balance can play in circuit function and behavioral output.

      Weaknesses:

      The approach of using endogenously tagged alleles to study localization is a strength of this study, but the authors do not provide sufficient evidence that the insertion of 7 copies of split GFP to the C terminus of the dopamine receptors does not interfere with the endogenous localization pattern or function. Both sets of tagged alleles (1X Venus and 7X split GFP tagged) were previously reported (Kondo et al., 2020), but only the 1X Venus tagged alleles were further functionally validated in assays of olfactory appetitive memory. Despite the smaller size of the 7X split-GFP array tag knocked into the same location as the 1X venus tag, the reconstitution of 7 copies of GFP at the C terminus of the dopamine receptor, might substantially increase the molecular bulk at this site, potentially impeding the function of the receptor more significantly than the smaller, single Venus tag. The data presented by Kondo et al. 2020, is insufficient to conclude that the two alleles are equivalent.

      In the revision, we validated the function of these engineered receptors by a new set of olfactory learning experiments. Both these receptors in KCs were shown to be required for aversive memory (Kim et al., 2007, Scholz-Kornehl et al., 2016). As in the anatomical experiments, we induced GFP110 expression in KC of the flies homozygous for 7xGFP<sub>11</sub>-tagged receptors using MB-Switch and 3 days of RU486 feeding o. We confirmed STM performance of these flies were not significantly different from the control (Figure 2 – figure supplement 1). Thus, these fusion receptors are functional.

      The authors' conclusion that the receptors localize to presynaptic sites is weak. The analysis of the colocalization of the active zone marker Brp whole-brain staining with dopamine receptors labeled in specific neurons is insufficient to conclude that the receptors are localized at presynaptic sites. Given the highly crowded neuropil environment, the data cannot differentiate between the receptor localization postsynaptic to a dopamine release site or at a presynaptic site within the same neuron. The known distribution of presynaptic sites within the neurons analyzed in the study provides evidence that the receptors are enriched in axonal compartments, but co-labeling of presynaptic sites and receptors in the same neuron or super-resolution methods are needed to provide evidence of receptor localization at active zones.  The data presented in Figures 5K-5L provides compelling evidence that the receptors localize to neuronal varicosities in DANs where the receptors could play a role as autoreceptors.

      Given the highly crowded environment of the mushroom body neuropil, the analysis of dopamine receptor localization in Kenyon cells is not conclusive. The data is sufficient to conclude that the receptors are preferentially localizing to the axonal compartment of Kenyon cells, but co-localization with brain-wide Brp active zone immunostaining is not sufficient to determine if the receptor localizes juxtaposed to dopaminergic release sites, in proximity of release sites in Kenyon cells, or both.

      To better resolve the microcircuits of KCs, we triple-labeled the plasma membrane and DAR::rGFP in KCs, and Brp, and examined their localizations with high-resolution imaging with  Airyscan. This strategy revealed the receptor clusters associated with Brp accumulation within KCs (Figure 4). To further verify the association of DARs and active zones within KCs, we co-expressed Brp<sup>short</sup>::mStraw and GFP<sub>1-10</sub> and confirmed their colocalization (Figure 5A), suggesting presynaptic localization of DARs in KCs. With these additional characterizations, we now discuss the significance of receptors at the presynaptic sites of KCs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      For Figure 1, the authors show PAM, PPL1 neurons, and the ellipsoid body as a validation of their tools (Dop1R1-T2A-GAL4 and Dop2R-T2A-GAL4) and the idea that these receptors are colocalized. However, it appears that the technique was applied to the whole brain so it would be great to see the whole brain to understand how much labelling is specific and how stochastic. Methods could include how dissection conditions were controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      The expression patterns of the receptor T2A-GAL4 lines (Figure 1A and 1B) are consistent in the multiple whole brains (Kondo et al., 2020, Author response image 1).

      Author response image 1.

      The significance of the expression of these two receptors in an active zone is not clearly discussed and presynaptic localization is not elaborated on. Would something like expansion microscopy be useful in resolving this? It would be important to discuss that as giant neurons in culture don't replicate many aspects of the MB system.

      In the revised manuscript, we elaborated discussion regarding the function of the two antagonizing receptors at the AZ (Lines #226-275).

      Does MB-GeneSwitch > GFP1-1 reliably express in gamma lobes? Most of the figures show alpha/beta lobes.

      Yes. MB-GeneSwitch is also expressed in γ KCs, but weakly. 12 hours of RU486 feeding, which we did in the previous experiments, was insufficient to induce GFP reconstitution in the γ KCs. By extending the time of transgene induction, we visualized expression of Dop1R1 and Dop2R more clearly in γ KCs. Their localization is similar to that in the α/β KCs (Figure 4C, Figure 5 - figure supplement 1).

      Figure 6, y-axis says protein level. At first, I thought it was related to starvation so maybe authors can be more specific as the protein level doesn't indicate any aspect of starvation.

      We appreciate this comment, and the labels on the y-axis were now changed to “rGFP levels” (Figure 8C and 8F, Figure 8 - figure supplement 1B, 1D and 1F).

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The title of the manuscript focuses on the tagging of the receptors and their synaptic enrichment.

      Given that the alleles used in the study were generated in a previously published study (Kondo et al, 2020), which describes the receptor tagging and that the data currently provided is insufficient to conclude that the receptors are localizing to synapses, the title should be changed to reflect the focus on localizing antagonistic cognate neurotransmitter receptors in the same neuron and their putative role as autoreceptors in DANs.

      Following this advice, we removed the methodology from the title and revised it to “Synaptic enrichment and dynamic regulation of the two opposing dopamine receptors within the same neurons”.

      Minor issues with text and figures:

      Figure 1

      A conclusion from Figure 1 is that the two receptors are co-expressed in Kenyon cells. Please provide panels equivalent to the ones shown in D-G, with Kenyon cells cell bodies, or mark these cells in the existing panels, if present. Line 111 refers to panel 1D as the Kenyon cells panel, which is currently a PAM panel.

      We added images for coexpression of these receptors in the cell bodies of KCs (Figure 1 - figure supplement 1) and revised the text accordingly (Lines #89-90).

      Given that most of the study centers on visualizing receptor localization, it would benefit the reader to include labels in Figure 1 that help understand that these panels reflect expression patterns rather than receptor localization. For instance, rCD2::GFP could be indicated in the Dop1R1-LexA panels.

      As suggested, labels were added to indicate the UAS and lexAop markers (Figure 1D, 1E, 1G-1I and Figure 1 – figure supplement 1).

      Given that panels D-E focus on the cell bodies of the neurons, it could be beneficial for the reader to present the ellipsoid body neurons using a similar view that only shows the cell bodies. Similarly, one could just show the glial cell bodies .

      We now show the cell bodies of ring neurons (Figure 1G) and ensheathing glia (Figure 1I).

      For panel 1E, please indicate the subset of PPL1 neurons that both expressed Dop1R1 and Dop2R, as indicated in the text, as it is currently unclear from the image.

      Dop1R1-T2A-LexA was barely detected in all PPL1 (Figure 1E). We corrected the confusing text (Lines #95-96).

      Figure 2

      The cartoon of the cell-type-specific labeling should show that the tag is 7XFP-11 and the UAScomponent FP-10, as the current cartoon leads the reader to conclude that the receptors are tagged with a single copy of split GFP. The detail that the receptors are tagged with 7 copies of split GFP is only provided through the genotype of the allele in the resource table.  This design aspect should be made clear in the figure and the text when describing the allele and approach used to tag receptors in specific neuron types.

      We now added the construct design in the scheme (Figure 2A) and revised the corresponding text (Line #101-103).

      Panel A. The arrow representing the endogenous promoter in the yellow gene representation should be placed at the beginning of the coding sequence. Currently, the different colors of what I assume are coding (yellow) and non-coding (white) transcript regions are not described in the legend.  I would omit these or represent them in the same color as thinner boxes if the authors want to emphasize that the tag is inserted at the C terminus within the endogenous locus.

      The color scheme was revised to be more consistent and intuitive (Figure 2A).

      Figure 3

      Labels of the calyx and MB lobes would benefit readers not as familiar with the system used in the study. In addition, it would be beneficial to the reader to indicate in panel A the location of the compartments analyzed in panel H (e.g., peduncle, α3).

      Figure 3A was amended to clearly indicate the analyzed MB compartments.

      Adding frontal and sagittal to panels B-E, as in Figure 2, would help the reader interpret the data. 

      In Figure 3B, “Frontal” and “Sagittal” were indicated.

      Panel F-G. A scale bar should be provided for the data shown in the insets. Could the author comment on the localization of Dop1R1 in KCs? The data in the current panel suggests that only a subset of KCs express high levels of receptors in their axons, as a portion of the membrane is devoid of receptor signals. This would be in line with differential dopamine receptor expression in subsets of Kenyon cells, as shown in Kondo et al., 2020, which is currently not commented on in the paper. 

      We confirmed that the majority of the KCs express both Dop1R1 and Dop2R genes (Figure 1 - figure supplement 1). LIs should be compared within the same cells rather than the differences of protein levels between cell types as they also reflect the GAL4 expression levels. 

      Panel H. Some P values are shown as n.s. (p> 0.05). Other non-significant p values in this panel and in other figures throughout the paper are instead reported (e.g. peduncle P=0.164). For consistency, please report the values as n.s. as indicated in the methods for all non-significant tests in this panel and throughout the manuscript.

      We now present the new dataset, and the graph represents the appropriate statistical results (Figure 3D; see the methods section for details).

      The methods of labeling the receptors through the expression of the GeneSwitch-controlled GFP1-10 in Kenyon cells induced by RU486 are not provided in the methods. Please provide a description of this as referenced in the figure legend and the genotypes used in the analysis shown in the panels.

      The method of RU486 feeding has been added. We apologize for the missing method.

      Figure 4

      Please provide scale bars for the inset in panels A-B.

      Scale bars were added to all confocal images.

      The current analysis cannot distinguish between postsynaptic and presynaptic dopamine receptors in KCs, and the figure title should reflect this.

      We now present the new data dopamine receptors in KCs and clearly distinguish Brp clusters of the KCs and other cell types (Figure 4, Figure 5).

      The reader could benefit from additional details of using the giant neuron model, as it is not commonly used, and it is not clear how to relate this to interpret the localization of dopaminergic receptors within Kenyon cells. The use of the venus-tagged receptor variant should be introduced in the text, as using a different allele currently lacks context. Figures 4F-4J show that the receptor is localizing throughout the neuron. Quantifying the fraction of receptor signal colocalizing with Brp could aid in interpreting the data.  However, it would still not be clear how to interpret this data in the context of understanding the localization of the receptors in neurons within fly brain circuits. In the absence of additional data, the data provided in Figure 4 is inconclusive and could be omitted, keeping the focus of the study on the analysis of the two receptors in DANs. Co-expressing a presynaptic marker in Kenyon cells (e.g., by expressing Brp::SNAP)  in conjunction with rGFP labeled receptor would provide additional evidence of the relationship of release sites in Kenyon cells and tagged dopamine receptors in these same cells and could add evidence in support to the current conclusion.

      Following the advice, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." Therefore, the giant neuron serves as an excellent model to study the presynaptic localization in large cells in isolation.

      To clarify polarized localization of Brp clusters and dopamine receptors but not "localizing throughout the neuron", we now show less magnified data (Figure 5C). It clearly demonstrates punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). This is the same membrane segment where Dop1R1 and Dop2R are localized (Figure 5C). Therefore, the association of Brp clusters and the dopamine receptors in the isolated giant neurons suggests that the subcellular localization in the brain neurons is independent of the circuit context. 

      As the giant neurons do not form intermingled circuits, venus-tagged receptors are sufficient for this experiment and simpler in genetics.

      Following the suggestion to clarify the AZ association of the receptors in KCs, we coexpressed Brpshort-mStraw and GFP1-10 in KCs and confirmed their colocalization (Figure 5A).

      Figure 6

      The data and analysis show that starvation induces changes in the α3 compartment in PPL1 neurons only, while the data provided shows no significant change for PPL1 neurons innervating other MB compartments. This should be clearly stated in lines 174-175, as it is implied that there is a difference in the analysis for compartments other than α3. Panel L of Figure 6 - supplement 1 shows no significant change for all three compartments analyzed and should be indicated as n.s. in all instances, as stated in the methods. 

      We revised the text to clarify that the starvation-induced differences of Dop2R expression were not significant (Lines #217-219). The reason to highlight the α3 compartment is that both Dop1R1 and Dop2R are coexpressed in this PPL1 neuron (Figure 8D).

      Additional minor comments:

      There are a few typos and errors throughout the manuscript. The text should be carefully proofread to correct these. Here are the ones that came to my attention:

      Please reference all figure panels in the text. For instance, Figure 3A is not mentioned and should be revised in line 112 as Figure 3A-E.

      Lines 103-104. The sentence "LI was visualized as the color of the membrane signals" is unclear and should be revised. 

      Figure 4 legend - dendritic claws should likely be B and C and not B and E.

      Lines 147 - Incorrect figure panels, should be 5C-L or 5D-E.

      Line 241 - DNAs should be DANs.

      Methods - please define what the abbreviation CS stands for.

      We really appreciate for careful reading of this reviewer. All these were corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

      We appreciate the Editorial assessment on our paper’s strengths and novelty. We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning. Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these socalled micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      We have previously showed that neural replay of MEG activity representing the practiced skill was prominent during rest intervals of early learning, and that the replay density correlated with micro-offline gains (Buch et al., 2021). These findings are consistent with recent reports (from two different research groups) that hippocampal ripple density increases during these inter-practice rest periods, and predict offline learning gains (Chen et al., 2024; Sjøgård et al., 2024). However, decoder performance in our earlier work (Buch et al., 2021) left room for improvement. Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses:

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions.

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while online monitoring of head position was not performed for this study, it was assessed at the beginning and at the end of each recording. The head was restrained with an inflatable air bladder, and head movement between the beginning and end of each scan did not exceed 5mm for all participants included in the study.

      The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. We agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. However, such correlations between small head movements and finger movements could only meaningfully contribute to decoding performance if: (A) they were consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) they systematically varied between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is unlikely. Alternatively, for this task design a much more likely confound could be the contribution of eye movement artefacts to the decoder performance (an issue raised by Reviewer #3 in the comments below).

      Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may generate eye movements that are systematically related to the task. Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (triggered by a KeyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts) (end of figure legend).

      Remember that the task display does not provide explicit feedback related to performance, only information about the present position in the sequence. Thus, it is possible that participants did not actively attend to the feedback. In fact, inspection of the eye position data revealed that on majority of trials, participants displayed random-walk-like gaze patterns around a central fixation point located near the center of the screen. Thus, participants did not attend to the asterisk position on the display, but instead intrinsically generated the action sequence. A similar realworld example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks) as provided in the study task – feedback which is typically ignored by the user.

      The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued. The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals.(Buch et al., 2021; Classen et al., 1998; Karni et al., 1995; Kleim et al., 1998) Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known. Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported (Doyon et al., 2002; Grafton et al., 1992; Hardwick et al., 2013; Kennerley et al., 2004; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001), and appears to be even more prominent during early fine motor skill learning in the non-dominant hand (Lee et al., 2019; Sawamura et al., 2019). The frontal regions identified in these studies are known to play crucial roles in executive control (Battaglia-Mayer & Caminiti, 2019), motor planning (Toni, Thoenissen, et al., 2001), and working memory (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998) processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998), in addition to working memory (Grover et al., 2022). Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task. We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular for the following reasons. First, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications (Srinivas et al., 2016). One could also view this hybrid-space decoding approach as a spatial analogue to common timefrequency based analyses such as theta-gamma phase amplitude coupling (θ/γ PAC), which assess interactions between two or more narrow-band spectral features derived from the same time-series data (Lisman & Jensen, 2013).

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (Hybrid<sub>Alt</sub>) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (Hybrid<sub>Orig</sub>). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± 7.03% SD for Hybrid<sub>Orig</sub> vs. 75.49% ± 7.17% for Hybrid<sub>Alt</sub>; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04; Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. Hybrid<sub>Alt</sub>: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. Hybrid<sub>Orig</sub>: Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that Hybrid<sub>Orig</sub> (the approach used in our manuscript) significantly outperforms the Hybrid<sub>Alt</sub> approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns (end of figure legend).

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen.

      We agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated, an important confound in connectivity analyses (Colclough et al., 2015; Colclough et al., 2016), not performed in our investigation.

      In our study, correlations between adjacent voxels effectively reduce the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. – the rank is greater than 1), the intra-parcel spatial patterns could meaningfully contribute to the decoder performance, as shown by the following results:

      First, we obtained higher decoding accuracy with voxel-space features (74.51% ± 7.34% SD) compared to parcel space features (68.77% ± 7.6%; Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel space features. Second, individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding shows that correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside within.

      Author response image 3.:

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding (end of figure legend).

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment.

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics (Bansal et al., 2011; Mollazadeh et al., 2011) muscle activation patterns (Flint et al., 2012) and temporal sequencing (Churchland et al., 2012) during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies) (Heusser et al., 2016). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions".

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans assessed changes in functional connectivity patterns while participants performed a similar sequence learning task to our present study (Bassett et al., 2011). Using a dynamic network analysis approach, Bassett et al. showed that flexibility in the composition of individual network modules (i.e. – changes in functional brain region membership of orthogonal brain networks) is up-regulated in novel learning environments and explains differences in learning rates across individuals. Thus, consistent with our findings, it is likely that functional brain networks rapidly reconfigure during early learning of novel sequential motor skills.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning (Albouy et al., 2013; Albouy et al., 2012). For example, reactivation events in the posterior parietal (Qin et al., 1997) and medial prefrontal (Euston et al., 2007; Molle & Born, 2009) cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains (Frankland & Bontempi, 2005), including motor sequence learning (Albouy et al., 2015; Buch et al., 2021; F. Jacobacci et al., 2020). Further, synchronized interactions between MPFC and hippocampus are more prominent during early as opposed to later learning stages (Albouy et al., 2013; Gais et al., 2007; Sterpenich et al., 2009), perhaps reflecting “redistribution of hippocampal memories to MPFC” (Albouy et al., 2013). MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning (Euston et al., 2012). Consistently, coupling between hippocampus and MPFC has been shown during initial memory encoding and during subsequent rest (van Kesteren et al., 2010; van Kesteren et al., 2012). Importantly, MPFC activity during initial memory encoding predicts subsequent recall (Wagner et al., 1998). Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” (Albouy et al., 2012), also engaged in the development of an abstract representation of the sequence (Ashe et al., 2006). In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012) required during early learning (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012). The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice (Schendan et al., 2003), all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding (Morris, 2006; Tse et al., 2007). Thus, several prefrontal and frontoparietal regions contributing to long term learning (Berlot et al., 2020) are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning. We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here.

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power (Bonstrup et al., 2019) and neural replay density (Buch et al., 2021) during inter-practice rest periods) to observed micro-offline gains.

      Reviewer #2 (Public review):

      Summary

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond.

      Strengths

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea.

      Weaknesses

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation.

      We now include a new control analysis that addresses this issue as well as additional re-examination of previously reported results with respect to this issue – all of which are inconsistent with this alternative explanation that “contextualization” reflects a change in mixing of keypress related MEG features as opposed to a change in the underlying representations themselves. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the “4-4” transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization related changes to the underlying neural representations.

      We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis in the revised manuscript.

      We also re-examined our previously reported classification results with respect to this issue. We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization.

      Based upon the increased overlap between adjacent index finger keypresses (i.e. – “4-4” transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features.

      In summary, both re-examination of previously reported data and new control analyses all converged on the idea that the proximity between keypresses does not explain contextualization.

      We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study.

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3 — figure supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans. This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself.

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the KeyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses. We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the KeyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder. Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the KeyDown event (t<sub>0</sub> = 0 ms). We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window. Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study. Future work in our lab, as pointed out above, are investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well.

      The Reviewer suggests that the current data is not enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last Index<sub>OP5</sub> and first Index<sub>OP1</sub> from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Figure 5 – figure supplement 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest periods.

      With respect to the second concern, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the original manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out. When quantifying online changes in contextualization from the first Index<sub>OP1</sub> the last Index<sub>OP5</sub> keypress in the same trial we observed no learning-related trend (Figure 5 – figure supplement 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Figure 5 – figure supplement 6).

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals.

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multiscale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter).

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?).

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes – 1; e.g. – 3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space. We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for).

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above. We agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above reply to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would miss most learning effects on a task in which speed is the main learning metrics.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial is pre-planned before the first keypress is performed. This occurs in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes. The Reviewer is concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. Please, note that since neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence (Kornysheva et al., 2019), mixing effects are most likely present also for the first keypress in a trial.

      Separately, the Reviewer suggests that contextualization during early learning may reflect preplanning or online planning. This is an interesting proposal. Given the decoding time-window used in this investigation, we cannot dissect separate contributions of planning, memory and sensory feedback to contextualization. Taking advantage of the superior temporal resolution of MEG relative to fMRI tools, work under way in our lab is investigating decoding time-windows more appropriate to address each of these questions.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice). It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable.

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualizaton effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that most participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.

      The minimal participant engagement with the visual display in this explicit sequence learning motor task (which is highly generative in nature) contrasts markedly with behavior observed when reactive responses to stimulus cues are needed in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when comparing findings across studies using the two sequence learning tasks.

      The authors report a significant correlation between "offline differentiation" and cumulative microoffline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"?

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differentiation” vs micro-online gains, (2) “online differentiation” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Figure 5 – figure supplement  4, 5 and 6). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      We disagree with this statement. The original (Bonstrup et al., 2019) paper clearly states that micro-offline gains do not necessarily reflect offline learning in some cases and must be carefully interpreted based upon the behavioral context within which they are observed. Further, the paper lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning. In fact, the excellent meta-analysis of (Pan & Rickard, 2015), which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study (Bonstrup et al., 2019), as well as in all our subsequent work. Pan & Rickard state:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943 . It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks(Brawn et al., 2010; Rickard et al., 2008 . Rickard, Cai, Rieth, Jones, and Ard (2008 and Brawn, Fenn, Nusbaum, and Margoliash (2010 (Brawn et al., 2010; Rickard et al., 2008 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008 massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard make several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They state:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead (Pan & Rickard, 2015 . One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead (Pan & Rickard, 2015 . That design appears sufficient to eliminate at least the majority of the reactive inhibition effect (Brawn et al., 2010; Rickard et al., 2008 .”

      We mindfully incorporated recommendations from (Pan & Rickard, 2015) into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects.

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.” The initial (Bonstrup et al., 2019) report was followed up by a large online crowd-sourcing study (Bonstrup et al., 2020). This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 4 below for further details on these conditions).

      Author response image 4.

      This Figure shows that micro-offline gains o ser ed in learning and nonlearning contexts are attri uted to different underl ing causes. Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from (Bonstrup et al., 2019). During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also (Bonstrup et al., 2020)). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature (Brooks et al., 2024; Gupta & Rickard, 2022; Florencia Jacobacci et al., 2020), argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning. The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds (end of Fig legend).

      Evidence documented in that paper (Bonstrup et al., 2020) showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118); 3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) (Bonstrup et al., 2020). Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve (Pan & Rickard, 2015) refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects (Buch et al., 2021). Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study (Buch et al., 2021)) linked to micro-offline gains during early skill learning. These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice (Deleglise et al., 2023). Crucial to this point, Chen et al. (2024) and Sjøgård et al (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple density during rest periods (which are known markers for neural replay (Buzsaki, 2015)) in the human hippocampus (80-120 Hz) to micro-offline gains during early skill learning.

      Thus, there is now substantial converging evidence in humans across different indirect noninvasive and direct invasive recording techniques linking hippocampal activity, neural replay dynamics and offline performance gains in skill learning.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024).

      The recent work of (Gupta & Rickard, 2022, 2024) does not present any data that directly opposes our finding that early skill learning (Bonstrup et al., 2019) is expressed as micro-offline gains during rest breaks. These studies are an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) experimental design to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.

      To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning trials (only at retest 5 min later). Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods than early learning. In fact, we reported the same findings for trials following the early learning period in our original 2019 paper (Bonstrup et al., 2019) (Author response image 4). Please, note that we also reported that cumulative microoffline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later (Bonstrup et al., 2019) (see the Results section and further elaboration in the Discussion). We interpreted these findings as indicative that the mechanisms underlying offline gains over the micro-scale of seconds during early skill learning versus over minutes or hours very likely differ.

      In the recent preprint from (Das et al., 2024), the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data. The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”. The study utilizes a spaced vs. massed practice groups between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis.

      Crucially, their design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024). A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 5):

      Author response image 5.

      This figure shows (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original (Bonstrup et al., 2019) paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) (gaps in the red shaded area) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report (Bonstrup et al., 2019) (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) (Bonstrup et al., 2019) is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range (end of figure legend).

      Participants in the original (Bonstrup et al., 2019) experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 5). Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.

      In addition, the training interventions (i.e. – the practice schedule differences between the Spaced and Massed groups) were designed in a manner that minimized any chance of effectively testing their hypothesis. First, the interventions were applied over an extremely short period relative to the length of the total training session (5% and 12% of the total training session for Massed and Spaced groups, respectively; see gaps in the red shaded area in Author response image 5). Second, the intervention was applied during a period in which only half of the known total learning occurs. Specifically, we know from Bönstrup et al. (2019) that only 46.57% of the total performance gains occur in the practice interval covered by Das et al Training 1 intervention. Thus, early skill learning as evaluated by multiple groups (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024), is in the Das et al experiment amputated to about half.

      Furthermore, a substantial amount of learning takes place during Das et al’s Test 1 and Test 2 periods (32.49% of total gains combined). The fact that substantial learning is known to occur over both the Test 1 (18.06%) and Test 2 (14.43%) intervals presents a fundamental problem described by Pan and Rickard (Pan & Rickard, 2015). They reported that averaging over intervals where substantial performance gains occur (i.e. – performance is not stable) inject crucial artefacts into analyses of skill learning:

      “A large amount of averaging has the advantage of yielding more precise estimates of each subject’s pretest and posttest scores and hence more statistical power to detect a performance gain. However, calculation of gain scores using that strategy runs the risk that learning that occurs during the pretest and (or posttest periods (i.e., online learning is incorporated into the gain score (Rickard et al., 2008; Robertson et al., 2004 .”

      The above statement indicates that the Test 1 and Test 2 performance scores from Das et al. (2024) are substantially contaminated by the learning rate within these intervals. This is particularly problematic if the intervention design results in different Test 2 learning rates between the two groups. This in fact, is apparent in their data (Figure 1C,E of the Das et al., 2024 preprint) as the Test 2 learning rate for the Spaced group is negative (indicating a unique interference effect observable only for this group). Specifically, the Massed group continues to show an increase in performance during Test 2 and 4 relative to the last 10 seconds of practice during Training 1 and 2, respectively, while the Spaced group displays a marked decrease. This post-training performance decrease for the Spaced group is in stark contrast to the monotonic performance increases observed for both groups at all other time-points. One possible cause could be related to the structure of the Test intervals, which include 20 seconds of uninterrupted practice. For the Spaced group, this effectively is a switch to a Massed practice environment (i.e., two 10-secondlong practice trials merged into one long trial), which interferes with greater Training 1 interval gains observed for the Space group. Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (Figure 1E) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      In summary, the experimental design and analyses used by Das et al does not contradict the view that early skill learning is expressed as micro-offline gains during rest breaks. The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized (Bonstrup et al., 2019; Pan & Rickard, 2015). Extrapolation of this current framework to postplateau performance periods, longer timespans, or non-learning situations (e.g. – the Nonrepeating groups from Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found Figure 2B too small to be useful, as the actual elements of the cells are very hard to read.

      We have removed the grid colormap panel (top-right) from Figure 2B. All of this colormap data is actually a subset of data presented in Figure 2 – figure supplement 1, so can still be found there.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to the first point in my concerns, I would suggest the authors compare decoding accuracy between correct presses followed by correct vs. incorrect presses. This would clarify if the decoder is actually taking the MEG signal for subsequent press into account. I would also suggest the authors use pre-movement MEG features and post-movement features with shorter windows and compare each result with the results for the original post-movement MEG feature with a longer window.

      The present study does not contain enough errors to perform the analysis proposed by the Reviewer. As noted above, we did re-examine our data and now report a new control regression analysis, all of which indicate that the proximity between keypresses does not explain contextualization effects.

      (2) I was several times confused by the author's use of "neural representation of an action" or "sequence action representations" in understanding whether these terms refer to representation on the level of whole-brain, region (as defined by the specific parcellation used), or voxels. In fact, what is submitted to the decoder is some complicated whole-brain MEG feature (i.e., the "neural representation"), which is a hybrid of voxel and parcel features that is further dimension-reduced and not immediately interpretable. Clarifying this point early in the text and possibly using some more sensible terms, such as adding "brain-wise" before the "sequence action representation", would be the most helpful for the readers.

      We now clarified this terminology in the revised manuscript.

      (3) Although comparing many different ways in feature selection/reduction, time window selection, and decoder types is undoubtedly a meticulous work, the current version of the manuscript seems still lacking some explanation about the details of these methodological choices, like which decoding method was actually used to report the accuracy, whether or not different decoding methods were chosen for individual participants' data, how training data was selected (is it all of the correct presses in Day 1 data?), whether the frequency power or signal amplitude was used, and so on. I would highly appreciate these additional details in the Methods section.

      The reported accuracies were based on linear discriminant analysis classifier. A comparison of different decoders (Figure 3 – figure supplement 4) shows LDA was the optimal choice.

      Whether or not different decoding methods were chosen for individual participants' data

      We selected the same decoder (LDA) performance to report the final accuracy.

      How training data was selected (is it all of the correct presses in Day 1 data?),

      Decoder training was conducted as a randomized split of the data (all correct keypresses of Day 1) into training (90%) and test (10%) samples for 8 iterations.

      Whether the frequency power or signal amplitude was used

      Signal amplitude was used for feature calculation.

      (4) In terms of the Methods, please consider adding some references about the 'F1 score', the 'feature importance score,' and the 'MRMR-based feature ranking,' as the main readers of the current paper would not be from the machine learning community. Also, why did the LDA dimensionality reduction reduce accuracy specifically for the voxel feature?

      We have now added the following statements to the Methods section that provide more detailed descriptions and references for these metrics:

      “The F1 score, defined as the harmonic mean of the precision (percentage of true predictions that are actually true positive) and recall (percentage of true positives that were correctly predicted as true) scores, was used as a comprehensive metric for all one-versus-all keypress state decoders to assess class-wise performance that accounts for both false-positive and false-negative prediction tendencies [REF]. A weighted mean F1 score was then computed across all classes to assess the overall prediction performance of the multi-class model.”

      and

      “Feature Importance Scores

      The relative contribution of source-space voxels and parcels to decoding performance (i.e. – feature importance score) was calculated using minimum redundant maximum relevance (MRMR) and highlighted in topography plots. MRMR, an approach that combines both relevance and redundancy metrics, ranked individual features based upon their significance to the target variable (i.e. – keypress state identity) prediction accuracy and their non-redundancy with other features.”

      As stated in the Reviewer responses above, the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. – 3 dimensions for 4-class keypress decoding). It is likely that the reduction in accuracy observed only for the voxel-space feature was due to the loss of relevant information during the mapping process that resulted in reduced accuracy. This reduction in accuracy for voxel-space decoding was specific to LDA. Figure 3—figure supplement 3 shows that voxel-space decoder performance actually improved when utilizing alternative dimensionality reduction techniques.

      (5) Paragraph 9, lines #139-142: "Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations."

      This does not seem to be a fair comparison, as the index finger appears twice as many as the other fingers do in the sequence. To claim this, proper statistical analysis needs to be done taking this difference into account.

      We thank the Reviewer for bringing this issue to our attention. We have now corrected this comparison to evaluate relative false negative and false positive rates between individual keypress state decoders, and have revised this statement in the manuscript as follows:

      “Notably, decoding of index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest false negative (0.116 per keypress) and false positive (0.043 per keypress) misclassification rates compared with all other digits (false negative rate range = [0.067 0.114]; false positive rate range = [0.020 0.037]; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed within different contexts (i.e. - different learning states or sequence locations).”

      (6) Finally, the authors could consider acknowledging in the Discussion that the contribution of micro-offline learning to genuine skill learning is still under debate (e.g., Gupta and Rickard, 2023; 2024; Das et al., bioRxiv, 2024).

      We have added a paragraph in the Discussion that addresses this point.

      Reviewer #3 (Recommendations for the authors):

      In addition to the additional analyses suggested in the public review, I have the following suggestions/questions:

      (1) Given that the authors introduce a new decoding approach, it would be very helpful for readers to see a distribution of window sizes and window onsets eventually used across individuals, at least for the optimized decoder.

      We have now included a new supplemental figure (Figure 4 – figure Supplement 2) that provides this information.

      (2) Please explain in detail how you arrived at the (interpolated?) group-level plot shown in Figure 1B, starting from the discrete single-trial keypress transition times. Also, please specify what the shading shows.

      Instantaneous correct sequence speed (skill measure) was quantified as the inverse of time (in seconds) required to complete a single iteration of a correctly generated full 5-item sequence. Individual keypress responses were labeled as members of correct sequences if they occurred within a 5-item response pattern matching any possible circular shifts of the 5-item sequence displayed on the monitor (41324). This approach allowed us to quantify a measure of skill within each practice trial at the resolution of individual keypresses. The dark line indicates the group mean performance dynamics for each trial. The shaded region indicates the 95% confidence limit of the mean (see Methods).

      (3) Similarly, please explain how you arrived at the group-level plot shown in Figure 1C. What are the different colored lines (rows) within each trial? How exactly did the authors reach the conclusion that KTT variability stabilizes by trial 6?

      Figure 1C provides additional information to the correct sequence speed measure above, as it also tracks individual transition speed composition over learning. Figure 1C, thus, represents both changes in overall correct sequence speed dynamics (indicated by the overall narrowing of the horizontal speed lines moving from top to bottom) and the underlying composition of the individual transition patterns within and across trials. The coloring of the lines is a shading convention used to discriminate between different keypress transitions. These curves were sampled with 1ms resolution, as in Figure 1B. Addressing the underlying keypress transition patterns requires within-subject normalization before averaging across subjects. The distribution of KTTs was normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning.

      (4) Maybe I missed it, but it was not clear to me which of the tested classifiers was eventually used. Or was that individualized as well? More generally, a comparison of the different classifiers would be helpful, similar to the comparison of dimension reduction techniques.

      We have now included a new supplemental figure that provides this information.

      (5) Please add df and effect sizes to all statistics.

      Done.

      (6) Please explain in more detail your power calculation.

      The study was powered to determine the minimum sample size needed to detect a significant change in skill performance following training using a one-sample t-test (two-sided; alpha = 0.05; 95% statistical power; Cohen’s D effect size = 0.8115 calculated from previously acquired data in our lab). The calculated minimum sample size was 22. The included study sample size (n = 27) exceeded this minimum.

      This information is now included in the revised manuscript.

      (7) The cut-off for the high-pass filter is unusually high and seems risky in terms of potential signal distortions (de Cheveigne, Neuron 2019). Why did the authors choose such a high cut-off?

      The 1Hz high-pass cut-off frequency for the 1-150Hz band-pass filter applied to the continuous raw MEG data during preprocessing has been used in multiple previous MEG publications (Barratt et al., 2018; Brookes et al., 2012; Higgins et al., 2021; Seedat et al., 2020; Vidaurre et al., 2018).

      (8) "Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not", lines 336/337 - where is that analysis?

      Additional details pertaining to this analysis are now provided in the Results section (Figure 5 – figure supplement 4).

      (9) How were feature importance scores computed?

      We have now added a new subheading in the Methods section with a more detailed description of how feature importance scores were computed.

      (10)  Please add x and y ticks plus tick labels to Figure 5 - Figure Supplement 3, panel A

      Done

      (11) Line 369, what does "comparable" mean in this context?

      The sentence in the “Study Participants” part of the Methods section referred to here has now been revised for clarity.

      (12) In lines 496/497, please specify what t=0 means (KeyDown event, I guess?).

      Yes, the KeyDown event occurs at t = 0. This has now been clarified in the revised manuscript.

      (13) Please specify consistent boundaries between alpha- and beta-bands (they are currently not consistent in the Results vs. Methods (14/15 Hz or 15/16 Hz)).

      We thank the Reviewer for alerting us to this discrepancy caused by a typographic error in the Methods. We have now corrected this so that the alpha (8-14 Hz) and beta-band (15-24 Hz) frequency limits are described consistently throughout the revised manuscript.

      References

      Albouy, G., Fogel, S., King, B. R., Laventure, S., Benali, H., Karni, A., Carrier, J., Robertson, E. M., & Doyon, J. (2015). Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage, 108, 423-434. https://doi.org/10.1016/j.neuroimage.2014.12.049

      Albouy, G., King, B. R., Maquet, P., & Doyon, J. (2013). Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus, 23(11), 985-1004. https://doi.org/10.1002/hipo.22183 Albouy, G., Sterpenich, V., Vandewalle, G., Darsaud, A., Gais, S., Rauchs, G., Desseilles, M., Boly, M., Dang-Vu, T., Balteau, E., Degueldre, C., Phillips, C., Luxen, A., & Maquet, P. (2012). Neural correlates of performance variability during motor sequence acquisition. NeuroImage, 60(1), 324-331. https://doi.org/10.1016/j.neuroimage.2011.12.049

      Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. Annu Rev Neurosci, 25, 189-220. https://doi.org/10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences. Curr Opin Neurobiol, 16(2), 213-221. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=16563734

      Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W., & Donoghue, J. P. (2011). Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol, 105(4), 1603-1619. https://doi.org/10.1152/jn.00532.2010

      Barratt, E. L., Francis, S. T., Morris, P. G., & Brookes, M. J. (2018). Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage, 181, 831-844. https://doi.org/10.1016/j.neuroimage.2018.06.041

      Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T. (2011). Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A, 108(18), 7641-7646. https://doi.org/10.1073/pnas.1018985108

      Battaglia-Mayer, A., & Caminiti, R. (2019). Corticocortical Systems Underlying High-Order Motor Control. J Neurosci, 39(23), 4404-4421. https://doi.org/10.1523/JNEUROSCI.2094-18.2019

      Berlot, E., Popp, N. J., & Diedrichsen, J. (2020). A critical re-evaluation of fMRI signatures of motor sequence learning. Elife, 9. https://doi.org/10.7554/eLife.55241

      Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N., & Cohen, L. G. (2020). Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn, 5, 7. https://doi.org/10.1038/s41539-020-0066-9

      Bonstrup, M., Iturrate, I., Thompson, R., Cruciani, G., Censor, N., & Cohen, L. G. (2019). A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol, 29(8), 1346-1351 e1344. https://doi.org/10.1016/j.cub.2019.02.049

      Brawn, T. P., Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2010). Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci, 30(42), 13977-13982. https://doi.org/10.1523/JNEUROSCI.3295-10.2010

      Brookes, M. J., Woolrich, M. W., & Barnes, G. R. (2012). Measuring functional connectivity in MEG: a multivariate approach insensitive to linear source leakage. NeuroImage, 63(2), 910-920. https://doi.org/10.1016/j.neuroimage.2012.03.048

      Brooks, E., Wallis, S., Hendrikse, J., & Coxon, J. (2024). Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn, 9(1), 23. https://doi.org/10.1038/s41539-024-00238-6

      Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M., & Cohen, L. G. (2021). Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep, 35(10), 109193. https://doi.org/10.1016/j.celrep.2021.109193

      Buneo, C. A., & Andersen, R. A. (2006). The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44(13), 2594-2606. https://doi.org/10.1016/j.neuropsychologia.2005.10.011

      Buzsaki, G. (2015). Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. https://doi.org/10.1002/hipo.22488

      Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H., & Staresina, B. P. (2024). Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680. https://doi.org/10.1101/2024.10.06.614680

      Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., & Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature, 487(7405), 51-56. https://doi.org/10.1038/nature11129

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol, 79(2), 1117-1123. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=9463469

      Colclough, G. L., Brookes, M. J., Smith, S. M., & Woolrich, M. W. (2015). A symmetric multivariate leakage correction for MEG connectomes. NeuroImage, 117, 439-448. https://doi.org/10.1016/j.neuroimage.2015.03.071

      Colclough, G. L., Woolrich, M. W., Tewarie, P. K., Brookes, M. J., Quinn, A. J., & Smith, S. M. (2016). How reliable are MEG resting-state connectivity metrics? NeuroImage, 138, 284-293. https://doi.org/10.1016/j.neuroimage.2016.05.070

      Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P., & Azanon, E. (2024). “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795. https://doi.org/10.1101/2024.07.11.602795

      Deleglise, A., Donnelly-Kehoe, P. A., Yeffal, A., Jacobacci, F., Jovicich, J., Amaro, E., Jr., Armony, J. L., Doyon, J., & Della-Maggiore, V. (2023). Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex, 33(10), 6120-6131. https://doi.org/10.1093/cercor/bhac489

      Doyon, J., Bellec, P., Amsel, R., Penhune, V., Monchi, O., Carrier, J., Lehéricy, S., & Benali, H. (2009). Contributions of the basal ganglia and functionally related brain structures to motor learning. [Review]. Behavioural brain research, 199(1), 61-75. https://doi.org/10.1016/j.bbr.2008.11.012

      Doyon, J., Song, A. W., Karni, A., Lalonde, F., Adams, M. M., & Ungerleider, L. G. (2002). Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A, 99(2), 1017-1022. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11805340

      Euston, D. R., Gruber, A. J., & McNaughton, B. L. (2012). The role of medial prefrontal cortex in memory and decision making. Neuron, 76(6), 1057-1070. https://doi.org/10.1016/j.neuron.2012.12.002

      Euston, D. R., Tatsuno, M., & McNaughton, B. L. (2007). Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science, 318(5853), 1147-1150. https://doi.org/10.1126/science.1148979

      Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E., & Slutzky, M. W. (2012). Local field potentials allow accurate decoding of muscle activity. J Neurophysiol, 108(1), 18-24. https://doi.org/10.1152/jn.00832.2011

      Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nat Rev Neurosci, 6(2), 119-130. https://doi.org/10.1038/nrn1607

      Gais, S., Albouy, G., Boly, M., Dang-Vu, T. T., Darsaud, A., Desseilles, M., Rauchs, G., Schabus, M., Sterpenich, V., Vandewalle, G., Maquet, P., & Peigneux, P. (2007). Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A, 104(47), 1877818783. https://doi.org/10.1073/pnas.0705454104

      Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E. (1992). Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci, 12(7), 2542-2548.

      Grover, S., Wen, W., Viswanathan, V., Gill, C. T., & Reinhart, R. M. G. (2022). Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci, 25(9), 1237-1246. https://doi.org/10.1038/s41593-022-01132-3

      Gupta, M. W., & Rickard, T. C. (2022). Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn, 7(1), 25. https://doi.org/10.1038/s41539-022-00140-z

      Gupta, M. W., & Rickard, T. C. (2024). Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 14(1), 4661. https://doi.org/10.1038/s41598-024-52726-9

      Hardwick, R. M., Rottschy, C., Miall, R. C., & Eickhoff, S. B. (2013). A quantitative metaanalysis and review of motor learning in the human brain. NeuroImage, 67, 283-297. https://doi.org/10.1016/j.neuroimage.2012.11.020

      Heusser, A. C., Poeppel, D., Ezzyat, Y., & Davachi, L. (2016). Episodic sequence memory is supported by a theta-gamma phase code. Nat Neurosci, 19(10), 1374-1380. https://doi.org/10.1038/nn.4374

      Higgins, C., Liu, Y., Vidaurre, D., Kurth-Nelson, Z., Dolan, R., Behrens, T., & Woolrich, M. (2021). Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron, 109(5), 882-893 e887. https://doi.org/10.1016/j.neuron.2020.12.007

      Hikosaka, O., Nakamura, K., Sakai, K., & Nakahara, H. (2002). Central mechanisms of motor skill learning. Curr Opin Neurobiol, 12(2), 217-222. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=12015240

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro, E., Jr., Jovicich, J., Doyon, J., & Della-Maggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A, 117(38), 23898-23903. https://doi.org/10.1073/pnas.2009576117

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro Jr, E., Jovicich, J., Doyon, J., & DellaMaggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning.

      Proceedings of the National Academy of Sciences, 117(38), 23898-23903. Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995). Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature, 377(6545), 155-158. https://doi.org/10.1038/377155a0

      Kennerley, S. W., Sakai, K., & Rushworth, M. F. (2004). Organization of action sequences and the role of the pre-SMA. J Neurophysiol, 91(2), 978-993. https://doi.org/10.1152/jn.00651.2003 00651.2003 [pii]

      Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol, 80, 3321-3325.

      Kornysheva, K., Bush, D., Meyer, S. S., Sadnicka, A., Barnes, G., & Burgess, N. (2019). Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron, 101(6), 1166-1180 e1163. https://doi.org/10.1016/j.neuron.2019.01.018

      Lee, S. H., Jin, S. H., & An, J. (2019). The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 9(1), 14066. https://doi.org/10.1038/s41598-019-50644-9

      Lisman, J. E., & Jensen, O. (2013). The theta-gamma neural code. Neuron, 77(6), 1002-1016. https://doi.org/10.1016/j.neuron.2013.03.007

      Mollazadeh, M., Aggarwal, V., Davidson, A. G., Law, A. J., Thakor, N. V., & Schieber, M. H. (2011). Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci, 31(43), 15531-15543. https://doi.org/10.1523/JNEUROSCI.2999-11.2011

      Molle, M., & Born, J. (2009). Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron, 61(4), 496-498. https://doi.org/10.1016/j.neuron.2009.02.002

      Morris, R. G. M. (2006). Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. [Review]. The European journal of neuroscience, 23(11), 2829-2846. https://doi.org/10.1111/j.1460-9568.2006.04888.x

      Mylonas, D., Schapiro, A. C., Verfaellie, M., Baxter, B., Vangel, M., Stickgold, R., & Manoach, D. S. (2024). Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci, 44(14). https://doi.org/10.1523/JNEUROSCI.1839-23.2024

      Pan, S. C., & Rickard, T. C. (2015). Sleep and motor learning: Is there room for consolidation? Psychol Bull, 141(4), 812-834. https://doi.org/10.1037/bul0000009

      Penhune, V. B., & Steele, C. J. (2012). Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res., 226(2), 579-591. https://doi.org/10.1016/j.bbr.2011.09.044

      Qin, Y. L., McNaughton, B. L., Skaggs, W. E., & Barnes, C. A. (1997). Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci, 352(1360), 1525-1533. https://doi.org/10.1098/rstb.1997.0139

      Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn, 34(4), 834-842. https://doi.org/10.1037/0278-7393.34.4.834

      Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedural consolidation. Nat Rev Neurosci, 5(7), 576-582. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=15208699

      Sawamura, D., Sakuraba, S., Suzuki, Y., Asano, M., Yoshida, S., Honke, T., Kimura, M., Iwase, Y., Horimoto, Y., Yoshida, K., & Sakai, S. (2019). Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep, 9(1), 20397. https://doi.org/10.1038/s41598-019-56956-0

      Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37(6), 1013-1025. https://doi.org/10.1016/s0896-6273(03)00123-5

      Seedat, Z. A., Quinn, A. J., Vidaurre, D., Liuzzi, L., Gascoyne, L. E., Hunt, B. A. E., O'Neill, G. C., Pakenham, D. O., Mullinger, K. J., Morris, P. G., Woolrich, M. W., & Brookes, M. J. (2020). The role of transient spectral 'bursts' in functional connectivity: A magnetoencephalography study. NeuroImage, 209, 116537. https://doi.org/10.1016/j.neuroimage.2020.116537

      Shadmehr, R., & Holcomb, H. H. (1997). Neural correlates of motor memory consolidation. Science, 277, 821-824.

      Sjøgård, M., Baxter, B., Mylonas, D., Driscoll, B., Kwok, K., Tolosa, A., Thompson, M., Stickgold, R., Vangel, M., Chu, C., & Manoach, D. S. (2024). Hippocampal ripples mediate motor learning during brief rest breaks in humans. bioRxiv. https://doi.org/10.1101/2024.05.02.592200

      Srinivas, S., Sarvadevabhatla, R. K., Mopuri, K. R., Prabhu, N., Kruthiventi, S. S. S., & Babu, R. V. (2016). A Taxonomy of Deep Convolutional Neural Nets for Computer Vision [Technology Report]. Frontiers in Robotics and AI, 2. https://doi.org/10.3389/frobt.2015.00036

      Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Desseilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P. (2009). Sleep promotes the neural reorganization of remote emotional memory. J Neurosci, 29(16), 5143-5152. https://doi.org/10.1523/JNEUROSCI.0561-09.2009

      Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage, 14(5), 10481057. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11697936

      Toni, I., Thoenissen, D., & Zilles, K. (2001). Movement preparation and motor intention. NeuroImage, 14(1 Pt 2), S110-117. https://doi.org/10.1006/nimg.2001.0841

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      van Kesteren, M. T., Fernandez, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schemadependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A, 107(16), 7550-7555. https://doi.org/10.1073/pnas.0914892107

      van Kesteren, M. T., Ruiter, D. J., Fernandez, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends Neurosci, 35(4), 211-219. https://doi.org/10.1016/j.tins.2012.02.001

      Vidaurre, D., Hunt, L. T., Quinn, A. J., Hunt, B. A. E., Brookes, M. J., Nobre, A. C., & Woolrich, M. W. (2018). Spontaneous cortical activity transiently organises into frequency specific phase-coupling networks. Nat Commun, 9(1), 2987. https://doi.org/10.1038/s41467-01805316-z

      Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B. R., & Buckner, R. L. (1998). Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. [Comment]. Science (New York, N.Y.), 281(5380), 1188-1191. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=9712582 &retmode=ref&cmd=prlinks

      Wolpert, D. M., Goodbody, S. J., & Husain, M. (1998). Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci, 1(6), 529-533. https://doi.org/10.1038/2245

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors performed experimental evolution of MreB mutants that have a slow-growing round phenotype and studied the subsequent evolutionary trajectory using analysis tools from molecular biology. It was remarkable and interesting that they found that the original phenotype was not restored (most common in these studies) but that the round phenotype was maintained. 

      Strengths: 

      The finding that the round phenotype was maintained during evolution rather than that the original phenotype, rod-shaped cells, was recovered is interesting. The paper extensively investigates what happens during adaptation with various different techniques. Also, the extensive discussion of the findings at the end of the paper is well thought through and insighXul. 

      Weaknesses: 

      I find there are three general weaknesses: 

      (1) Although the paper states in the abstract that it emphasizes "new knowledge to be gained" it remains unclear what this concretely is. On page 4 they state 3 three research questions, these could be more extensively discussed in the abstract. Also, these questions read more like genetics questions while the paper is a lot about cell biological findings. 

      Thank you for drawing attention to the unnecessary and gratuitous nature of the last sentence of the Abstract. We are in agreement. It has been modified, and we have taken  advantage of additional word space to draw attention to the importance of the two competing (testable) hypotheses laid out in the Discussion. 

      As to new knowledge, please see the Results and particularly the Discussion. But beyond this, and as recognised by others, there is real value for cell biology in seeing how (and whether) selection can compensate for effects that are deleterious to fitness. The results will very o_en depart from those delivered from, for example, suppressor analyses, or bottom up engineering. 

      In the work recounted in our paper, we chose to focus – by way of proof-of principle – on the most commonly observed mutations, namely, those within pbp1A.  But beyond this gene, we detected mutations  in other components of the cell shape / division machinery whose connections are not yet understood and which are the focus of on-going investigation.  

      As to the three questions posed at the end of the Introduction, the first concerns whether selection can compensate for deleterious effects of deleting mreB (a question that pertains to evolutionary aspects); the second seeks understanding of genetic factors; the third aims to shed light on the genotype-to-phenotype map (which is where the cell biology comes into play).  Given space restrictions, we cannot see how we could usefully expand, let alone discuss, the three questions raised at the end of the Introduction in restrictive space available in the Abstract.   

      (2) It is not clear to me from the text what we already know about the restoration of MreB loss from suppressors studies (in the literature). Are there suppressor screens in the literature and which part of the findings is consistent with suppressor screens and which parts are new knowledge?  

      As stated in the Introduction, a previous study with B. subtilis (which harbours three MreB isoforms and where the isoform named “MreB” is essential for growth under normal conditions), suppressors of MreB lethality were found to occur in ponA, a class A penicillin binding protein (Kawai et al., 2009). This led to recognition that MreB plays a role in recruiting Pbp1A to the lateral cell wall. On the other hand, Patel et al. (2020) have shown that deletion of classA PBPs leads to an up-regulation of rod complex activity. Although there is a connection between rod complex and class A PBPs, a further study has shown that the two systems work semi-autonomously (Cho et al., 2016). 

      Our work confirms a connection between MreB and Pbp1A, and has shed new light on how this interaction is established by means of natural selection, which targets the integrity of cell wall. Indeed, the Rod complex and class A PBPs have complementary activities in the building of the cell wall with each of the two systems able to compensate for the other in order to maintain cell wall integrity. Please see the major part of the Discussion. In terms of specifics, the connection between mreB and pbp1A (shown by Kawai et al (2009)) is indirect because it is based on extragenic transposon insertions. In our study, the genetic connection is mechanistically demonstrated.  In addition, we capture that the evolutionary dynamics is rapid and we finally enriched understanding of the genotype-to-phenotype map.

      (3) The clarity of the figures, captions, and data quantification need to be improved.  

      Modifications have been implemented. Please see responses to specific queries listed below.

      Reviewer #2 (Public Review): 

      Yulo et al. show that deletion of MreB causes reduced fitness in P. fluorescens SBW25 and that this reduction in fitness may be primarily caused by alterations in cell volume. To understand the effect of cell volume on proliferation, they performed an evolution experiment through which they predominantly obtained mutations in pbp1A that decreased cell volume and increased viability. Furthermore, they provide evidence to propose that the pbp1A mutants may have decreased PG cross-linking which might have helped in restoring the fitness by rectifying the disorganised PG synthesis caused by the absence of MreB. Overall this is an interesting study. 

      Queries: 

      Do the small cells of mreB null background indeed have have no DNA? It is not apparent from the DAPI images presented in Supplementary Figure 17. A more detailed analysis will help to support this claim. 

      It is entirely possible that small cells have no DNA, because if cell division is aberrant then division can occur prior to DNA segregation resulting in cells with no DNA. It is clear from microscopic observation that both small and large cells do not divide. It is, however, true, that we are unable to state – given our measures of DNA content – that small cells have no DNA. We have made this clear on page 13, paragraph 2.

      What happens to viability and cell morphology when pbp1A is removed in the mreB null background? If it is actually a decrease in pbp1A activity that leads to the rescue, then pbp1A- mreB- cells should have better viability, reduced cell volume and organised PG synthesis. Especially as the PG cross-linking is almost at the same level as the T362 or D484 mutant.  

      Please see fitness data in Supp. Fig. 13. Fitness of ∆mreBpbp1A is no different to that caused by a point mutation. Cells remain round.  

      What is the status of PG cross-linking in ΔmreB Δpflu4921-4925 (Line 7)? 

      This was not analysed as the focus of this experiment was PBPs. A priori, there is no obvious reason to suspect that ∆4921-25 (which lacks oprD) would be affected in PBP activity.

      What is the morphology of the cells in Line 2 and Line 5? It may be interesting to see if PG cross-linking and cell wall synthesis is also altered in the cells from these lines. 

      The focus of investigation was restricted to L1, L4 and L7. Indeed, it would be interesting to look at the mutants harbouring mutations in :sZ, but this is beyond scope of the present investigation (but is on-going). The morphology of L2 and L5 are shown in Supp. Fig. 9.

      The data presented in 4B should be quantified with appropriate input controls. 

      Band intensity has now been quantified (see new Supp. Fig .20). The controls are SBW25, SBW25∆pbp1A, SBW25 ∆mreB and SBW25 ∆mreBpbp1A as explained in the paper.

      What are the statistical analyses used in 4A and what is the significance value? 

      Our oversight. These were reported in Supp. Fig. 19, but should also have been presented in Fig. 4A. Data are means of three biological replicates. The statistical tests are comparisons between each mutant and SBW25, and assessed by paired t-tests.  

      A more rigorous statistical analysis indicating the number of replicates should be done throughout. 

      We have checked and made additions where necessary and where previously lacking. In particular, details are provided in Fig. 1E, Fig. 4A and Fig. 4B. For Fig. 4C we have produced quantitative measures of heterogeneity in new cell wall insertion. These are reported in Supp. Fig. 21 (and referred to in the text and figure caption) and show that patterns of cell wall insertion in ∆mreB are highly heterogeneous.

      Reviewer #3 (Public Review): 

      This paper addresses an understudied problem in microbiology: the evolution of bacterial cell shape. Bacterial cells can take a range of forms, among the most common being rods and spheres. The consensus view is that rods are the ancestral form and spheres the derived form. The molecular machinery governing these different shapes is fairly well understood but the evolutionary drivers responsible for the transition between rods and spheres are not. Enter Yulo et al.'s work. The authors start by noting that deletion of a highly conserved gene called MreB in the Gram-negative bacterium Pseudomonas fluorescens reduces fitness but does not kill the cell (as happens in other species like E. coli and B. subtilis) and causes cells to become spherical rather than their normal rod shape. They then ask whether evolution for 1000 generations restores the rod shape of these cells when propagated in a rich, benign medium. 

      The answer is no. The evolved lineages recovered fitness by the end of the experiment, growing just as well as the unevolved rod-shaped ancestor, but remained spherical. The authors provide an impressively detailed investigation of the genetic and molecular changes that evolved. Their leading results are: 

      (1) The loss of fitness associated with MreB deletion causes high variation in cell volume among sibling cells a_er cell division. 

      (2) Fitness recovery is largely driven by a single, loss-of-function point mutation that evolves within the first ~250 generations that reduces the variability in cell volume among siblings. 

      (3) The main route to restoring fitness and reducing variability involves loss of function mutations causing a reduction of TPase and peptidoglycan cross-linking, leading to a disorganized cell wall architecture characteristic of spherical cells. 

      The inferences made in this paper are on the whole well supported by the data. The authors provide a uniquely comprehensive account of how a key genetic change leads to gains in fitness and the spectrum of phenotypes that are impacted and provide insight into the molecular mechanisms underlying models of cell shape. 

      Suggested improvements and clarifications include: 

      (1) A schematic of the molecular interactions governing cell wall formation could be useful in the introduction to help orient readers less familiar with the current state of knowledge and key molecular players. 

      We understand that this would be desirable, but there are numerous recent reviews with detailed schematics that we think the interested reader would be better consulting. These are referenced in the text.

      (2) More detail on the bioinformatics approaches to assembling genomes and identifying the key compensatory mutations are needed, particularly in the methods section. This whole subject remains something of an art, with many different tools used. Specifying these tools, and the parameter sesngs used, will improve transparency and reproducibility, should it be needed. 

      We overlooked providing this detail, which has now been corrected by provision of more information in the Materials and Methods. In short we used Breseq, the clonal option, with default parameters. Additional analyses were conducted using Genieous. The BreSeq output files are provided https://doi.org/10.17617/3.CU5SX1 (which include all read data).

      (3) Corrections for multiple comparisons should be used and reported whenever more than one construct or strain is compared to the common ancestor, as in Supplementary Figure 19A (relative PG density of different constructs versus the SBW25 ancestor). 

      The data presented in Supp Fig 19A (and Fig 4A) do not involve multiple comparisons. In each instance the comparison is between SBW25 and each of the different mutants. A paired t-test is thus appropriate.

      (4) The authors refrain from making strong claims about the nature of selection on cell shape, perhaps because their main interest is the molecular mechanisms responsible. However, I think more can be said on the evolutionary side, along two lines. First, they have good evidence that cell volume is a trait under strong stabilizing selection, with cells of intermediate volume having the highest fitness. This is notable because there are rather few examples of stabilizing selection where the underlying mechanisms responsible are so well characterized. Second, this paper succeeds in providing an explanation for how spherical cells can readily evolve from a rod-shaped ancestor but leaves open how rods evolved in the first place. Can the authors speculate as to how the complex, coordinated system leading to rods first evolved? Or why not all cells have lost rod shape and become spherical, if it is so easy to achieve? These are important evolutionary questions that remain unaddressed. The manuscript could be improved by at least flagging these as unanswered questions deserving of further attention. 

      These are interesting points, but our capacity to comment is entirely speculative. Nonetheless, we have added an additional paragraph to the Discussion that expresses an opinion that has yet to receive attention:

      “Given the complexity of the cell wall synthesis machinery that defines rod-shape in bacteria, it is hard to imagine how rods could have evolved prior to cocci. However, the cylindrical shape offers a number of advantages. For a given biomass (or cell volume), shape determines surface area of the cell envelope, which is the smallest surface area associated with the spherical shape. As shape sets the surface/volume ratio, it also determines the ratio between supply (proportional to the surface) and demand (proportional to cell volume). From this point of view, it is more efficient to be cylindrical (Young 2006). This also holds for surface attachment and biofilm formation (Young 2006). But above all, for growing cells, the ratio between supply and demand is constant in rod shaped bacteria, whereas it decreases for cocci. This requires that spherical cells evolve complex regulatory networks capable of maintaining the correct concentration of cellular proteins despite changes in surface/volume ratio. From this point of view, rod-shaped bacteria offer opportunities to develop unsophisticated regulatory networks.”

      why not all cells have lost rod shape and become spherical.

      Please see Kevin Young’s 2006 review on the adaptive significance of cell shape

      The value of this paper stems both from the insight it provides on the underlying molecular model for cell shape and from what it reveals about some key features of the evolutionary process. The paper, as it currently stands, provides more on which to chew for the molecular side than the evolutionary side. It provides valuable insights into the molecular architecture of how cells grow and what governs their shape. The evolutionary phenomena emphasized by the authors - the importance of loss-of-function mutations in driving rapid compensatory fitness gains and that multiple genetic and molecular routes to high fitness are o_en available, even in the relatively short time frame of a few hundred generations - are wellunderstood phenomena and so arguably of less broad interest. The more compelling evolutionary questions concern the nature and cause of stabilizing selection (in this case cell volume) and the evolution of complexity. The paper misses an opportunity to highlight the former and, while claiming to shed light on the latter, provides rather little useful insight. 

      Thank you for these thoughts and comments. However, we disagree that the experimental results are an overlooked opportunity to discuss stabilising selection. Stabilising selection occurs when selection favours a particular phenotype causing a reduction in underpinning population-level genetic diversity. This is not happening when selection acts on SBW25 ∆mreB leading to a restoration of fitness. Driving the response are biophysical factors, primarily the critical need to balance elongation rate with rate of septation. This occurs without any change in underlying genetic diversity.  

      Recommendations for the authors:  

      Reviewer 1 (Recommendations for the Authors): 

      Hereby my suggestion for improvement of the quantification of the data, the figures, and the text. 

      -  p 14, what is the unit of elongation rate?  

      At first mention we have made clear that the unit is given in minutes^-1

      -  p 14, please give an error bar for both p=0.85 and f=0.77, to be able to conclude they are different 

      Error on the probability p is estimated at the 95% confidence interval by the formula:1.96 , where N is the total number of cells. This has been added in the paragraph p »probability » of the Image Analysis section in the Material and Methods. 

      We also added errors on p measurement in the main text.

      -  p 14, all the % differences need an errorbar 

      The error bars and means are given in Fig 3C and 3D.

      -  Figure 1B adds units to compactness, and what does it represent? Is the cell size the estimated volume (that is mentioned in the caption)? Shouldn't the datapoints have error bars? 

      Compactness is defined in the “Image Analysis” section of the Material and Methods. It is a dimensionless parameter. The distribution of individual cell shapes / sizes are depicted in Fig 1B. Error does arise from segmentation, but the degree of variance (few pixels) is much smaller than the representations of individual cells shown.

      -  Figure 1C caption, are the 50.000 cells? 

      Correct. Figure caption has been altered.

      -  Figure 1D, first the elongation rate is described as a volume per minute, but now, looking at the units it is a rate, how is it normalized? 

      Elongation rate is explained in the Materials and Methods (see the image analysis section) and is not volume per minute. It is dV/dt = r*V (the unit of r is min^-1). Page 9 includes specific mention of the unit of r.

      -  Figure 1E, how many cells (n) per replicate? 

      Our apologies. We have corrected the figure caption that now reads:

      “Proportion of live cells in ancestral SBW25 (black bar) and ΔmreB (grey bar) based on LIVE/DEAD BacLight Bacterial Viability Kit protocol. Cells were pelleted at 2,000 x g for 2 minutes to preserve ΔmreB cell integrity. Error bars are means and standard deviation of three biological replicates (n>100).”

      -  Figure 1G, how does this compare to the wildtype 

      The volume for wild type SBW25 is 3.27µm^3 (within the “white zone”). This is mentioned in the text.

      -  Figure 2B, is this really volume, not size? And can you add microscopy images? 

      The x-axis is volume (see Materials and Methods, subsection image analysis). Images are available in Supp. Fig. 9.

      -  Figure 3A what does L1, L4 and L7 refer too? Is it correct that these same lines are picked for WT and delta_mreB 

      Thank you for pointing this out. This was an earlier nomenclature. It was shorthand for the mutants that are specified everywhere else by genotype and has now been corrected. 

      -  Figure 3c: either way write out p, so which probability, or you need a simple cartoon that is plotted. 

      The value p is the probability to proceed to the next generation and is explained in Materials and Methods  subsection image analysis.  We feel this is intuitive and does not require a cartoon. We nonetheless added a sentence to the Materials and Methods to aid clarity.

      -  Figure 4B can you add a ladder to the gel? 

      No ladder was included, but the controls provide all the necessary information. The band corresponding to PBP1A is defined by presence in SBW25, but absence in SBW25 ∆pbp1A.

      -  Figure 4c, can you improve the quantification of these images? How were these selected and how well do they represent the community? 

      We apologise for the lack of quantitative description for data presented in Fig 4C. This has now been corrected. In brief, we measured the intensity of fluorescent signal from between 10 and 14 cells and computed the mean and standard deviation of pixel intensity for each cell. To rule out possible artifacts associated with variation of the mean intensity, we calculated the ratio of the standard deviation divided by the square root of the mean. These data reveal heterogeneity in cell wall synthesis and provide strong statistical support for the claim that cell wall synthesis in ∆mreB is significantly more heterogeneous than the control. The data are provided in new Supp. Fig. 21. 

      Minor comments: 

      -  It would be interesting if the findings of this experimental evolution study could be related to comparative studies (if these have ever been executed).  

      Little is possible, but Hendrickson and Yulo published a portion of the originally posted preprint separately. We include a citation to that paper. 

      -  p 13, halfway through the page, the second paragraph lacks a conclusion, why do we care about DNA content? 

      It is a minor observation that was included by way of providing a complete description of cell phenotype.  

      -  p 17, "suggesting that ... loss-of-function", I do no not understand what this is based upon. 

      We show that the fitness of a pbp1A deletion is indistinguishable from the fitness of one of the pbp1A point mutants. This fact establishes that the point mutation had the same effects as a gene deletion thus supporting the claim that the point mutations identified during the course of the selection experiment decrease (or destroy) PBP1A function.

      -  p 25, at the top of the page: do you have a reference for the statement that a disorganized cell wall architecture is suited to the topology of spherical cells? 

      The statement is a conclusion that comes from our reasoning. It stems from the fact that it is impossible to entirely map the surface of a sphere with parallel strands.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Recommendation 1: The authors reasoned upon the presence of a differential basal hydraulic stress in waves' valleys vs hills at first from the observation of "domes" formation upon 48h cultivation. I suggest performing a quantification to support the statement as a good scientific practice. Furthermore, it would strengthen the concept when the formation of domes was compared between the waves' dimensions as a different grade of cell extrusion was quantified. i.e., 50, 100, and 200 µm.

      Response 1: Upon seeing the phenomenon (Author response image 1 A), we performed a count for domes on the 100 µm and saw a significant effect. We refrained from including the results as it is the subject of ongoing research in our lab. In response to the reviewer’s suggestion, we have included a graph (Author response image 1 B) showing the increasing number of domes over 48 hours from three 100 µm wave samples.

      We have updated Figure 2A and B in the manuscript to include the new graph.

      Author response image 1.

      (A) shows dome (white arrows) over a 100 µm wave substrate. (B) is the number of accumulated domes in valley and hill regions, for 3 independent samples, over 48 hours.

      Recommendation 2: Using RICM microscopy to quantify the cell basal separation with the substrate and hydraulic stress is very clever. Nevertheless, I am in doubt if the different intensity reported for the hills vs valley (Fig. 2G and H) is a result of the signal reduction at deeper Z levels. Since there is no difference in extrusion and forces between valleys and hills in the 200 µm waves but only in 50µm and 100µm, I would add this to the quantification. I would expect no intensity difference from RICM for the 200 µm sample if this is not an artefact of imaging.

      Response 2: We performed additional experiments on blank wave substrates (both 100 and 200 µm) to ascertain the extent of reflection intensity drop (Author response image 2A). And, as correctly pointed out by Reviewer #1, there was a drop in intensity even without cells. On the 100 µm waves, hill reflections are on average ~27 % dimmer than valley reflections. Whereas, on the 200 µm waves, hill reflections are on average ~39 % dimmer.

      Using this information, we performed a calibration on the RICM results obtained from both the 100 and 200 µm waves (Author response image 3B). The calibrated 100 µm data showed residual signatures of difference, whereas the calibrated 200 µm distributions appeared very similar. We noticed large cross- sample variations in the registered intensities, which will negatively impact effect size if not accounted for. To do this, we subsequently normalized both hill and valley intensities against planar region intensities for each sample. As shown by the final output (Author response image 3C), we were able to remove the skewness in the distributions. Moreover, 1-way ANOVA followed by a post hoc analysis with BH correction revealed a significant reduction in 100 µm hill/flat intensity ratio compared to 100 µm valley/flat intensity ratios (Δ~-23 %). Conversely, no significance was observed for the same comparison on the 200 µm waves.

      Author response image 2.

      (A). RICM from blank wave samples reveal a reduction in reflection intensity in hill regions compared to flat and valley regions.

      Author response image 3.

      (B) shows the RICM intensities after adjusting for the inherent reflection intensity drop shown in (A). (C) show the RICM intensities after normalization against planar region signals; this removes cross-sample variations and improve effect size of differences.

      We have updated the manuscript Figure 2I and text accordingly. The blank wave results are included in Figure 2-figure supplement 1 along with updated text and summary data table in Supplementary File 4.

      Recommendation 3: To measure 3D forces on top of the hills and valleys, the use of PAA gels is necessary. Since in Fig 3B, the authors show a difference in cell extrusion number between substrates and stiffnesses, I think it is necessary to confirm the presence of more extrusion in valleys vs hills on PAA gels. This would ensure the conclusion between normal forces and extrusion.

      Response 3: We do have time-lapse data with monolayers on the PAA waves. However, we felt results from the flat regions were sufficient in supporting the point being made in the text. Specifically, our original intention with PAA gels was to show that the extrusion reductions seen in osmotic perturbations were by virtue of removing basal stress and not some cryptic osmotic response. Hydrogels were chosen because they can effectively dilute basal solute concentration and thereby reduce the osmotically induced water transport. Moreover, as fluid could freely move within the gel, the fluid stress can quickly equilibrate across the basal surface. In contrast, poorly water/solute permeable substrates could lead to localized spikes in solute concentration and transient basal regions with high fluid stress.

      To get a sense of the potential difference in basal solute concentration between the two materials, we can do a quick hand-waving estimation. For monolayers on non-water/solute permeable PDMS of 20x20 mm and using the laser wavelength (640 nm) for RICM as an extreme estimate of basal separation, we should expect ~0.25 µl of total basal water content. On the other hand, we typically produce our PAM gel slabs using ~150 µl of precursor solutions. This means that, given similar amounts of solute, PAM gels will lead to monolayer basal osmolarity that is around 3 orders of magnitude lower than monolayers on PDMS, producing significantly lower osmotic potential. This implies from the outset that we should expect high survivability of cells on these substrates irrespective of curvature domains. Indeed, later immunoblotting experiments showed MDCKs exhibiting hyper activated FAK and Akt on PAM gels.

      In response to Reviewer #1’s suggestion then, we have added another supporting time-lapse (Video 19) showing typical response of MDCK monolayers on 100 µm PAA waves (Author response image 4). Evident from the time-lapses, like the planar regions, cell extrusions were very rare. This supports the idea that on PAM gels the effects of basal hydraulic stress and asymmetric forces are marginal against the strong survival signals. And the response is similar to hyper-osmotic perturbations; there, we did not see a significant difference between valley and hill extrusions.

      Author response image 4.

      Time-lapse snapshot showing negligible MDCK extrusions 24 hours after confluency over PAM gel wave substrates.

      Recommendation 4: Before proceeding with the FAK inhibitor experiment, the authors should better justify why the 4.1 wt % sucrose vs DMSO or NaCl is the most inert treatment. This can be done by citing relevant papers or showing time-lapses (as it is done for the higher FAKI14 dose).

      Response 4: Although some cells have recently been shown to be able to transport and utilize sucrose, mammalian cells generally cannot directly take up polysaccharides for metabolism and this is frequently mentioned in literature: see (Ref. R1) for example. Without special enzymes to break sucrose down into monosaccharides, such as sucrase found in the gut, the sugars should remain spectators in the culture medium, contributing only to osmotic effects.

      DMSO on the other hand, besides changing osmolarity, can also be integrated into cell membrane and pass through cells over time. It has been reported to chronically affect cell membrane properties and gene expressions (Ref. R2).

      Finally, it is well known that both sodium and chloride ions are readily taken up and transported by cells (Ref R3). They help to regulate the transmembrane potential, which in turn can affect membrane bound proteins and biochemical reactions within a cell.

      Hence, comparing the 3 hyper-osmotic perturbations, adding sucrose should have the least off- target effects on both the inhibitor study and the subsequent immunoblotting. And, in response to the reviewer’s recommendation, we have updated the text accordingly and included new references to support our statement.

      Ref R1. H. Meyer, O. Vitavska, H. Wieczorek; Identification of an animal sucrose transporter. Journal of Cell Science 124, 1984–1991 (2011). Doi: 10.1242/jcs.082024

      Ref R2. B. Gironi, Z. Kahveci, B. McGill, B.-D. Lechner, S. Pagliara, J. Metz, A. Morresi, F. Palombo, P. Sassi, P. G. Petrov; Effect of DMSO on the Mechanical and Structural Properties of Model and Biological Membranes. Biophysical Journal 119, 274-286 (2020). Doi: doi.org/10.1016/j.bpj.2020.05.037

      Ref R3. X. Zhang, H. Li; Interplay between the electrostatic membrane potential and conformational changes in membrane proteins. Protein Science 28, 502-512 (2019). Doi: 10.1002/pro.3563

      Recommendation 5: The data showing a FAK-dependent phosphorylation of AKT responsible for a higher cell survival rate in the hills is not yet completely convincing. Please show a reduced AKT phosphorylation level after FAK inhibition in high osmolarity levels. Furthermore, the levels of AKT activation seem to increase slightly upon substrate softening independently of FAK activation or osmotic pressure (i.e., Fig. 4E, Soft PDMS). The authors should comment on this in connection with the results shown for PAA gels.

      Response 5: For the additional immunoblotting experiments, work is currently underway. We could not, however, complete these experiments in time for this revision, as both Cheng-Kuang and Xianbin will shortly be taking on new jobs elsewhere. David will continue with the immunoblotting studies and should be able to include the results in an update in the coming months. As for the apparent elevated levels of AKT seen on soft silicones, we speculate that it is because we cannot immunoblot cells that have died and were inevitably washed out at the start of the procedure. Inferring from the higher extrusion rates on these soft substrates, we could be missing a significant portion of stats. Specifically, we are missing all the cells that would have lowered AKT activation but died, and had we been able to collect those statistics, perhaps both the FAK and AKT should have shown lower levels. We risk committing survival bias on the results if we read too much into the data as is.

      Alternatively, another explanation could be that, by virtue of survival of the fittest, we might have effectively selected a subpopulation of cells that were able to survive on lower FAK signals, or completely irrespectively of it.

      At any rate, to prove our foregoing hypothesis would require us to perform comprehensive immunoblotting and total transcriptome analysis over different duration conditions. Unfortunately, we do not have the time to do that for the current article, but it could be developed into a stand-alone molecular biology investigation in future. We have included similar discussion in the main text.

      Recommendation 6: In the discussion, the authors suggest the reported findings be especially relevant for epithelia that significantly separate compartments and regulate water and soluble transport. These are for example kidney epithelia (i.e., MDCK is the best experimental choice), retinal epithelium or intestinal epithelium. I would suggest that some proof-of-concept experiments could be done to support this concept. For example, I would expect keratinocytes (i.e., HaCaT) not to show a strong difference in extrusion rate between valleys and hills since the monolayer is not so sealed as kidney epithelium. In general, this kind of experiment would significantly strengthen the finding of this work.

      Response 6: As recommended, we tracked the behavior of retina pigment epithelial cells (hTERT RPE-1 from ATCC) which do not form tight monolayers like MDCKs (Ref. R4). We did not detect extrusion events occurring from monolayers of these cells (Author response image 5). This is true even for portions of monolayers over waved regions.

      Author response image 5.

      Time-lapse snapshot showing non-existent o cell extrusions from RPE monolayers confluent for over 21 hours.

      We have updated these findings in the main text discussions and included a new supporting time- lapse (Video 15) in our article.

      Ref R4 F. Liu, T. Xu, S. Peng, R. A. Adelman, L. I. Rizzolo; Claudins regulate gene and protein expression of the retinal pigment epithelium independent of their association with tight junctions. Experimental Eye Research 198, 108157 (2020). Doi: 10.1016/j.exer.2020.108157

      Recommendation 7 (minor point): Figure S1 needs to have clear notes indicating in each step what is what. i.e., where is glass, PDMS, NOA73, etc? A more detailed caption will help the figure's comprehension. Also "Cy52" should be changed to "soft silicone" to be consistent with the text (or Cy52 should be mentioned in the text).

      Response 7 (minor point): Changes were made to Figure 1-figure supplement 1 to improve comprehension accordingly. CY52 was added to the main-text, next to the first appearance of the word soft silicone, to be consistent with the figures.

      Recommendation 8 (minor point): The authors often mentioned that epithelial monolayers are denser on PAA gels. Please add a reference(s) to this statement.

      Response 8 (minor point): The statement is an inference from visually comparing monolayers on PAM gels and PDMS. The difference is quite evident (Author response image 6). The density difference is in spite of the fact that the substrates share similar starting cell numbers.

      To address the reviewer’s comment, we have combined time-lapses of monolayers on silicones and PAM gels side-by-side in Video 17 to facilitate convenient comparisons.

      Author response image 6.

      Time-lapse snapshot at 24 hours after confluence, showing conspicuously higher density of MDCK monolayers on PAM gel compared to those on silicon elastomer.

      Reviewer #2

      Recommendation 1: The sinusoidal wavy substrate that the authors use in their investigation is interesting and relevant, but it is important to realize that this is a single-curved surface (also known as a developable surface). This means that the Gaussian curvature is zero and that monolayers need to undergo (almost) no stretching to conform to the curvature. The authors should at least discuss other curved surfaces as an option for future research, and highlight how the observations might change. Convex and concave hemispherical surfaces, for example, might induce stronger differences than observed on the sinusoidal substrates, due to potentially higher vertical resultant forces that the monolayer would experience. The authors could discuss this geometry aspect more in their manuscript and potentially link it to some other papers exploring cell-curvature interactions in more complex environments (e.g. non-zero Gaussian curvature).

      Response 1: In response to reviewer #2’s recommendation we have highlighted in the discussion of our text that our waves constitute a developable surface and that cells will experience little stretching for the most part. Based on our knowledge of how curvature can modulate forces and thus osmotic effects, we included some rudimentary analysis of what one would expect on hemispherical surfaces of two types: one that is periodic and contiguous (Ref. R5), and another with delineating flat regions (Ref. R6).

      For epithelial monolayers in the first scenario, and on poorly solute/water permeable substrates, we should also expect to see a relatively higher likelihood of extrusions from concave regions compared to convex ones. Moreover, as the surfaces are now curved in both principal directions (producing larger out-of-plane forces), we should see the onset of differential extrusions seen in this study, but at larger length scales. For example, the effects seen on 100 µm hemicylindrical waves might now happen at larger feature size for hemispherical waves. Furthermore, as this kind of surface would invariably contain hyperbolic regions (saddle points), we might expect an intermediate response from these locations. If the forces in both principal directions offset each other, the extrusion response may parallel planar regions. On the other hand, if one dominates over the other, we may see extrusion responses tending to the dominating curvature (concave of convex).

      On the other hand, on curved landscapes with discrete convex or concave regions, we should expect, within the curved surface, extrusion behaviors paralleling findings in this study. What would be interesting would be to see what happens at the rims (or skirt regions) of the features. At these locations we effectively have hyperbolically curved surfaces, and like before, we should expect some sort of competing effect between the forces generated from the principal directions. So, for dome skirts, we should see fewer extrusions when the domes are small, and vice versa, when they are larger. Meanwhile, for pit rims, we should see a reversed behavior. It should also be noted that the transitioning curvature between convex/concave and planar regions would also modulate the effect.

      These effects might have interesting developmental implications. For instance, in developing pillar like tissues (e.g., villi) structures, the strong curvatures of nascent lumps would favor accumulation of cell numbers. However, once the size of the lumps reaches some critical value, epithelial cell extrusions might begin to appear at the roots of the developing structures, offsetting cell division, and eventually halting growth.

      Ref R5. L. Pieuchot, J. Marteau, A. Guignandon, T. Dos Santos, I. Brigaud, P. Chauvy, T. Cloatre, A. Ponche, T. Petithory, P. Rougerie, M. Vassaux, J. Milan, N. T. Wakhloo, A. Spangenberg, M. Bigerelle, K. Anselme, Curvotaxis directs cell migration through cell-scale curvature landscapes. Nature Communications 9, 3995 (2018). Doi: 10.1038/s41467-018-06494-6

      Ref R6. M. Werner, S. B.G. Blanquer, S. P. Haimi, G. Korus, J. W. C. Dunlop, G. N. Duda, D. W. Grijpma, A. Petersen, Surface curvature differentially regulates stem cell migration and differentiation via altered attachment morphology and nuclear deformation. Advanced Science 4, 1–11 (2017). Doi: 10.1002/advs.201600347

      Recommendation 2: The discussion of the experiments on PAM gels is rather limited. The authors describe that cells on the PAM gels experience fewer extrusions than on the PDMS substrates, but this is not discussed in sufficient detail (e.g. why is this the case). Additionally, the description of the 3D traction force microscopy and its validation is quite limited and should be extended to provide more convincing evidence that the measured force differences are not an artefact of the undulations of the surface.

      Response 2: We first saw a significant reduction in cell extrusions when we performed hyper-osmotic perturbations, and to eliminate possible off-target effects of the compounds used to increase osmolarity, we used three different compounds to be sure. In spite of this, we felt it would further support our argument, that basal accumulation of fluid stress was responsible for the extrusions, if we had some other independent means of removing fluid stress without directly tuning osmolarity through addition of extraneous solutes. We hence thought of culturing MDCK monolayers on hydrogels.

      Hydrogels were chosen because they can effectively dilute basal solute concentration (for reference ions (Na+) are continuously pumped out basally by the monolayer) and thereby reduce the associated osmotically induced water transport. Moreover, as fluid could freely move within the gel, the fluid stress can quickly equilibrate across the basal surface. In contrast, poorly water/solute permeable substrates will lead to localized spikes in solute concentration and transient basal regions with high fluid stress.

      To get a sense of the extent of difference in basal solute concentration between the two materials, we can do a quick hand-waving estimation. For monolayers on non-water-permeable PDMS of 20x20 mm, and using the laser wavelength (640 nm) for RICM as an extreme estimate of basal separation, we should expect ~0.25 µl of total basal water content. On the other hand, we typically produce our PAM gel slabs using ~150 µl of precursor solutions. This means that, given similar amounts of solute, PAM gels will lead to monolayer basal osmolarity that is around 3 orders of magnitude lower than monolayers on PDMS, producing significantly lower osmotic potential. This implies from the outset that we should expect high survivability of cells on these substrates. Indeed, later immunoblotting experiments showed MDCKs exhibiting hyper activated FAK and Akt on PAM gels.

      As for the 3D TFM used in this study, it is actually implemented from a well-established finite element method to solve inverse problems in engineering and has been repeatedly validated in larger scale engineering contexts (Ref. R7). The novelty and contribution of our article is in its adaptation to reconstruct cellular forces at microscopic scales.

      In brief, soft materials, such as hydrogels used in our case, are doped with fluorescent particles, coated with ECM, and then seeded with cells. The cells would exert forces that deform the soft substrate, thereby displacing the fluorescent particles from their equilibrium positions. This particle displacement can be extracted by producing an image pair with microscopy; first one with the cells, and subsequent one of relaxed gel after removal of cells with acutely cytotoxic reagents, such as SDS. There are several ways in which the displacement field can be extracted from the image pair. These include particle tracking velocimetry, particle image velocimetry, digital volume correlation, and optical flow.

      We employed 3D Farneback optical flow in our study for its superior computational performance. The method was validated using synthetically generated images from Sample 14 of the Society for Experimental Mechanics DIC challenge. The accuracy of the calculated displacements using the 3D Farneback optical flow was then compared to the provided ground truth displacements. For the highest frequency displacement image pairs, an x-component root-mean-square-error (RMSE) value of 0.0113 was observed. This was lower than the 0.0141 RMSE value for the Augmented Lagrangian Digital Volume Correlation method. This suggested that the 3D Farneback optical flow is capable of accurately calculating the displacement between two bead images.

      The displacement fields are then fed into a finite element suite (ANSYS in our case) along with the model and mesh of the underlying substrate structure to obtain node specific displacements. This is required because mech nodes do not typically align with voxel positions of displacements. With these node specific displacements, we subsequently solve the inverse problem for the forces using Tikhonov regularization (Ref. R8). The outcome is a vector of node specific forces.

      In light of the above, to physically validate the method in our context would require the generation of a known ground truth force on the scale of pico- to nano-newtons and subsequently image the particle displacements from this force using confocal microscopy. The force must then be released in situ in order for the relaxed gel to be imaged again. This is not a straightforward feat at this scale, and a method that immediately springs to mind is magnetic tweezers. Unfortunately, this is a tool that we cannot develop within reasonable timeframes, as the method will have to be seamlessly integrated with our spinning-disk confocal. However, as a compromise, we have included an in-silico validation with our revised manuscript.

      Specifically, given a finite element model with a predefined curvature, a known force was applied to the surface of the model (Author response image 7A). The resulting displacements were then calculated from the finite element solution. A 10% random noise is then added to the resulting displacement. The traction force recovery (Fig. R2-1 B) was then performed using the in-silico noisy displacements. To evaluate the accuracy of the recovery, the cosine similarity along with the mean norm of the force vectors were calculated. A value closer to 1 for both evaluation metrics indicates a more accurate reconstruction of the simulated traction force. The cosine similarity of the recovered traction forces to the original applied force was 0.977±0.056 while the norm of the recovered traction forces as a proportion of the original applied force was 1.016±0.165. As both values are close to 1 (i.e., identical), this suggested that the traction forces could be satisfactorily recovered using the finite-element based method.

      In response to the reviewer’s recommendations then, additional content has been included in the main text to explain the use of PAM gels and the workings of our 3D TFM pipeline.

      Ref R7. James F. Doyle, Modern Experimental Stress Analysis: Completing the Solution of Partially Specified Problems (John Wiley & Sons, Chichester, 2004).

      Ref R8. Per Christian Hansen, Discrete Inverse Problems: Insight and Algorithms (siam, Philadelphia, 2010).

      Author response image 7.

      (A) shows simulated force field to generate simulated displacements. (B) shows force field reconstructed from simulated displacements with noise.

      Recommendation 3: The authors show nuclear deformation on the hills and use this as evidence for a resultant downward-pointing force vector. This has, indeed, also been observed in other works referenced by the authors (e.g. Werner et al.), and could be interesting evidence to support the current observations, provided the authors also show a nuclear shape on the concave and flat regions. The authors could potentially also characterize this shape change better using higher-resolution data.

      Response 3: We characterized nucleus deformation using Hoechst-stained samples as per recommendation. The deformation is estimated by dividing segmented nuclei volumes by best-fit ellipsoid volumes of same objects. In this way, objects exhibiting minimal bending will lead to values close to 1.0. The obtained graph is shown in figure Author response image 8B (and manuscript Figure 3D).

      Author response image 8.

      (A) an example of deformed nuclei on 50 µm wave hill region. (B) a Violin plot of calculated nuclear deformations across dimensions and features using segmented volume normalized against best-fit ellipsoid volume.

      Our quantifications show a statistically significant difference in nuclei deformation measure medians between hill and valley cells on the 50 µm (0.973 vs 0.982) and 100 µm (0.971 vs 0.979) waves; this indicates that cells on the hills tend to have more deformed nuclei compared to cells in the valleys. Meanwhile, no significant difference was found for a similar comparison on 200 µm (0.978 vs 0.978) samples. For reference, the median found for cells pooled from planar regions was 0.975.

      In response to the reviewer’s suggestions Figure 3 of our manuscript has been updated to include the new results on nuclei deformation. The text has also been updated to account for the new information to support our claims. The statistics are included in a new summary data table in Supplementary File 6.

      Recommendation 4: The U-net for extrusion detection is a central tool used within this study, though the explanation and particularly validation of the tool are somewhat lacking. More clarity in the explanation and more examples of good (or bad) detections would help establish this tool as a more robust component of the data collection (on all geometries).

      Response 4: The architecture of the neural network used in this study is outlined in supplementary figure S5a. To validate the performance of the model, a test dataset consisting of 200 positive examples and 100 negative examples were fed into the network and the resulting prediction was obtained from model. The confusion matrix of the model is shown in supplementary figure S5c. The weighted precision and recall of the model are 0.958 and 0.953 respectively.

      Additionally, we have included examples of false positive and false negative detections in Figure 1-figure supplement 5 (Author response image 8). For false positive detections, these were typically observed to be extrusions that were labelled to have occurred the frame prior to the frame of interest (Author response image 9 bottom sequence). However, as the extrusion process is incomplete in the prior frame, there are still changes in the extruded cell body and the network falsely predicts this as a detection.

      Author response image 9.

      Examples of false negative and false positive extrusions registration.

      Recommendation 5: The authors study the involvement of FAK in the observed curvature-dependent and hydraulic stress-dependent spatial regulation of cell extrusion. In one of the experiments, the authors supplement the cell medium with FAK inhibitors, though only in a hyper-osmotic medium. They show that FAK inhibition counteracts the extrusion-suppressing effect of a hyper-osmotic medium. However, no data is shown on the effect of FAK inhibitors within the control medium. Would the extrusion rates be even higher then?

      Response 4: We proceeded, as suggested by the reviewer, to explore the effects of the FAK inhibitor on MDCK monolayers in our control medium. The results revealed that, at the 3 µM FAK concentration, where cells in sucrose media showed an elevated extrusion rate, monolayers in control medium quickly suffered massive cell death (Author response image 10) similar to what was seen when 6 µM FAK was introduced to sucrose medium.

      This finding suggests that osmolarity protects against FAK inhibitors in a dose dependent manner. Moreover, as cell extrusions require an intact monolayer, its rates cannot increase indefinitely: a point will be reached where an intact monolayer can no longer be maintained.

      We have updated the main text of our article to mention this observation, and also included a new time-lapse (Video 22) to demonstrate the effect.

      Author response image 10.

      Timelapse snapshot of MDCK monolayers over waves 4 hours after inclusion of focal adhesion kinase inhibitor.

      Recommendation 6: The supplementary videos show two fields of view next to each other, which is not immediately clear to the viewer. I strongly advise the authors to add a clear border between the two panels, so that it is clear that the cells from one panel are not migrating into the next panel.

      Response 6: A distinctive border has been added to the movies to separate panels showing different focal planes of the same stack.

      Recommendation 7: The general quality and layout of the figures could be improved. Some figures would benefit from higher-resolution or larger cell images (e.g. Figure 2A, C, D), and the organisation of subpanels could be improved (e.g. especially in Figure 2). The box plots and bar graphs are also not consistent throughout the manuscript in terms of colouring and style, which should be improved.

      Response 7: We have enlarged the figures in question accordingly, at the cost of reducing some information. However, the full scope of the sub-figures remains accessible in the supplementary movies. We have also tried to change the placement of the panels to improve readability. We have also adjusted the valley, hill, and flat coloring scheme for the extrusion boxplots in Figures 1 and 2 to make them consistent.

      Recommendation 8: The graphs in Figures 3E and F are confusing and difficult to interpret. The x-axis states "Position along curve in radians" but it is unclear how to relate this to the position on the wavy substrate. The graphs also have a second vertical axis on the right ("valley-interface-hill"), which adds to the confusion. I would recommend the authors provide more explanation and consider a different approach of plotting this.

      Response 8: We have removed the confusing plot of cross-sectional profile from the force graphs. To indicate positions on the waves, we have augmented radian values with Hill, Interface, and Valley accordingly.

      Recommendation 9: Specify which silicone was used for the low-stiffness silicone substrates in the methods and in the main text.

      Response 9: CY52 has been added to the main-text, next to the first appearance of the word soft silicone, to be consistent with the figures.

      Recommendation 10: The flow lines that are plotted over the RICM data make it difficult to see the underlying RICM images. I would advise to also show the RICM images without the flow lines.

      Response 10: The original movie S15 (now Video 16) showing the RICM overlapped with optical flow paths has now been replaced by a movie showing the same, but with the flow paths and RICM in separate panels.

      Recommendation 11: In the first paragraph of the discussion, the authors write: "And this difference was both dependent on the sense (positive or negative)...". This is superfluous since the authors already mentioned earlier in the paragraph that the convex and concave regions (i.e. different signs of curvature) show differences in extrusion rates.

      Response 11: The sentence has been changed to “And this difference was also dependent on the degree of curvature.”

      Recommendation 12: In the second paragraph of the discussion, the authors mention that "basal fluid spaces under monolayers in hill regions were found consistently smaller than those in valley regions". Is this data shown in the figures of the manuscript? If so, a reference should be made because it was unclear to me.

      Response 12: This statement is an inference from the comparison of the hill and valley RICM grey values. Specifically, RICM intensities are direct surrogates for basal separations (i.e., fluid space (as there cannot be a vacuum)) by virtue of the physics underlying the effect. To be more precise then, “inferred from RICM intensity differences (Figure 2I)” has been added to support the statement.

      Recommendation 13: On page 7 of the discussion, the authors talk about positively and negatively curved surfaces. This type of description should be avoided, as this depends on the definition of the surface normal (i.e. is positive convex or concave?). Rather use convex and concave in this context.

      Response 13: The wording has been changed accordingly.

      Recommendation 14: The label of Table 8 reads "Table 2".

      Response 14: The error has been corrected.

      Reviewer #3

      Recommendation 1: The central finding seems to be opposite to an earlier report (J Cell Sci (2019) 132, jcs222372), where MDCK cells in curved alginate tubes exhibit increased extrusion on a convex surface. I suggest that you comment on possible explanations for the different behaviors.

      Response 1: The article in question primarily reported the phenomenon of MDCK and J3B1A monolayers detaching from the concave alginate tube walls coated with Matrigel. The authors attributed this to the curvature induced out-of-plane forces towards the center of the tubes. Up to this point, the findings and interpretation are consistent with our current study where we also find a similar force trend in concave regions.

      To further lend support to the importance of curvature in inducing detachment, the authors cleverly bent the tubes to introduce asymmetry in curvature between outer and inner surfaces. Specifically, the outside bend is concave in both principal directions, whereas the inside bend is convex in one of its principal directions. As expected, the authors found that detachment rates from the outer surface were much larger compared to the inner one. Again, the observations and interpretations are consistent with our own findings; the convex direction will generate out-of-plane forces pointing into the surface, serving to stabilize the monolayer against the substrate. It should be noted however, since the inner-side tube is characterized by both convex and concave curvatures in its two principal directions, the resulting behavior of overlaying monolayers will depend on which of the two resulting forces become dominant. So, for gradual bends, one should expect the monolayers to still be able to detach from the inner tube surface. This is what was reported in their findings.

      For their extrusion observations, I am surprised. Because their whole material (hydrogels) is presumably both solute and water permeable, I would be more inclined to expect very few extrusions irrespective of curvature. This is indeed the case with our study of MDCKs on PAM hydrogels, where the hydrogel substrate effectively buffers against the quick build-up of solute concentration and basal hydraulic stress. Without the latter, concave monolayer forces alone are unlikely to be able to disrupt cell focal adhesions. Indeed, the detachments seen in their study are more likely by exfoliation of Matrigel rather than pulling cells off Matrigel matrix entirely.

      My guess is that the extrusions seen in their study are solely of the canonical crowding effect. If this was the case, then the detached monolayer on the outside bend could buffer against crowding pressure by buckling. Meanwhile, the monolayer on the inside bend, being attached to the surface, can only regulate crowding pressure by removing cells through extrusions. This phenomenon should be particular to soft matrices such as Matrigel. Using stiffer and covalently bonded ECM should be sufficient to prevent monolayers from detaching, leading to similar extrusion behaviors. In response to the reviewer’s recommendation then, we have included a short paragraph to state the points discussed in this response.

      Recommendation 2: Fig 3E, F: The quantities displayed on the panels are not forces, but have units of pressure (or stress).

      Response 2: we have changed “force” to “stress” according to the reviewer’s suggestion. The reason we kept the use of force in the original text was due to the fact that we were reconstructing forces. Due to discretization, the resulting forces will inevitably be assigned to element nodes. In between the nodes, in the faces, there will be no information. So, in order to have some form of continuity to plot, the face forces are obtained by averaging the 4 nodes around the element face. Unfortunately, element face areas are not typically of the same size, therefore the average forces obtained needs to be further normalized against the face area, leading to a quantity that has units of stress.

      Recommendation 3: Fig 2D: Asterisks are hard to see.

      Response 3: the color of the asterisks has been changed to green for better clarity against a B&W background.

      Recommendation 4: p 19, l 7: Word missing in "the of molding"

      Response 4: the typo has been amended to “the molding of”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well. 

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify. 

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois. 

      Strengths: 

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition. 

      Weaknesses: 

      The authors rely on the expression of a small number of key regulatory genes to interpret the developmental defects. The alternative possibilities remain to be ruled out thoroughly. The manuscript is also quite descriptive and would benefit from more focused highlighting of the novelty regarding the absence of Tgfbr1 in the mouse embryo. They should also strengthen some of their conclusions with more details in the results.

      Although we used a limited number of key regulatory genes to interpret the phenotype, these genes were carefully chosen to focus on specific processes involving the lateral mesoderm, its derivatives, and the endoderm. In addition to these markers, we included references to other relevant markers that were previously analyzed and initially led us to examine the lateral plate mesoderm and tail gut in Tgfbr1 mutants. To strengthen our analysis, we have now incorporated additional data to clarify specific phenotypes. For instance, in situ hybridization (ISH) for Shh further confirms abnormalities at the caudal end of the endoderm in mutant embryos, while no endodermal defects are observed in the trunk region. We also included an analysis of the intermediate mesoderm, which shows abnormalities at the same level as those found in the lateral plate mesoderm and endoderm of Tgfbr1 mutants.

      It’s important to note that using additional markers to assess the epiblast/primitive streak of Tgfbr1 mutants at E7.5–E8.5, as suggested by a reviewer, is unlikely to yield new insights. At these early stages, Tgfbr1 mutant embryos do not display observable phenotypes in the main body axis. Data in this manuscript already demonstrate the absence of abnormalities at this stage, as shown in Figure 3 and Supplementary Figure 6. Additionally, the expression of certain genes showing abnormalities when the embryo would enter tail development, in the trunk their expression remains unaffected, indicating that trunk extension is not significantly impacted by Tgfbr1 deficiency. While transcriptomic analysis of these Tgfbr1 mutants could provide interesting insights, it would be more appropriate to focus on later developmental stages, which would be beyond the scope of the current study.

      The second major critique was that the manuscript is primarily descriptive. We disagree with this assessment. Several hypotheses were rigorously tested using genetic approaches, including Isl1 knockout experiments, cell tracing from the primitive streak with a newly generated Cre driver to activate a reporter from the ROSA26 locus, and assessment of extraembryonic endoderm fate in Tgfbr1 mutants by introducing the Afp-GFP transgene into the Tgfbr1 mutant background. Additionally, we conducted tracing analyses of tail bud cell contributions to the tail gut via DiI injection and embryo incubation. To address potential concerns regarding this experiment, we have included data showing the DiI position immediately after injection to confirm that it does not contact the tail gut. We also considered and accounted for potential DiI leakage into neuromesodermal progenitors to clarify the endodermal results.

      Our genetic and DiI experiments were specifically designed to differentiate between alternative hypotheses and to confirm hypotheses generated from other analyses. Additionally, improvements in some of the imaging data have helped address remaining concerns.

      Reviewer #1 (Recommendations For The Authors): 

      I have listed my suggestions as queries. The authors may perform experiments or clarify by editing the text to address them. 

      The authors state on Page 11 and elsewhere that the ventral lateral mesoderm is absent in the Tgfbr1 mutant. What is the basis for this conclusion? Are there specific markers for PCM or GT primordium? 

      The specific marker of PCM and GT primordium is Isl1. The absence of this marker in the Tgfbr1 mutants is shown in (Dias et al, 2020). The reference is introduced in the manuscript.

      A schematic illustrating the VLM and the expression patterns of Tgfbr1, Gdf11, etc., would be helpful. 

      Characterization of Gdf11 expression has been previously reported (e.g. McPherron et al 1999, cited in our manuscript). It is expressed in the region containing of axial progenitors before the trunk to tail transition and not expressed in the VLM. As for Tgfbr1 expression is hard to detect, likely because it is ubiquitously expressed at low level. We include in this document some pictures of an ISH, including a control using the Tgfbr1 mutants to illustrate that the staining resembling background actually represents Tgfbr1 expression. If the reviewers find it important, we can also incorporate these data into the manuscript. Under these circumstances, we feel that a schematic might not be very informative.

      Author response image 1.

      Image showing an example of an ISH procedure with a probe against Tgfbr1, showing widespread and low expression. The lower picture shows a ventral view of a stained wild type E10.5 embryo.

      Foxf1+ cells in the 'extended LPM' of Tgfbr1 mutants suggest fate transformation, or does it indicate the misexpression of marker gene otherwise suppressed by Tgfbr1 activity? The authors suggest that Foxf1+ cells are VLM progenitors from posterior PS trapped in the extended LPM. Do they continue to express PS markers? 

      The observation that both in wild type and Tgfbr1 mutant embryos Foxf1 expression in the trunk is restricted to the splanchnic LPM indicates that the absence of this marker in the somatic LPM is not the result of a suppression of its expression by Tgfbr1. In wild type embryos Foxf1 is also expressed in the posterior PS, regulated independently of its expression in the LPM (i.e. Shh-independent) and later in the pericloacal mesoderm (our supplementary figure 2). As Foxf1 expression in the posterior PS was not suppressed in the Tgfbr1 mutants, together with the absence of pericloacal mesoderm, we interpret that the Foxf1-positive cells in the two layers around the extended celomic cavity in the posterior end of the mutant embryos derived from the posterior PS, resulting from the absence of its normal progression through the embryonic tissues.

      We did not find expression of PS markers giving rise to paraxial mesoderm, like Tbxt, further suggesting that those cells could derive from the restricted set of cells within the posterior PS that contribute to the pericloacal mesoderm

      For example, the misexpression of Apela is interpreted as mis-localized endoderm cells. They show scattered Keratin 8 misexpression to support the interpretation. It would be more convincing if the authors tested the expression of other endoderm markers. 

      As indicated in the manuscript, we suggest that these cells are endoderm progenitors (p. 13), like those present at the posterior end of the gut tube at E9.5 and E10.5, that are unable to incorporate into the gut tube. Apela is not a general endodermal marker: it is expressed in the foregut pocket and the nascent cells of the hindgut/tail gut, becoming down regulated as cells take typical endodermal signatures. The presence of ectopic Apela expression in the extended LPM of the mutant embryos might indeed indicate the presence of progenitors that failed to downregulate Apela resulting from the lack differentiation-associated downregulation. This would also implicate the absence of definitive endodermal markers.

      The Nodal signaling pathway in the anterior PS drives endoderm development. It acts through Alk7. Does Tgfbr1 (Alk5) mutation impact endoderm development, in general? It isn't easy to assess this from the Foxa2 in situ RNA hybridization shown in Figures 6A and B. It would be helpful for the readers if the authors clarified this point. 

      In the pictures shown in Figure 7D-D’ it is already shown that the endoderm is mostly preserved until the region of the trunk to tail transition. The presence of a rather normal endoderm in the embryonic trunk can also be seen with Shh, a figure added as Supplementary Fig.5.

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention two interesting novel points which they should develop in the discussion, and probably also in the results. 

      (1) The authors speculate about the possible involvement of the posterior PS as a mediator of Gdf11/Tgfbr1 signaling activity. However, as mentioned in the manuscript, their experiments do not allow regional sublocalization within the PS... Here it would be important to assess/discuss in more detail which progenitors respond to this signaling activity and when they do it. At the very least, the authors should provide high-resolution spatiotemporal data of the expression of Tgfbr1 in the PS. 

      Tgfbr1 expression at this embryonic stage does not give clear differential patterns. The data reported for this expression in Andersson et al 2006 is very low quality and we have not been able to reproduce the reported pattern. On the contrary, all our efforts over the years provided a very general staining that could even be interpreted as background. When we now included Tgfbr1 mutants as controls, it became clear that the ubiquitous and low level signal observed in wild type embryos indeed represent Tgfbr1 expression pattern: low level and ubiquitous. We are attaching a figure to this document illustrating these observations. If required, this can also be included in the manuscript as a supplementary figure. 

      Also, the work of Wymeersch et al., 2019 regarding the lateral plate mesoderm progenitors (LPMPs) should be referred to and discussed here. 

      This was now added in the results (page 11) and in discussion (page 16). 

      For instance, are the LPMP transcriptomic differences detected between E7.5 and E8.5 caused by Tgfbr1 signaling activity? This question could be easily answered through a comparative bulk RNAseq analysis of the posterior-most region of the PS of mutant and WT embryos. The possible colocalization of Tgfb1 (Wymeersch et al., 2019) and Tgfbr1 in the LPMPs should also be addressed. 

      We agree with the suggestion that RNA-seq in the posterior PS of WT and mutant embryos might be informative. However, it is very likely that within the proposed timeframe (E7.5 to E8.5) that there are no significant differences between the wild type and the Tgfbr1 mutant embryos because there is no apparent axial phenotype in Tgfbr1 mutant embryos before the trunk to tail transition. Therefore, at this stage, we think that this experiment is out of the scope of the present manuscript. 

      (2) The activity of Tgfbr1 during the trunk-to-tail transition is critical for the development of tail endodermal tissues. Here the authors suggest again the involvement of the posterior PS/allantois region, but a similar phenotype can also be observed for instance in the absence of Snai1 in the caudal epiblast (Dias et al., 2020)... It would be important to assess/discuss the origin of those morphogenetic problems in the gut. Is it due to the reallocation of NMC cells into the CNH? The tailbud-EMT process? LPMPs specification?... Regional mutations or gain of functions of Snai1 or Tgfbr1 in the caudal epiblast would help answer the question.  

      The endodermal phenotype in the Snai1 mutants is different to that observed in the Tgfbr1 mutants. As can be observed in Figures 3, 4 and 5 of Dias et al. the absence of tailbud is replaced by a structure that extends the epiblast. As a consequence, the endoderm finishes at the base of that structure, even expanding to make a structure resembling the cloaca, which is different to what is seen in the Tgfbr1 mutants. In this case, the lack of tail gut is likely to result either from the lack of formation of the progenitors of the gut endoderm or from the dissociation of what would be the tail bud from the LPM. Actually, hindlimb/pericloacal mesoderm markers, like Tbx4, are preserved in the Snai1 mutant. As for the gain of function of Snai1 experiment, already reported also in Dias et al 2020, the destiny of these cells is not clear. The ISH for Foxa2 showed extra signals but as it is not an exclusive marker for endoderm it is not possible to know whether any of these signals correspond to endodermal tissues.

      Regarding the development of tail endodermal tissues, the authors suggest that it occurs from a structure derived from the PS that is located posteriorly, in the tailbud, after the tip of the growing gut. This is an important and novel point as it suggests that the primordia of the endoderm is not wholly specified during gastrulation. So the observation should be well supported. How can Anastasiia et al. distinguish such "structure" from the actual developing gut? Does it have a distinct molecular signature or any morphological landmark that enables its separation from the actual gut? The data suggests that the region highlighted in Supplementary Figure 4Ab contains part of the actual gut tube (the same is suggested in Figure 5B). If the authors think otherwise, they must characterize that region of the tailbud by doing a thorough morphological and gene/protein expression analysis and assess its potency, via transplantation experiments. Also, the authors' claim mostly relies on the DiI experiments and those have three problems: #1 Anastasiia et al. assess "tail" endodermal growth at E9.5 when the correct stage to do it is after E10.5 (after tailbud formation). 2# Incongruencies, low number (only three embryos), and diversity in the results shown in Figure 8 and Supplementary Figure 4. For instance, despite similar staining at 0h, the extension and amount of DiI present in the gut tube after 20h varies significantly amongst the differently labeled embryos. A possible explanation lies in the abnormal leakiness of the DiI labelings and that is confirmed by the observations shown in Supplementary Figure 4M-O; the same for Supplementary Figure 4G, which shows a substantial amount of DiI in the neural tube. 3# The authors must provide high-quality data showing which tissues/regions were labelled at time 0h, including transversal and sagittal sections as they did for the 20h time-point. Additionally, it is important to re-orient the sagittal optical sections to a position that also shows the neural tube (like a mid-sagittal section) and include information concerning the AP/DV axis, as well as the location of the transversal optical sections in the sagittal image. 

      As described in the reply to reviewer 1, Apela is expressed in the nascent tail gut endoderm but not in more anterior areas except for a foregut pocket, and becomes downregulated as the tube acquires endodermal signatures. Therefore, the structure to which the reviewer refers to might indeed represent a group of progenitors that extend the tail gut. And the observation that this property is observed only in the tail gut as it grows, already separates this region of the gut, which in the end do not contribute to mature organs, from more anterior areas of the endoderm (essentially anterior to the cloaca) that will become a relevant tissue of the intestinal organs. Our DiI labelling experiment was aimed to test whether this pool of cells contributes to the gut but does not allow to determine the nature of those cells, a question that will require further research (discussed on p. 17) and we think is beyond the scope of the present manuscript.

      Regarding the labelling at E10.5, we agree that the tail bud in terms of NMCs is not completely formed, for example, at E9.5 the neuropore is not yet closed. However, we are more interested in regression of the epiblast, which is complete by E9.5. Injecting at E9.5 also has technical advantages for us, first, because in our hands earlier embryos grow better in culture, and second, because it is easier to inject in the tailbud at E9.5 because it is a little bit bigger than at E10.5. Therefore, injecting at E9.5 is less prone to technical artifacts due to injection inaccuracy and compromised growth in culture.

      We agree that the injected DiI could also leak into NMPs, which might be located in the same area. However, while this could result in labeling of the neural tube, it would not affect the interpretation of the finding of labeled cells in the tail gut. Indeed, the presence of this label in the gut epithelium indicates the presence of progenitors in the injected region of the tail gut. We added some considerations of this the possible leakage into the results section of the manuscript (p. 15). We thank the reviewer for drawing our attention to this issue. 

      We also now provide high quality data showing labelled tissue at 0h in Supplementary figure 8A-c’, higher magnification images in Fig. 8, and reoriented optical sections in Fig.6 and in Supplementary Fig. 7, including axis and location of the sections as suggested by the reviewer.

      Minor concerns/comments: 

      (1) The abstract is quite long, though this might be fine for this journal. 

      (2) In relation to the comment on the abstract, the manuscript needs an initial Figure descrbing the events that are described in the introduction. Otherwise, the manuscript will only be accessible to mouse embryologists.

      We have a figure summarizing the results at the end of the manuscript, we think that including similar figure in the beginning might be redundant. What we could do, if required, is to include this type of schematic as a graphical abstract.

      (3) The authors need to clarify what they mean when they use the following expressions "PS fate" and "fate of the posterior PS".

      I do not think that we have used such expressions. Indeed, they did not come out when we run a “find” in the word document. However, they would mean the tissue that would come out from them at later developmental stages.

      (4) The assessment of Isl1 expression in Tgfbr1 mutant and transgenic mouse embryos would be better indicative of their molecular relationship than a comparative phenotypic analysis. 

      These data have been reported in Dias et al 2020 and Jurberg et al 2013, both cited in the manuscript.  

      (5) The authors should explain or discuss what the upregulation of Foxa2 in the posterior end of Tgfbr1 mutants means.

      While an upregulation is apparent in the figure, looking at other pictures we cannot be sure of this being a significantly quantifiable up-regulation. We therefore removed the statement from the text.

      (6) What happens to the intermediate mesoderm during the trunk-to-tail transition? Is Tgfbr1 involved in the regulation of its development?

      We have tested this using Pax2 and added the relevant data in Supplementary Fig. 1 and described in the results.

      (7) The term "potential" should not be used during the description of DiI labeling experiments as this technique only assesses cell fate.

      Corrected

      (8) Some figures lack AP/DV axis information (e.g. Figures 6, C, and D).

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Millard and colleagues investigated if the analgesic effect of nicotine on pain sensitivity, assessed with two pain models, is mediated by Peak Alpha Frequency (PAF) recorded with resting state EEG. The authors found indeed that nicotine (4 mg, gum) reduced pain ratings during phasic heat pain but not cuff pressor algometry compared to placebo conditions. Nicotine also increased PAF (globally). However, mediation analysis revealed that the reduction in pain ratings elicited by the phasic heat pain after taking nicotine was not mediated by the changes in PAF. Also, the authors only partially replicated the correlation between PAF and pain sensitivity at baseline (before nicotine treatment). At the group-level no correlation was found, but an exploratory analysis showed that the negative correlation (lower PAF, higher pain sensitivity) was present in males but not in females. The authors discuss the lack of correlation.

      In general, the study is rigorous, methodology is sound and the paper is well-written. Results are compelling and sufficiently discussed.

      Strengths:

      Strengths of this study are the pre-registration, proper sample size calculation, and data analysis. But also the presence of the analgesic effect of nicotine and the change in PAF.

      Weaknesses:

      It would even be more convincing if they had manipulated PAF directly.

      We thank Reviewer #1 for their positive and constructive comments regarding our study. We appreciate the view that the study was rigorous and methodologically sound, that the paper was well-written, and that the strengths included our pre-registration, sample size calculation, and data analysis.

      In response to the reviewer's comment about more directly manipulating Peak Alpha Frequency (PAF), we agree that such an approach could provide a more direct investigation of the role of PAF in pain processing. We chose nicotine to modulate PAF as the literature suggested it was associated with a reliable increase in PAF speed. As mentioned in our Discussion, there are several alternative methods to manipulate PAF, such as non-invasive brain stimulation techniques (NIBS) like transcranial alternating current stimulation (tACS) or neurofeedback training. These approaches could help clarify whether a causal relationship exists between PAF and pain sensitivity. Although methods such as NIBS still require further investigation as there is little evidence for these approaches changing PAF (Millard et al., 2024).

      Reviewer #2 (Public Review):

      Summary:

      The study by Millard et al. investigates the effect of nicotine on alpha peak frequency and pain in a very elaborate experimental design. According to the statistical analysis, the authors found a factor-corrected significant effect for prolonged heat pain but not for alpha peak frequency in response to the nicotine treatment.

      Strengths:

      I very much like the study design and that the authors followed their research line by aiming to provide a complete picture of the pain-related cortical impact of alpha peak frequency. This is very important work, even in the absence of any statistical significance. I also appreciate the preregistration of the study and the well-written and balanced introduction. However, it is important to give access to the preregistration beforehand.

      Weaknesses:

      The weakness of the study revolves around three aspects:

      (1) I am not entirely convinced that the authors' analysis strategy provides a sufficient signal-tonoise ratio to estimate the peak alpha frequency in each participant reliably. A source separation (ICA or similar) would have been better suited than electrode ROIs to extract the alpha signal. By using a source separation approach, different sources of alpha (mu, occipital alpha, laterality) could be disentangled.

      (2) Also, there's a hint in the literature (reference 49 in the manuscript) that the nicotine treatment may not work as intended. Instead, the authors' decision to use nicotine to modulate the peak alpha frequency and pain relied on other, not suitable work on chronic pain and permanent smokers. In the present study, the authors use nicotine treatment and transient painful stimulation on nonsmokers.

      (3) In my view, the discussion could be more critical for some aspects and the authors speculate towards directions their findings can not provide any evidence. Speculations are indeed very important to generate new ideas but should be restricted to the context of the study (experimental pain, acute interventions). The unfortunate decision to use nicotine severely hampered the authors' aim of the study.

      Impact:

      The impact of the study could be to show what has not worked to answer the research questions of the authors. The authors claim that their approach could be used to define a biomarker of pain. This is highly desirable but requires refined methods and, in order to make the tool really applicable, more accurate approaches at subject level.

      We thank reviewer #2 for their recognition of the study’s design, the importance of this research area, and the pre-registration of our study. In response to the weaknesses highlighted:

      (1) We appreciate the reviewer’s suggestion to improve the signal-to-noise ratio by applying source separation techniques, such as ICA, which have now been performed and incorporated into the manuscript. Our original decision to use sensor-level ROIs followed the precedent set in previous studies, our rationale being to improve reproducibility and avoid  biases from picking individual electrodes or manually picking sources. We have  added analyses using an automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing sensorimotor sites. Here again we found no significant differences in the mediation results that used a sensor space sensorimotor ROI, further supporting the robustness of the chosen approach. ICA could still potentially disentangle different sources of alpha, such as occipital alpha and mu rhythm, and provide new insights into the PAF-pain relationship. We have now added a discussion in the manuscript about the potential advantages of source separation techniques and suggest that the possible contributions of separate alpha sources be investigated and compared to sensor space PAF as a direction for future research.

      (2) We recognise the reviewer's concern regarding our choice of nicotine as a modulator of pain and alpha peak frequency (PAF). The meta-analysis by Ditre et al. (2016) indeed points to small effect sizes for nicotine's impact on experimental pain and highlights the potential for publication bias. However, our decision to use nicotine in this study was not primarily based on its direct analgesic effects, but rather on its well-documented ability to modulate PAF, in smoking and non-smoker populations, as outlined in our study aims.

      In this regard, the intentional use of nicotine was to assess whether changes in PAF could mediate alterations in pain. This approach aligns with the broader concept that a direct effect of an intervention is not necessary to observe indirect effects (Fairchild & McDaniel, 2017). We have, however, revised our introduction to further clarify this rationale, highlighting that nicotine was used as a tool for PAF modulation, not solely for its potential analgesic properties.

      (3) We agree with the reviewer’s observation that certain aspects of the Discussion could be more cautious, particularly regarding speculations about nicotine’s effects and PAF as a biomarker of pain. We have revised the Discussion to ensure that our interpretations are better grounded in the data from this study, clearly stating the limitations and avoiding overgeneralization. This revision focuses on a more critical evaluation of the potential relationships between PAF, nicotine, and pain sensitivity based solely on our experimental context.

      Finally, We also apologize for not providing access to the preregistration earlier. This was an oversight on our end, and we will ensure that future preregistrations are made available upfront.

      Reviewer #3 (Public Review):

      In this manuscript, Millard et al. investigate the effects of nicotine on pain sensitivity and peak alpha frequency (PAF) in resting state EEG. To this end, they ran a pre-registered, randomized, double-blind, placebo-controlled experiment involving 62 healthy adults who received either 4 mg nicotine gum (n=29) or placebo (n=33). Prolonged heat and pressure were used as pain models. Resting state EEG and pain intensity (assessed with a visual analog scale) were measured before and after the intervention. Additionally, several covariates (sex at birth, depression and anxiety symptoms, stress, sleep quality, among others) were recorded. Data was analyzed using ANCOVAequivalent two-wave latent change score models, as well as repeated measures analysis of variance. Results do not show *experimentally relevant* changes of PAF or pain intensity scores for either of the prolonged pain models due to nicotine intake.

      The main strengths of the manuscript are its solid conceptual framework and the thorough experimental design. The researchers make a good case in the introduction and discussion for the need to further investigate the association of PAF and pain sensitivity. Furthermore, they proceed to carefully describe every aspect of the experiment in great detail, which is excellent for reproducibility purposes. Finally, they analyse the data from almost every possible angle and provide an extensive report of their results.

      The main weakness of the manuscript is the interpretation of these results. Even though some of the differences are statistically significant (e.g., global PAF, pain intensity ratings during heat pain), these differences are far from being experimentally or clinically relevant. The effect sizes observed are not sufficiently large to consider that pain sensitivity was modulated by the nicotine intake, which puts into question all the answers to the research questions posed in the study.

      We would like to express our gratitude to Reviewer #3 for their thoughtful and constructive review, including the positive feedback on the strengths of our study's conceptual framework, experimental design, and thorough methodological descriptions.

      We acknowledge the concern regarding the experimental and clinical relevance of some statistically significant results (e.g., global PAF and pain intensity during heat pain) and agree that small effect sizes may limit their practical implications. However, our primary goal was to assess whether nicotine-induced changes in PAF mediate pain changes, rather than to demonstrate large direct effects on pain sensitivity. Nicotine was chosen for its known ability to modulate PAF, and our focus was on the mechanistic role of PAF in pain perception. To clarify this, we have revised the discussion to better differentiate between statistical significance, experimental relevance, and clinical applicability. We emphasize that this study represents a preliminary step towards understanding PAF’s mechanistic role in pain, rather than a direct clinical application.

      We appreciate the suggestion to refine our interpretation. We have adjusted our language to ensure it aligns with the effect sizes observed and made recommendations for future research, such as testing different nicotine doses, to potentially uncover stronger or more clinically relevant effects.

      Although modest, we believe these findings offer valuable insights into the potential mechanisms by which nicotine affects alpha oscillations and pain. We have also discussed how these small effects could become more pronounced in different populations (e.g., chronic pain patients) and over time, offering guidance for future research on PAF modulation and pain sensitivity.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have a number of points that the authors may want to consider for this or future work.

      (1) By reviewing the literature provided by the authors in the introduction I think that using nicotine as a means to modulate pain and alpha peak frequency was a mistake. The only work that may give a hint on whether nicotine can modulate experimental pain is the meta-analysis by Ditre and colleagues (2016). They suggest that their small effect may contain a publication bias. I think the other "large body of evidence" is testing something else than analgesia.

      Thank you for your consideration of our choice of nicotine in the study. The meta-analysis by Ditre and colleagues (2016) suggests small effect sizes for nicotine's impact on experimental pain, compared to the moderate effects claimed in some papers, especially when accounting for the potential publication bias you mentioned. However, our selection of nicotine was primarily driven by its documented ability to modulate PAF rather than its direct analgesic effects, as clearly stated in our aims. Therefore, we do not view our decision to use nicotine as a mistake; instead, it was aligned with our goal of assessing whether changes in PAF mediate alterations in pain and thus served as a valuable tool. This perspective aligns with the broader concept that a direct effect is not a prerequisite for observing indirect effects of an intervention on an outcome (Fairchild &

      McDaniel, 2017). To further enhance clarity, we've revised the introduction to emphasize the role of nicotine in manipulating PAF in relation to our study's aims.

      Previously we wrote: “A large body of evidence suggests that nicotine is an ideal choice for manipulating PAF, as both nicotine and smoking increase PAF speed [37,40–47] as well as pain thresholds and tolerance [48–52].” This has been changed to read: “Because evidence suggests that nicotine can modulate PAF, where both nicotine and smoking increase PAF speed [37,40–47], we chose nicotine to assess our aim of whether changes in PAF mediate changes in pain in a ‘mediation by design’ approach [48]. In addition, given evidence that nicotine may increase experimental pain thresholds and tolerance [49–53], nicotine could also influence pain ratings during tonic pain.”

      (2) As mentioned above, the OSF page is not accessible.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      (3) I generally struggle with the authors' approach to investigating alpha. With the approach the authors used to detect peak alpha frequency it might be that the alpha signal may just show such a low amplitude that it is impossible to reliably detect it at electrode level. In my view, the approach is not accurate enough, which can be seen by the "jagged" shape of the individual alpha peak frequency. In my view, a source separation technique would have been more useful. I wonder which of the known cortical alphas contributes to the effects the authors have reported previously: occipital, mu rhythms projections or something else? A source separation approach disentangles the different alphas and will increase the SNR. My suggestion would be to work on ICA components or similar approaches. The advantage is that the components are almost completely free of any artefacts. ICAs could be run on the entire data or separately for each individual. In the latter case, it might be that some participants do not exhibit any alpha component.

      We appreciate your thoughtful consideration of our approach to investigating alpha. The calculation of PAF involves various methods and analysis steps across the literature (Corcoran et al., 2018; Gil Avila et al., 2023; McLain et al., 2022). Your query about which known cortical alphas contribute to reported effects is important. Initially focusing on a sensorimotor component from an ICA in Furman et al., 2018, subsequent work from our labs suggested a broader relationship between PAF and pain across the scalp (Furman et al., 2019; Furman et al., 2020; Millard et al., 2022), and a desire to conduct analyses at the sensor level in order to improve the reproducibility of the methods (Furman et al., 2020). However, based on your comment we have made several additions to the manuscript, including: explaining why we did not use manual ICA methods, suggest this for future research, and added an exploratory analysis using a recently developed automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing activity from occipital or motor sites.

      While we acknowledge that ICA components can offer a better signal-to-noise ratio (SNR) and possibly smoother spectral plots, we opted for our chosen method to avoid potential bias inherent in deciding on a component following source separation. The desire for a quick, automated, replicable, and unbiased pipeline, crucial for potential clinical applications of PAF as a biomarker, influenced this decision. At the time of analysis registration, automated methods for deciding which alpha components to extract following ICA were not apparent. We have now added this reasoning to Methods.

      “Contrary to some previous studies that used ICA to isolate sensory region alpha sources (Furman et al., 2018; De Martino et al., 2021; Valentini et al., 2022), we used pre-determined sensor level ROIs to improve reproducibility and reduce the potential for bias when individually selecting ICA components. Using sensor level ROIs may decrease the signal-to-noise ratio of the data; however, this approach has still been effective for observing the relationship between PAF and experimental pain (Furman et al., 2019; Furman et al., 2020).”

      We have also added use of ICA and development of methods as a suggestion for future research in the discussion:

      “Additionally, the use of global PAF may have introduced mediation measurement error into our mediation analysis. The spatial precision used in the current study was based on previous literature on PAF as a biomarker of pain sensitivity, which have used global and/or sensorimotor ROIs (Furman et al., 2018; Furman et al., 2020). Identification and use of the exploratory electrode clusters found in this study could build upon the current work (e.g., Furman et al., 2021). However, exploratory analysis of the clusters found in the present analysis demonstrated no influence on mediation analysis results (Supplementary Materials 3.8-3.10). Alternatively, independent component analysis (ICA) could be used to identify separate sources of alpha oscillations (Choi et al., 2005), as used in other experimental PAF-pain studies (Furman et al., 2018; Valentini et al., 2022), which could aid to disentangle the potential relevance of different alpha sources in the PAFpain relationship. Although this comes with the need to develop more reproducible and automated methods for identifying such components.”

      The specific location or source of PAF that relates to pain remains unclear. Because of this, we did employ an exploratory cluster-based permutation analysis to assess the potential for variations in the presence of PAF changes across the scalp at sensor level, and emphasise that location of PAF change could be explored in future. However, we have now conducted the mediation analysis (difference score 2W-LCS model) using averages from the data-driven parietal cluster, frontal cluster, and both clusters together. For these we see a stronger effect of gum on PAF change, which was expected given the data driven approach of picking electrodes. There was still a total and direct effect of nicotine on pain during the PHP model, but still no indirect effect via change in PAF. For the CPA models, there were still no significant total, direct, or indirect effects of nicotine on CPA ratings. Therefore, using these data-driven clusters did not alter results compared to the model using the global PAF variable.

      The reader has been directed to this supplementary material so:

      “The potential mediating effect of this change in PAF on change in PHP and CPA was explored (not pre-registered) by averaging within each cluster (central-parietal: CP1, CP2, Cpz, P1, P2, P3, P4, Pz, POz; right-frontal: F8, FT8, FT10) and across both clusters. This averaging across electrodes produced three new variables, each assessed in relation to mediating effects on PHP and CPA ratings. The resulting in six exploratory mediation analysis (difference score 2W-LCS) models demonstrated minimal differences from the main analysis of global PAF (8-12 Hz), except for the

      expected stronger effect of nicotine on change in PAF (bs = 0.11-0.14, ps < .003; Supplementary

      Materials 3.8-3.10).”

      Moreover, our team has been working on an automated method for selecting ICA components, so in response to your comment we assessed whether using this method altered the results of the current analysis. The in-depth methodology behind this new automatic pipeline will be published with a validation from some co-authors in the current collaboration in due course. At present, in summary, this automatic pipeline conducts independent component analysis (ICA) 10 times for each resting state, and selects the component with the highest topographical correlation to a template created of a sensorimotor alpha component from Furman et al., (2018). 

      The results of the PHP or CPA mediation models were not substantially different using the PAF calculated from independent components than that using the global PAF. For the PHP model, the total effect (b = -0.648, p \= .033) and direct effects (b = -0.666, p \= .035) were still significant, and there was still no significant indirect effect (b = 0.018, p \= .726). The general fit was reduced, as although the CFI was above 0.90, akin to the original model, the RMSEA and SRMR were not below 0.08, unlike the original models (Little, 2013). For the CPA model, there were still no significant total (b = -0.371, p \= .357), direct (b = -0.364, p \= .386), or indirect effects (b = -0.007, p \= .906), and the model fit also decreased, with CFI below 0.90 and RMSEA and SRMR above 0.08. See supplementary material (3.11). Note that still no correlations were seen between this IC sensorimotor PAF and pain (PHP: r = 0.11, p = .4; CPA: r \= -0.064, p = .63).

      Interestingly, in both models, there was now no longer a significant a-path (PHP: b = 0.08, p =

      0.292; CPA: b = 0.039, p = 0.575), unlike previously observed (PHP: b = 0.085, p = 0.018; CPA: b = 0.089, p = 0.011). We interpret this as supporting the previously highlighted difference between finding an effect on PAF globally but not in a sensorimotor ROI (and now a sensorimotor IC), justifying the exploratory CBPA and the suggestion in the discussion to explore methodology.

      We understand that this analysis does not fully uncover the reviewer’s question in which they wondered which of the known cortical alphas contributes to the effects reported in our previous work. However, we consider this exploration to be beyond the scope of the current paper, as it would be more appropriately addressed with larger datasets or combinations of datasets, potentially incorporating MEG to better disentangle oscillatory sources. The highlighted differences seen between global PAF, sensorimotor ROI PAF, sensorimotor IC PAF, as well as the CBPA of PAF changes provide ample directions for future research to build upon: 1) which alpha (sensor or source space) are related to pain, 2) how are these alpha signals represented robustly in a replicable way, and 3) which alpha (sensor or source space) are manipulable through interventions. These are all excellent questions for future studies to investigate.

      The below text has been added to the Discussion:

      In-house code was developed to compare a sensorimotor component to the results presented in this manuscript (Supplementary Material 3.11), showing similar results to the sensorimotor ROI mediation analysis presented here. However, examination of which alpha - be it sensor or source space - are related to pain, how they can be robustly represented, and how they can be manipulated are ripe avenues for future study.

      (4) I have my doubts that you can get a reliable close to bell-shaped amplitude distribution for every participant. The argument that the peak detection procedure is hampered by the high-amplitude lower frequency can be easily solved by subtracting the "slope" before determining the peak. My issue is that the entire analysis is resting on the assumption that each participant has a reliable alpha effect at electrode level. This is not the case. Non-alpha participants can severely distort the statistics. ICA-based analyses would be more sensitive but not every participant will show alpha. You may want to argue with robust group effects but In my view, every single participant counts, particularly for this type of data analysis, where in the case of a low SNR the "peak" can easily shift to the extremes. In case there is an alpha effect for a specific subject, we should see a smooth bump in the frequency spectrum between 8 and 12 12Hz. Anything beyond that is hard to believe. The long stimulation period allows a broad FFT analysis window with a good frequency resolution in order to detect the alpha frequency bump.

      The reviewer is correct that non-alpha participants can distort the statistics. We did visually assess the EEG of each individual’s spectra at baseline to establish the presence of global peaks, as we believe this is good practice to aid understanding of the data. Please see Author response image 1 for individual spectra seen at baseline. Although not all participants had a ‘smooth bump in the frequency spectrum between 8 and 12 Hz’, we prefer to not apply/necessitate this assumption to our data. Chiang et al., (2011) suggest that ~3% of individuals do not have a discernible alpha peak, and in our data we observed only one participant without a very obvious spectral peak (px-39). But, this participant does have enough activity within the alpha range to identify PAF by the CoG method (i.e. not just flat spectra and activity on top of 1/f characteristics). Without a pre-registered and standardised decision process to remove such a participant in place, we opted to not remove any participants to avoid curation of our data.

      Author response image 1.

      (5) I find reports on frequent channel rejections reflect badly on the data quality. Bad channels can be avoided with proper EEG preparation. EEG should be continuously monitored during recording in order to obtain best data quality. Have any of the ROI channels been rejected?

      We appreciate your attention to the channel rejection. We believe that the average channels removed (0.94, 0.98, 0.74, and 0.87 [range: 0-4] for each of the four resting states out of 64 channels) does not suggest overly frequent rejection, as it was less than one electrode on average and the numbers are below the accepted number of bad channels to remove/interpolate (i.e. 10%) in EEG pipelines (Debnath et al., 2020; Kayhan et al., 2022). To maintain data quality, consistently poor channels were identified and replaced over time. We hope you will accept our transparency on this issue and note that by stating how channel removal decisions were made (i.e. 8 or more deviations) and reporting the number of channels removed, we adhere to the COBIDAS guidelines (Pernet et al., 2018; 2020).

      During analysis, cases of sensorimotor ROI channels being rejected were noted and are now specified in our manuscript. “Out of 248 resting states recorded, 14 resting states had 4 ROI channels instead of 5. Importantly, no resting state had fewer than 4 channels for the sensorimotor ROI.”

      Note, we also realised that we had not specified that we did interpolate channels for the cluster based permutation analysis. This has been corrected with the following sentence:

      “Removed channels were not interpolated for the pre-registered global and sensorimotor ROI averaged analyses, but were interpolated for an exploratory cluster based permutation analysis using the nearest neighbour average method in `Fieldtrip`.”

      (6) I have some issues buying the authors' claims that there is an effect of nicotine on prolonged pain. By looking at the mean results for the nicotine and placebo condition, this can not be right. What was the point in including the variables in the equation? In my view, in this within-subject design the effect of nicotine should be universal, no matter what gender, age, or depression. The unconditional effect of nicotine is close to zero. I can not get my head around how any of the variables can turn the effects into significance. There must be higher or lower variable scores that might be related to a higher or lower effect on nicotine. The question is not to consider these variables as a nuisance but to show how they modulate the pain-related effect of nicotine treatment. Still, the overall nicotine effect of the entire group is basically zero.

      Another point is that for within-subject analyses even tiny effects can become statistically significant if they are systematically in one direction. This might be the case here. There might be a significant effect of nicotine on pain but the actual effect size (5.73 vs. 5.78) is actually not interpretable. I think it would be interesting for the reader how (in terms of pain rating difference) each of the variables can change the effect of nicotine.

      Thank you for your comments. We recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      In light of this, we have also altered the PAF Table 3 to reflect both the pre-post values used for the CPA mediation and baseline correlations with CPA and PHP pain (i.e. N=62), and the pre-post values used for the PHP mediation (i.e. n=60).

      It is inherently difficult to visualise the findings of a mediation analysis with confounding variables that also used latent change scores (LCS) and random-effect intercepts for participants. LCS was specifically used because of issues of regression to the mean that occur if you calculate a straightforward ‘difference-score’, therefore calculating the difference in order to demonstrate the results of the statistical model in a figure, for example, does not provide a full description of the data assessed (Valente & McKinnon, 2017). Nevertheless, if we look at the data descriptively with this in mind, then calculating the change in PHP ratings does indicate that, for the nicotine group, the mean change in PHP ratings was -0.047 (SD = 1.05, range: -4.13, 1.45). Meanwhile, for the placebo group the mean change in PHP ratings was 0.33 (SD = 0.75, range: -1.37, 1.66). Therefore suggesting a slight decrease in pain ratings on average for the nicotine group compared to a slight increase on average for the placebo group. With control for pre-determined confounders, we found that the latent change score was -0.63 lower for the nicotine group compared to the control group (i.e. the direct effect of nicotine on change in pain).

      If the reviewer is only discussing the effect of nicotine on pain, we do not believe that this effect ‘should be universal’. There is clear evidence that effects of nicotine on other measures can vary greatly across individuals (Ettinger et al., 2009; Falco & Bevins, 2015; Pomerleau et al., 1995). Our intention would not be to propose a universal effect but to understand how these variables may influence nicotine's impact on pain for individuals. Here we focus on the effects of nicotine on PAF and pain sensitivity, but attempted to control for the potential influence of these other confounding factors. Therefore, our statistical approach goes beyond mean values, incorporating variables like sex at birth, age, and depression to control for and explore potential modulating factors. Control for confounding factors is an important aspect of mediation analysis (Lederer et al., 2019; VanderWeele, 2019).

      Regarding the seemingly small effect size, we understand your concern. Indeed ‘tiny effects can become statistically significant if they are systematically in one direction’, which may be what we see in this analysis. We do not agree that the effect is ‘not interpretable’, rather that it should be interpreted in light of its small effect size (effect size being the beta coefficient in our analysis, rather than the mean group difference). We agree on the importance of considering practical significance alongside statistical significance and hope to conduct additional experiments and analyses in future to elucidate the contribution of each variable to the subtle and therefore not entirely conclusive overall effect you mention.

      Your feedback on this is valuable, and we have ensured a more detailed discussion in the revised manuscript on how these factors should be interpreted alongside some additional post-hoc analyses of confounding factors that were significant in our mediation, with the note that investigation of these interactions is exploratory. We had already discussed the potential contribution of sex on the effect of nicotine on PAF, with exploratory post-hoc analysis on this included in supplementary materials. In addition, we have now added an exploratory post-hoc analysis on the potential contribution of stress on the effect of nicotine on pain. This then shows the stratified effects by the covariates that our model suggest are influencing change in PAF and pain.

      Results edits:

      “There was also a significant effect of perceived stress at baseline on change in PHP ratings when controlling for group allocation and other confounding variables (b = -0.096, p = .048, bootstrapped 95% CI: [-0.19, -0.000047]), where higher perceived stress resulted in larger decreases in PHP ratings (see Supplementary Material 3.3 for post-hoc analysis of stress).”

      Supplementary material addition:

      “3.3 Exploratory analysis of the influence of perceived stress on the effects of nicotine on change in PHP ratings “

      “Due to the significant estimated effects of perceived stress on change in PHP ratings in the 2WLCS mediation model, we also explored post-hoc effects of stress on change in PHP ratings. We found that there is strong evidence for a negative correlation between stress and change in PHP rating within the nicotine group (n = 28, r = −0.39, BF10 = 13.65; Figure 3) that is not present in the placebo group, with equivocal evidence (n = 32, r = −0.14, BF10 = 0.46). This suggests that those with higher baseline stress who had nicotine gum experienced greater decreases in PHP ratings. Note that there was less, but still sufficient evidence for this relationship within the nicotine group when the participant who was a potential outlier for change in PHP rating was removed (n = 27, r = −0.32, BF10 = 1.45). “

      Author response image 2.

      Spearman correlations od baseline perceived stress with the change in phasic heat pain (PHP) ratings, suggest strong evidence for a negative relationship for the nicotine gum groupin orange (n=28; BF<sub>10</sub>=13.65) but not for the placebo group in grey (n=32; BF<sub>10</sub>=0.46). Regression lines and 95% confidence intervals.

      Discussion edits:

      “For example, in addition to the effect of nicotine on prolonged heat pain ratings, our results suggest an effect of stress on changes in heat pain ratings, with those self-reporting higher stress at baseline having greater reductions in pain. Our post-hoc analysis suggested that this relationship between higher stress and larger decrease in PHP ratings was only present for the nicotine group (Supplementary Material 3.3). As stress is linked to nicotine use [69,70] and pain [71–73], these interactions should be explored in future.”

      (7) Is the differential effect of nicotine vs. placebo based on the pre vs. post treatment effect of the placebo condition or on the pre vs. post effect of the nicotine treatment? Can the mediation model be adapted and run for each condition separately? The placebo condition seems to have a stronger effect and may have driven the result.

      Thank you for your comments. In our mediation analysis, the differential effect of nicotine vs. placebo is assessed as a comparison between the pre-post difference within each condition. A latent change score (i.e. pre-post) is calculated for each condition (nicotine and placebo), and then the effect of being in the nicotine group (dummy coded as 1) is compared to being in the placebo group (dummy coded as 0). The comparison between conditions is needed for this model (Valente & MacKinnon, 2017), as we are assessing the change in PAF and pain in the nicotine group compared to the change in the placebo group.

      However, to address your response, it is possible to simplify and assess the relationship between the change in peak alpha frequency (PAF) and change in pain within each gum group (nicotine and placebo) independently, without including the intervention as a factor. To do this, the mediation model can be simplified to regression analysis with latent change scores that focus purely on these relationships. The results of this can help to understand whether change in PAF influences change in pain within each group separately. As with the main analysis, we see no significant influence of change in PAF on change in pain while controlling for the same confounding variables within the nicotine group (Beta = -0.146 +/- 1.105, p = 0.895, 95% CI: -2.243, 2.429) or the placebo group (Beta = 0.730 +/- 2.061, p = 0.723, 95% CI: -4.177, 3.625).

      When suggesting that the “the placebo condition seems to have a stronger effect and may have driven the result”, we believe you are referring to the increase in mean PHP ratings within the placebo group from pre (5.51 +/- 2.53) to post-placebo gum (5.84 +/- 2.67). Indeed there was a significant increase in pain ratings pre to post chewing placebo gum (t(31) = -2.53, p = 0.0165, 95% CI: -0.603, -0.0653), that was not seen after chewing nicotine gum (t(27) = 0.237, p = 0.81, 95% CI: -0.358, 0.452). In lieu of a control where no gum was chewed (i.e. simply a second pain assessment ~30 minutes after the first), we assume the gum without nicotine is a good reference that controls for the effect of time plus expectation of chewing nicotine gum. With this in mind, as we describe in our results, the change in PHP ratings is reduced in the nicotine group compared to the placebo group. Note that this phrasing keeps the effect of placebo on pain as our reference from which to view the effect of nicotine on pain. However, you are correct that we need to ensure we emphasise that the change in pain in the PHP group is reduced in comparison to the change seen after placebo.

      We have not included these extra statistics in our revised manuscript, but hope that they aid the your understanding and interpretation of the included analyses and have highlighted these nuances in the discussion.

      “However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice.”

      (8) I would not dare to state that nicotine can function as an acute analgesic. Acute analgesics need to work for everyone. The average effect here is close to zero.

      In light of your feedback, we have refined our language to avoid a sweeping assertion of universal analgesic effects and emphasize individual variability. Nicotine's role as a coping strategy for pain is acknowledged in the literature (Robinson et al., 2022), with the meta-analysis by Ditre et al. (2016) discussing its potential as an acute analgesic in humans, along with some evidence from animal research (Zhang et al., 2020). Our revised discussion underscores the need for further exploration into factors influencing nicotine's potential impact on pain. We have also specified the short-term nature of nicotine use in this context to distinguish acute effects from potential opposing effects after long-term use (Zhang et al., 2020).

      “Short-term nicotine use is thought to have acute analgesic properties in experimental settings, with a review reporting that nicotine increased pain thresholds and pain tolerance [49]. In addition, research in a rat model suggests analgesic effects on mechanical thresholds after short-term nicotine use (Zhang et al., 2020). However, previous research has not assessed the acute effects of nicotine on prolonged experimental pain models. The present study found that 4 mg of nicotine reduced heat pain ratings during prolonged heat pain compared to placebo for our human participants, but that prolonged pressure pain decreased irrespective of which gum was chewed. Our findings are thus partly consistent with the idea that nicotine may have acute analgesic properties [49], although further research is required to explore factors that may influence nicotine’s potential impact on a variety of prolonged pain models. We further advance the literature by reporting this effect in a

      model of prolonged heat pain, which better approximates the experience of clinical pain than short lasting models used to assess thresholds and tolerance [50]. However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice. Future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (9) Figures 2E and 2F are not particularly intuitive. Usually, the colour green in "jet" colour coding is being used for "zero" values. I would suggest to cut off the blue and use only the range between red green and red.

      We have chosen to retain the current colour scale for several reasons. In our analysis, green represents the middle of the frequency range (approx 10 Hz in this case), and if we were to use green as zero, it would effectively remove both blue and green from the plot, resulting in only red shades. Additionally, we have provided a clear colour scale for reference next to the plot, which allows readers to interpret the data accurately. Our intention is to maintain clarity and precision in representing the data, rather than conforming strictly to conventional practices in color coding.

      We believe that the current representation effectively conveys the results of our study while allowing readers to interpret the data within the context provided. Thank you again for your suggestion, and we hope you understand our reasoning in this matter.

      (10) Did the authors do their analysis on the parietal ROI or on the pre-registerred ROI?

      The analysis was conducted on the pre-registered sensorimotor ROI and on the global values. We have now also conducted the analysis with the regions suggested with the cluster based permutation analysis as requested by reviewer 2, comment 3.

      (11) Point 3.2 in the discussion. I would be very cautious to discuss smoking and chronic pain in the context of the manuscript. The authors can not provide any additional knowledge with their design targeting non-smokers, acute nicotine and experimental pain. The information might be interesting in the introduction in order to provide the reader with some context but is probably misleading in the discussion.

      We appreciate your perspective and agree with your caution regarding the discussion of smoking and chronic pain. While our study specifically targets non-smokers and focuses on acute nicotine effects in experimental pain, we understand the importance of contextual clarity. We have removed these points from the discussion to not mislead the reader.

      Previously we wrote, and have removed: “For those with chronic pain, smoking and nicotine use is reported as a coping strategy for pain [52]; abstinence can increase pain sensitivity [48,50], and pain is thus seen as a barrier to smoking cessation due to fear of worsening pain [51,52]. Therefore, continued understanding of the acute effects of nicotine on models of prolonged pain could improve understanding of the role of nicotine and smoking use in chronic pain [49,51,52].”

      (12) I very much appreciate section 3.3 of the discussion. I would not give up on PAF as a target to modulate pain. A modulation might not be possible in such a short period of experimental intervention. PAF might need longer and different interventions to gradually shift in order to attenuate the intensity of pain. As discussed by the authors themselves, I would also consider other targets for alpha analysis (as mentioned above not other electrodes or ROIs but separated sources.)

      Thank you for your comments on section 3.3. We appreciate your recognition of the potential significance of PAF as a target for pain modulation. Your insights align with our considerations that the experimental intervention duration or type might be a limiting factor in observing substantial shifts in PAF to attenuate pain intensity. We had mentioned the use of the exploratory electrode clusters in future work, but have now also mentioned that the use of ICA to identify separate ICA sources may provide an alternative approach. See responses to your previous ICA comment regarding separate sources.

      REFERENCES for responses to reviewer 2

      Chiang, A. K. I., Rennie, C. J., Robinson, P. A., Van Albada, S. J., & Kerr, C. C. (2011). Age trends and sex differences of alpha rhythms including split alpha peaks. Clinical Neurophysiology, 122(8), 1505-1517.

      Debnath, R., Buzzell, G. A., Morales, S., Bowers, M. E., Leach, S. C., & Fox, N. A. (2020). The Maryland analysis of developmental EEG (MADE) pipeline. Psychophysiology, 57(6), e13580.

      Ettinger, U., Williams, S. C., Patel, D., Michel, T. M., Nwaigwe, A., Caceres, A., ... & Kumari, V. (2009). Effects of acute nicotine on brain function in healthy smokers and non-smokers: estimation of inter-individual response heterogeneity. Neuroimage, 45(2), 549-561.

      Falco, A. M., & Bevins, R. A. (2015). Individual differences in the behavioral effects of nicotine: a review of the preclinical animal literature. Pharmacology Biochemistry and Behavior, 138, 80-90.

      Kayhan, E., Matthes, D., Haresign, I. M., Bánki, A., Michel, C., Langeloh, M., ... & Hoehl, S. (2022). DEEP: A dual EEG pipeline for developmental hyperscanning studies. Developmental cognitive neuroscience, 54, 101104.

      Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., ... & Vincent, J. L. (2019). Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society, 16(1), 22-28.

      Little TD. Longitudinal structural equation modeling. Guilford press; 2013.

      Pernet, C., Garrido, M., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2018). Best practices in data analysis and sharing in neuroimaging using MEEG.

      Pernet, C., Garrido, M. I., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2020). Issues and recommendations from the OHBM COBIDAS MEEG committee for reproducible EEG and MEG research. Nature neuroscience, 23(12), 1473-1483.

      Pomerleau, O. F. (1995). Individual differences in sensitivity to nicotine: implications for genetic research on nicotine dependence. Behavior genetics, 25(2), 161-177.

      Robinson, C. L., Kim, R. S., Li, M., Ruan, Q. Z., Surapaneni, S., Jones, M., ... & Southerland, W. (2022). The Impact of Smoking on the Development and Severity of Chronic Pain. Current Pain and Headache Reports, 26(8), 575-581.

      Xia, J., Mazaheri, A., Segaert, K., Salmon, D. P., Harvey, D., Shapiro, K., ... & Olichney, J. M. (2020). Event-related potential and EEG oscillatory predictors of verbal memory in mild cognitive impairment. Brain communications, 2(2), fcaa213.

      VanderWeele, T. J. (2019). Principles of confounder selection. European journal of epidemiology, 34, 211-219.

      Valente, M. J., & MacKinnon, D. P. (2017). Comparing models of change to estimate the mediated effect in the pretest–posttest control group design. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 428-450.

      Vimolratana, O., Aneksan, B., Siripornpanich, V., Hiengkaew, V., Prathum, T., Jeungprasopsuk, W., ... & Klomjai, W. (2024). Effects of anodal tDCS on resting state eeg power and motor function in acute stroke: a randomized controlled trial. Journal of NeuroEngineering and Rehabilitation, 21(1), 1-15.

      Zhang, Y., Yang, J., Sevilla, A., Weller, R., Wu, J., Su, C., ... & Candiotti, K. A. (2020). The mechanism of chronic nicotine exposure and nicotine withdrawal on pain perception in an animal model. Neuroscience letters, 715, 134627.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      (1) Rationale and link to chronic pain. I am not sure I agree with the statement "The ability to identify those at greater risk of developing chronic pain is limited". I believe there is an abundance of literature associating risk factors with the different instances of chronic pain (e.g., Mills et al., 2019). The fact that the authors cite studies involving potential neuroimaging biomarkers leads me to believe that they perhaps did not intend to make such a broad statement, or that they wanted to focus on individual prediction instead of population risk.

      We thank the reviewer for the thought put into this comment. We did indeed wish to refer to individual prediction, but also realise that the focus on predicting pain might not be the most appropriate opening for this manuscript. Therefore, we have adjusted the below sentence to refer to the need to identify modifiable factors rather than the need to predict pain.

      “Identifying modifiable factors that influence pain sensitivity could be a key step in reducing the presence and burden of chronic pain (van der Miesen et al., 2019; Davis et al., 2020; Tracey et al., 2021).”

      (2) The statement "Individual peak alpha frequency (PAF) is an electro-physiological brain measure that shows promise as a biomarker of pain sensitivity, and thus may prove useful for predicting chronic pain development" is a non sequitur. PAF may very well be a biomarker of pain sensitivity, but the best measures of pain sensitivity we have (selfreported pain intensity ratings) in general are not in themselves predictive of the development of chronic pain. Conversely, features that are not related to pain sensitivity could be useful for predicting chronic pain (e.g., Tanguay-Sabourin et al., 2023).

      We agree that it is essential to acknowledge that self-reported pain intensity ratings alone are not definitive predictors of chronic pain development. To align with this, we have revised the sentence, removing the second clause to avoid overstatement. The adjusted sentence now reads, "Individual peak alpha frequency (PAF) is an electrophysiological brain measure that shows promise as a biomarker of pain sensitivity."

      (3) Finally, some of the statements in the discussion comparing a tonic heat pain model with chronic neuropathic pain might be an overstatement. Whereas it is true that some of the descriptors are similar, the time courses and mechanisms are vastly different.

      We appreciate this comment, and agree that it is difficult to compare the heat pain model used to clinical neuropathic pain. This was an oversight and with further understanding we have removed this comment from the introduction and the discussion:

      “In parallel, we saw no indication of a relationship between PAF and pain ratings during CPA. The introduction of the CPA model, specifically calibrated to a moderate pain threshold, provides further support for the notion that the relationship between PAF and pain is specific to certain pain types [17,28]. Prolonged heat pain was pre-dominantly described as moderate/severe shooting, sharp, and hot pain, whereas prolonged pressure pain was predominantly described as mild/moderate throbbing, cramping, and aching in the present study. It is possible that the PAF–pain relationship is specific to particular pain models and protocols [12,17].”

      Methodology

      (4) or the benefit of good science. However, I am compelled to highlight that I could not access the preregistered files, even though I waited for almost two weeks after requesting permission to do so. This was a problem on two levels: the main one is that I could not check the hypothesized effect sizes of the sample size estimation, which are not only central to my review, and in general negate all the benefits that should go with preregistration (i.e., avoiding phacking, publication bias, data dredging, HARKing, etc.). The second one is that I had to provide an email address to request access. This allows the authors to potentially identify the reviewers. Whereas I have no issues with this and I support transparent peer review practices (https://elifesciences.org/inside-elife/e3e90410/increasingtransparency-in-elife-s-review-process), I also note that this might condition other reviewers.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      Interpretation of results

      (5)To be perfectly clear, I trust the results of this study more than some of the cited studies regarding nicotine and pain because it was preregistered, the sample size is considerably larger, and it seems carefully controlled. I just do not agree with the interpretation of the results, stated in the first paragraph of the Discussion. Quoting J. Cohen, "The primary product of a research inquiry is one or more measures of effect size, not P values" (Cohen, 1990). As I am sure the authors are aware of, even tiny differences between conditions, treatments or groups will eventually be statistically significant given arbitrarily large sample sizes. What really matters then is the magnitude of these differences. In general, the authors hypothesize on why there were no differences on the pressure pain model, and why decreases in heat pain were not mediated by PAF, but do not seem to consider the possibility that the intervention just did not cause the intended effect on the nociceptive system, which would be a much more straightforward explanations for all observations.

      While acknowledging and agreeing with the concern that 'even tiny differences between conditions, treatments, or groups will eventually be statistically significant given arbitrarily large sample sizes,' it's crucial to clarify that our sample size of N=62 does not fall into the category of arbitrarily large. We carefully considered the observed outcomes in the pressure pain model and the lack of PAF mediation in heat pain, as dictated by our statistical approach and the obtained results.

      The suggestion of a straightforward explanation aligning with the intervention not causing the intended effect on the nociceptive system is a valid consideration. We did contemplate the possibility of a false positive, emphasising this in the limitations of our findings and the need for replication to draw stronger conclusions to follow up this initial study.

      (6) In this regard, I do not believe that an average *increase* of 0.05 / 10 (Nicotine post - pre) can be considered a "reduction of pain ratings", regardless of the contrast with placebo (average increase of 0.24 / 10). This tiny effect size is more relevant in the context of the considerable inter-individual variation, in which subjects scored the same heat pain model anywhere from 1 to 10, and the same pressure pain model anywhere from 1 to 8.5. In this regard, the minimum clinically or experimentally important differences (MID) in pain ratings varies from study to study and across painful conditions but is rarely below 1 / 10 in a VAS or NRS scale, see f. ex. (Olsen et al., 2017). It is not my intention to question whether nicotine can function as an acute analgesic in general (as stated in the Discussion), but instead, if it worked as such under these very specific experimental conditions. I also acknowledge that the authors note this issue in two lines in the Discussion, but I believe that this is not weighed properly.

      We appreciate your perspective on the interpretation of the effect size, and we understand the importance of considering it in the context of individual variation.

      As also discussed in response to comment 6 From reviewer 2, we recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      Moreover, we have made sure refer to the comparison with the placebo group when discussing the reduction or decrease in pain seen in the nicotine group, for example:

      “2) nicotine reduced prolonged heat pain intensity but not prolonged pressure pain intensity compared to placebo gum;”

      “The nicotine group had a decrease in heat pain ratings compared to the placebo group and increased PAF speed across the scalp from pre to post-gum, driven by changes at central-parietal and right-frontal regions.”

      We have kept our original comment of whether this effect on pain is meaningful in practice to refer to the minimum clinically or experimentally important differences in pain ratings as highlighted by Olsen et al., 2017.

      “While acknowledging the modest effect size, it’s essential to consider the broader context of our study’s focus. Assessing the clinical relevance of pain reduction is pertinent in applications involving the use of any intervention for pain management [69]. However, from a mechanistic standpoint, particularly in understanding the implications of and relation to PAF, the specific magnitude of the pain effect becomes less pivotal. Nevertheless, future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (7) In line with the topic of effect sizes, average effect sizes for PAF in the study cited in the manuscript range from around 1 Hz (Boord et al., 2008; Wydenkeller et al., 2009; Lim et al., 2016), to 2 Hz (Foulds et al., 1994), compared with changes of 0.06 Hz (Nicotine post - pre) or -0.01 Hz (Placebo post - pre). MIDs are not so clearly established for peak frequencies in EEG bands, but they should be certainly larger than some fractions of a Hertz (which is considerably below the reliability of the measurement).

      We appreciate your care of these nuances. We acknowledge the differences in effect sizes between our study and those referenced in the manuscript. Given the current state of the literature, it's noteworthy that ‘MIDs’ for peak frequencies in EEG bands, particularly PAF changes, are not clearly established, other than a recent publication suggesting that even small changes in PAF are reliable and meaningful (Furman et al., 2021). In light of this, we have addressed the uncertainty around the existence and determination of MIDs in our revision, highlighting the need for further research in this area.

      In addition, our study employed a greater frequency resolution (0.2 Hz) compared to some of the referenced studies, with approximately 0.5 Hz resolution (Boord et al., 2008; Wydenkeller et al., 2009; Foulds et al., 1994). This improved resolution allows for a more precise measurement of changes in PAF. Considering this, it is plausible that studies with lower resolution might have conflated increases in PAF, and our higher resolution contributes to a more accurate representation of the observed changes.

      We have also incorporated this insight into the manuscript, emphasising the methodological advancements in our study and their potential impact on the interpretation of PAF changes. Thank you for your thoughtful feedback.

      “The ability to detect changes in PAF can be considerably impacted by the frequency resolution used during Fourier Transformations, an element that is overlooked in recent methodological studies on PAF calculation [16,95]. Changes in PAF within individuals might be obscured or conflated by lower frequency resolutions, which should be considered further in future research.”

      (8) The authors also ran alternative statistical models to analyze the data and did not find consistent results in terms of PHP ratings (PAF modulation was still statistically significantly different). The authors attribute this to the necessity of controlling for covariates. Now, considering the effects sizes, aren't these statistically significant differences just artifacts stemming from the inclusion of too many covariates (Simmons et al., 2011)? How much influence should be attributable to depression and anxiety symptoms, stress, sleep quality and past pain, considering that these are healthy volunteers? Should these contrasting differences call the authors to question the robustness of the findings (i.e., whether the same data subjected to different analysis provides the same results), particularly when the results do not align with the preregistered hypothesis (PAF modulation should occur on sensorimotor ROIs)?

      Thank you for your comments on our alternative statistical models. By including these covariates, we aim to provide a more nuanced understanding of the complexities within our data by considering their potential impact on the effects of interest. The decision to include covariates was preregistered (apologies again that this was not available) and made with consideration of balancing model complexity and avoiding potential confounding. Moreover, we hope that the insights gained from these analyses will offer valuable information about the behaviour of our data and aid future research in terms of power calculations, expected variance, and study design.

      (9) Beyond that, I believe in some cases that the authors overreach in an attempt to provide explanations for their results. While I agree that sex might be a relevant covariate, I cannot say whether the authors are confirming a pre-registered hypothesis regarding the gender-specific correlation of PAF and pain, or if this is just a post hoc subgroup analysis. Given the large number of analyses performed (considering the main document and the supplementary files), caution should be exercised on the selective interpretation of those that align with the researchers' hypotheses.

      We chose to explore the influence of sex on the correlation between PAF and pain, because this has also been investigated in previous publications of the relationship (Furman et al., 2020).  We state that the assessment by sex is exploratory in our results on p.17: “in an exploratory analysis of separate correlations in males and females (Figure 5, plot C)”. For clarity regarding whether this was a pre-registered exploration or not, we have adjusted this to be: “in an exploratory analysis (not pre-registered) of separate correlations in males and females (Figure 5, plot C), akin to those conducted in previous research on this topic (Furman et al., 2020),

      We have made sure to state this in the discussion also. Therefore, when we previously said on p.22:

      “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7–11,15] was only observed here for male participants during the PHP model for global PAF.” We have now changed this to: “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7– 11,15] was only observed here for male participants during the PHP model for global PAF in an exploratory analysis.”

      Please also note that we altered the colour and shape of points on the correlation plot (Figure 5 in initial submission), the male brown was changed to a dark brown as we realised that the light brown colour was difficult to read. The shape was then changed for male points so that the two groups can be distinguished in grey-scale.

      Overall, your thoughtful feedback is instrumental in refining the interpretation of our findings, and we look forward to presenting a more comprehensive and nuanced discussion. Thank you for your comments.

      REFERENCES for responses to reviewer 3

      Arendt-Nielsen, L., & Yarnitsky, D. (2009). Experimental and clinical applications of quantitative sensory testing applied to skin, muscles and viscera. The Journal of Pain, 10(6), 556-572.

      Chowdhury, N. S., Skippen, P., Si, E., Chiang, A. K., Millard, S. K., Furman, A. J., ... & Seminowicz, D. A. (2023). The reliability of two prospective cortical biomarkers for pain: EEG peak alpha frequency and TMS corticomotor excitability. Journal of Neuroscience Methods, 385, 109766.

      Fishbain, D. A., Lewis, J. E., & Gao, J. (2013). Is There Significant Correlation between SelfReported Low Back Pain Visual Analogue Scores and Low Back Pain Scores Determined by Pressure Pain Induction Matching?. Pain practice, 13(5), 358-363.

      Furman, A. J., Prokhorenko, M., Keaser, M. L., Zhang, J., Chen, S., Mazaheri, A., & Seminowicz, D. A. (2021). Prolonged pain reliably slows peak alpha frequency by reducing fast alpha power.

      bioRxiv, 2021-07.

      Heitmann, H., Ávila, C. G., Nickel, M. M., Dinh, S. T., May, E. S., Tiemann, L., ... & Ploner, M. (2022). Longitudinal resting-state electroencephalography in patients with chronic pain undergoing interdisciplinary multimodal pain therapy. Pain, 163(9), e997.

      McLain, N. J., Yani, M. S., & Kutch, J. J. (2022). Analytic consistency and neural correlates of peak alpha frequency in the study of pain. Journal of neuroscience methods, 368, 109460.

      Ngernyam, N., Jensen, M. P., Arayawichanon, P., Auvichayapat, N., Tiamkao, S., Janjarasjitt, S., ... & Auvichayapat, P. (2015). The effects of transcranial direct current stimulation in patients with neuropathic pain from spinal cord injury. Clinical Neurophysiology, 126(2), 382-390.

      Parker, T., Huang, Y., Raghu, A. L., FitzGerald, J., Aziz, T. Z., & Green, A. L. (2021). Supraspinal effects of dorsal root ganglion stimulation in chronic pain patients. Neuromodulation: Technology at the Neural Interface, 24(4), 646-654.

      Petersen-Felix, S., & Arendt-Nielsen, L. (2002). From pain research to pain treatment: the role of human experimental pain models. Best Practice & Research Clinical Anaesthesiology, 16(4), 667680.

      Sarnthein, J., Stern, J., Aufenberg, C., Rousson, V., & Jeanmonod, D. (2006). Increased EEG power and slowed dominant frequency in patients with neurogenic pain. Brain, 129(1), 55-64.

      Sato, G., Osumi, M., & Morioka, S. (2017). Effects of wheelchair propulsion on neuropathic pain and resting electroencephalography after spinal cord injury. Journal of Rehabilitation Medicine, 49(2), 136-143.

      Sufianov, A. A., Shapkin, A. G., Sufianova, G. Z., Elishev, V. G., Barashin, D. A., Berdichevskii, V. B., & Churkin, S. V. (2014). Functional and metabolic changes in the brain in neuropathic pain syndrome against the background of chronic epidural electrostimulation of the spinal cord. Bulletin of experimental biology and medicine, 157(4), 462-465.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Amaral et al. presents a study investigating the mesoscale modelling and dynamics of bolalipids.

      Strengths:

      The figures in this paper are exceptional. Both those to outline and introduce the lipid types, but also the quality and resolution of the plots. The data held within also appears to be outstanding and of significant (hopefully) general interest.

      We thank the reviewer for their kind words and the appreciation of our work.

      Weaknesses:

      In the introduction, I would like to have read more specifics on the biological role of bolalipids. Archaea are mentioned, but this kingdom is huge - there must be specific species that can be discussed where bolalipids are integral to archaeal life. The authors should go beyond ’extremophiles’. In short, they should unpack why the general audience should be interested in these lipids, within a subset of organisms that are often forgotten about.

      Following the reviewer’s advice we have revised the introduction of the manuscript, in which we now discuss specific species (Sulfolobus acidocaldarius and Thermococcus kodakarensis) and how in these species bolalipids are integral to archaeal life. We explain that the ratio between bilayer and bolalipids, and the number of cyclopentane rings contained within bolalipids can change to adapt to the environment. The revised parts of the introduction read (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to understand the biophysical properties of archeal membranes made of bolalipids. Bacterial and eukaryotic membranes are made of lipids that self-assemble into bilayers. Archea, instead, use bolalipids, lipids that have two headgroups and can span the entire bilayer. The authors wanted to determine if the unique characteristics of archaea, which are often extremophiles, are in part due to the fact that their membranes contain bolalipids.

      The authors develop a minimal computational model to compare the biophysics of bilayers made of lipids, bolalipids, and mixtures of the two. Their model enables them to determine essential parameters such as bilayer phase diagrams, mechanical moduli, and the bilayer behaviour upon cargo inclusion and remodelling.

      The author demonstrates that bolalipid bilayers behave as binary mixtures, containing bolalipids organized either in a straight conformation, spanning the entire bilayer, or in a u-shaped one, confined to a single leaflet. This dynamic mixture allows bolalipid bilayers to be very sturdy but also provides remodelling. However, remodelling is energetically more expensive than with standard lipids. The authors speculate that this might be why lipids were more abundant in the evolutionary process. Strengths:

      This is a wonderful paper, a very fine piece of scholarship. It is interesting from the point of view of biology, biophysics, and material science. The authors mastered the modelling and analysis of these complex systems. The evidence for their findings is really strong and complete. The paper is written superbly, the language is precise and the reading experience is very pleasant. The plots are very well-thought-out.

      Weaknesses:

      I would not talk about weaknesses, because this is really a nice paper. If I really had to find one, I would have liked to see some clear predictions of the model expressed in such a way that experimentalists could design validation experiments.

      We thank the reviewer for their very kind assessment. We incorporated their recommendations regarding experimental validation in the discussion section, as follows (p.14):

      “Our model makes a number of predictions that could be tested by experiment either in cells or in vitro. First, it predicts that a small increase in the fraction of archaeal bilayer lipids should be sufficient to soften a bolalipid-rich membrane. While this could be tested in the future, so far only very few studies have yet reported experimental analysis of archaeal membrane mixtures [18, 50]. Second, we observed that membranes with moderate bolalipid molecular rigidity k<sub>bola</sub> exhibit curvature-dependent bending rigidity. To experimentally verify this, one could extrude membrane tethers from cells while controlling for membrane tension. Finally, to get to the core mechanism underlying our findings, it will be important to develop experimental methods that will allow the fraction of U-shaped bolalipid conformers per leaflet to be imaged and measured.”

      Reviewer #3 (Public review):

      Summary:

      The authors have studied the mechanics of bolalipid and archaeal mixed-lipid membranes via comprehensive molecular dynamics simulations. The Cooke-Deserno 3-bead-per-lipid model is extended to bolalipids with 6 beads. Phase diagrams, bending rigidity, mechanical stability of curved membranes, and cargo uptake are studied. Effects such as the formation of U-shaped bolalipids, pore formation in highly curved regions, and changes in membrane rigidity are studied and discussed. The main aim has been to show how the mixture of bolalipids and regular bilayer lipids in archaeal membrane models enhances the fluidity and stability of these membranes.

      Strengths:

      The authors have presented a wide range of simulation results for different membrane conditions and conformations. For the most part, the analyses and their results are presented clearly and concisely. Figures, supplementary information, and movies very well present what has been studied. The manuscript is well-written and is easy to follow.

      We thank the reviewer for the detailed assessment of our work and their constructive feedback.

      Major issues

      R3.Q1: The Cooke-Deserno model, while very powerful for biophysical analysis of membranes at the mesoscale, is very much void of chemical information. It is parametrized such that it is good in producing fluid membranes and predicting values for bending rigidity, compressibility, and even thermalexpansioncoefficientfallingintheacceptedrangeofvaluesforbilayermembranes. But it still represents a generic membrane. Now, the authors have suggested a similar model for the archaeal bolalipids, which have chemically different lipids (the presence of cyclopentane rings for one), and there is no good justification for using the same pairwise interactions between their representative beads in the coarse-grained model. This does not necessarily diminish the worth of all the authors’ analyses. What is at risk here is the confusion between ”what we observe this model of bolalipidor mixed-membranes do” and ”how real bolalipid-containing archaeal membranes behave at these mechanical and thermal conditions.”.

      As the reviewer correctly notes, Cooke and Deserno used a minimal model, devoid of chemical detail, to represent fluid lipid membranes composed of bilayer lipids. Indeed archaeal lipids are chemically different compared to non-archaeal lipids, but just like non-archaeal lipids, they can be very different from one another. Given the chemical diversity of bolalipids between each other, instead of representing their complexity in a complicated model with many experimentally unconstrained parameters, we here defined a minimal model for bolalipids. The power of this minimal model is to represent the key physical/geometrical characteristics of archaeal membranes, namely the fact that lipid heads on two sides of the membrane are often connected, that bolalipids can exhibit a conformational change, and that bolalipids mix with some percentage of bilayer molecules. We then ask a general question: how do these unique geometrical characteristics of archaeal membranes influence their mechanics and reshaping? The reviewer is however right in pointing out that a model, regardless of its level of details (atomistic, coarse-grained, minimal), is still a model.

      Our approach of extending an established coarse-grained model for bilayer lipids to bolalipids is further supported by experimental observations, which report that archaeal bilayer lipids can form membranes of comparable bending rigidity to those of non-archaeal bilayer membranes [53]. Hence, different lipid linkages (archaeal vs. non-archaeal) give rise to fluid, deformable membranes of not too dissimilar rigidities, suggesting that both archaeal and non-archaeal bilayer lipids can be represented by a similar minimal coarse-grained model for the purpose of mesoscopic biophysical investigations. Since archaeal bolalipids have the same core chemical structure as two archaeal bilayer lipids joined by their tail ends, similarly we model a bolalipid by joining two bilayer lipids. Such an approach also efficiently enables us to compare bolalipid with bilayer membranes, and connect to the large body of knowledge on the physics of bilayer membranes.

      To conclude, our coarse-grained model is indeed intended to capture the main physical properties of bolalipid membranes, and not their chemical diversity.

      R3.Q2: Another more specific, major issue has to do with using the Hamm-Kozlov model for fitting the power spectrum of thermal undulations. The 1/q<sup>2</sup> term can very well be attributed to membrane tension. While a barostat is indeed used, have the authors made absolutely sure that the deviation from 1/q<sup>4</sup> behaviour does not correspond to lateral tension?

      To the casual observer, any 1/q<sup>2</sup> trend might point at membrane tension. However, the precise functional form is relevant as it determines whether the 1/q<sup>2</sup> dominates the 1/q<sup>4</sup> trend for small or large values of the wave number q in the fitted power spectrum.

      The first model (including lipid tilt) exhibits the functional form 1/(kq<sup>4</sup>) + 1/(kq<sup>2</sup>). In contrast, the second model (including membrane tension) exhibits the functional form 1/(kq<sup>4</sup> + ∑q<sup>2</sup>). Importantly, the two models obey a different functional form. Here k and k<sub>θ</sub>, are the bending and tilt moduli, which are assumed positive, and ∑ is the membrane tension, which can be either positive or negative. For the first model (with tilt), while for small q the amplitude is proportional to q<sup>-4</sup>, for large q the amplitude is proportional to q<sup>-2</sup>. In contrast, for the second model (with positive tension) while for small q the amplitude is proportional to q<sup>-2</sup>, for large q the amplitude is proportional to q<sup>-4</sup>. If membrane tension were to be negative in the second model, the slope would cross from negative infinity for small q to -4 for large q. The functional dependencies are summarized in Author response image 1A.

      For rigid bolalipid membranes, it is clearly visible that the slope of the power spectrum plotted against the wave number q decreases with increasing q (Author response image 1B). While the slope initially assumes a value close to 4, it gradually approaches 2 for larger values of q. We conclude that only the model including lipid tilt can fit the power spectrum of membrane fluctuations appropriately (solid-dashed line), whereas the model with tension fails to fit the data (dashed line). We note that the combined model containing both lipid tilt and membrane tension does not give a better fit (dotted line).

      To demonstrate that the tension model cannot fit the data, we included the best fits for both models for rigid bolalipid membranes in the new SI section 16 (p. S22) and show that only the tilt model leads to acceptable fits. We also measured the projected membrane tension - , where P<sub>x</sub>,P<sub>y</sub> are respectively the pressure in x and y direction and  L<sub>z</sub> is the dimension of the simulation box in z axis. We found the projected membrane tension to give a negligible value similarly to the one that we indirectly measured by fitting a combined model with both tension and tilt, further confirming our conjecture.

      Author response image 1.

      (A) Schematic showing the decay of the power spectrum as a function of the wave number q in the tilt model (top), in the tension model with positive membrane tension (middle), and in the tension model with negative membrane tension (bottom). (B) Fitted power spectrum as a function of q for rigid bolalipid membranes (k<sub>bola</sub>=5k<sub>B</sub>T). The fit shows that while the model with tension (dashed line) cannot fit the data, the model with tilt nicely fits the spectrum (solid-dashed line). The combined model including both tension and tilt does not fit the spectrum any better (dotted line).

      R3.Q3: I got more worried when I noticed in the SI that the simulations had been done with combined ”fix langevin” and ”fix nph” LAMMPS commands. This combination does not result in a proper isothermal-isobaric ensemble. The importance of tilt terms for bolalipids is indeed very interesting, but I believe more care is needed to establish that.

      In what follows, we show that there is no reason to worry. First of all we want to clarify that the physical setup we simulate is that of a membrane contained in a heat bath under negligible tension with correct diffusional dynamics. To achieve this physical setup, for which we use a Langevin thermostat combined with pressure control via an overdamped barostat, which we implement in LAMMPS by combining ”fix langevin” and ”fix nph”.

      In more detail: we simulated particles in an implicit solvent, for which we use a Langevin thermostat to get the right diffusional dynamics. To apply the theory of fitting fluctuation spectrums the simulation box length needs to be (near) constant. However, simulating membranes at a fixed box size results in an average non-zero membrane tension, making it hard to measure bending rigidity. The reason is that the effect of membrane tension is most influential on the largest wavelength modes, which are also most decisive when determining mechanical membrane properties like membrane rigidity. To minimize the effect of tension, we perform our simulation with an overdamped barostat (𝜏<sub>baro</sub> = 10 𝜏 <sub>langevin</sub>), which keeps the membrane near tensionless, as also done before [32]. In the revised manuscript, we have clarified the statement on the physical ensemble used (p.S2):

      “For simulating flat membrane patches of bolalipids, we combined the previously used Langevin thermostat with relaxation time of 1𝜏 with a Nosé–Hoover barostat with relaxation time of 10𝜏. In LAMMPS this amounts to combining the commands ’fix langevin’ with ’fix nph’. We configured the barostat to set lateral pressure P<sub>xy</sub> to zero by re-scaling the simulation box in the x-y plane. We compare this setup to a fixed box length setup, and an NPT ensemble setup, in SI section 17.”

      To connect our results with statistical mechanics ensemble theory we tested alternative setups. Similar setups, including the formal isothermal-isobaric ensemble, where N,P,T are kept constant using Nose-Hoover style equations for thermostating and barostating with modern corrections [34], which the reviewer refers to, result in very similar fluctuation spectrums. Consequently, our measurements of bending and tilt modulus hold true regardless of the integration scheme. However, such a setup does not correctly capture implicit solvent and diffusional dynamics.

      In even more detail: we tested our setup (implemented via ”fix langevin”+”fix nph”) versus a isothermal-isobaric ensemble (implemented via ”fix npt”). We measured volume mean and standard deviation, and found them matching for a reference LJ gas.

      To be completely sure, and to please the reviewer, we have performed additional verifications in the new SI section 17, which we summarize in the following. We simulated three representative membranes with different integration schemes: ”fix npt”, ”fix langevin”+”fix nph”, and ”fix langevin” (Langevin dynamics with projected area fixed at the average value obtained from a ”langevin+nph”). We checked that the ”fix nph” barostat is merely equilibrating the membrane to a tensionless configuration, after which the projected membrane area (A<sub>p</sub> = L<sub>x</sub>L<sub<y</sub>) is practically constant. Consequently, the different schemes resulted in minor changes in the longest wavelength modes that we tracked down to small changes in the negligible tension. The resulting measurements of bending modulus change by less than 10%, and our main text conclusions do not change. Author response image 2 compares the fluctuation spectrums for the different integration schemes.

      Author response image 2.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, simulated with Langevin dynamics (pink, ‘langevin‘), our setup (purple, ‘nph+langevin‘), and under an isothermal-isobaric ensemble (blue, ‘npt‘); fits are shown as dotted lines.

      R3.Q4: This issue is reinforced when considering Figure 3B. These results suggest that increasing the fraction of regular lipids increases the tilt modulus, with the maximum value achieved for a normal Cooke-Deserno bilayer void of bolalipids. But this is contradictory. For these bilayers, we don’t need the tilt modulus in the first place.

      We understand the concern why this might be counter-intuitive, and we thank the reviewer for pointing it out. We first want to stress that the tilt modulus can also be measured for bilayer membranes even if it is not needed to fit the fluctuation spectrum. If we measure the tilt modulus for a bilayer membrane, we obtain a value similar to the previously measured one [36]. Importantly, here we also report measurements for the tilt modulus for bolalipid membranes.

      To understand the seemingly contradictory behaviour of the tilt modulus, it is insightful to rewrite the expression for the fluctuation spectrum as done in Eq. (1):

      where is a characteristic length scale related to tilt, which we call the tilt persistence length. From the last equation it is easy to see that the tilt modulus 𝜅<sub>𝜃</sub> becomes relevant for the fluctuation spectrum if the tilt persistence length l<sub>𝜃</sub>  is not negligible. In other words, this means that we have to consider the tilt modulus 𝜅<sub>𝜃</sub> as relevant, if it is sufficiently small compared to the bending rigidity 𝜅.

      However, this is not only counter-intuitive, but also difficult to communicate graphically. Per the excellent reviewer’s suggestion, to make the interpretation more accessible, we converted in the main text and its figures the tilt modulus to the more directly interpretable tilt persistence length l<sub>𝜃</sub>, as this is small when tilt is irrelevant (for bilayer lipids and flexible bolalipids) and large otherwise (for rigid bolalipids). This includes changes to the main text on p.6 and p.8 , and to the insets in Figs. 2C and 3B. We note that for completeness we also report the tilt modulus 𝜅<sub>𝜃</sub>  in the SI.

      R3.Q5: Also, from the SI, I gathered that the authors have neglected the longest wavelength mode because it is not equilibrated. If this is indeed the case, it is a dangerous thing to do, because with a small membrane patch, this mode can very well change the general trend of the power spectrum. As a lot of other analyses in the manuscript rely on these measurements, I believe more elaboration is in order.

      We thank the reviewer for the careful examination of our supplementary material. For each fluctuation spectrum measurement, we ran multiple replicas. We observed that the largest wavelength modes were not fully equilibrated. In the simulations the first mode of the fluctuation spectrum is probed at different amplitudes and phases. We thus expected the potential systematic error would show up clearly when comparing spectrums of the different replicas. As we saw no correlation in these systematic offsets between replicas, we concluded that the simulations are sufficiently equilibrated and we could safely exclude the first mode of the fluctuation spectrum from our analysis.

      To show without doubt that this procedure does not randomly bias our results, we also ran simulations for three representative membranes until all modes were equilibrated. On the modes previously equilibrated, the resulting spectrums agree with our previous shorter simulations. On the largest wavelength modes that were previously not fully equilibrated, we noticed a small deviation from theory, specifically for flexible membranes (small bending modulus). These small deviations can be explained by including a negligible negative tension. Importantly, however, the resulting bending modulus σ stays nearly the same. We note that the small negative tension disappears when we halve the timestep (see Author response image 3). This verification is shown in SI section 17.

      R3.Q6: The authors have found that ”there is a strong dependency of the bending rigidity on the membrane mean curvature of stiffer bolalipids.” The effect is negative, with the membrane becoming less stiff at higher mean curvatures. Why is that? I would assume that with more flexible bolalipids, the possibility of reorganization into U-shaped chains should affect the bending rigidity more (as Figure 2E suggests). While for a stiff bolalipid, not much would change if you increase the mean curvature. This should be either a tilt effect, or have to do with asymmetry between the leaflets. But on the other hand, the tilt modulus is shown to decrease with increasing bolalipid rigidity. The authors get back to this issue only on page 10, when they consider U-shaped lipids in the inner and outer leaflets and write, ”this suggested that an additional membrane-curving mechanism must be involved.” But then again, in the Discussion, the authors write, ”It is striking that membranes made from stiffer bolalipids showed a curvature-dependent bending modulus, which is a clear signature that bolalipid membranes exhibit plastic behaviour during membrane reshaping,” adding to the confusion.

      Author response image 3.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, as simulated in the main text (grey, for 60⇥10<sup>3</sup>τ), for longer duration (1_.44⇥10<sup>6</sup>τ) (pink), and with the longer duration and halved timestep =0.005_τ(purple); fits are shown as dotted lines (tension and tilt) or dash-dot lines (tilt only).

      We thank the reviewer for asking this important question. Membrane bending rigidity in bolalipid membranes decreases dramatically once a small fraction of U-shapes is allowed to form, but then plateaus once this U-shape fraction reaches 20%. In a curved bolalipid membrane, U-shapes must accumulate in the outer leaflet to accommodate for area difference. Together, the bending rigidity non-linear dependence on U-shape fraction, and the promotion of U-shapes by curvature, explain why in a membrane made of moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T), which contain very few U-shapes in the flatstate, the bending rigidity of the membrane decreases as curvature increases. While in a membrane made of flexible bolalipid molecules (k<sub>bola</sub> = 0), where many U-shapes are present in the flat membrane, the bending rigidity does not change with curvature.

      Bending rigidity 𝜅 in flat membranes composed of bolalipids decreases dramatically once a small fraction of U-shapes is allowed to form, but plateaus once more than 20% of U-shaped bolalipids are present. In details, our data shows that with an increasing bolalipid molecular rigidity k<sub>bola</sub>, both the number of U-shaped bolalipids decreases (Fig. 2B) and the membrane rigidity 𝜅 increases (Fig. 2C). Thus, the correlation suggests that U-shaped bolalipids soften the membrane, in a non-linear way where most of the change in membrane bending rigidity happens for U-shaped bolalipid fraction < 20% (Figure S11).

      Separately, membrane curvature affects the area difference between curved membrane leaflets and thus drives U-shape accumulation. To be specific, a cylindrical membrane with area A, mean curvature H and thickness h has the outer leaflet with area A(1 + Hh) and the inner leaflet with smaller area A(1 Hh). This can be large, in our simulations up to an area change of Hh \= 25%. For pure bolalipid membranes, straight bolalipids occupy the same space in each leaflet. Area difference can then be achieved only by having a different amount of U-shaped bolalipids in each leaflet, which can result in a different U-shape fraction between leaflets and thus ’asymmetry between leaflets’. Figure S10 confirms U-shape head fraction asymmetry that increases with curvature, for both flexible (k<sub>bola</sub> = 0) and moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T).

      Together, these two effects result in membrane softening under curvature for the moderately stiff bolalipids, but constant rigidity for flexible bolalipids (Fig. 2F). In details: for membranes composed of moderately stiff bolalipid molecules (k<sub>bola</sub> = 1k<sub>B</sub>T), the U-shape bolalipid head fraction only increases in the outer leaflet, goingfrom10to20%(Figure S10). This is in the high sensitivity region where the bending rigidity is expected to change the most (Figure S11). We hypothesize that the molecular rigidity of a U-shaped bolalipid creates compression on the outer leaflet that stabilizes the membrane curvature and thus causes membrane softening. We suspect that for membranes composed of rigid bolalipids (k<sub></sub> > 1k<sub>B</sub>T), the effect is likely not present due to the absence of U-shape formation even under strong bending.

      By contrast, for membranes composed of flexible bolalipids (k<sub></sub> = 0), the U-shaped bolalipid head fraction changes relatively little from its value for flat membranes (from 50% to respectively 60 and 40% for the outer and inner leaflet, Figure S10). This is in the region where the membrane bending rigidity is expected to respond weakly to U-shape fraction (Figure S11). Additionally, the change is symmetric, so presumably the outer leaflet becomes softer as the inner leaflet becomes stiffer, thus creating opposing effects and only weakly affecting the membrane bending rigidity as a whole. We note that the distinction between the U-shape head fraction that we plot (Figure S10) and U-shape fraction (Figure S11) matters little for this analysis.

      We have added this deduction and its plots to SI section 8, and revised the corresponding statement in the main text accordingly (p.7 ).

      “Changing membrane curvature alters the area differently in the two membrane leaflets. To adapt to the area difference, we thus expect the fraction of U-shaped bolalipids to change as the membrane curvature changes. Moreover, the results of Fig. 2B and Fig. 2C showed that the U-shaped bolalipid fraction and the membrane bending rigidity are correlated. As a result, we predict that the fraction of straight versus U-shaped bolalipids in a membrane will change in response to membrane bending, in a way that makes the bending rigidity of a bolalipid membrane curvature dependent.”

      R3.Q7: This issue is repeated when the authors study nanoparticle uptake. They write: ”to reconcile these seemingly conflicting observations we reason that the bending rigidity, similar to Figure 2F, is not constant but softens upon increasing membrane curvature, due to dynamic change in the ratio between bolalipids in straight and U-shaped conformation. Hence, bolalipid membranes show stroking plastic behaviour as they soften during reshaping.” But the softening effect that they refer to, as shown in Figure 4B, occurs for very stiff bolalipids, for which not much switching to U-shaped conformation should occur.

      We thank the reviewer for locating a particularly dense sentence. We changed the text to explicitly refer to the range k<sub></sub> 2 [0,2] k<sub>B</sub>T for which there is significant change in U-shape fraction (p.8 ):

      “To reconcile these seemingly conflicting observations we reason that the bending rigidity κ, similar to Fig. 2F, is not constant but softens in the range k<sub></sub> 2 [0,2] k<sub>B</sub>T, upon increasing membrane curvature. This is due to the dynamic change in the ratio between bolalipids in straight and U-shaped conformation.”

      As for Fig. 4B, for k<sub></sub> > 2k<sub>B</sub>T, pores form thus explaining the plateau in adsorption energy.

      R3.Q8: Another major issue is with what the authors refer to as the ”effective temperature”. While plotting phase diagrams for kT/eps value is absolutely valid, I’m not a fan of calling this effective temperature. It is a dimensionless quantity that scales linearly with temperature, but is not a temperature. It is usually called a ”reduced temperature”. Then the authors refer to their findings as studying the stability of archaeal membranes at high temperatures. I have to disagree because eps is not the only potential parameter in the simulations (there are at least space exclusion and angle-bending stiffnesses) so one cannot identify changing eps with changing the global simulation temperature. This only works when you have one potential parameter, like an LJ gas.

      We indeed thought about this before and found that it makes little difference in our set-up. To thoroughly show that the distinction matters very little, per reviewer’s question, we computed our phase diagrams by scaling temperature T explicitly (and not lipid tail interactions T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>). We added these results to the SI section 14 and found no significant difference when comparing scaling tail interactions (Figure S15A) with scaling temperature explicitly (Figure S15B).

      We also computed Fig. 2A-C for scaling interactions (Figure S17A) and scaling temperature explicitly (Figure S17B). We found a slightly increased U-shaped bolalipid fraction for low k<sub></sub> when comparing scaling interactions (Figure S17A) with temperature scaling (Figure S17B). The reason is that the U-shaped fraction depends on temperature, as with higher temperature bolalipids can easier transition into the U-shape. Most importantly, however, we found no qualitative changes on the liquid region or the mechanical membrane properties when we compared the different scaling variants.

      The reason why both scaling variants match so well can be understood easily. All pair potentials, including volume exclusion interactions between head beads and other membrane beads, were also scaled in the same manner as tail-to-tail interactions, as described in the SI. In contrast, the energy scales for maintaining the lipid bonds, the bilayer lipid angles and the bolalipid angles are relatively large compared to the energy scales involved in tail-to-tail interactions. This separation of energy scales guarantees that there will be little effect when increasing global temperature. Regarding nomenclature, we take the reviewer’s advice and have added ’reduced temperature’ as an alias for T<sub>eff</sub> in the main text.

      In the revised version of the manuscript, we mention these observations in the SI section 14 and point towards these results in the main text (p.4 ):

      “This interaction strength governs the membrane phase behaviour and can be interpreted as the effective temperature or reduced temperature T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>. As the distinction between scaling interactions (T<sub>eff</sub>) or temperature (T) is not important for our analysis (see Supplemental Information (SI) section 14), for simplicity we refer to T<sub>eff</sub> as temperature in the following.”

      Minor issues

      R3.Q9: As the authors have noted, the fact that the membrane curvature can change the ratio of U-shaped to straight bolalipids would render the curvature elasticity non-linear (though the term ”plastic” should not be used, as this is still structurally reversible when the stress is removed. Technically, it is hypoelastic behaviour, possibly with hysteresis.) With this in mind, when the authors use essentially linear elastic models for fluctuation analysis, they should make a comparison of maximum curvatures occurring in simulations with a range that causes significant changes in bolalipid conformational ratios.

      We thank the reviewer for their suggestion on calling the non-linear behaviour of the curvature elasticity hypoelastic. We have edited the main text accordingly (p.8 ):

      “In an elastic material, the strain modulus holds constant and deformation is reversible. For bolalipid membranes at k<sub></sub> = 1k<sub>B</sub>T, however, the bending modulus decreases when deformation increases, rendering bolalipid membranes hypoelastic.”

      Moreover, regarding the maximum curvatures occurring in the fluctuation simulations: We first note that the ensemble average of the mean curvature H from the fluctuation measurements is indicated as a vertical line in Fig. 2F. As the average value is nearly zero, the membrane can be considered as flat in good approximation. To investigate the question in more detail, we extended the SI with a careful analysis of the validity of the maximum membrane curvature and the validity of the Monge gauge approximation (SI section 15).

      In short, we found that the involved membrane curvatures are small and therefore are unlikely to trigger any significant changes of the bending modulus. Moreover, since we are dealing with two bolalipid conformations, we also tested the homogeneity of the membrane. In our simulations of flat membrane patches we did not observe clustering or phase separation between the two bolalipid conformations beyond the [2,3]σ range. Furthermore, we get good agreement between our fluctuation measurement and the cylinder simulations in Fig. 2F. We now mention this verification in the revised version of the manuscript (p.8 ):

      “Fortunately, this dependency on curvature does not invalidate our fluctuation results, where the curvature is small enough that its effect on the bending modulus is negligible (SI section 15).”

      Last but least, simulating bending/unbending cycles of an arc-shaped membrane (frozen endpoints) shows agreement with cylinder membrane simulations, and no hysteresis at the rates of deformation employed (cf. M. Amaral’s thesis [54], soon to be out of the embargo period).

      R3.Q10: The Introduction section of the manuscript is written with a biochemical approach, with very minor attention to the simulation works on this system. Some molecular dynamics works are only cited as existing previous work, without mentioning what has already been studied in archaeal membranes. While some information, like the binding of ESCRT proteins to archaeal membranes, though interesting, helps little to place the study within the discipline. The Introduction should be revised to show what has already been studied with simulations (as the authors mention in the Discussion) and how the presented research complements it.

      The present research for the first time covers archaeal membranes with a single coarse-grained model capable of assuming both bolalipid in-membrane conformations and sweeps through temperature, membrane composition, and molecular rigidity. The work shows the first curvature dependent bending modulus for pure bolalipid membranes. It also investigates systematically bending modulus and Gaussian modulus, and tests the model in an all-encompassing budding simulation that incorporates topology changes. Existing atomistic or coarse-grained MD simulations (MARTINI or similar force fields) are limited to small patches of membrane, with no study of large-scale deformations or topology changes; plus, they rely on force fields that were parametrized for bilayer membranes.

      To give a comprehensive overview of the field, we revised the introduction section of the manuscript, in which we now discuss previous computational work investigating membrane diffusivity, U-shaped lipid fraction, and bending rigidity (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      Following the reviewer’s advice and to keep the introduction concise and focused on bolalipid membranes, we have removed the paragraph on ESCRT-III proteins in the revised manuscript.

      R3.Q11: The authors have been a bit loose with using the term ”stability”. I’d like to see the distinction in each case, as in ”chemical/thermal/mechanical/conformational stability”.

      We have clarified when applicable the type of stability throughout the manuscript. In all other instances, if not clear from context, we mean simply that the membrane persists being a membrane. At our coarse-grained level, this means the membrane does not disassemble into a gas phase.

      R3.Q12: In the original Cooke-Deserno model, a so-called ”poorman’s angle-bending term” is used, which is essentially a bond-stretching term between the first and third particle. However, I notice the authors using the full harmonic angle-bending potential. This should be mentioned.

      This is made clear in the SI (Eq. (S3)). Cooke and Deserno mention the harmonic angle potential as a valid alternative in their original publication. We now also added this detail to the main text (p.3 ):

      “The angle formed by the chain of three beads is kept near 180° via an angular potential with strength k<sub>0</sub>, instead of the approximation by a bond between end beads of the original model [32].”

      R3.Q13: The analysis of energy of U-shaped lipids with the linear model E \= c<sub>0</sub> + c<sub>1</sub>k<sub></sub> is indeed very interesting. I am curious, can this also be corroborated with mean energy measurements? The minor issue is calling the source of the favorability of U-shaped lipids ”entropic”, while clearly an energetic contribution is found. The two conformations, for example, might differ in the interactions with the neighbouring lipids.

      We were also curious and thank the reviewer for the suggestion of mean energy measurements. We concluded that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids. We have now included these measurements in SI section 6 (p.S5 ):

      “By splitting the average potential energy between an internal contribution (bonds, angles and pair interactions between particles in the same molecule) and an external contribution (pair interactions between a molecule and its neighbours), we determined the transition energy from straight to U-shaped bolalipids in detail. We found that this transition lowers the internal potential energy of the bolalipid while increasing its interaction energy. In total, we obtained an energy barrier for the transition of ΔE<sub>s→u</sub> = 0.79±0.01k<sub>B</sub>T. Since the fit indicates, however, that the U-shaped bolalipid conformation is preferred over the straight conformation, we conclude that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids.”

      We refer to these measurements in the main text (p.6 ):

      “For the fit it appears that c<sub>0</sub> < 0, which implies that bolalipids in U-shape conformation are slightly favoured over straight bolalipids at k<sub></sub> = 0 (explored in SI section 6).”

      R3.Q14: The authors write in the Discussion, ”In any case, our results indicate that membrane remodelling, such as membrane fission during membrane traffic, is much more difficult in bolalipid membranes [34].” Firstly, I’m not sure if studying the dependence of budding behaviour on adhesion energy with nanoparticles is enough to make claims about membrane fission. Secondly, why is the 2015 paper by Markus Deserno cited here?

      We thank the reviewer for giving us the opportunity to clarify. We make an energetic argument on membrane fission based on the observed difference in the ratio of .

      Splitting a spherical membrane vesicle into two spherical vesicles (fission) increases the bending energy by 8𝜋𝜅 and decreases the energy related to the Gaussian bending modulus by . The second part of the argument is given for example in the review by Markus Deserno (p.23, right column), that’s why we cite the paper here. Together, this gives an energy barrier, required for membrane fission in the considered geometry of ∆E<sub>fission</sub> = . We found that is around 0.5 for bolalipid membranes and around 1 for bilayer membranes. Since 𝜅 was typically larger in bolalipid membranes we thus expect the energy barrier for fission ∆E<sub>fission</sub> to be larger for bolalipid membranes. We therefore predict that membrane remodelling, such as membrane fission during membrane trafficking, is harder in bolalipid membranes. We explain our reasoning in the discussion of the revised manuscript (p.13 ):

      “Membrane remodelling, such as the fission of one spherical vesicle into two, increases the bending energy by 8πκ but decreases the energy related to the Gaussian modulus by – [39], giving rise to a fission energy barrier of ∆E<sub>fission</sub> = . Our results indicated that while in bolalipid membranes 𝜅 is larger, is smaller compared to bilayer membranes. Our results thus predict a larger energy barrier for membrane fission ∆E<sub>fission</sub> in bolalipid membranes compared to bilayer membranes.”

      R3.Q15: In the SI, where the measurement of the diffusion coefficient is discussed, the expression for D is missing the power 2 of displacement.

      We thank the reviewer for spotting this oversight. We corrected it in the revised version of the SI (p.S5 ).

      R3.Q16: Where cargo uptake is discussed, the term ”adsorption energy” is used. I think the more appropriate term would be ”adhesion energy”.

      For the sake of simplicity, we changed the term to adhesion energy (caption of Fig. 4, and p.10). We do not have a strong opinion on this, but we believe that adsorption energy would be equally correct as we describe the adsorption of many lipid head beads to a nanoparticle.

      R3.Q17: Typos:

      Page 1, paragraph 2: Adaption → Adaptation. Page 10, paragraph 1: Stroking → Striking.

      We thank the reviewer for spotting these typos which we have corrected in the revised version of the manuscript.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      A few thoughts (likely out of the scope of this paper but possibly to consider upon revision):

      R1.Q1: Do bolalipids always have the same headgroup? I don’t recall reading this in the introduction/discussion. R1 and R2 are in Figure 1, but I don’t know whether there are standard types. Could this be expanded upon? Is the model able to take these differences into account?

      We thank the reviewer for raising this important question. Similar to bacteria and eukaryotes, in archaea there is a huge variety in terms of the different head groups that lipids can contain and thus also lipid variety. Most archaeal lipids have head groups that contain either phosphate groups or sugar residues. Typically, archaeal bolalipids are asymmetric and contain a phosphatidyl and a sugar moiety at the two ends of the lipid molecule. Within the membrane the lipid is oriented such that the phosphatidyl moiety points towards the interior of the cell whereas the sugar moiety points towards the outside of the cell as it occupies more space [5].

      In our computational model, however, we consider symmetric bolalipids for the sake of simplicity and to decouple the role of ”connected geometry” from other effects. In principle, we could investigate the effect of lipid asymmetry by increasing the size of one of the lipid head beads. However, this investigation exceeds the scope of the present study and therefore requires future work.

      In the revised version of the manuscript, we now clarify that bolalipids can have different headgroups (p.1 and the caption of Fig. 1):

      “The hydrophilic heads can be composed of different functional groups with phosphatidyl and sugar being the most relevant moieties. For bolalipids the two head groups at either end of the molecule are typically distinct (Fig. 1A right) [5].”

      “The hydrophilic head of a bolalipid can be composed of different functional groups represented by R1 and R2 (right).”

      We also explicitly state that we neglect lipid head group asymmetry for the sake of simplicity (p.4 ):

      “To decouple the effect of the connected geometry of the bolalipids from that of lipid asymmetry, we assume both head beads of a bolalipid to share the same properties.”

      R1.Q2: Is it possible to compare the mesoscale models to either Coarse-grained or even all-atom lipid models? Have simulations previously been performed for bolalipids at those levels of description?

      A few studies have investigated bolalipids membranes in simulations previously. These studies either used all-atom or coarse-grained simulations. However, none of these studies investigated how bolalipids respond to membrane deformations. Therefore, it is currently not possible to directly compare our results to studies in the literature. However, to recapitulate our predictions experimentally is certainly something that could and should be done in the future. As a reply to this reviewer and reviewer 3, we discuss the current state of modelling bolalipid membranes in simulations in the revised version of the manuscript (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      We want to mention, however, that we do compare membrane diffusivity, U-shaped lipid fraction, and bending rigidity to the behaviour and values that have been previously measured in simulations in the discussion section. In general, we find good agreement between our results and previously reported behaviour/values (p.13 ):

      “While flexible bolalipid membranes are liquid under the same conditions as bilayer membranes, we found that stiff bolalipids form membranes that operate in the liquid regime at higher temperatures. These results agree well with previous molecular dynamics simulations that suggested that bolalipid membranes are more ordered and have a reduced diffusivity compared to bilayer membranes [24, 29]. In our simulations, this is due to the fact that completely flexible bolalipids molecules adopt both straight (transmembrane) as well as the U-shaped (loop) conformation with approximately the same frequency. In contrast, stiff bolalipids typically only take on the straight conformation when assembled in a membrane. These results agree with the previous coarse-grained molecular dynamics simulations using the MARTINI force field which showed that the ratio of straight to U-shaped bolalipids increased upon stiffening the linker between the lipid tails [29].

      [...]

      When we determined the bending rigidity of bolalipid membranes by measuring their response to thermal fluctuations, we found that membranes made from flexible bolalipids are only slightly more rigid than bilayer membranes. This result is consistent with previous atomistic simulations, which showed that the membrane rigidity was similar for membranes composed of bilayer lipids and flexible synthetic bolalipids [45].”

      R1.Q3: How would membrane proteins alter the behaviour of bolalipids? Either those integral to the membrane or those binding peripherally?

      The reviewer asks an important question. However, the question is difficult to answer due to its scope and the gaps in the current literature. Important examples of integral or peripheral membrane proteins that alter the behaviour of bolalipids and archaeal bolalipid membranes are involved in cell homeostasis, cell division, membrane trafficking, and lipid synthesis.

      The cells of many archaeal species are enclosed in a paracrystalline protein layer called the Slayer, which is attached to the lipid membrane [4, 55]. The main function of the S-layer is to keep the cell’s shape and to protect it against osmotic stress. Due to the embedding of the S-layer in the membrane at specific locations, it is to be expected that the membrane properties are influenced by the S-layer. Furthermore, archaea execute cell division by locally reshaping the membrane using FtsZ and ESCRT-III proteins [56]. While Asgard archaeal genomes encode proteins with homology to those regulating aspects of eukaryotic membrane remodelling and trafficking [57], they have yet to be observed undergoing a process like endocytosis [58]. In addition, it has been speculated that the proteins that drive the synthesis of two diether lipids into a tetraether lipid are either membrane associated or integral membrane proteins [59].

      However, to the best of our knowledge it is not known how membrane proteins specifically alter the behaviour of bolalipids. Future work will need to be executed to answer this question. Following the advice of reviewer 3 and to keep the introduction concise and focused on bolalipid membranes, we do not mention these observations in the revised manuscript.

      R1.Q4: Is there a mechanism in cells to convert or switch bolalipids from a straight to a u-shaped description? Does this happen spontaneously or are there enzymes responsible for this?

      We thank the reviewer for bringing up this important point. Despite the relevance of the question, little is currently known about the mechanism that make bolalipids transition between a straight and a U-shaped configuration mainly because there is to date no established experimental method.

      Besides our own results, most of what we know comes from coarse-grained molecular dynamics simulations, which showed that bolalipids can spontaneously transition between the straight and U-shaped configuration [29]. In addition, by using comparative genomic analysis, it has been predicted that many archaeal species contain flippases, i.e., membrane proteins that are able, upon the consumption of energy, to transfer (flipflop) bilayer lipids between the two membrane leaflets [43]. Moreover, it has been shown that Halobacterium salinarum (an archaeon with a bilayer lipid membrane) [44] contains scramblases, which are membrane proteins that passively transfer bilayer lipids from one membrane leaflet to the other. It is therefore tempting to speculate that similar proteins might exist for bolalipids which could facilitate the straight to U-shaped transition.

      In addition, it has been reported that vesicles composed of bolalipid membranes can undergo fusion with enveloped influenza viruses [17]. In this context, it has been suggested that the influenza fusion protein hemagglutinin may locally induce U-shaped bolalipids to facilitate membrane fusion. However, all these hints are by far no proof of a mechanism that can drive the straight to U-shaped bolalipid transition, and further work needs to be done to investigate this question in detail.

      In the revised version of the manuscript, we now discuss what is known about potential mechanisms to facilitate the straight to U-shaped transition in the discussion section (p.13 ):

      “While previous coarse-grained simulations predicted that bolalipids spontaneously transition between the straight and U-shaped conformations [29], how this happens in archaeal membranes and whether membrane proteins are involved in this conformational transition needs to be clarified in the future. Experimental studies suggest that archaeal membranes contain flippases and scramblases for the transitioning of bilayer lipids between membrane leaflets [43, 44], raising the possibility that similar proteins could also facilitate conformational transitions in bolalipids. In addition, it has been suggested that the viral fusion protein hemagglutinin could cause a transition from straight to U-shaped bolalipid conformation during the fusion of bolalipid vesicles with influenza viruses [17]. However, future investigation is required.”

      R1.Q5: Ideally, coordinates and any parameter files required to run the molecular simulations should be included for reproducibility.

      We absolutely share the reviewer’s concern with reproducibility and as such have included in the original submission as part of our data availability section a link to a code repository (available at: https://doi.org/10.5281/zenodo.13934991 [51]) that allows initializing and simulating flat membrane patches, with user control of the parameters explored in this paper (𝜔,T<sub>eff</sub>,k<sub>bola</sub>,f<sup>bi</sup>).

      Reviewer #2 (Recommendations for the authors):

      This is a great paper and I congratulate the authors for writing such a fine piece of scholarship. The only nitty-gritty feedback that I have is summarized in the following three points:

      R2.Q1: In the introduction the authors talk about archaea adapting their membrane to retain membrane fluidity. However, homeoviscous adaptation is also fundamental in bacteria and eukaryotes.

      The reviewer is correct, like archaea the membranes of bacteria and eukaryotes must balance between flexibility and stability. Moreover, the cell membranes in all 3 domains of life need to maintain membrane fluidity and provide mobility to the embedded lipids and membrane proteins (homeoviscous adaptation). The general idea is that these organisms change the ratio of different lipids to change membrane properties and thereby optimally adapt to their environments [10]. Importantly, however, there are differences of how homeoviscous adaptation is maintained across the different domains of life. As a reply to this reviewer and reviewer 3, we now discuss the underlying mechanisms in the revised parts of the introduction (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      R2.Q2: Uncertainties in Gaussian rigidity modulus estimates are not properly reported.

      The large uncertainties in the Gaussian rigidity modulus were due to the fact how they were calculated. In short, is determined in cap folding simulations [41] (SI section 9), by using the measured values of the dimensionless parameter 𝜉, related to the folding probability, the bending modulus 𝜅, the membrane line tension , and the cap radius R. In our case, the main source of uncertainty for determining comes from the uncertainty in the measurement of the bending rigidity 𝜅. To obtain 𝜅, previously, we fitted fluctuation spectra for different seeds and only then averaged the obtained values. In the revised version of the manuscript, we now first pool the fluctuation spectra of the different simulation seeds before we fit all spectra at the same time. This new approach results in smaller uncertainties for the bending rigidity 𝜅 and also the Gaussian rigidity modulus .

      As a consistency check, in addition to the simulations that we previously performed at T<sub>eff</sub> = 1.3, we have repeated the cap folding and line tension simulations at T<sub>eff</sub> = 1.2, resulting in similar values for . In the revised version of the manuscript, we report the newly calculated values and uncertainties for at T<sub>eff</sub>  = 1.2 in the main text (p.8 ):

      “At T<sub>eff</sub>  = 1.2, we obtained = 4.30±0.22kBT and thus a ratio of = 0.89±0.04 for bilayer membranes, similar to what has been reported previously [41]. For flexible bolalipid membranes, we got a slightly smaller value for = 5.04 ± 0.37kBT. Due to the larger bending modulus, however, flexible bolalipid membranes show a significantly smaller ratio = 0.64± 0.04 (k<sub></sub> = 0). At larger temperature (Teff = 1.3), the ratio can be even smaller = 0.45 ± 0.07 (see SI section 9).”

      In addition, we report the values at T<sub>eff</sub> = 1.3 and T<sub>eff</sub> = 1.2 in the SI (p.S15 , Tabl. S4):

      We have also adapted the discussion of the Gaussian bending modulus accordingly (p.13 ):

      “Another marked difference between bilayer and flexible bolalipid membranes is the ratio of the Gaussian rigidity to the bending modulus. Instead of being around 1 as for bilayer membranes [41], it is around 1/2 and therefore only half of that of bilayer lipids.”

      Reviewer #3 (Recommendations for the authors):

      While I think the bulk of the work presented is useful, some of the issues that I raised in my review are indeed major. Without properly addressing them, it is hard to accept the conclusions of the manuscript. I hope the authors can address them by revising their analysis.

      We thank the reviewer for their constructive feedback, which helped us to improve the manuscript. We have addressed all points raised by the reviewer in our detailed point-by-point response to the reviewer (see above). We hope the reviewer will now find it easier to accept our conclusions.

      (1) R. Phillips, J. Kondev, J. Theriot, and H. Garcia, Physical biology of the cell (Garland Science, New York, 2012).

      (2) H. T. McMahon and J. L. Gallop, Membrane curvature and mechanisms of dynamic cell membrane remodelling, Nature 438, 590 (2005).

      (3) S. B. Gould, Membranes and evolution, Curr. Biol. 28, R381 (2018).

      (4) S.-V. Albers and B. H. Meyer, The archaeal cell envelope, Nat. Rev. Microbiol. 9, 414 (2011).

      (5) P. M. Oger and A. Cario, Adaptation of the membrane in Archaea, Biophys. Chem. 183, 42 (2013).

      (6) K. Rastädter, D. J. Wurm, O. Spadiut, and J. Quehenberger, The Cell Membrane of Sulfolobus spp.—Homeoviscous Adaption and Biotechnological Applications, International Journal of Molecular Sciences 21, 3935 (2020).

      (7) P. L.-G. Chong, Archaebacterial bipolar tetraether lipids: Physico-chemical and membrane properties, Chem. Phys. Lipids 163, 253 (2010).

      (8) M. Tourte, P. Schaeffer, V. Grossi, and P. M. Oger, Functionalized Membrane Domains: An Ancestral Feature of Archaea?, Front. Microbiol. 11, 526 (2020).

      (9) Y. H. Kim, G. Leriche, K. Diraviyam, T. Koyanagi, K. Gao, D. Onofrei, J. Patterson, A. Guha, N. Gianneschi, G. P. Holland, M. K. Gilson, M. Mayer, D. Sept, and J. Yang, Entropic effects enable life at extreme temperatures, Sci. Adv. 5, eaaw4783 (2019).

      (10) M. F. Siliakus, J. van der Oost, and S. W. M. Kengen, Adaptations of archaeal and bacterial membranes to variations in temperature, pH and pressure, Extremophiles 21, 651 (2017).

      (11) D. W. Grogan, Phenotypic characterization of the archaebacterial genus sulfolobus: comparison of five wild-type strains, J. Bacteriol. 171, 6710 (1989).

      (12) D. X. Sahonero-Canavesi, M. F. Siliakus, A. Abdala Asbun, M. Koenen, F. von Meijenfeldt, S. Boeren, N. J. Bale, J. C. Engelman, K. Fiege, L. Strack van Schijndel, J. S. Sinninghe Damsté, and L. Villanueva, Disentangling the lipid divide: Identification of key enzymes for the biosynthesis of membrane-spanning and ether lipids in Bacteria, Sci. Adv. 8, eabq8652 (2022).

      (13) M. van Wolferen, A. A. Pulschen, B. Baum, S. Gribaldo, and S.-V. Albers, The cell biology of archaea, Nat. Microbiol. 10.1038/s41564-022-01215-8 (2022).

      (14) U. Bakowsky, U. Rothe, E. Antonopoulos, T. Martini, L. Henkel, and H.-J. Freisleben, Monomolecular organization of the main tetraether lipid from Thermoplasma acidophilum at the water–air interface, Chem. Phys. Lipids 105, 31 (2000).

      (15) C. Jeworrek, F. Evers, M. Erlkamp, S. Grobelny, M. Tolan, P. L.-G. Chong, and R. Winter, Structure and Phase Behavior of Archaeal Lipid Monolayers, Langmuir 27, 13113 (2011).

      (16) D. P. Brownholland, G. S. Longo, A. V. Struts, M. J. Justice, I. Szleifer, H. I. Petrache, M. F. Brown, and D. H. Thompson, Phase Separation in Binary Mixtures of Bipolar and Monopolar Lipid Dispersions Revealed by 2H NMR Spectroscopy, Small Angle X-Ray Scattering, and Molecular Theory, Biophysical Journal 97, 2700 (2009).

      (17) A. Bhattacharya, I. D. Falk, F. R. Moss, T. M. Weiss, K. N. Tran, N. Z. Burns, and S. G. Boxer, Structure–function relationships in pure archaeal bipolar tetraether lipids, Chem. Sci. 15, 14273 (2024).

      (18) V. Vitkova, D. Mitkova, V. Yordanova, P. Pohl, U. Bakowsky, G. Staneva, and O. Batishchev, Elasticity and phase behaviour of biomimetic membrane systems containing tetraether archaeal lipids, Colloids Surf. A Physicochem. Eng. Asp. 601, 124974 (2020).

      (19) E. Chang, Unusual thermal stability of liposomes made from bipolar tetraether lipids, Biochem. Biophys. Res. Commun. 202, 673 (1994).

      (20) O. V. Batishchev, A. S. Alekseeva, D. S. Tretiakova, T. R. Galimzyanov, A. Y. Chernyadyev, N. R. Onishchenko, P. E. Volynsky, and I. A. Boldyrev, Cyclopentane rings in hydrophobic chains of a phospholipid enhance the bilayer stability to electric breakdown, Soft Matter 16, 3216 (2020).

      (21) U. Seifert, Configurations of fluid membranes and vesicles, Adv. Phys. 46, 13 (1997).

      (22) H. Noguchi, Membrane Simulation Models from Nanometer to Micrometer Scale, J. Phys. Soc. Jpn. 78, 041007 (2009).

      (23) F. Frey and T. Idema, More than just a barrier: using physical models to couple membrane shape to cell function, Soft Matter 17, 3533 (2021).

      (24) C. Huguet, S. Fietz, A. Rosell-Melé, X. Daura, and L. Costenaro, Molecular dynamics simulation study of the effect of glycerol dialkyl glycerol tetraether hydroxylation on membrane thermostability, Biochimica et Biophysica Acta (BBA) - Biomembranes 1859, 966 (2017).

      (25) T. R. Galimzyanov, P. I. Kuzmin, P. Pohl, and S. A. Akimov, Elastic deformations of bolalipid membranes, Soft Matter 12, 2357 (2016).

      (26) T. R. Galimzyanov, P. E. Volynsky, and O. V. Batishchev, Continuum elasticity and molecular dynamics of a pore in archaeal bolalipid membranes, Soft Matter 21, 687 (2025).

      (27) A. O. Chugunov, P. E. Volynsky, N. A. Krylov, I. A. Boldyrev, and R. G. Efremov, Liquid but Durable: Molecular Dynamics Simulations Explain the Unique Properties of Archaeal-Like Membranes, Sci. Rep. 4, 7462 (2015).

      (28) L. F. Pineda De Castro, M. Dopson, and R. Friedman, Biological Membranes in Extreme Conditions: Simulations of Anionic Archaeal, PLoS One 11, e0155287 (2016).

      (29) M. Bulacu, X. Périole, and S. J. Marrink, In Silico Design of Robust Bolalipid Membranes, Biomacromolecules 13, 196 (2012).

      (30) C. H. Davis, H. Nie, and N. V. Dokholyan, Insights into thermophilic archaebacterial membrane stability from simplified models of lipid membranes, Phys. Rev. E 75, 051921 (2007).

      (31) S. Dey and J. Saha, Minimal Coarse-Grained Modeling toward Implicit Solvent Simulation of Generic Bolaamphiphiles, J. Phys. Chem. B 124, 2938 (2020).

      (32) I. R. Cooke and M. Deserno, Solvent-free model for self-assembling fluid bilayer membranes: Stabilization of the fluid phase based on broad attractive tail potentials, J. Chem. Phys. 123, 224710 (2005).

      (33) P. L.-G. Chong, U. Ayesa, V. Prakash Daswani, and E. C. Hur, On Physical Properties of Tetraether Lipid Membranes: Effects of Cyclopentane Rings, Archaea 2012, 1 (2012).

      (34) A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton, LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comput. Phys. Commun. 271, 108171 (2022).

      (35) A. Stukowski, Visualization and analysis of atomistic simulation data with ovito–the open visualization tool, Modelling and Simulation in Materials Science and Engineering 18, 015012 (2009).

      (36) E. R. May, A. Narang, and D. I. Kopelevich, Role of molecular tilt in thermal fluctuations of lipid membranes, Physical Review E 76, 021913 (2007).

      (37) W. Helfrich, Elastic Properties of Lipid Bilayers: Theory and Possible Experiments, Z. Naturforsch. C 28, 693 (1973).

      (38) M. Hamm and M. Kozlov, Elastic energy of tilt and bending of fluid membranes, Eur. Phys. J. E 3, 323 (2000).

      (39) M. Deserno, Fluid lipid membranes: From differential geometry to curvature stresses, Chemistry and Physics of Lipids 185, 11 (2015).

      (40) V. A. Harmandaris and M. Deserno, A novel method for measuring the bending rigidity of model lipid membranes by simulating tethers, The Journal of Chemical Physics 125, 204905 (2006).

      (41) M. Hu, J. J. Briguglio, and M. Deserno, Determining the Gaussian Curvature Modulus of Lipid Membranes in Simulations, Biophys. J. 102, 1403 (2012).

      (42) M. Deserno, Elastic deformation of a fluid membrane upon colloid binding, Phys. Rev. E 69, 031903 (2004), arXiv: cond-mat/0303656.

      (43) K. S. Makarova, M. Y. Galperin, and E. V. Koonin, Comparative genomic analysis of evolutionarily conserved but functionally uncharacterized membrane proteins in archaea: Prediction of novel components of secretion, membrane remodeling and glycosylation systems, Biochimie 118, 302 (2015).

      (44) A. Verchère, W.-L. Ou, B. Ploier, T. Morizumi, M. A. Goren, P. Bütikofer, O. P. Ernst, G. Khelashvili, and A. K. Menon, Light-independent phospholipid scramblase activity of bacteriorhodopsin from Halobacterium salinarum, Sci. Rep. 7, 9522 (2017).

      (45) T. B. H. Schroeder, G. Leriche, T. Koyanagi, M. A. Johnson, K. N. Haengel, O. M. Eggenberger, C. L. Wang, Y. H. Kim, K. Diraviyam, D. Sept, J. Yang, and M. Mayer, Effects of lipid tethering in extremophile-inspired membranes on H(+)/OH(-) flux at room temperature, Biophys. J. 110, 2430 (2016).

      (46) R. Xu, A. Dehghan, A.-C. Shi, and J. Zhou, Elastic property of membranes self-assembled from diblock and triblock copolymers, Chem. Phys. Lipids 221, 83 (2019).

      (47) Z. Dogic and S. Fraden, Ordered phases of filamentous viruses, Curr. Opin. Colloid Interface Sci. 11, 47 (2006).

      (48) E. Barry and Z. Dogic, Entropy driven self-assembly of nonamphiphilic colloidal membranes, Proc. Natl. Acad. Sci. U.S.A. 107, 10348 (2010).

      (49) A. J. Balchunas, R. A. Cabanas, M. J. Zakhary, T. Gibaud, S. Fraden, P. Sharma, M. F. Hagan, and Z. Dogic, Equation of state of colloidal membranes, Soft Matter 15, 6791 (2019).

      (50) M. Saracco, P. Schaeffer, M. Tourte, S.-V. Albers, Y. Louis, J. Peters, B. Demé, S. Fontanay, and P. M. Oger, Bilayer-Forming Lipids Enhance Archaeal Monolayer Membrane Stability, Int. J. Mol. Sci. 26, 3045 (2025).

      (51) M. Amaral, archaeal_membranes : code and examples (2024), available at https://doi.org/10.5281/zenodo. 13934991.

      (52) M. F. Ergüder and M. Deserno, Identifying systematic errors in a power spectral analysis of simulated lipid membranes, The Journal of Chemical Physics 154, 214103 (2021).

      (53) J. Genova, N. Ulrih, V. Kralj-Iglič, A. Iglič, and I. Bivas, Bending Elasticity Modulus of Giant Vesicles Composed of Aeropyrum Pernix K1 Archaeal Lipid, Life 5, 1101 (2015).

      (54) M. Amaral, Archaeal Membranes: In Silico Modelling and Design, Ph.D. thesis, Institute of Science and Technology Austria (2024).

      (55) M. Pohlschroder, F. Pfeiffer, S. Schulze, and M. F. A. Halim, Archaeal cell surface biogenesis, FEMS Microbiol. Rev. 42, 694 (2018).

      (56) K. S. Makarova, N. Yutin, S. D. Bell, and E. V. Koonin, Evolution of diverse cell division and vesicle formation systems in Archaea, Nat. Rev. Microbiol. 8, 731 (2010).

      (57) C. W. Stairs and T. J. Ettema, The Archaeal Roots of the Eukaryotic Dynamic Actin Cytoskeleton, Curr. Biol. 30, R521 (2020).

      (58) B. Baum and D. A. Baum, The merger that made us, BMC Biol. 18, 72 (2020).

      (59) Z. Zeng, H. Chen, H. Yang, Y. Chen, W. Yang, X. Feng, H. Pei, and P. V. Welander, Identification of a protein responsible for the synthesis of archaeal membrane-spanning GDGT lipids, Nat. Commun. 13, 1545 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In Ryu et al., the authors use a cortical mouse astrocyte culture system to address the functional contribution of astrocytes to circadian rhythms in the brain. The authors' starting point is transcriptional output from serum-shocked culture, comparative informatics with existing tools and existing datasets. After fairly routine pathway analyses, they focus on the calcium homeostasis machinery and one gene, Herp, in particular. They argue that Herp is rhythmic at both mRNA and protein levels in astrocytes. They then use a calcium reporter targeted to the ER, mitochondria, or cytosol and show that Herp modulates calcium signaling as a function of circadian time. They argue that this occurs through the regulation of inositol receptors. They claim that the signaling pathway is clock-controlled by a limited examination of Bmal1 knockout astrocytes. Finally, they switch to calcium-mediated phosphorylation of the gap junction protein Connexin 43 but do not directly connect HERP-mediated circadian signaling to these observations. While these experiments address very important questions related to the critical role of astrocytes in regulating circadian signaling, the mechanistic arguments for HERP function, its role in circadian signaling through inositol receptors, the connection to gap junctions, and ultimately, the functional relevance of these findings is only partially substantiated by experimental evidence. 

      Strengths: 

      - The paper provides useful datasets of astrocyte gene expression in circadian time. 

      - Identifies HERP as a rhythmic output of the circadian clock. 

      - Demonstrates the circadian-specific sensitivity of ATP -> calcium signaling. 

      - Identifies possible rhythms in both Connexin 43 phosphorylation and rhythmic movement of calcium between cells. 

      Weaknesses: 

      - It is not immediately clear why the authors chose to focus on Ca2+ homeostasis or Herp from their initial screens as neither were the "most rhythmic" pathways in their primary analyses. 

      We appreciate the reviewer’s comment. We chose to focus on Ca2+ homeostasis processes because intracellular Ca2+ signaling plays crucial role in numerous astrocyte functions and is notably associated with sleep/wake status of animals, which is our primary interest (Bojarskaite et al., 2020; Ingiosi et al., 2020; Blum et al., 2021; Szabó et al., 2017). Among the genes involved in calcium ion homeostasis, Herp exhibited the most robust rhythmicity (supplementary table 1). The rationale for our focus on Ca2+ homeostasis and Herp is explained in the results section (line 143-150). We hope this provides a clear justification for our focus.

      - It would have been interesting (and potentially important) to know whether various methods of cellular synchronization would also render HERP rhythmic (e.g., temperature, forskolin, etc). If Herp is indeed relatively astrocyte-specific and rhythmic, it should be easy to assess its rhythmicity in vivo. 

      Thank you for the reviewer’s insightful comment. In response, we examined HERP expression in cultured astrocytes synchronized using either Dexamethasone or Forskolin treatment. We found that Herp exhibited rhythmic expression at both the the mRNA and protein levels under these conditions. These results have been added to Figure S3 and are explained in the manuscript (lines 173-175).

      Additionally, we measured HERP levels in the prefrontal cortex of mice at CT58 and CT70 and found no rhythmicity, as shown in Author response image 1. Given that Herp is expressed in various brain cell types, including microglia, endothelial cells, neurons, oligodendrocytes, and the astrocytes- with the highest expression in microglia(Cahoy et al., 2008), we reason that the potential rhythmic expression of HERP in astrocytes might be masked by its continuous expression in other cell types. Nonetheless, to assess HERP rhythmicity specifically in astrocytes in vivo, we attempted immunostaining using several anti-HERP antibodies, but none were successful. Consequently, we were unable to determine whether HERP exhibits rhythmic expression in astrocytes in vivo.

      Author response image 1.

      HERP levels were constant at CT58 and CT70. (A, B) Mice were entrained under 12h:12h LD cycle and maintained in constant dark. Prefrontal cortices were harvested at indicated time and processed for Western blot analysis. Representative image shows three independent samples. (B) Quantification of HERP levels normalized to VINCULIN. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; t-test)

      - The authors show that Herp suppression reduces ATP-mediated suppression of calcium whereas it initially increases Ca2+ in the cytosol and mitochondria and then suppresses it. The dynamics of the mitochondrial and cytosolic responses are not discussed in any detail and it is unclear what their direct relationship is to Herp-mediated ER signaling. What is the explanation for Herp (which is thought to be ER-specific) to calcium signaling in other organelles? 

      Our examination of cytosolic and mitochondrial Ca2+ responses was aimed at corroborating HERP’s effect on ER Ca2+ response. Upon ATP stimulation, Ca2+ is released from the ER via IP3R receptors (IP3Rs) and subsequently transmitted to other organelles including mitochondria (Carreras-Sureda et al., 2018; Giorgi et al., 2018). Ca2+ is directly transferred to the cytosol by IP3Rs located on the ER membrane, and to the mitochondria through a complex formed by IP3R and the voltage-dependent anion channel (VDAC) on the mitochondria (Giorgi et al., 2018).  Consistent with previous reports, we observed an increase of cytosolic and mitochondrial Ca2+ levels accompanied by decrease in ER Ca2+ levels following ATP treatment (See Fig. 3B, E, H, control siRNA). The ATP-stimulated ER Ca2+ release was enhanced by Herp knockdown. We reasoned that if Ca2+ release was enhanced, then cytosolic and mitochondrial Ca2+ uptakes would also be enhanced. The results were consistent with our hypothesis (See Fig. 3B, E, H, Herp siRNA). These observations are described in the Results section (lines 202-208) and in the Discussion (lines 333-348). We hope this explanation clarifies the relationship between Herp-mediated ER Ca2+ response and Ca2+ response in other organelles. Thank you for your consideration.

      - What is the functional significance of promoting ATP-mediated suppression of calcium in ER? 

      In astrocytes, intracellular Ca2+ plays crucial role in regulating several processes. In this study, among various downstream effects of intracellular Ca2+, we examined the gap junction channel (GJC) conductance, which affects astrocytic communication. As discussed in the manuscript (lines 357-381), circadian variation in HERP results in rhythmic Cx43 (S368) phosphorylation linked with GJC conductance. We propose that during the subjective night phase, heightened ATP induced ER Ca2+ release reduces GJC conductance, uncoupling astrocytes from the syncytium, making them better equipped for localized response. On the other hand, during the subjective day phase, increased GJC conductance may allow astrocytes to control a larger area for synchronous neuronal activity which is a key feature of sleep.

      - The authors then nicely show that the effect of ATP is dependent on intrinsic circadian timing but do not explain why these effects are antiphase in cytosol or mitochondria.

      Moreover, the ∆F/F for calcium in mitochondria and cytosol both rise, cross the abscissa, and then diminish - strongly suggesting a biphasic signaling event. Therefore, one wonders whether measuring the area under the curve is the most functionally relevant measurement of the change. 

      We appreciate the reviewer’s insightful comments. As explained in our previous response, Ca2+ released from the ER is transferred to the cytosol and mitochondria. This transfer explains why the fluorescent intensities of cytosolic and mitochondrial Ca2+ indicators show anti-phasic responses to those of the ER.

      We agree that cytosolic and mitochondrial Ca2+ responses may be biphasic. The decrease below the abscissa in mitochondria and cytosol likely reflects Ca2+ extrusion from these organelles. However, our primary focus was on the initial uptake of Ca2+ following ER Ca2+ release. Thus, when calculating the area under the curve (AUC), we measured the area between the ∆F/F graph and the y=0 (X-axis) for both mitochondria and cytosol. We reason that the measuring the area under the curve (above the abscissa) fits with our objective.

      While addressing your concerns, we noticed errors in the Y-axis labels of Fig. 3C, 4D, and 5C. For the ER Ca2+ dynamics, we measured the area above curve. These mistakes have now been corrected.

      - Why are mitochondrial and cytosolic calcium not also demonstrated for Bmal1 KO astrocytes? 

      In two sets of experiments (Fig. 3 and Fig. 4), we demonstrated that the increase in cytosolic and mitochondrial Ca2+ aligns with ER Ca2+ release. Since there were no circadian time differences in ER Ca2+ release in the Bmal1 KO cultures, we concluded that it was unnecessary to measure Ca2+ levels in the mitochondria and cytosol. Additionally, our primary focus is on the ER Ca2+ response rather than the Ca2+ dynamics in subcellular organelles. We hope this clarifies our rationale and maintains the focus of our study.

      - The authors claim that Herp acts by regulating the degradation of ITPRs but this hypothesis - rather central to the mechanisms proposed in this study - is not experimentally substantiated. 

      We appreciate the reviewer’s insightful comments regarding the role of HERP in the degradation of IP3Rs. In the original manuscript, we demonstrated that treating cells with Herp siRNA leads to an increase in the levels of ITPR1 and ITPR2, suggesting that HERP might be involved in the regulation of IP3Rs stability. This observation is consistent with previous studies, which showed that Herp siRNA treatment increases ITPR levels in HeLa and cardiac cells (Paredes et al., 2016; Torrealba et al., 2017). Torrealba et al. also showed that HERP regulates the polyubiquitination of IP3Rs. Based on our results and previous reports, we hypothesized that HERP similarly regulates ITPR degradation in cultured astrocytes.

      However, as the reviewer rightly pointed out, further evidence is needed to confirm that HERP specifically regulates ITPR degradation. To address this, we conducted new experiments examining the effect of XesC, an inhibitor of IP3Rs, on ER Ca2+ release. The treatment of XesC reduced the ER Ca2+ release and abolished the enhancement of ER Ca2+ release by Herp KD. These results demonstrated that HERP influences ER Ca2+ response through IP3Rs. These new findings have been added to Fig. 3N – 3P and explained in the Results section (lines 217-221).

      We believe these additional experiments and clarifications strengthen our hypothesis that HERP regulates IP3R degradation, thereby modulating ER Ca2+ responses.

      - There is no clear demonstration of the functional relevance of the circadian rhythms of ATP-mediated calcium signaling.

      As mentioned in the previous response, we examined Cx43 phosphorylation linked with GJC conductance in the context of ATP-mediated Ca2+ signaling. Our results demonstrated circadian variations in Cx43 Ser368 phosphorylation leading to variations of gap junction channel (GJC) conductance (Fig. 6C – F and Fig. 7D - I). We have discussed the significance of this circadian rhythm in ATP driven ER Ca2+ signaling concerning astrocytic function during sleep/wake states in the manuscript (lines 357 – 382) as follows.

      “ATP-stimulated Cx43 (S368) phosphorylation is higher at 30hr (subjective night phase) than at 42hr (subjective day phase) (Fig. 6C and 6D.), a finding further supported by in vivo experiments showing higher pCx43(S368) levels in the prefrontal cortex during the subjective night than during the day (Fig. 6E and 6F). What are the implications of this day/night variation in Cx43 (S368) phosphorylation? We reasoned that the circadian variation in Cx43 phosphorylation could significantly impact astrocyte functionality within the syncytium. Indeed, our cultured astrocytes exhibited circadian phase-dependent variation in gap junctional communication (Fig.7D – 7F). Astrocytes influence synaptic activity through the release of gliotransmitters such as glutamate, GABA, D-serine, and ATP, triggered by increases in intracellular Ca2+ in response to the activity of adjacent neurons and astrocytes (Verkhratsky & Nedergaard, 2018). Importantly, this increase in Ca2+ spreads to adjacent astrocytes through GJCs (Fujii et al., 2017), influencing a large area of the neuronal network. Considering that Cx43 Ser368 phosphorylation occurs to uncouple specific pathways in the astrocytic syncytium to focus local responses (Enkvist & McCarthy, 1992), our findings suggest that astrocytes better equipped for localized responses when presented with a stimulus during the active phase in mice. Conversely, during the rest period, characterized by more synchronous neuronal activity across broad brain areas (Vyazovskiy et al., 2009) higher GJC conductance might allow astrocytes to exert control over a larger area. In support of this idea, recent study showed that synchronized astrocytic Ca2+ activity advances the slow wave activity (SWA) of the brain, a key feature of non-REM sleep (Szabó et al., 2017). Blocking GJC was found to reduce SWA, further supporting this interpretation. However, conflicting findings have also been reported. For instance, Ingiosi et al. (Ingiosi et al., 2020) found that astrocytic synchrony was higher during wakefulness than sleep in the mouse frontal cortex. Whether these differing results in astrocyte synchrony during resting and active periods are attributable to differences in experimental context (e.g., brain regions, sleep-inducing condition) remains unclear. Indeed, astrocyte Ca2+ dynamics during wakefulness/sleep vary according to brain regions (Tsunematsu et al., 2021). While the extent of astrocyte synchrony might differ depending on brain region and/or stimulus, on our results suggest that the baseline state of astrocyte synchrony, which is affected by GJC conductance, varies with the day/night cycle.”

      Reviewer #2 (Public Review): 

      Summary: 

      The article entitled "Circadian regulation of endoplasmic reticulum calcium response in mouse cultured astrocytes" submitted by Ryu and colleagues describes the circadian control of astrocytic intracellular calcium levels in vitro. 

      Strengths: 

      The authors used a variety of technical approaches that are appropriate 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript.

      Weaknesses: 

      Statistical analysis is poor and could lead to a misinterpretation of the data 

      Thank you for the comment. We have carefully reviewed our statistical analyses and applied appropriate methods where necessary. Please see below for the specific revisions and improvements made.

      For Fig. 2D-E, we initially used a t-test. However, after adding more replicates and conducting a normality test, we found that the data did not follow a normal distribution. Therefore, we switched to the Mann-Whitney U test. In Fig. 5D-E, we originally used a repeated measures two-way ANOVA, but we have now changed it to a standard two-way ANOVA. For Fig. 7C and I, we also observed non-normal distribution in the normality test and consequently replaced the t-test with the Mann-Whitney U test. For other analyses not specifically mentioned, normality tests confirmed normal distribution, allowing us to use t-tests or ANOVA as appropriate for statistical analysis.

      Several conceptual issues have been identified. 

      We have addressed the reviewer’s concerns. Please see our detailed point-by-point responses below.

      Overinterpretation of the data should be avoided. This is a mechanistic paper done completely in vitro, all references to the in vivo situation are speculative and should be avoided. 

      We appreciate the reviewer’s insightful comment. Following the reviewer’s suggestion, we have removed the interpretations of GO pathways in the context of in vivo situation.

      Reviewer #3 (Public Review): 

      Astrocyte biology is an active area of research and this study is timely and adds to a growing body of literature in the field. The RNA-seq, Herp expression, and Ca2+ release data across wild-type, Bmal1 knockout, and Herp knockdown cellular models are robust and lend considerable support to the study's conclusions, highlighting their importance. Despite these strengths, the manuscript presents a gap in elucidating the dynamics of HERP and the involvement of ITPR1/2 in modulating Ca2+ release patterns and their circadian variations, which remains insufficiently supported and characterized. While the Connexin data underscore the importance of rhythmic Ca2+ release triggered by ATP, the relationship here appears correlational and the role of HERP and ITPR in Cx function remains to be characterized. Moreover, enhancing the manuscript's clarity and readability could significantly benefit the presentation and comprehension of the findings. 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript. Regarding the identified gaps, we have conducted several new experiments to clearly demonstrate the HERP-ITPR-Cx phosphorylation axis. Please see our detailed point-by-point responses below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      - While HERP appears to be a clock-controlled gene and its protein levels appear to demonstrate rhythmicity as well, the data quality of the western blotting in Bmal1 knockout raises some concern about the accuracy of HERP protein quantification. 

      We understand the reviewer’s concern regarding the proximity of the HERP band to a nonspecific band in the Western blotting for the Bmal1 knockout. However, we took great care to ensure the accuracy of our HERP band quantification. We meticulously selected only the specific HERP band, excluding nonspecific band. Therefore, we are confident in the accuracy of our HERP protein measurements.

      - If HERP is rhythmic and ITPRs are not, if their model is correct, might we expect HERP suppression to result in 'unmasking' an ITPR rhythm? 

      Our model suggests that both HERP and ITPRs are rhythmic, with HERP regulating the degradation of ITPR proteins and driving their rhythms. Consistent with this, we observed that day/night variations in ITPR2 levels (Fig. 4N and 4O). Therefore, we concluded that circadian variations in HERP are sufficient to drive ITPR2 rhythms. We have explained this in detail in the Result section (lines 236-241) and the Discussion section (lines 324-332).

      - The authors make a rather abrupt switch to examining gap junctions and connexin 43 phosphorylation. While the data demonstrating that the phosphorylation of S368 may indeed be rhythmic - the authors do not connect these data to the rest of the manuscript by showing a connection to HERP-mediated calcium signaling, limiting the coherence of the narrative. 

      Thank you for the reviewer’s insightful comments. To address the reviewer's concern regarding the connection between Herp and the phosphorylation of CX43 at S368, we have conducted new experiments to test whether KD of Herp abolishes the rhythms of Cx43 phosphorylation at S368. We found that the phosphorylation of Cx43 at S368 is significantly enhanced at 30hrs post sync compared with 42hrs post sync in control siRNA-treated astrocytes consistent with our previous results (Fig. 6C & 6D). On the other hand, this circadian phase dependent difference in phosphorylation was abolished in Herp siRNA treated astrocytes. These results clearly indicate that circadian variations in Cx43 phosphorylation are driven by the HERP. These new results are now included in Fig. 6G and 6H and explained in the Results section (lines 276-281).

      - Comment on data presentation: the authors repeatedly present histograms with attached lines between data points - from my understanding of the experiments, this is inappropriate unless these were repeated measures from the same cells. Otherwise, the lines connecting one data point to another between different conditions (e.g., Ctrl or Herp knockdown) are arbitrary and possibly misleading (i.e., Figure 3K, 3M, 4L, 6D). 

      Thank you for the reviewer’s comment. We have updated the figures by removing the lines connecting data points in the relevant figures (Fig.3K, M, Fig4.N and Fig.6D)).

      Reviewer #2 (Recommendations For The Authors): 

      Most of the suggestions of this reviewer are related to the conceptual interpretation and presentation of the data and to the statistical analysis 

      In Figure 1 the authors analyzed the rhythmic transcriptome of cortical astrocytes synchronized with a serum shock in two different ways. The authors need to discuss what is the difference between the two methods used to detect rhythmic transcripts and make sense of them. 

      Following the reviewer’s suggestion, we have provided a more detailed explanation about MetaCycle and BioCycle, as well as the rationale for using both packages in our analysis as follows: “Various methods have been used to identify periodicity in time-series data, such as Lomb-Scargle (Glynn et al., 2006), JTK_CYCLE (Hughes et al., 2010) and ARSER (Yang & Su, 2010), each with distinct advantages and limitations. MetaCycle, integrates these three methods, facilitating the evaluation of periodicity in time-series data without requiring the selection of an optimal algorithm (Wu et al., 2016). Additionally, BioCycle has been developed using a deep neural network trained with extensive synthetic and biological time series datasets (Agostinelli et al., 2016). Because MetaCycle and Biocycle identify periodic signal based on different algorithms, we applied both packages to identify periodicity in our time-series transcriptome data. BioCycle and MetaCycle analyses detected 321 and 311 periodic transcripts, respectively (FDR corrected, q-value < 0.05) (Fig. 1B). Among these, 220 (53.4%) were detected by both methods, but many transcripts did not overlap. MetaCycle is known for its inability to detect asymmetric waveforms (Mei et al., 2020). In our analysis, genes with increasing waveforms like Adora1 and Mybph were identified as rhythmic only by BioCycle, while Plat and Il34 were identified as rhythmic only by MetaCycle (Fig. S1C). Despite these discrepancies, the clear circadian rhythmic expression profiles of these genes led us to conclude that using the union of the two lists compensates for the limitations of each algorithm.”

      Please refer to lines 105-117 in the Results section.

      The reasoning for comparing CT0 with the phase of the clock 8 hs after SS needs to be explained. Circadian time (CT) conceptually refers to the clock phase in the absence of entrainment cues in vivo, the direct transformation of "time after synchronization" in vitro to CT is misleading. 

      Thank you for the reviewer’s insightful comments. Initially, we believed that transforming TASS to CT, despite being in vitro data, might provide a more intuitive and physiologically relevant interpretation of our results. However, we agree that this approach might be misleading. Following the reviewer’s suggestion, we have revised our terminology by changing “CT” to “Time post sync (hr)”. Nonetheless, in Fig. 1F for circular peak phase map, we set 8hrs post sync to ZT0 based on a phase comparison result in Fig. 1D for physiologically relevant interpretation. We hope these revisions clarify our approach.

      Moreover, also by definition a CT cannot be defined in terms of "dark" or "light". Figure 6M needs to be changed. 

      Following the reviewer’s suggestion, we removed the labels CT22 and CT34. Instead. we have labeled the respective periods as “30hr post sync” and “42hr post sync”.

      In Figure 1D, the authors present a gene ontology analysis that is certainly interesting, however, it should not be overinterpreted when trying to explain processes that take place only in vivo (e.g. wound repair). 

      Thank you for the insightful comment. Following the reviewer’s feedback, we have removed the paragraph interpreting the cell migration process in relation to wound repair and have focused instead on Ca2+ ion homeostasis.

      In Figure 2A the relative expression of clock genes and Herp is again misleading by a white/grey shading indicating subjective night and subjective day when the system under study is a cell culture. 

      We understand the reviewer’s concern that a cell culture system is not equivalent to light/dark entrainment condition. However, we apply time-synchronizing stimuli to recapitulate in vivo entrainment. In addition, by comparing our data with CircaDB, we defined 8hrs post sync as corresponding to ZT0, thus aligning it with the beginning of the day. We have retained the shading to facilitate easier interpretation of our data in relation to in vivo situations. However, in response to the reviewer’s concern, we have revised the shading from white/grey to light grey/dark grey. We hope this adjustment addresses the reviewer’s concern, but if the reviewer still believes it is inappropriate, please let us know, we will gladly update it.

      In the Figure 2A legend, it is indicated that rhythmicity is assessed using MetaCycle with mean values obtained from n=2. The authors need to make clear whether this n=2 mean: 2 biological replicates or 2 technical replicates. This difference is relevant because it would make the analysis statistically valid or invalid, respectively. 

      Thank you for your feedback. n=2 refers to 2 biological replicates. Therefore, the analysis is statistically valid.

      In Figures 2C and D the authors applied a T-test, a parametric statistical test for one-to-one comparison that requires normality distribution of the data to be tested first. To test normality, the authors need at least 4 biological replicates. The suggestion of this reviewer is that these experiments have to be repeated and proper statistics applied. 

      Thank you for your feedback. In response to the reviewer's suggestion, we conducted additional experiments to increase the number of biological replicates to 4. After verifying the normality of the data, we applied a t-test for Figure 2C and a Mann-Whitney test for Figure 2D and 2E. These tests confirmed significant statistical difference between groups.

      Further evidence of Bmal1-dependent control of HERP circadian expression authors could check the presence of E-Box elements in the Herp promoter. 

      Thank you for the reviewer’s insightful comment. In the original version of our manuscript's Discussion section, we mentioned the absence of a canonical E-Box in the upstream of Herp gene. However, following the reviewer’s suggestion and considering the potential role of non-canonical E-Boxes, we conducted an additional analysis. This analysis identified several non-canonical E-Boxes within the 6 kb upstream region of the Herp gene (Table S2). Notably, we found one non-canonical E-Box, “CACGTT,” known to regulate circadian expression (Yoo et al., 2005) is close to the transcription start site (chr8:94386194-94386543). Moreover, this element is evolutionarily conserved across various mammals, including humans, rats, mice, dogs, and opossums (See Author response image 2). Therefore, we reasoned that these non-canonical E boxes might drive the CLOCK/BMAL1 dependent expression of Herp. We have updated the Discussion to reflect these findings in lines 315-319.

      Author response image 2.

      The calcium experiments shown in Figures 3A-I, could be more convincing if the authors showed that the different Ca2+ sensors are compartment-specific by showing co-localization with a subcellular marker. In the pictures shown it is not even possible to recognize the cell dimensions. 

      Following the reviewer’s suggestion, we performed co-staining experiments with organelle specific Ca2+ indicators and organelle markers. First, astrocytes were co-transfected with G-CEPIA1er, an ER specific Ca2+ indicator and ER targeted DsRed2 (with Calreticulin signal sequence). Live imaging analysis showed that the fluorescent intensities of G-CEPIA1er and DsRed2-ER-5 significantly overlapped in co-transfected cells. Secondly, astrocytes were transfected with Mito-R-GECO1 and Mitotracker, a cell permeable mitochondria dye, was applied. The fluorescent intensities of Mito-R-GECO1 and Mitotracker also significantly overlapped. These new data are included in Figure S4 and explained in the Result section (lines 194-195).

      Data analysis in Figure 3 K and M is misleading. According to the explanations of the results, each of the experiments to assess ITRP1 or 2 is run independently. Then it is not clear why the relative levels obtained with control or Herp siRNA are plotted as pairs. Same comment as above for Figure 4L and Figure 6D. 

      Thank you for the reviewer’s insightful comments. Reviewer1 raised similar issues. Following the reviewers’ suggestions, we have removed the lines connecting the data points in Fig. 3K, 3M, 4L, and 6D.

      In Figure 5E the authors need to explain why they consider that repeated measures 2-way ANOVA is the right statistical test to apply. According to the explained experimental design, cells transfected, synchronized, and then harvested independently at the indicated time after synchronization. 

      Thank you for the reviewer’s insightful comment. Upon reviewing the statistical methods as suggested, we have revised our approach. Instead of using repeated measures 2-way ANOVA, we have now applied a standard 2-way ANOVA, which is more appropriate given the experimental procedures were independent, as the reviewer pointed out.

      The English language needs to be revised throughout the text. 

      We have thoroughly revised the English language throughout the text.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figure 3. Clarify the physiological importance of 100 µM ATP. Would the Herp rhythm warrant Ca2+ release rhythms under basal conditions? In 3J-K, the relatively weak effect of Herp knockdown on ITPR1/2 levels, albeit statistically significant, may not be physiologically significant. This calls into question the claimed Herp-ITPR axis that underlies the Ca2+ release phenotype. Further, the correlation certainly exists but further characterization of Herp KD cells would be required to address the mechanism. 

      As previously reported, a broad range of ATP concentrations can induce Ca2+ activity in the astrocytes (Neary et al., 1988). Originally, we conducted an ATP dose-response analysis to observe ER Ca2+ release in our primary astrocyte culture. Our results show that ER Ca2+ release begins at 50 µM ATP and plateaus at 500 µM. Please refer to Author response image 3. We selected 100µM ATP for our experiments because it induces a medium level of ER Ca2+ response. Importantly, although measuring ATP concentrations at the synapse in vivo is challenging(Tan et al., 2017), estimates suggest synaptic ATP concentrations range from 5-500 µM (Pankratov et al., 2006). Thus, 100µM ATP is a physiologically relevant concentration that can affect nearby cells, including astrocytes, in the nervous system.

      Author response image 3.

      Cultured astrocytes were transfected with G-CEPIA1er ER and at 48hrs post transfection, cultured astrocytes were treated with various concentrations of ATP and Ca2+ imaging analysis was performed. (A) ΔF/F0 values over time following ATP application. (B) Area above curve values. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; one-way ANOVA).

      Regarding the comment on Ca2+ release rhythms under basal conditions, we interpret this as referring Ca2+ release in the absence of a stimulus. We typically observe Ca2+ release only upon stimulation, such as ATP treatment. However, we acknowledge that the modest effects of HERP knockdown on ITPR1/2 levels could question the HERP-ITPR axis’s role in ER Ca2+ release.

      To address this, we analyzed whether Herp KD induced increases in ER Ca2+ release were mediated through ITPRs by treating cells with Xestospongin C (XesC), an IP3R inhibitor. XesC treatment reduced ATP-induced ER Ca2+ release and eliminated the differences in ER Ca2+ release between control and Herp KD astrocytes (Fig. 3N – 3P). These results clearly indicate that HERP-ITPR axis plays critical role in controlling ER Ca2+ release. These new experiments have been included in Fig. 3 and explained in the result section (lines 217-221).

      Furthermore, following the reviewer’s suggestion, we examined whether HERP rhythms underlie the rhythms of ER Ca2+ response by analyzing ER Ca2+ response in Herp KD astrocyte in two different times following synchronization. In control astrocytes, ATP-induced ER Ca2+ responses vary depending on time, whereas these time-dependent variations were abolished in Herp KD astrocytes. These new experiments have been included in Fig. 4K – 4M and explained in the Results section (lines 232-235).

      Collectively, these results indicate that HERP rhythms lead to time-dependent differences in ER Ca2+ response through ITPRs.

      (2) Figure 4K-L. As data suggested the involvement of ITPR1 and ITPR2 (circadian effect), a reasonable next step is to determine their involvement, but the study did not pursue the hypothesis. 

      Thank you for your insightful comment. Our results indeed suggest that rhythms in ITPR2 levels may drive the time-dependent variations in ATP-induced ER Ca2+ release following synchronization. The newly conducted experiments demonstrated that treatment with the ITPR inhibitor XesC suppressed ATP-induced ER Ca2+ release at both control and Herp siRNA treatment conditions (Fig. 3). Based on these findings, we now further confirm that rhythms of ITPR levels, specifically ITPR2 underlie the circadian variations in ER Ca2+ release. While examining the effect of ITPR2 siRNA would directly prove the involvement of ITPR2, we have decided to pursue this experiment in the future studies.

      (3) Figure 5A-C. Data from WT cells should be included side by side with Bmal1-/- cells for comparison which is expected to be consistent with the HERP levels as in 5D-E. Again, the role of ITPR2 is suggested but not demonstrated. 

      Following the reviewer's suggestion, we conducted additional experiments including both WT and Bmal1-/- cultured astrocytes side-by-side. The results were consistent with our previous findings: WT astrocytes showed rhythms of ER Ca2+ release while Bmal1-/- astrocytes did not. We have updated the Figure 5A to 5C and the corresponding Results section in lines 242-245 accordingly.<br /> Regarding second comment, as mentioned in our previous response, we plan to examine the role of ITPR2 in further studies.

      (4) Figure 6. The Connexin data seems an addon and is correlative with the Ca2+ release. The role of Herp and Itpr in Connexin function is not addressed. Figure 6E-F was not called out in the results section. Suggest providing additional data to support the role of the HERP-ITPR axis in regulating Ca2+ release and Connexin activity. 

      We agree that additional data are needed to support the role of HERP in regulating CX43 phosphorylation. Therefore, we have conducted further experiments to determine whether rhythms of Cx43 phosphorylation are regulated by HERP. In the control astrocytes, ATP treatment induced time-dependent variations in Cx43 phosphorylation. However, these rhythms were abolished in Herp KD astrocytes. These results indicate that rhythms in HERP levels contribute to the time-dependent variations in Cx43 phosphorylation. These new experiments have included in Fig. 6G and 6H and explained in the results section (lines 276-281).

      Regarding second comment, we have corrected our oversight by properly referencing figures 6E-F in the results section. Please refer to lines 357-359 for clarification.

      (5) Discussion. This section should focus on noteworthy points to discuss, not repeating the results. 

      Based on the reviewer's valuable suggestions, we have revised the Discussion section to minimize repetition of the results. Thank you for your guidance.

      (6) The manuscript exhibits numerous grammatical and textual inaccuracies that necessitate careful revision by the authors. My observations here are confined to the title and the abstract alone. I recommend altering the title from "mouse cultured astrocytes" to "cultured mouse astrocytes" for clarity and grammatical correctness. The abstract, meanwhile, needs enhancements both in terms of its content and language. It should incorporate the results of the partitioning among the ER, cytoplasm, and mitochondria, and provide clear definitions for some of the critical terms used. It's worth noting that the abstract's second sentence contains a grammatical error. 

      Thank you for the reviewer’s valuable feedback. We have carefully revised the title, abstract, and main text to address the grammatical and textual issues. The title has been changed to “cultured mouse astrocytes”. Additionally, the abstract now includes results related to cytoplasmic Ca2+ dynamics and has been revised in several places. We appreciate your insights and have worked to enhance the content and language accordingly.

      Reference

      Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., & Baldi, P. (2016). What time is it? Deep learning approaches for circadian rhythms. Bioinformatics, 32(12), i8-i17. https://doi.org/10.1093/bioinformatics/btw243

      Cahoy, J. D., Emery, B., Kaushal, A., Foo, L. C., Zamanian, J. L., Christopherson, K. S., Xing, Y., Lubischer, J. L., Krieg, P. A., Krupenko, S. A., Thompson, W. J., & Barres, B. A. (2008). A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci, 28(1), 264-278. https://doi.org/10.1523/JNEUROSCI.4178-07.2008

      Carreras-Sureda, A., Pihán, P., & Hetz, C. (2018). Calcium signaling at the endoplasmic reticulum: fine-tuning stress responses. Cell Calcium, 70, 24-31. https://doi.org/10.1016/j.ceca.2017.08.004

      Enkvist, M. O., & McCarthy, K. D. (1992). Activation of protein kinase C blocks astroglial gap junction communication and inhibits the spread of calcium waves. J Neurochem, 59(2), 519-526. https://doi.org/10.1111/j.1471-4159.1992.tb09401.x

      Fujii, Y., Maekawa, S., & Morita, M. (2017). Astrocyte calcium waves propagate proximally by gap junction and distally by extracellular diffusion of ATP released from volume-regulated anion channels. Scientific Reports, 7(1), 13115. https://doi.org/10.1038/s41598-017-13243-0

      Giorgi, C., Marchi, S., & Pinton, P. (2018). The machineries, regulation and cellular functions of mitochondrial calcium. Nature Reviews Molecular Cell Biology, 19(11), 713-730. https://doi.org/10.1038/s41580-018-0052-8

      Glynn, E. F., Chen, J., & Mushegian, A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310-316. https://doi.org/10.1093/bioinformatics/bti789

      Hughes, M. E., Hogenesch, J. B., & Kornacker, K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythms, 25(5), 372-380. https://doi.org/10.1177/0748730410379711

      Ingiosi, A. M., Hayworth, C. R., Harvey, D. O., Singletary, K. G., Rempe, M. J., Wisor, J. P., & Frank, M. G. (2020). A Role for Astroglial Calcium in Mammalian Sleep and Sleep Regulation. Curr Biol, 30(22), 4373-4383.e4377. https://doi.org/10.1016/j.cub.2020.08.052

      Mei, W., Jiang, Z., Chen, Y., Chen, L., Sancar, A., & Jiang, Y. (2020). Genome-wide circadian rhythm detection methods: systematic evaluations and practical guidelines. Briefings in Bioinformatics, 22(3). https://doi.org/10.1093/bib/bbaa135

      Neary, J. T., van Breemen, C., Forster, E., Norenberg, L. O., & Norenberg, M. D. (1988). ATP stimulates calcium influx in primary astrocyte cultures. Biochem Biophys Res Commun, 157(3), 1410-1416. https://doi.org/10.1016/s0006-291x(88)81032-5

      Pankratov, Y., Lalo, U., Verkhratsky, A., & North, R. A. (2006). Vesicular release of ATP at central synapses. Pflugers Arch, 452(5), 589-597. https://doi.org/10.1007/s00424-006-0061-x

      Paredes, F., Parra, V., Torrealba, N., Navarro-Marquez, M., Gatica, D., Bravo-Sagua, R., Troncoso, R., Pennanen, C., Quiroga, C., Chiong, M., Caesar, C., Taylor, W. R., Molgó, J., San Martin, A., Jaimovich, E., & Lavandero, S. (2016). HERPUD1 protects against oxidative stress-induced apoptosis through downregulation of the inositol 1,4,5-trisphosphate receptor. Free Radic Biol Med, 90, 206-218. https://doi.org/10.1016/j.freeradbiomed.2015.11.024

      Szabó, Z., Héja, L., Szalay, G., Kékesi, O., Füredi, A., Szebényi, K., Dobolyi, Á., Orbán, T. I., Kolacsek, O., Tompa, T., Miskolczy, Z., Biczók, L., Rózsa, B., Sarkadi, B., & Kardos, J. (2017). Extensive astrocyte synchronization advances neuronal coupling in slow wave activity in vivo. Scientific Reports, 7(1), 6018. https://doi.org/10.1038/s41598-017-06073-7

      Tan, Z., Liu, Y., Xi, W., Lou, H. F., Zhu, L., Guo, Z., Mei, L., & Duan, S. (2017). Glia-derived ATP inversely regulates excitability of pyramidal and CCK-positive neurons. Nat Commun, 8, 13772. https://doi.org/10.1038/ncomms13772

      Torrealba, N., Navarro-Marquez, M., Garrido, V., Pedrozo, Z., Romero, D., Eura, Y., Villalobos, E., Roa, J. C., Chiong, M., Kokame, K., & Lavandero, S. (2017). Herpud1 negatively regulates pathological cardiac hypertrophy by inducing IP3 receptor degradation. Sci Rep, 7(1), 13402. https://doi.org/10.1038/s41598-017-13797-z

      Tsunematsu, T., Sakata, S., Sanagi, T., Tanaka, K. F., & Matsui, K. (2021). Region-specific and state-dependent astrocyte Ca<sup>2+</sup> dynamics during the sleep-wake cycle in mice. The Journal of Neuroscience, JN-RM-2912-2920. https://doi.org/10.1523/jneurosci.2912-20.2021

      Verkhratsky, A., & Nedergaard, M. (2018). Physiology of Astroglia. Physiol Rev, 98(1), 239-389. https://doi.org/10.1152/physrev.00042.2016

      Vyazovskiy, V. V., Olcese, U., Lazimy, Y. M., Faraguna, U., Esser, S. K., Williams, J. C., Cirelli, C., & Tononi, G. (2009). Cortical firing and sleep homeostasis. Neuron, 63(6), 865-878. https://doi.org/10.1016/j.neuron.2009.08.024

      Wu, G., Anafi, R. C., Hughes, M. E., Kornacker, K., & Hogenesch, J. B. (2016). MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics, 32(21), 3351-3353. https://doi.org/10.1093/bioinformatics/btw405

      Yang, R., & Su, Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168-174. https://doi.org/10.1093/bioinformatics/btq189

      Yoo, S. H., Ko, C. H., Lowrey, P. L., Buhr, E. D., Song, E. J., Chang, S., Yoo, O. J., Yamazaki, S., Lee, C., & Takahashi, J. S. (2005). A noncanonical E-box enhancer drives mouse Period2 circadian oscillations in vivo. Proc Natl Acad Sci U S A, 102(7), 2608-2613. https://doi.org/10.1073/pnas.0409763102

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang, Hu et al. examined the molecular mechanisms underlying astrocyte activation and its implications for multiple sclerosis. This study shows that the glycolytic enzyme PKM2 relocates to astrocyte nuclei upon activation in EAE mice. Inhibiting PKM2's nuclear import reduces astrocyte activation, as evidenced by decreased proliferation, glycolysis, and inflammatory cytokine release. Crucially, the study identifies TRIM21 as pivotal in regulating PKM2 nuclear import via ubiquitination. TRIM21 interacts with PKM2, promoting its nuclear translocation and enhancing its activity, affecting multiple signaling pathways. Confirmatory analyses using single-cell RNA sequencing and immunofluorescence demonstrate TRIM21 upregulation in EAE astrocytes. Modulating TRIM21 expression in primary astrocytes impacts PKM2-dependent glycolysis and proliferation. In vivo experiments targeting this mechanism effectively mitigate disease severity, CNS inflammation, and demyelination in EAE.

      The authors supported their claims with various experimental approaches, however, some results should be supported with higher-quality images clearly depicting the conclusions and additional quantitative analyses of Western blots.

      Thanks for the reviewer’s comments. We agree with the reviewer and have added higher magnification images, for example Fig.2A to better visualize the localization of PKM2 in DASA-treated conditions, and Fig. 3A and Fig.3B to better visualize the pSTAT3 and pp65. Moreover, we have added quantitative analyses of Western blots for some key experiments, for example quantitative results for Fig.2D is added in Fig.S3 to show the change of PKM2 and p-c-myc in DASA-58-treated conditions and quantitative results for Fig. 3D are added in Fig.S4B and S4C to show the change of nuclear and cytoplasmic PKM2, STAT3 and NF-κB in different conditions.

      Strength:

      This study presents a comprehensive investigation into the function and molecular mechanism of metabolic reprogramming in the activation of astrocytes, a critical aspect of various neurological diseases, especially multiple sclerosis. The study uses the EAE mouse model, which closely resembles MS. This makes the results relevant and potentially translational. The research clarifies how TRIM21 regulates the nuclear import of PKM2 through ubiquitination by integrating advanced techniques. Targeting this axis may have therapeutic benefits since lentiviral vector-mediated knockdown of TRIM21 in vivo significantly reduces disease severity, CNS inflammation, and demyelination in EAE animals.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The authors reported that PKM2 levels are elevated in the nucleus of astrocytes at different EAE phases compared to cytoplasmic localization. However, Figure 1 also shows elevated cytoplasmic expression of PKM2. The authors should clarify the nuclear localization of PKM2 by providing zoomed-in images. An explanation for the increased cytoplasmic PKM2 expression should provided. Similarly, while PKM2 translocation is inhibited by DASA-58, in addition to its nuclear localization, a decrease in the cytoplasmic localization of PKM2 is also observed. This situation brings to mind the possibility of a degradation mechanism being involved when its nuclear translocation of PKM2 is inhibited.

      According to the results of immunofluorescence staining of PKM2 in spinal cord of EAE mice and in cultured primary astrocytes, in addition to the observation of PKM2 nuclear translocation in EAE conditions, we showed an elevated expression of PKM2 in astrocytes, including the cytoplasmic and nuclear expression. In neurological diseases, various studies showed consistent results, for example, following spinal cord injury (SCI), not only the upregulated expressing of PKM2 but also nuclear translocation was observed in astrocytes (Zhang et al., 2015). In EAE conditions, CNS inflammation is elevated and several proinflammatory cytokines and chemokines might contribute to the upregulated expression of PKM2 in astrocytes. We have tested TNFα and IL-1β, which are recognized to play important roles in EAE and MS (Lin and Edelson, 2017, Wheeler et al., 2020), and results from western blots showed the increased expression of PKM2 upon stimulation with TNFα and IL-1β (Author response image 1). Moreover, according to the reviewer’s suggestions, we have added zoomed-in images for figure 2A.

      Additionally, the reviewer has noted the decrease in the cytoplasmic PKM2 level, degradation-related mechanism and other mechanisms might be involved in this process.

      Author response image 1.

      Upregulated expression of PKM2 in astrocytes following stimulation with TNF-α and IL-1β. Primary astrocytes were stimulated with TNF-α and IL-1β (50 ng/mL) for 48 h and western blotting analysis were performed.

      In Figure 3D, the authors claim that PKM2 expression causes nuclear retention of STAT3, p65, and p50, and inhibiting PKM2 localization with DASA-58 suppresses this retention. The western blot results for the MOG-stimulated group show high levels of STAT3, p50, and p65 in nuclear localization. However, in the MOG and DASA-58 treated group, one would expect high levels of p50, p65, and STAT3 proteins in the cytoplasm, while their levels decrease in the nucleus. These western blot results could be expanded. Additionally, intensity quantification for these results would be beneficial to see the statistical difference in their expressions, especially to observe the nuclear localization of PKM2.

      We agree with the reviewer’s comments and we have incorporated the quantification of STAT3,p50 and p65 for Fig.3D and Fig.S4B and Fig.S4C. Nevertheless, given that DASA-58 did not trigger a notable increase in the cytoplasmic level of PKM2, we did not detect an upregulation of STAT3, p50, or p65 in the cytoplasm of the MOG and DASA-58-treated groups. With the quantification results, it is more obvious to see the changes of these proteins in different conditions.

      The discrepancy between Figure 7A and its explaining text is confusing. The expectation from the knocking down of TRIM21 is the amelioration of activated astrocytes, leading to a decrease in inflammation and the disease state. The presented results support these expectations, while the images showing demyelination in EAE animals are not highly supportive. Clearly labeling demyelinated areas would enhance readers' understanding of the important impact of TRIM21 knockdown on reducing the disease severity.

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      Additionally, we have added the whole image of the spinal cord for MBP in Author Response image 2. Moreover, we have labelled the demyelinated areas to facilitate readers’ understanding.

      Author response image 2.

      MBP staining of the whole spinal cord in EAE mice from shVec and shTRIM21 group. Scale bar: 100 μm. Demyelinated areas are marked with dashed lines.

      Reviewer #2 (Public Review):

      This study significantly advances our understanding of the metabolic reprogramming underlying astrocyte activation in neurological diseases such as multiple sclerosis. By employing an experimental autoimmune encephalomyelitis (EAE) mouse model, the authors discovered a notable nuclear translocation of PKM2, a key enzyme in glycolysis, within astrocytes.

      Preventing this nuclear import via DASA 58 substantially attenuated primary astrocyte activation, characterized by reduced proliferation, glycolysis, and inflammatory cytokine secretion.<br /> Moreover, the authors uncovered a novel regulatory mechanism involving the ubiquitin ligase TRIM21, which mediates PKM2 nuclear import. TRIM21 interaction with PKM2 facilitated its nuclear translocation, enhancing its activity in phosphorylating STAT3, NFκB, and c-myc. Single-cell RNA sequencing and immunofluorescence staining further supported the upregulation of TRIM21 expression in astrocytes during EAE.

      Manipulating this pathway, either through TRIM21 overexpression in primary astrocytes or knockdown of TRIM21 in vivo, had profound effects on disease severity, CNS inflammation, and demyelination in EAE mice. This comprehensive study provides invaluable insights into the pathological role of nuclear PKM2 and the ubiquitination-mediated regulatory mechanism driving astrocyte activation.

      The author's use of diverse techniques, including single-cell RNA sequencing, immunofluorescence staining, and lentiviral vector knockdown, underscores the robustness of their findings and interpretations. Ultimately, targeting this PKM2-TRIM21 axis emerges as a promising therapeutic strategy for neurological diseases involving astrocyte dysfunction.

      While the strengths of this piece of work are undeniable, some concerns could be addressed to refine its impact and clarity further; as outlined in the recommendations for the authors.

      Thanks for the reviewer’s comment and positive evaluation of our present work. We have further answered each question in recommendations section.

      Reviewer #3 (Public Review):

      Summary:

      Pyruvate kinase M2 (PKM2) is a rate-limiting enzyme in glycolysis and its translocation to the nucleus in astrocytes in various nervous system pathologies has been associated with a metabolic switch to glycolysis which is a sign of reactive astrogliosis. The authors investigated whether this occurs in experimental autoimmune encephalomyelitis (EAA), an animal model of multiple sclerosis (MS). They show that in EAA, PKM2 is ubiquitinated by TRIM21 and transferred to the nucleus in astrocytes. Inhibition of TRIM21-PKM2 axis efficiently blocks reactive gliosis and partially alleviates symptoms of EAA. Authors conclude that this axis can be a potential new therapeutic target in the treatment of MS.

      Strengths:

      The study is well-designed, controls are appropriate and a comprehensive battery of experiments has been successfully performed. Results of in vitro assays, single-cell RNA sequencing, immunoprecipitation, RNA interference, molecular docking, and in vivo modeling etc. complement and support each other.

      Weaknesses:

      Though EAA is a valid model of MS, a proposed new therapeutic strategy based on this study needs to have support from human studies.

      We agree that although we have clarified the therapeutic potential of targeting TRIM21 or PKM2 in the treatment of EAE, a mouse model of MS, the application in human studies warrants further studies. While considering the use of TRIM21 as a target for treating multiple sclerosis in clinical trials, several issues need to be addressed to ensure the safety, efficacy and feasibility. One such aspect is the development of drug that specifically target TRIM21 in brain, capable of crossing the blood-brain barrier and have minimal off-target effects. The translation of preclinical finding into clinical trials poses a significant challenge. To provide evidence for the similarities between the EAE model and multiple sclerosis, we have screened GEO databases (Author response image 3). In GSE214334 which analyzed transcriptional profiles of normal-appearing white matter from non-MS and different subtypes of disease (RRMS, SPMS and PPMS). Although no statistical difference was observed among different groups, the TRIM21 expression has tendency to increase in SPMS (secondary progressive MS) and PPMS (primary progressive MS) patients. In GSE83670, astrocytes from 3 control white matter and 4 multiple sclerosis normal appearing white matter (NAWM) were analyzed. TRIM21 mRNA expression is higher in MS group (78.73 ± 10.44) compared to control group (46.67 ± 24.15). Although these two GEO databases did not yield statistically significant differences, TRIM21 expression appears to be elevated in the white matter of MS patients compared to controls.

      To address this limitation, we have incorporated the following statement in the discussion section: “However, whether TRIM21-PKM2 could potentially serve as therapeutic targets in multiple sclerosis warrants further studies.”

      Author response image 3.

      TRIM21 expression in control and MS patients based on published GEO database. (A) The expression of TRIM21 in normal-appearing white matter in non-MS (Ctl) and different clinical subtypes of MS (RRMS, SPMS, PPMS) based on GSE214334 (one-way ANOVA). (B) The expression of TRIM21 from multiple sclerosis normal appearing white matter (NAWM) and control WM based on GSE83670. RRMS, relapsing--remitting MS; SPMS, secondary progressive MS; PPMS, primary progressive MS (unpaired Student's t test). Data are represented as the means ± SEM.

      Reviewer #4 (Public Review):

      Summary:

      The authors report the role of the Pyruvate Kinase M2 (PKM2) enzyme nuclear translocation as fundamental in the activation of astrocytes in a model of autoimmune encephalitis (EAE). They show that astrocytes, activated through culturing in EAE splenocytes medium, increase their nuclear PKM2 with consequent activation of NFkB and STAT3 pathways. Prevention of PKM2 nuclear translocation decreases astrocyte counteracts this activation. The authors found that the E3 ubiquitin ligase TRIM21 interacts with PKM2 and promotes its nuclear translocation. In vivo, either silencing of TRIM21 or inhibition of PKM2 nuclear translocation ameliorates the severity of the disease in the EAE model.

      Strengths:

      This work contributes to the knowledge of the complex action of the PKM2 enzyme in the context of an autoimmune-neurological disease, highlighting its nuclear role and a novel partner, TRIM21, and thus adding a novel rationale for therapeutic targeting.

      Weaknesses:

      Despite the relevance of the work and its goals, some of the conclusions drawn would require more thorough proof:

      I believe that the major weakness is the fact that TRIM21 is known to have per se many roles in autoimmune and immune pathways and some of the effects observed might be due to a PKM2-independent action. Some of the experiments to link the two proteins, besides their interaction, do not completely clarify the issue. On top of that, the in vivo experiments address the role of TRIM21 and the nuclear localisation of PKM2 independently, thus leaving the matter unsolved.

      We agree that TRIM21 has multifunctional roles and only some of their effects are due to PKM2-independent action. It is obvious that TRIM21 functions as ubiquitin ligases and its substrate are various. Here we identify PKM2 as one of its interacting proteins and our focus is the relationship between TRIM21 and the nuclear translocation PKM2, we have used diverse experiments to clarify their relationships, for example immunoprecipitation, western blotting, immunofluorescence, cyto-nuclear protein extraction. These aforementioned experiments are key points of our studies. From the results of in vitro experiments, targeting either TRIM21 or PKM2 might be potential targets for EAE treatment. Expectedly, from in vivo experiments, either targeting TRIM21 or PKM2 nuclear transport ameliorated EAE. In order to test the relationship of TRIM21 and PKM2 nuclear transport in vivo, we have stained PKM2 in shVec and shTRIM21-treated mice. Expectedly, knocking down TRIM21 led to a decrease in the nuclear staining of PKM2 in spinal cord astrocytes in EAE models (Figure S7A). This observation underscores that the therapeutic potential of inhibiting TRIM21 in astrocytes in vivo might be partially due to its role in triggering the reduced nuclear translocation of PKM2.

      Some experimental settings are not described to a level that is necessary to fully understand the data, especially for a non-expert audience: e.g. the EAE model and MOG treatment; action and reference of the different nuclear import inhibitors; use of splenocyte culture medium and the possible effect of non-EAE splenocytes.

      According to the reviewer’s suggestions, we have added more detailed descriptions in the materials and methods section, for example, the use of splenocytes culture medium, mass spectrometry, HE and LFB staining have been added. More details are incorporated in the part for “EAE induction and isolation and culture of primary astrocytes”. Moreover, the reference of DASA-58 in vitro and TEPP-46 in vivo as inhibitors of PKM2 nuclear transport were added.

      The statement that PKM2 is a substrate of TRIM21 ubiquitin ligase activity is an overinterpretation. There is no evidence that this interaction results in ubiquitin modification of PKM2; the ubiquitination experiment is minimal and is not performed in conditions that would allow us to see ubiquitination of PKM2 (e.g. denaturing conditions, reciprocal pull-down, catalytically inactive TRIM21, etc.).

      To prevent the misunderstanding, we have revised certain statements in the manuscript. In the updated version, the description is as follows: Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General recommendations:

      - The whole manuscript needs language editing.

      We appreciate the comments of the reviewers. We have improved the writing of the manuscript. All modifications are underlined.

      - Details of many experiments are not given in the materials and methods.

      According to the reviewer’s suggestions, we have added more details for experiments in the materials and methods. For example, “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes”, “mass spectrometry”, “Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining” were added in the section of Materials and Methods. More detailed information is given for EAE induction and isolation and culture of primary astrocytes.

      - Line properties in graphics should be corrected, some lines in box plots and error bars are very weak and hardly visible. Statistical tests should be included in figure legends as well. Statistical differences should be mentioned for control vs DASA-58 (alone) in all related figures.

      We have revised the figures to enhance their visibility by thickening the lines and error bars. In accordance with the reviewer’s suggestions, we have incorporated statistical tests in figure legends. Moreover, statistical analysis has been made among all groups, if there is no asterisk indicated in the figure legend and figure panels, it means there is no statistical difference between the control vs DASA-58 groups. For most of the experiments conducted in our studies, including lactate production, glucose consumption, the EdU analysis and CCK8 analysis, the change of STAT3 and NF-κB pathways, no statistical difference was observed between the control and DASA-58 group. The reason might be due to that in unstimulated astrocytes, the expression of PKM2 is low and nuclear translocation of PKM2 are few, which may explain why DASA-58 did not exert the anticipated effect. Thus, in our experiments, we have used MOGsup to stimulate astrocytes, enabling us to observe the impact of DASA-58 on the astrocyte proliferation and glycolysis in this condition.

      - Scale bars, arrows, and labeling in the images are not visible.

      We have improved the images according to the reviewer’s suggestions. The scale bars, arrows are made thicker and labeling are larger. The updated figures are visible.

      - Quantitative analysis of all western blot results and their statistics could be provided in every image and for every protein.

      For western blotting results which are further processed with quantitative analysis, for example, Fig.2D, fig. 5G, Fig. 6A and 6B, Fig. S4, we have added their statistics in the raw data sections. The other western blot results, for example, IP analysis, which are used to analyze protein-protein binding are not further processed with quantitative analysis.

      - Proteins that are used for normalizations in western blots should be stated in the text.

      We have added description of proteins that are used for normalization in western blots in figure legends. Moreover, in figure panels, proteins used for normalization are indicated. Globally, whole protein level is normalized to protein level of β-actin. For nuclear and cytoplasmic proteins, nuclear protein is normalized to the expression of lamin, cytoplasmic protein is normalized to the expression of tubulin. 

      - The manuscript investigates the role of TRIM21 in the nuclear localization of PKM2 in astrocytes in EAE mice, however almost no information is given about TRIM21 in the introduction. Extra information is given for PKM2, yet can be concisely explained.

      We have added a paragraph that describes the information of TRIM21 in the introduction section. The description is as follows: “TRIM21 belongs to the TRIM protein family which possess the E3 ubiquitin ligase activity. In addition to its well-recognized function in antiviral responses, emerging evidences have documented the multifaceted role of TRIM21 in cell cycle regulation, inflammation and metabolism (Chen et al., 2022). Nevertheless, the precise mechanisms underlying the involvement of TRIM21 in CNS diseases remain largely unexplored.”

      - "As such, deciphering glycolysis-dominant metabolic switch in astrocytes is the basis for understanding astrogliosis and the development of neurological diseases such as multiple sclerosis." The sentence could be supported by references.

      To support this sentence, we have added the following references:

      (1) Xiong XY, Tang Y, Yang QW. Metabolic changes favor the activity and heterogeneity of reactive astrocytes. Trends in endocrinology and metabolism: TEM 2022;33(6):390-400.

      (2) das Neves SP, Sousa JC, Magalhães R, Gao F, Coppola G, Mériaux S, et al. Astrocytes Undergo Metabolic Reprogramming in the Multiple Sclerosis Animal Model. Cells 2023;12(20):2484.

      Figure 1/Result 1:

      - Figure 1A-B: Quality of the images should be improved.

      According to the reviewer’s suggestion, we have improved the quality of the image, images with higher resolution were added in figure 1A and figure 1B.

      - Control images of Figure 1B are not satisfying. GFAP staining is very dim. Images from control cells should be renewed.

      As mentioned by the reviewer’s, we have renewed the control images and added the DAPI staining figures for all groups. Compared with MOGsup stimulated astrocytes, the control cells are not in activated state and GFAP are relatively low.

      - Labelings on the images are not sufficient, arrows and scale bars are not visible.

      We have improved the images including labels, arrows and scale bars in all figures.

      - How splenocytes were obtained from MOG induced mice were not given in the material and methods section. Thus, it should be clearly stated how splenocyte supernatant is generated (treatment details).

      We have added the detailed information relating to splenocyte isolation and splenocyte supernatant entitled “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes” in the section of Materials and methods. “Splenocytes were isolated from EAE mice 15 d (disease onset) after MOG35-55 immunization. Briefly, spleen cells were suspended in RPMI-1640 medium containing 10% FBS. Splenocytes were plated in 12-well plates at 1x106 cells/well containing 50 μg/mL MOG35-55 and cultured at 37°C in 5% CO2. After stimulation for 60 h, cell suspension was centrifuged at 3000 rpm for 5 min and supernatants were collected. For the culture of MOGsup-stimulated astrocytes, astrocytes were grown in medium containing 70% DMEM supplemented with 10% FBS and 30% supernatant from MOG35-55-stimulated-splenocytes.”

      - For general astrocyte morphology: authors showed the cells are GFAP+ astrocytes. It is surprising that these cells do not bear classical astrocyte morphology in cell culture. How long do you culture astrocytes before treatment? How do you explain their morphological difference?

      Astrocytes were cultured for 2 to 3 weeks which correspond to 2-3 passages before treatment. There are several possible reasons for the morphological differences observed between GFAP+ astrocytes and their classical morphology. Firstly, the cell density. In low-density culture just as shown in Figure 1B, we have observed that astrocytes adopt a more flattened morphology. In high-density cultures, they adopt a stellate shape. Moreover, variations in culture conditions, such as the use of different fetal bovine serum, can also influence the morphology of astrocytes. In addition, the mechanical injury induced by the isolation procedures for astrocytes might contribute to variations in their morphology during in vitro cultivation. In summary, the morphological differences observed in GFAP+ astrocytes in cell culture likely result from a combination of culture conditions, cell density, and mechanical injury occured during astrocyte isolation etc.

      - Additional verification of reactive astrocytes could be performed by different reactive astrocyte markers, such as GLAST, Sox9, S100ß. Thus, quantitative analysis of activated astrocytes can be done by counting DAPI vs GLAST, Sox9 or S100ß positive cells.

      We really agree with the reviewer that there are other markers of reactive astrocytes such as GLAST, sox9 and S100β. However, numerous evidences support that GFAP is the most commonly used reactive astrocyte markers. Most of the cases, reactive astrocytes undergo GFAP overexpression. GFAP is one the most consistently induced gene in transcriptomic datasets of reactive astrocytes, confirming its usefulness as a reactive marker (Escartin et al., 2019). Thus, we have used GFAP as the marker of astrocyte activation in our study.

      - How you performed quantifications for Figures 1C and 1D should be clearly explained, details are not given.

      Quantification for Figure 1C and 1D were added in the figure legend. In general, Mean fluorescence intensity of PKM2 in different groups of (B) was calculated by ImageJ. The number of nuclear PKM2 was quantified by Image-Pro Plus software manually (eg. nuclear or cytoplasmic based on DAPI blue staining). The proportion of nuclear P KM2 is determined by normalizing the count of nuclear PKM2 to the count of nuclear DAPI, which represents the number of cell nuclei.

      - "Together, these data demonstrated the nuclear translocation of PKM2 in astrocytes from EAE mice." Here the usage of "suggests" instead of "demonstrated".

      Based on the reviewer's suggestion, we have revised the use of "demonstrated" to "suggest" in this sentence.

      Result 2 and 3:

      - In the literature, DASA-58 is shown to be the activator of PKM2 (https://www.nature.com/articles/nchembio.1060https://doi.org/10.1016/j.cmet.2019.10.015).

      - Providing references for the inhibitory use of DASA-58 for PKM2 would be appreciated.

      DASA-58 is referred to as “PKM2 activator” due to its ability to enforce the tetramerization of PKM2, enhancing the enzymatic ability of PKM2 to catalyze PEP to pyruvate conversion. However, the enforced conversion of tetramerization of PKM2 inhibited the dimer form of PKM2, thereby inhibiting its nuclear translocation. For this reason, DASA-58 is also used as the inhibitor of nuclear translocation of PKM2. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      - Western blot results and statistics for PKM2 should be quantitatively given for all groups.

      According to the reviewer’s suggestions, we have added the quantification of PKM2 for western blots in figure 2 and figure 3. Quantification of PKM2 in figure 2D is added in Fig S3. Quantification of PKM2 in figure 3D is added in Fig.S4B and Fig. S4C.

      - Figure 3A-B: staining method/details are not mentioned in materials and methods.

      Staining methods is in the paragraph entitled “Immunofluorescence” in the section of materials and methods. The descriptions are as follows:

      For cell immunochemistry, cells cultured on glass coverslips were fixed with 4% PFA for 10 min at RT, followed by permeabilization with 0.3% Triton X-100. Non-specific binding was blocked with buffer containing 3% BSA for 30 min at RT. Briefly, samples were then incubated with primary antibodies and secondary antibodies. DAPI was used to stain the nuclei. Tissues and cells were observed and images were acquired using an EVOS FL Auto 2 Cell image system (Invitrogen). The fluorescence intensity was measured by ImageJ.

      - In Figure 3A, in only DASA-58 treated cells, it looks like GFAP staining is decreased. It would be better to include MFI analysis for GFAP in the supplementary information.

      We have added the MFI analysis for GFAP in Figure 3A in Fig.S4A. GFAP expression is decreased after DASA-58 treatment (in both control and MOGsup condition), the reason might be due to the effect of DASA-58 on inhibition of PKM2 nuclear transport, which subsequently suppress the activation of astrocytes, leading to the decreased expression of GFAP.

      Result 4

      - Detailed explanation of the mass spectrometry and IP experiments should be given in materials and methods. What are the conditions of the cells? Which groups were analyzed? Are they only MOG stimulated, MOG-DASA-58 treated, or only primary astrocytes without any treatment? The results should be interpreted according to the experimental group that has been analyzed.

      We have added the detailed information relating to mass spectrometry and immunoprecipitation in the materials and methods. In general, two groups of cells were subjected to mass spectrometry analysis, primary astrocytes without any treatment and MOGsup-stimulated primary astrocytes. These two groups were immunoprecipitated with anti-PKM2 antibody. Moreover, in the manuscript, we have revised the sentence concerning the description of mass spectrometry. The description is as follows: “To illustrate underlying mechanism accounting for nuclear translocation of PKM2 in astrocytes, we sought to identify PKM2-interacting proteins. Here, unstimulated and MOGsup-stimulated primary astrocytes were subjected to PKM2 immunoprecipitation, followed by mass spectrometry”. Furthermore, the description of these two groups of cells were added in the figure legend of Fig.4.

      Result 5:

      - For the reader, it would be better to start this part by explaining the role of TRIM21 in cells by referring to the literature.

      We agreed with the reviewer that beginning this part by explaining the role of TRIM21 would be better. Accordingly, we have added the following descriptions at the beginning of this part: “TRIM21 is a multifunctional E3 ubiquitin ligase that plays a crucial role in orchestrating diverse biological processes, including cell proliferation, antiviral responses, cell metabolism and inflammatory processes (Chen X. et al., 2022).” The relevant literature has been included: Chen X, Cao M, Wang P, Chu S, Li M, Hou P, et al. The emerging roles of TRIM21 in coordinating cancer metabolism, immunity and cancer treatment. Front Immunol 2022;13:968755.

      - The source and the state of the cells (control vs MOG induced) should be stated (Figure 5A).

      In figure 5A to 5D, single-cell RNA-seq were performed from CNS tissues of naive and different phases of EAE mice (peak and chronic). We have added this detailed information in the figure legend of Figure 5.

      - Figure 5D can be placed after 5A. Data in Figure 5A is probably from naive animals, if so, it should be stated in the legend where A is explained. The group details of the data shown in Figure 5 should be clearly stated.

      According to the reviewer’s suggestions, we have placed 5D after 5A. Single-cell RNA seq analysis were performed from CNS tissues of naïve mice and EAE mice. This information is stated in the legend of Figure 5A-D. “Single-cell RNA-seq profiles from naive and EAE mice (peak and chronic phase) CNS tissues. Naive (n=2); peak (dpi 14–24, n=3); chronic (dpi 21–26, n=2).”

      - Immunofluorescence images should be replaced with better quality images, in control images, stainings are not visible.

      We have replaced with better quality images in figure 5H and in control images, the staining is now visible.

      Result 6:

      - Experimental procedures should be given in detail in materials and methods.

      We have revised the section of materials and methods, and more details are added. Detailed information was added for astrocyte isolation, immunoprecipitation. Moreover, mass spectrometry, Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining, Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes were added in materials and methods.

      Result 7:

      - In Figure 7A, the mean clinical score seems significantly reduced in the shTRIM21-treated group, although it is explained in the result text that it is not significant. Explain to us the difference between Figure 7A and the explaining text?

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      - The staining methods for luxury fast blue and HE are not given in materials and methods.

      According to the reviewer’s comments, we have added the staining methods for HE and LFB in materials and methods.

      - In Figure 7E, authors claim that MBP staining is low in an image, however the image covers approximately 500 um area. One would like to see the demyelinated areas in dashed lines, and also the whole area of the spinal cord sections.

      In Author response image 2, we have added the images for MBP staining of the whole area of spinal cord sections. Demyelinated areas are marked with dashed lines.

      - "TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization." should be supported by references.

      We have added two references for this sentence. Anastasiou D et al. showed that TEPP-46 acts as an activator by stabilizing subunit interactions and promoting tetramer formation of PKM2. Angiari S et al. showed that TEPP-46 prevented the nuclear transport of PKM2 by promoting its tetramerization in T cells.

      These two references are added:

      Angiari S, Runtsch MC, Sutton CE, Palsson-McDermott EM, Kelly B, Rana N, et al. Pharmacological Activation of Pyruvate Kinase M2 Inhibits CD4(+) T Cell Pathogenicity and Suppresses Autoimmunity. Cell metabolism 2020;31(2):391-405.e8.

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      - Could you explain what the prevention stage is?

      The term “prevention stage” was used to describe the administration of TEPP-46 before disease onset. To be more accurate, we have revised the phrase from “prevention stage” to “preventive treatment” as described in other references. For example, Ferrara et al. (Ferrara et al., 2020) used “preventive” and “preventive treatment” to mean administration before disease onset.

      The revised sentences are as follows: “To test the effect of TEPP-46 on the development of EAE, the “preventive treatment” (i.e, administration before disease onset) was administered. Intraperitoneal treatment with TEPP-46 at a dosage of 50 mg/kg every other day from day 0 to day 8 post-immunization with MOG35-55 resulted in decreased disease severity (Fig. S8A).”

      - In in vitro experiments, authors used DASA-58, and in vivo they used TEPP-46. What might be the reason that DASA-58 is not applied in vivo?

      The effects of DASA-58 and TEPP-46 in promoting PKM2 tetramerization have been tested in vitro and has been documented. Based on in vitro absorption, distribution, metabolism and excretion profiling studies, Anastasiou et al. predicted that TEPP-46 had better in vivo drug exposure compared to DASA-58. Moreover, TEPP-46, but not DASA-58, is pharmacokinetically validated in vivo (Anastasiou et al., 2012). Thus, we used TEPP-46 for in vivo studies.

      - Authors claim that TEPP-46 activates PKM2 and leads it its nuclear translocation, however, they did not verify PKM2 expression in the nucleus.

      To support that TEPP-46 exerts effects in inhibiting PKM2 nuclear translocation both in vivo and in vitro, we have performed western blotting analysis and immunofluorescence staining. In vitro, TEPP-46 administration inhibited the MOGsup-induced PKM2 nuclear translocation, which exerts similar effects as DASA-58 (Author response image 4). The in vivo effects of TEPP-46 was analyzed by co-immunostaining of PKM2 and GFAP. The results showed reduced nuclear staining of PKM2 in spinal cord astrocytes in TEPP-46-treated EAE mice compared with control EAE mice (Figure S7B).

      Author response image 4.

      TEPP-46 inhibited the nuclear transport of PKM2 in primary astrocytes. Nuclear-cytoplasmic protein extraction analysis showed the nuclear and cytoplasmic changes of PKM2 in TEPP-46 treated astrocytes and MOGsup-stimulated astrocytes. Primary astrocytes were pretreated with 50 μM TEPP-46 for 30 min and stimulated with MOGsup for 24 h.

      Supplementary Figure 3:

      - In Figure 3D, merge should be stated on top of the merged images, it is confusing to the reader.

      According to the reviewer’s comments, we have added merge on top of the merged images.

      Discussion:

      All results should be discussed in detail by interpreting them according to the literature.

      We have further discussed the results in the discussion n section. Firstly, we added a paragraph describing the role of nuclear translocation of PKM2 in diverse CNS diseases. Moreover, a paragraph discussing the nuclear function of PKM2 as a protein kinase or transcriptional co-activator was added. Now the discussion section is more comprehensive, which nearly discuss all the results by interpreting them according to the literature in detail.

      Reviewer #2 (Recommendations For The Authors):

      The authors could address the following points:

      (1) In Figure 1A, the authors present immunofluorescence staining of PKM2 in both control mice and MOG35-725 55-induced EAE mice across different stages of disease progression: onset, peak, and chronic stages. Observing the representative images suggests a notable increase in PKM2 levels, particularly within the nucleus of MOG35-725 55-induced EAE mice. However, to provide a more comprehensive analysis, it would be beneficial for the authors to include statistical data, such as average intensities {plus minus} standard deviation (SD), along with the nuclear PKM2 ratio, akin to the presentation for cultured primary astrocytes in vitro in panels B-D. Additionally, the authors should clearly specify the number of technical repeats and the total number of animals utilized for these data sets to ensure transparency and reproducibility of the findings.

      Thanks for the reviewer’s suggestion. Accordingly, for figure 1A, we have added the nuclear PKM2 ratio in astrocytes in control and different stages of EAE mice in Supplementary figure S1A. Moreover, the quantification of mean fluorescence intensity (MFI) for PKM2 was added in figure S1B. Moreover, we have added the number of animals used in each group in figure legend.

      (2) The blue hue observed in the merged images of Figure 1B (lower panel) presents a challenge for interpretation. The source of this coloration remains unclear from the provided information. Did the authors also include a co-stain for the nucleus in their imaging? To enhance clarity, especially for individuals with color vision deficiency, the authors might consider utilizing different color combinations, such as presenting PKM2 in green and GFAP in magenta, which would aid in distinguishing the two components. Furthermore, for in vitro cell analysis, incorporating a nuclear stain could provide valuable insights into estimating the cytosolic-to-nuclear ratio of PKM2.

      For the question relating to the merged images in figure 1B, PKM2 was presented in green, GFAP was presented in red and blue represents the nuclear staining by DAPI. “Merge” represents the merged images of these three colors. To enhance the clarity, we have added the images for the nuclear staining of DAPI.

      (3) To substantiate the conclusion of the authors regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes, employing supplementary methodologies such as high-resolution respirometry and metabolomics could offer valuable insights. These techniques would provide a more comprehensive understanding of metabolic alterations and further validate the observed changes in glycolytic activity.

      While we recognize the merits of techniques such as high-resolution respirometry and metabolomics, we believe that the conclusions regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes are sufficiently supported by the current experimental evidence. Our study has relied on a robust set of experiments, including lactate production, glucose consumption, cyto-nuclear localization analysis and western blotting analysis of key enzymes in glycolysis. These results, in conjunction with the literature on the role of PKM2 in various cancer cells, keratinocytes and immune cells, provide a strong foundation for our conclusions. Although metabolomics could offer a global view of the changes in metabolic states in astrocytes, as the end product of aerobic glycolysis is lactate, our study, which analyze the change of lactate levels in different experimental conditions might be more direct. However, we fully acknowledge that future studies employing these advanced methodologies could provide further insights into the precise mechanisms underlying PKM2's effects on aerobic glycolysis.

      (4) Minor: Why is the style of the columns different in Gig 2 panel D compared to those shown in panels B, C, and G of Figure 2.

      To maintain consistency in the column style across figure 2, we have updated the column in figure 2D. Now, we use same style of columns in Fig 2B, C, D and G.

      (5) The effect of stimulating astrocytes with MOGsup on cell proliferation, as shown in Figure 2E, is very moderate. Does DASA-58 reduce the proliferation of control cells in this assay?

      In response to the reviewer’s questions, we conducted a CCK8 analysis in astrocytes subjected to DASA-58 treatment. As depicted in Author response image 5, administration of DASA-58 did not reduce the proliferation of control cells. This result aligns with our other findings in the glycolysis assays and EdU analysis, where there is no statistical difference between control group and DASA-58-treated group. One plausible explanation for this is that in their steady state, astrocytes in the control group are not in a hyperproliferative state. Under such conditions, inhibiting the translocation of PKM2 via DASA-58 or other inhibitors did not significantly affect the proliferation of astrocytes.

      Author response image 5.

      CCK8 analysis of astrocyte proliferation. Primary astrocytes were pretreated with 50 μM DASA-58 for 30 min before stimulation with MOGsup. Data are represented as mean ± SEM. ***P<0.001. SEM, standard error of the mean.

      (6) The tables and lists in Figure 4, panels A-D, are notably small, hindering readability and comprehension. Consider relocating these components to the supplementary materials as larger versions.

      We have updated the tables and lists, the lines are made thicker. As suggested by the reviewer, we relocate theses components in Supplementary Figure S5.

      Reviewer #3 (Recommendations For The Authors):

      Higher magnification images that more clearly show nuclear translocation of PKM2 and pp65 and pSTAT3 immunoreactivity should be added to the figures panels, for example as inlets.

      Thank you for pointing out this issue in the manuscript. According to the reviewer’s comments we have included higher magnification images as inlets for Figure 3A, Figure 3B and Figure 2A. These enlarged images now provide a clearer visualization of the nuclear translocation state of PKM2, pp65, and pSTAT3.

      There are seldom wording errors like features => feathers at line 364.

      We are very sorry for our incorrect writing. We have corrected this spelling mistake in the manuscript.

      Reviewer #4 (Recommendations For The Authors):

      Here below are major and minor concerns on the data presented:

      (1) It is not clear from the Methods section what are the culture conditions defined as 'control' in Figure 1B-D. I believe the control should be culturing with the conditioned medium of normal (non-EAE) mice splenocytes to be sure the effect is not from cytokines naturally secreted by these cells.

      Thanks for the reviewer’s comments and we totally understand the reviewer's concern. The control means non-treated primary astrocytes cultured with traditional DMEM medium supplemented with 10% FBS. In fact, we have performed experiments to exclude the possibility that the observed effect of MOGsup on the activation of astrocytes is from cytokines secreted by splenocytes. Splenocytes from normal (non-EAE) mice were isolated, cultured in RPMI-1640 medium containing 10% FBS for 60 hours, and supernatant was collected. Immunofluorescence staining of PKM2 and GFAP were performed in non-treated primary astrocytes and astrocytes stimulated with supernatant from control splenocytes. As shown in Figure S1C, in both groups, no difference was observed in PKM2 expression and localization, PKM2 was located mainly in the cytoplasm in theses conditions. These results indicate that observed effect of PKM2 in MOGsup-stimulated condition is not due to the cytokines secreted from splenocytes. Thus, we used non-treated primary astrocytes as controls in our study. To clarify the control group, we have revised the description in the figure legend, The revised expression is as follows: “Immunofluorescence staining of PKM2 (green) with GFAP (red) in non-treated primary astrocytes (control) or primary astrocytes cultured with splenocytes supernatants of MOG35–55-induced EAE mice (MOGsup) for different time points (6 h, 12 h and 24 h). ”

      (2) Figure 3D: the presence of PMK2 in the nuclear fraction upon MOGSUP together with the DASA-58 (last lane of Figure 3D) is not supporting the hypothesis proposed and further may indicate that the reduction of pSTAT3, pp65, etc. observed is independent of PMK2 nuclear translocation/astrocyte activation being observed even in absence of MOGSUP.

      Thank you for pointing out this problem in manuscript. The representing image of nuclear level of PKM2 in Figure 3D is not obvious, as shown by figure 3D, which has raised doubts among the reviewers. To strengthen our conclusion that the reduction of STAT3 and p65 pathway is related to the inhibited nuclear level of PKM2 induced by DASA-58, nuclear PKM2 level was quantified and added in Figure S4B. From the quantification results, it is evident that DASA-58 administration decreased the nuclear level of PKM2 in MOGsup-stimulated astrocytes. To address this concern, we have updated the immunoblot image for PKM2 in figure 3D and incorporated quantification results in supplementary Figure S4.

      (3) Molecular docking indication and deletion co-immunoprecipitation reported in Figure 4 data are not concordant on TRIM21: N-terminal Phe23 and Thr87 (Figure 4E) predicted by MD to bind PMK2 are not in the PRY-SPRY domain suggested by the co-IP experiment (Figure 4I).

      The discrepancy between the molecular docking prediction and the co-immunoprecipitation can be explained as follows:

      Firstly, molecular docking is computational methods that predicts protein-protein interaction based on 3-D structures of the proteins. However, the accuracy of this predication can be influenced by the different models of 3D structures of TRIM21 and PKM2, as well as by factors such as post-translational modifications and flexibility of the proteins. Proteins in vivo are subject to post-translational modifications that can affect their interactions. These modifications are not fully captured in molecular docking analysis. For example, in our analysis, the predicted N-terminal Phe23 and Thr87 in TRIM21 hold the potential to interact with PKM2 by hydrogen bonds. However, such binding can be influenced by diverse biological environments, such as different cells and pathological conditions. Molecular docking predication may suggest the specific residues and binding pocked within the protein complex, however, the accuracy should be verified by experimental techniques such as immunoprecipitation. To address the predication results of molecular docking, the description has been revised as follows: “TRIM21 is predicted to bound to PKM2 via hydrogen bonds between the amino acids of the two molecules.”

      Co-immunoprecipitation that involves the use of truncated domains of TRIM21 and PKM2, is an experimental technique relies on the specific interaction between antibody and targeted proteins. This technique can provide insights into the precise binding domains between TRIM21 and PKM2. As demonstrated in our study, PRY-SPRY domain of TRIM21 is involved in this binding. In summary, while molecular docking and Co-IP are valuable tools for studying protein-protein interactions, their differing focus and limitations may result in discrepancies between the predicted interaction sites and the experimentally identified interaction domains.

      (4) The Authors state that PMK2 is a substrate of TRIM21 E3 ligase activity, however, this is not proved: i) interaction does not imply a ligase-substrate relationship; ii) the ubiquitination shown in Figure 6C is not performed in denaturing conditions thus the K63-Ub antibody can detect also interacting FLAG-IPed proteins (besides, only a single strong band is seen, not a chain; molecular weights in immunoblot should be indicated); iii) use of a catalytically inactive TRIM21 would be required as well.

      We appreciate the reviewer’s comments regarding the limitations of the immunoprecipitation and K63-antibody test, which could not lead to the conclusion that PKM2 is a substrate of TRIM21. To avoid any misunderstandings, we have revised the relevant sentence from “Hereby, we recognized PKM2 as a substrate of TRIM21” to “Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21”. Moreover, we have revised the title of the relevant part in the results section, the previous title, “TRIM21 ubiquitylates and promotes the nuclear translocation of PKM2” has been replaced with “TRIM21 promotes ubiquitylation and the nuclear translocation of PKM2”. Moreover, molecular weights for all proteins in western blotting were indicated.

      (5) As above, molecular weights should always be indicated in immunoblot.

      Thanks for pointing out this problem in the figures. Accordingly, we have added the molecular weights for every protein tested in immunoblot.

      (6) The authors should describe the EAE mouse model in the text and in the material and methods as it may not be so well known to the entire reader audience, and the basic principle of MOG35-55 stimulation, in order to understand the experimental plan meaning.

      We appreciate the reviewer’s comments highlighting the importance of clarifying EAE model for a broader understanding of the reader audience. In response, we have described the EAE model both in the text and in the materials and methods section. In the text, the description of EAE model was added at the beginning of the first paragraph in the Results section. The description is as follows: “EAE is widely used as a mouse model of multiple sclerosis, which is typically induced by active immunization with different myelin-derived antigens along with adjuvants such as pertussis toxin (PTX). One widely used antigen is the myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide (Nitsch et al., 2021), which was adopted in our current studies.”

      We have also added the detailed experimental procedures for EAE induction in the materials and methods section.

      (7) The authors should better explain and give the rationale for the use of splenocytes and why directly activated astrocytes (isolated from the EAE model) cannot be employed to confirm/prove some of the presented data.

      Firstly, splenocytes offer a heterogenous cell population, encompassing T cells and antigen presenting cells (APC), which may better mimic the microenvironment and complex immune responses observed in vivo.

      Myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide is one widely used antigen for EAE induction. MOG35-55 elicits strong T responses and is highly encephalitogenic. Moreover, MOG35-55 induces T cell-mediated phenotype of multiple sclerosis in animal models. Thus, by isolating splenocytes from the onset stage of EAE mice, which contains APC and effector T cells, followed by stimulation with antigen MOG35-55 in vitro for 60 hours, the T-cell response in the acute stage of EAE diseases could be mimicked in vitro. The supernatant from MOG35-55 stimulated splenocytes has high levels of IFN-γ and IL-17A, which in part mimic the pathological process and environment in EAE, and this technique has been documented in the references (Chen et al., 2009, Kozela et al., 2015).

      Correspondingly, we have revised sentence for the use of MOG35-55 stimulates splenocytes in EAE mice and add the relevant references: “Supernatant of MOG35-55-stimulated splenocytes isolated from EAE mice were previously shown to elicit a T-cell response in the acute stage of EAE and are frequently used as an in vitro autoimmune model to investigate MS and EAE pathophysiology (Chen et al., 2009, Du et al., 2019, Kozela et al., 2015).”

      Secondly, activated astrocytes (isolated from the EAE model) can not be employed for in vitro culture for the following reasons:

      (1) Low cell viability. Compared to embryonic or neonatal mice, adult mice yield a limited number of viable cells. The is mainly because that adult tissues possess less proliferative capacity.

      (2) Disease changes. Astrocytes in EAE mice are exposed to microenvironment including inflammatory cytokines, antigens and other pathological factors. Without this environment, the function and morphology of astrocytes undergo changes, which make it difficult to interpret the results in vitro.

      For these reasons, the in vitro cultured primary astrocytes used the neonatal mice.

      (8) The authors should indicate the phosphorylation sites they are referring to when analysing p-c-myc, pSTAT3, pp65, etc...

      According to the reviewer’s suggestions, we have added the phosphorylation sites for pSTAT3 (Y705), pp65 (S536), p-c-myc (S62) and pIKK (S176+S180) in the figure panels.

      (9) Reference of DASA-58 and TEPP-46 inhibitors and their specificity should be given.

      According to the reviewer’s comments, we have added the relevant references for the use of DASA-58 and TEPP-46 as inhibitors of PKM2 nuclear transport. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      To address the selectivity of TEPP-46 and add the references, the relevant sentence has been revised from “TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization” to “TEPP-46 is a selective allosteric activator for PKM2, showing little or no effect on other pyruvate isoforms. It promotes the tetramerization of PKM2, thereby diminishing its nuclear translocation (Anastasiou et al., 2012, Angiari et al., 2020).”

      Reviewing Editor (Recommendations For The Authors):

      The reviewing editor would appreciate it if the original blots from the western blot analysis, which were used to generate the final figures, could be provided.

      Thanks for the reviewing editor’s comment, accordingly, we will add the original blots for the western blots analysis.

      References

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      Escartin C, Guillemaud O, Carrillo-de Sauvage M-A. Questions and (some) answers on reactive astrocytes. Glia 2019;67(12):2221-47.

      Ferrara G, Benzi A, Sturla L, Marubbi D, Frumento D, Spinelli S, et al. Sirt6 inhibition delays the onset of experimental autoimmune encephalomyelitis by reducing dendritic cell migration. Journal of neuroinflammation 2020;17(1):228.

      Lin CC, Edelson BT. New Insights into the Role of IL-1β in Experimental Autoimmune Encephalomyelitis and Multiple Sclerosis. Journal of immunology (Baltimore, Md : 1950) 2017;198(12):4553-60.

      Palsson-McDermott Eva M, Curtis Anne M, Goel G, Lauterbach Mario AR, Sheedy Frederick J, Gleeson Laura E, et al. Pyruvate Kinase M2 Regulates Hif-1α Activity and IL-1β Induction and Is a Critical Determinant of the Warburg Effect in LPS-Activated Macrophages. Cell metabolism 2015;21(1):65-80.Rao J, Wang H, Ni M, Wang Z, Wang Z, Wei S, et al. FSTL1 promotes liver fibrosis by reprogramming macrophage function through modulating the intracellular function of PKM2. Gut 2022;71(12):2539-50.

      Wheeler MA, Clark IC, Tjon EC, Li Z, Zandee SEJ, Couturier CP, et al. MAFG-driven astrocytes promote CNS inflammation. Nature 2020;578(7796):593-9.

      Zhang J, Feng G, Bao G, Xu G, Sun Y, Li W, et al. Nuclear translocation of PKM2 modulates astrocyte proliferation via p27 and -catenin pathway after spinal cord injury. Cell Cycle 2015;14(16):2609-18.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Aging is associated with a number of physiologic changes including perturbed circadian rhythms. However, mechanisms by which rhythms are altered remain unknown. Here authors tested the hypothesis that age-dependent factors in the sera affect the core clock or outputs of the core clock in cultured fibroblasts. They find that both sera from young and old donors are equally potent at driving robust ~24h oscillations in gene expression, and report the surprising finding that the cyclic transcriptome after stimulation by young or old sera differs markedly. In particular, genes involved in the cell cycle and transcription/translation remain rhythmic in both conditions, while genes associated with oxidative phosphorylation and Alzheimer's Disease lose rhythmicity in the aged condition. Also, the expression of cycling genes associated with cholesterol biosynthesis increases in the cells entrained with old serum. Together, the findings suggest that age-dependent blood-borne factors, yet to be identified, affect circadian rhythms in the periphery. The most interesting aspect of the paper is that the data suggest that the same system (BJ-5TA), may significantly change its rhythmic transcriptome depending on how the cells are synchronized. While there is a succinct discussion point on this, it should be expanded and described whether there are parallels with previous works, as well as what would be possible mechanisms for such an effect.

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Major points: 

      Fig 1 and Table S1. Serum composition and levels of relevant blood-borne factors probably change in function of time. At what time of the day were the serum samples from the old and young groups collected? This important information should be provided in the text and added to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      Fig 2A. Luminescence traces: the manuscript would greatly benefit from inclusion of raw luminescence traces.

      Raw luminescence traces have been added to Figure S3 (S3A).

      Fig 2. Of the many genes that change their rhythms after stimulation with young and old sera, what are the typical fold changes? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? 

      We’ve presented these data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      Fig. 2 Gene expression. Also here, the presentation would benefit from showing a few key examples for different types of responses. 

      Sample traces of genes that gain rhythmicity, lose rhythmicity, phase shift, and change MESOR are now illustrated in Figure S6.

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? This could easily be assessed using published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not a carcinoma derived cell line. We’ve added this point in lines 98-101.

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      Author response image 1.

      BJ-5TA and U2OS Cells Exhibit Similar Profiles of Circadian Gene Transcription. We compared the transcriptomic profiles of the BJ-5TA cells in young and old serum (left) to the U2OS transcriptomic data (right) available on CircaDB, a database containing profiles of several circadian reference genes in U2OS cells. This figure suggests that circadian profiles of these genes exhibit many similarities. We find that the peak to trough ratios (amplitudes) are similar for ARNTL, NR1D1, NR1D2, Per2, PER3, and that the MESORS are similar (with the exception of NR1D1 which is much lower and NR1D2 which is much higher in the BJ-5TA cells). We find that the amplitudes of CRY1 is ~25% lower and TEF is ~15% higher for the BJ5TA cells. The axis for plots on the left show counts divided by 3.5 in order to made MESORs of ARNTL similar to ease comparison.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes, to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue4.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      We are indeed suggesting this, although it is also possible that it is not cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 has been seen in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      Reviewer #2 (Public Review): 

      Schwarz et al. have presented a study aiming to investigate whether circulating factors in sera of subjects are able to synchronize depending on age, circadian rhythms of fibroblast. The authors used human serum taken from either old (age 70-76) or young (age 25-30) individuals to synchronise cultured fibroblasts containing a clock gene promoter driven luciferase reporter, followed by RNA sequencing to investigate whole gene expression. 

      This study has the potential to be very interesting, as evidence of circulating factors in sera that mediate peripheral rhythms has long been sought after. Moreover, the possibility that those factors are affected by age which could contribute to the weaken circadian rhythmicity observed with aging. 

      Here, the authors concluded that both old and young sera are equally competent at driving robust 24 hour oscillations, in particular for clock genes, although the cycling behaviour and nature of different genes is altered between the two groups, which is attributed to the age of the individuals. This conclusion could however be influenced by individual variabilities within and between the two age groups. The groups are relatively small, only four individual two females and two males, per group. And in addition, factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals, are not taken into consideration. As seen in figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the true data. More focus should be attributed to investigating the effects of serum from each individual and observing common patterns. Additionally, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion.  

      The authors also note in the introduction that rhythms in different peripheral tissues vary in different ways with age, however the entire study is performed on only fibroblast, classified as peripheral tissue by the authors. It would be very interesting to investigate if the observed changes in fibroblast are extended or not to other cell lines from diverse organ origin. This could provide information about whether circulating circadian synchronising factors could exert their function systemically or on specific tissues. At the very least, this hypothesis should be addressed within the discussion. 

      It is likely that factors circulating in serum act on several tissues, and so their effects are relatively broad. However, this would require extensive investigation of other tissues. We now discuss this in the manuscript.

      In addition to the limitations indicated above I consider that the data of the study is an insufficiently analysis beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways.

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168. 

      Recommendations for the authors:

      The two reviewers and reviewing editor have agreed on the following recommendations for the authors: 

      Major: 

      (1) The bioinformatic analysis would benefit from a more thorough focus on variability between individuals. Specifically, the main conclusion of the manuscript could be significantly influenced by individual variabilities within and between the two age groups. This is of particular concern, as the groups are relatively small (four individual two females and two males, per group). In addition, the consideration of factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals should be more adequately explained. The lab is an experienced chronobiology lab, and thus we are confident that these factors had been thought of, but this needs to be better made clear.

      As seen in Figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the relevant data. Furthermore, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion. 

      (2) The study would benefit from a more thorough analysis of the data beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways. This would provide additional value to the study, especially given the otherwise apparent lack of any mechanistic explanation. 

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168.

      (3) There were some questions about the amplitude of the core circadian clock gene rhythms raised, which in other human cell types would be much higher. A comment on this matter and the provision of the raw luminescence traces for Fig 2A would be greatly beneficial.

      Addressing the same topic: what are the typical fold changes of the many genes that change their rhythms after stimulation with young and old sera? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? The presentation of the manuscript would further benefit from showing a few key examples for different types of responses. 

      The average luminescence trace for each individual serum sample from Fig 2A has been added to Fig S3A.

      We’ve presented the fold change data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      (4) There are several points that we recommend to consider to add to the discussion: 

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? It should be relatively easy to address this point by assessing published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not carcinoma derived cell line. We’ve added this point in lines 98-101. 

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      It may not be the cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 is seen also in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? Could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      The discussion would also benefit from mentioning parallels and dissimiliarities with previous works, as well as what would be possible mechanisms for such an effect. 

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Minor: 

      While time of serum collection is provided in the methods, it would be very useful to provide this information, along with the accompanying argumentation also at a more prominent position and to also add it to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donors, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature in our discussion.

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files.

      Both output and input files are included in this submission as additional files.  

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to broadly identify relevant pathways defined by different algorithms. From these data, we focused in particular on KEGG pathways.

      Reviewer #1 (Recommendations For The Authors): 

      These comments are in addition to those provided above: 

      Minor: 

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donor, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature.

      Fig.4 The fold change amplitude of the clock gene seems quite a bit lower than what is usually expected (for Nr1d1 it is usually 10 fold). The authors should provide an explanation and discuss this. 

      There are a variety of factors that contribute to the fold change amplitude of clock genes. First, the change in amplitude of clock genes is lower in vitro compared to in vivo samples. For example, in U2OS cell cultures the fold change in the cycling of Nr1d1 is only 2 fold and is not significantly different from the fold change we observe (as shown in the U2OS data from CircaDB plotted in Figure 1R). Second, the method of synchronization contributes to the strength of the rhythms. Serum synchronization is generally less effective at driving strong clock cycling than forskolin or dexamethasone although, as noted in the manuscript, it may promote the cycling of more genes. Lastly, rhythm amplitude is also dependent on the cell type in question so cell to cell variability also contributes to differences. However, overall, we do not find major differences in comparing the U2OS data and ours. Please note that the y-axis has a logarithmic scale.

      What is the authors' strategy to identify which serum components that are responsible for the reported changes? This should be discussed. 

      In the future, we intend to analyze the serum factors using a combination of fractionation and either proteomics or metabolomics to identify relevant factors. We have added this to the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the article is well-written but lacks some more rigorous data analysis as mentioned in the public review above. In addition to a more thorough analysis approach focusing much more heavily on individual variability, several other changes can be made to strengthen this study:

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files. 

      Both output and input files are included in this submission as additional files.

      Fig 1A. - Only n=5 participants were used for this analysis, explanation of the exclusion criteria for the other participants would be useful. 

      As Figure 1A is a schematic, we assume the reviewer is referring to Figure 1B. We’ve provided a flow chart of subject inclusion/exclusion in Figure S2.

      Fig 2. - For circadian transcriptome analysis only n=4 participants were used - what criteria was used to exclude individuals, and why were only these individuals used in the end? 

      As patient recruitment was interrupted by COVID, we selected samples where we had sufficient serum to effectively carry out the RNA seq experiment and control for age and sex.

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to identify relevant pathways. We do not present any STRING networks in the paper.

      Line 68 - "These novel findings suggest that it may be possible to treat impaired circadian physiology and the associated disease risks by targeting blood borne factors." This is a completed overstatement that are cannot be sustained by the limited findings provided by the authors. 

      We’ve modified this statement to avoid overstating results.

      (1) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proceedings of the National Academy of Sciences 108, 7218–7223 (2011).

      (2) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proc Natl Acad Sci U S A 108, 7218–7223 (2011).

      (3) Lee, Y. et al. G1/S cell cycle regulators mediate effects of circadian dysregulation on tumor growth and provide targets for timed anticancer treatment. PLOS Biology 17, e3000228 (2019).

      (4) Tomasetti, C. et al. Cell division rates decrease with age, providing a potential explanation for the age-dependent deceleration in cancer incidence. Proceedings of the National Academy of Sciences 116, 20482–20488 (2019).

      (5) Cela, O. et al. Clock genes-dependent acetylation of complex I sets rhythmic activity of mitochondrial OxPhos. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1863, 596–606 (2016).

      (6) Scrima, R. et al. Mitochondrial calcium drives clock gene-dependent activation of pyruvate dehydrogenase and of oxidative phosphorylation. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1867, 118815 (2020).

      (7) Lesnefsky, E. J. & Hoppel, C. L. Oxidative phosphorylation and aging. Ageing Research Reviews 5, 402–433 (2006).

      (8) Greco, M. et al. Marked aging-related decline in efficiency of oxidative phosphorylation in human skin fibroblasts. The FASEB Journal 17, 1706–1708 (2003).

      (9) Federico, A. et al. Mitochondria, oxidative stress and neurodegeneration. Journal of the Neurological Sciences 322, 254–262 (2012).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Author response:

      Reviewer #1:

      The main objective of this study is to achieve the development of a synthetic autotroph using adaptive laboratory evolution. To accomplish this, the authors conducted chemostat cultivation of engineered E. coli strains under xylose-limiting conditions and identified autotrophic growth and the causative mutations. Additionally, the mutational mechanisms underlying these causative mutations were also explored with drill down assays. Overall, the authors demonstrated that only a small number of genetic changes were sufficient (i.e., 3) to construct an autotrophic E. coli when additional heterologous genes were added. While natural autotrophic microorganisms typically exhibit low genetic tractability, numerous studies have focused on constructing synthetic autotrophs using platform microorganisms such as E. coli. Consequently, this research will be of interest to synthetic biologists and systems biologists working on the development of synthetic autotrophic microorganisms. The conclusions of this paper are mostly well supported by appropriate experimental methods and logical reasoning. However, further experimental validation of the mutational mechanisms involving rpoB and crp would enhance readers' understanding and provide clearer insights, despite acknowledgement that these genes impact a broad set of additional genes. Additionally, a similar study, 10.1371/journal.pgen.1001186, where pgi was deleted from the E. coli genome and evolved to reveal an rpoB mutation is relevant to this work and should be placed in the context of the presented findings.

      We thank the reviewer for pointing this study out. It is very interesting that a mutation in a similar region in RpoB was observed in a related context of Pgi loss of activity. We have added a reference to this study in our text (Page 11, line 21).

      he authors addressed rpoB and crp as one unit and performed validation. They cultivated the mutant strain and wild type in a minimal xylose medium with or without formate, comparing their growth and NADH levels. The authors argued that the increased NADH level in the mutant strain might facilitate autotrophic growth. Although these phenotypes appear to be closely related, their relationship cannot be definitively concluded based on the findings presented in this paper alone. Therefore, one recommendation is to explore investigating transcriptomic changes induced by the rpoB and crp mutations. Otherwise, conducting experimental verification to determine whether the NADH level directly causes autotrophic growth would provide further support for the authors' claim.

      We appreciate the valuable comment and agree that the work was lacking such an analysis. Due to various reasons we have opted to use a proteomic approach which we feel fulfills the same purpose as the transcriptomics suggestion. We found interesting evidence in up-regulation of the fdoGH operon (comprising the native formate dehydrogenase O enzyme complex) which could indicate why there is an increase in NADH/NAD+ levels. We also hypothesize that this upregulation might be important more generally by drawing comparisons to natural chemo-autotrophs.

      Further experimental work (which we were not able to include in the current study) could help validate this link by deleting fdoGH and observing a loss of phenotype and, on the flip side, directly overexpressing the fdoGH operon and observing an increase in the NADH/NAD+ ratio. Indeed, if this overexpression were to prove sufficient for achieving an autotrophic phenotype without the mutations in the global transcription regulators, it would be a much more transparent design.

      We have added a section titled "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes" to the Results based on this analysis.

      • It would be beneficial to provide a more detailed explanation of the genetic background before the evolution stage, specifically regarding the ∆pfk and ∆zwf mutations. Furthermore, it is suggested to include a figure that provides a comprehensive depiction of the reductive pentose phosphate pathway and the bypass pathway. These will help readers grasp the concept of the "metabolic scaffold" as proposed by the authors.

      We agree with the reviewer that this could be helpful and we added a reference to the original paper Gleizer et al. 2019 that reported this design and also includes the relevant figure. We feel that the figure should not be added to the current manuscript as we continue to show that this design is not relevant in the context of the three reported mutations and such a figure could distract the attention of the reader from the main takeaways of the current study.

      • Despite the essentiality of the rpoB mutation (A1245V) to the autotrophic phenotype in the final strain, the inclusion of this mutation in step C1 does not appear to be justified. According to line 37 on page 3, the authors chose to retain the unintended mutation in rpoB based on its essentiality to the phenotype observed in other evolved strains. However, it should be noted that the mutations found in the evolved strain I, II, and III (P552T or D866E) were entirely different from the unintended mutation (A1245V) during genetic engineering. This aspect should be revised to avoid confusion among readers.

      Thank you for pointing this issue out, we added a clarification in the text (page 4 line 7) to avoid such confusion. We believe this point is much clearer now.

      The rpoB mutation which was shown to be essential in the study is indeed known to be common in ALE experiments in E. coli. Thus, I searched the different rpoB mutations in ALEdb in E. coli and I was able to find a similar mutation in a study where pgi was knocked out and then evolved. https://doi.org/10.1371/journal.pgen.1001186 This study seems very relevant given that pgi was a key mutation in the compact set of this work and the section "Modulation of a metabolic branch-point activity increased the concentration of rPP metabolites" informs that loss of function mutations in pgi were also found. The findings of this study should thus be put in the context of the previous related ALE study. I would recommend a similar analysis of crp mutations from studies in ALEdb to see if there are similar mutations in this gene as well or if this a unique mutation.

      We thank the reviewer for bringing this publication to our attention. We have addressed this observation in the main text (page 11 , line 21). We agree that it could have some connection to the pgi mutation yet we would not want to overspeculate about this role, as we also found the exact same mutation (A1245V) as an adaptation to higher temperature in another E. coli study (Tenaillon et al. 2012). We would like to bring forward the fact that the two reported rpoB mutations are always accompanied by another mutation with pleiotropic effects, either in the transcription factor Crp or in another RNA polymerase subunit (e.g RpoC). As such many epistatic effects could occur, one of which we also report here in page 13, line 18. In conclusion, although there could be a connection between the rpoB and pgi mutations, it could be a mere coincidence and the two mutations could exhibit two distinct roles in two distinct phenotypes.

      We also would like to thank the reviewer for suggesting a similar analysis for crp and found another mutation at a nearby residue with strong adaptive effects and mentioned it in our main text.

      Can the typical number of mutations found in a given ALE experiment be directly compared to those found in this study? It seems like a retrospective analysis of other ALE studies to show how many mutations typically occur in an ALE study and sets which were found to be causal to reproduce the phenotype of interest (through similar reverse engineering in the starting strain) should be presented. Again, the authors cite ALEdb which should provide direct numbers of mutations found in similar ALE studies with E. coli and one could then examine them to find sets of clearly causal mutations which recreate phenotypes of interest. Such an analysis would go a long way in supporting the main finding of "small number" of mutations.

      Discussion, page 12, line 42. "This could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts". There is an entire body of work around growth-coupled production which can be predicted and evolved with a genome-scale metabolic model and ALE. Thus, if this statement is going to be made, relevant studies should be cited and placed in context.

      The reviewer raises an important point which could indeed yield an interesting perspective. However, it would be difficult to perform this comparison in practice since many of the studies published on ALEdb have not isolated essential mutations from other mutation incidents nor have they determined the role of each mutation in the reported phenotypes. For example, many ALE trajectories include a hypermutator that greatly increases the number of irrelevant mutations and it is nearly impossible to sieve through them to find an essential set.

      Moreover, it is hard to compare the “level of difficulty” of achieving one phenotype over another and therefore feel that even though such an analysis would be insightful, it requires an amount of work which is outside the scope of this study.

      Finally, we would like to highlight our approach of using the iterative approach, isolating the relevant consensus mutations and repeating this process until no evolution process is required, we are not aware of prior studies that used this approach.

      We now clarified what we mean by "promising strategy" in the discussion in order to avoid any false claims about novelty (page 16 line 32): "Using metabolic growth-coupling as a temporary 'metabolic scaffold' that can be removed, could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts."

      Reviewer #2:

      Synthetic autotrophy of biotechnologically relevant microorganisms offers exciting chances for CO2 neutral or even CO2 negative production of goods. The authors' lab has recently published an engineered and evolved Escherichia coli strain that can grow on CO2 as its only carbon source. Lab evolution was necessary to achieve growth. Evolved strains displayed tens of mutations, of which likely not all are necessary for the desired phenotype.

      In the present paper the authors identify the mutations that are necessary and sufficient to enable autotrophic growth of engineered E. coli. Three mutations were identified, and their phenotypic role in enhancing growth via the introduced Calvin-Benson-Bassham cycle were characterized. It was demonstrated that these mutations allow autotrophic growth of E. coli with the introduced CBB cycle without any further metabolic intervention. Autotrophic growth is demonstrated by 13C labelling with 13C CO2, measured in proteinogenic amino acids. In Figures 2B and S1, the labeling data are shown, with an interval of the "predicted range under 13CO2".

      Here, the authors should describe how this interval was derived.

      The methodology is clearly described and appropriate.

      The present results will allow other labs to engineer E. coli and other microorganisms further to assimilate CO2 efficiently into biomass and metabolic products. The importance is evident in the opportunity to employ such strain in CO2 based biotech processes for the production of food and feed protein or chemicals, to reduce atmospheric CO2 levels and the consumption of fossil resources.

      Please describe in the methodology how the interval of the predicted range of 13C labeling was derived for Figures 2B and S1. Was it calculated by the dilution factor during 4 generations, or did you predict the label incorporation individually with a metabolic model?

      The text needs careful editing, some sentences are incomplete and there are frequent inconsistencies in writing metabolites and enzymes.

      P2L6: unclear sentence (incomplete?)

      P2L19: pastoris with lower case "p"

      P2L40: incomplete sentence

      P2L42: here, and at many other places, the writing of RuBisCO needs to be aligned. It is an abbreviation and should begin with a capital letter. Most commonly it is written as RuBisCO which I would suggest - please unify throughout the text.

      P3L3: formate dehydrogenase ... metabolites and enzymes with lower case letter. And, no hyphen here.

      P5L4: delete the : after unintentionally

      P6L16: carboxylation of RuBP (it is not CO2 that is carboxylated - if any, CO2 is carboxylating)

      P7L25: phosphoglucoisomerase (lower case)

      P8L5: in line

      P8L9: part of glycolysis/ ...

      P10L4: pentose phosphates (lower case, no hyphen).

      P10L4: all metabolites lower case

      P12L28: incomplete sentence

      P18L4: Escherichia coli in italics P18L15: Pseudomonas sp. in italics P18L16: ... promoter and with a strong ...

      P20, chapter Metabolomics: put the numbers of 12C and 13C in superscript P23L9: pentose phosphates ; all metabolites in lower case (as above) P23: all 12C and 13C with superscript numbers.

      Response to reviewer #2:

      We thank the reviewer for their comments, and for pointing out the need to clarify how we derived the predicted range of 13C labeling. We edited the text accordingly, and added the relevant calculation to the methods section (under the “13C Isotopic labeling experiment”). We would like to also thank the reviewer for the required text improvements, which were implemented. 

      Reviewer #3:

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      We thank the reviewer for their comments and constructive feedback.

      We agree that there is probably a broader effect caused by the rpoB and crp mutations, besides the change in the NADH/NAD+ ratio. Hence, we performed a proteomics analysis, comparing the rpoB and crp mutations on a WT background to an autotrophic E.coli, searching for a mutual change in both strains compared to their "ancestors". We found up-regulation of rPP cycle and formate-associated genes, and a down-regulation of catabolic genes. We added a section dedicated to this matter under the title "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes".

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      We appreciate this recommendation and indeed tried to delete pgi, but the genetic manipulation caused a knockout of other genes along with pgi (pepE, rluF, yjbD, lysC) so in the time available to us we cannot confidently determine whether the deletion alone is sufficient and can replace the mutation.

      Regarding crp, we do not think there is a reason to believe the mutation is a loss-of-function. In any case, the proteomics-based characterization of the crp mutation is now included in the SI.

      1. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      Indeed, the fact that the proteins are interacting in the cell does not necessarily mean that the mutations are functionally connected. We therefore added as further justification in the new section:

      "As far as we know, the mutations in the Crp and RpoB genes affect the binding of the RNA polymerase complex to DNA and/or its transcription rates. Depending on the transcribed gene target, the effect of the two mutations might be additive, antagonistic, or synergistic. Since each one of these mutations individually (in combination with the pgi mutation) is not sufficient to achieve autotrophic growth, it is reasonable to assume that only the target genes whose levels of expression change significantly in the double-mutant are the ones relevant for the autotrophic phenotype”.

      In our proteomics analysis we considered each mutation separately. We found that in some cases the two mutations together have an additive effect, but in other cases we found that the two mutations together affect differently on the proteome, compared to the effect of each mutation alone. Since both mutations are essential to the phenotype, we decided to go with the approach of addressing the two mutations as one unit for the physiological and metabolic experiments.

      1. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      The mutation is located in “Activating Region 2”, interacting with RNA polymerase. We tried an in-vivo assay to determine the CRP H22N activity and got inconclusive results, we believe the proteomics analysis serves as a good method for understanding the global effect of the mutation.

      1. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      We added a supplementary figure regarding the structural location of the two mutations, where it is demonstrated that crp H22N is located in a region interacting with the RNA polymerase and rpoB A1245V is located in proximity to regions interacting with the DNA.

      1. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

      Indeed we agree that an omics approach to infer the global effect of these mutations is beneficial, we opted to use a proteomics approach and think it serves the purpose of clarifying the final, down-stream, effect on the cell.

      1. Page 2, lines 40-45, the authors should more clearly explain that the deletion of pfkA, pfkB and zwf was part of the experimental evolution strategy in their earlier work (Gleizer et al., 2019), and not a new strategy in the current study.

      We thank you for pointing this out, and edited the text accordingly.

      1. Page 3, line 27. Why did the authors compare the newly acquired mutants to only two mutants from the earlier work, not all 6?

      The 6 clones that were isolated in Gleizer et al., had 2 distinct mutation profiles. During the isolation process the lineage split into two groups. Three out of the 6 clones (clones 1,2,6) came from the same ancestor, and the other three (clones 3,4,5) came from another ancestor. Hence, these two groups shared almost all of their mutations (see Venn diagram). We decided to use for our comparison the representative with the highest number of mutations from each group (clones 5 and 6).

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rühling et al analyzes the mode of entry of S. aureus into mammalian cells in culture. The authors propose a novel mechanism of rapid entry that involves the release of calcium from lysosomes via NAADP-stimulated activation of TPC1, which in turn causes lysosomal exocytosis; exocytic release of lysosomal acid sphingomyelinase (ASM) is then envisaged to convert exofacial sphingomyelin to ceramide. These events not only induce the rapid entry of the bacteria into the host cells but are also described to alter the fate of the intracellular S. aureus, facilitating escape from the endocytic vacuole to the cytosol.

      Strengths:

      The proposed mechanism is novel and could have important biological consequences.

      Weaknesses:

      Unfortunately, the evidence provided is unconvincing and insufficient to document the multiple, complex steps suggested. In fact, there appear to be numerous internal inconsistencies that detract from the validity of the conclusions, which were reached mostly based on the use of pharmacological agents of imperfect specificity.

      We thank the reviewer for the detailed evaluation of our manuscript. We will address the criticism below.

      We agree with the reviewer that many of the experiments presented in our study rely on the usage of inhibitors. However, we want to emphasize that the main conclusion (invasion pathway affects the intracellular fate/phagosomal escape) was demonstrated without the use of inhibitors or genetic ablation in two key experiments (Figure5 D/E). These experiments were in line with the results we obtained with inhibitors (amitriptyline [Figure 4D], ARC39, PCK310, [Figure 4C] and Vacuolin-1 [Figure4E]). Importantly, the hypothesis was also supported by another key experiment, in which we showed the intracellular fate of bacteria is affected by removal of SM from the plasma membrane before invasion, but not by removal of SM from phagosomal membranes after bacteria internalization (Figure5A-C). Taken together, we thus believe that the main hypothesis is strongly supported by our data.

      Moreover, we either used different inhibitors for the same molecule (ASM was inhibited by ARC39, amitriptyline and PCK310 with similar outcome) or supported our hypothesis with gene-ablated cell pools (TPC1, Syt7, SARM1), as we will point out in more detail below.

      Firstly, the release of calcium from lysosomes is not demonstrated. Localized changes in the immediate vicinity of lysosomes need to be measured to ascertain that these organelles are the source of cytosolic calcium changes. In fact, 9-phenantrol, which the authors find to be the most potent inhibitor of invasion and hence of the putative calcium changes, is not a blocker of lysosomal calcium release but instead blocks plasmalemmal TRPM4 channels. On the other hand, invasion is seemingly independent of external calcium. These findings are inconsistent with each other and point to non-specific effects of 9-phenantrol. The fact that ionomycin decreases invasion efficiency is taken as additional evidence of the importance of lysosomal calcium release. It is not clear how these observations support involvement of lysosomal calcium release and exocytosis; in fact treatment with the ionophore should itself have induced lysosomal exocytosis and stimulated, rather than inhibited invasion. Yet, manipulations that increase and others that decrease cytosolic calcium both inhibited invasion.

      With respect to lysosomal Ca<sup>2<sup>+</sup></sup> release, we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection will improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.1 The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the main manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      Author response image 1.

      Lysosomal Ca<sup>2<sup>+</sup></sup> imaging during S. aureus infection. The lysosomes of HuLEC were stained with two dextran-coupled fluorescent dyes. A Ca<sup>2<sup>+</sup></sup>-sensitive dye Rhod-2 as well as Ca<sup>2<sup>+</sup></sup>insensitive AF647. Cells were infected with fluorescent S. aureus JE2 and monitored by live cell imaging (see Author response video 1). The intensity of Rhod-2/AF647 was measured close to a S. aureus-host contact site. Ratio of Rhod-2 vs. AF647 fluorescence intensity was calculated

      As to the TRPM4 involvement in S. aureus host cell internalization, it has been reported that TRPM4 is activated by cytosolic Ca<sup>2<sup>+</sup></sup>. However, the channel conducts monovalent cations such as K<sup>+</sup> or Na<sup>+</sup> but is impermeable for Ca<sup>2<sup>+</sup></sup> [2, 3]. The following of our observations are supporting this:

      i) S. aureus invasion is dependent on intracellular Ca<sup>2<sup>+</sup></sup>, but is independent from extracellular Ca<sup>2<sup>+</sup></sup>  (Figure 1A).

      ii) 9-phenantrol treatment reduces S. aureus internalization by host cells, illustrating the dependence of this process on TRPM4 (data removed from the manuscript) . We therefore hypothesize that TRPM4 is activated by Ca<sup>2<sup>+</sup></sup> released from lysosomes (see above).

      TRPM4 is localized to focal adhesions and is connected to actin cytoskeleton[4, 5] – a requisite of host cell entry of S. aureus.[6, 7] This speaks for an important function of TRPM4 in uptake of S. aureus in general, but does not necessarily have to be involved exclusively in the rapid uptake pathway.

      TRPM4 itself is not permeable for Ca<sup>2<sup>+</sup></sup> but is activated by the cation.  Thus, it is unlikely to cause lysosomal exocytosis. The stronger bacterial uptake reduction by treatment with 9-phenantrol when compared to Ned19 thus may be caused by the involvement of TRPM4 in additional pathways of S. aureus host cell entry involving that association of TRPM4 with focal adhesions or as pointed out by the reviewer, unspecific side effects of 9-phenantrol that we currently cannot exclude.  However, we think that experiments with 9-phenantrol distract from the main story (lysosomal Ca<sup>2<sup>+</sup></sup> and exocytosis) and might be confusing for the reader. We thus removed all data and discussion concerning 9phenantrol in the revised manuscript.

      Regarding the reduced S. aureus invasion after ionomycin treatment, we agree with the reviewer that ionomycin is known to lead to lysosomal exocytosis as was previously shown by others8 as well as our laboratory[9}. 

      We hypothesized that pretreatment with ionomycin would trigger lysosomal exocytosis and thus would reduce the pool of lysosomes that can undergo exocytosis before host cells are contacted by S. aureus. As a result, we should observe a marked reduction of S. aureus internalization in such “lysosome-depleted cells”, if the lysosomal exocytosis is coupled to bacterial uptake. Our observation of reduced bacterial internalization after ionomycin treatment supports this hypothesis.

      However, ionomycin treatment and S. aureus infection of host cells are distinct processes.  

      While ionomycin results in strong global and non-directional lysosomal exocytosis of all “releasable” lysosomes (~5-10 % of all lysosomes according to previous observations)8, we hypothesize that lysosomal exocytosis upon contact with S. aureus only involves a small proportion of lysosomes at host-bacteria contact sites. This is supported by experiments that demonstrate that ~30% of the lysosomes that are released by ionomycin treatment are exocytosed during S. aureus infection (see below and Figure 2, A-C). We added this new data as well as an according section to the discussion  (line 563 ff). Moreover, we moved the data obtained with ionomycin to Figure 2E and described our idea behind this experiment more precisely (line 166 ff).

      The proposed role of NAADP is based on the effects of "knocking out" TPC1 and on the pharmacological effects of Ned-19. It is noteworthy that TPC2, rather than TPC1, is generally believed to be the primary TPC isoform of lysosomes. Moreover, the gene ablation accomplished in the TPC1 "knockouts" is only partial and rather unsatisfactory. Definitive conclusions about the role of TPC1 can only be reached with proper, full knockouts. Even the pharmacological approach is unconvincing because the high doses of Ned-19 used should have blocked both TPC isoforms and presumably precluded invasion. Instead, invasion is reduced by only ≈50%. A much greater inhibition was reported using 9-phenantrol, the blocker of plasmalemmal calcium channels. How is the selective involvement of lysosomal TPC1 channels justified?

      As to partial gene ablation of TPC1: To avoid clonal variances, we usually perform pool sorting to obtain a cell population that predominantly contains cells -here- deficient in TPC1, but also a small proportion of wildtype cells as seen by the residual TPC1 protein on the Western blot. We observe a significant reduction in bacterial uptake in this cell pool suggesting that the uptake reduction in a pure K.O. population may be even more pronounced. 

      As to the inhibition by Ned19: 

      The scale of invasion reduction upon Ned19 treatment (50%, Figure 1B) is comparable with the reduction caused by other compounds that influence the ASM-dependent pathway (such as amitriptyline, ARC39 [Figure 2G], BAPTA-AM [Figure 1A], Vacuolin-1 [Figure 2D], β-toxin [Figure 2L] and ionomycin [Figure 2E]). Further, the partial reduction of invasion is most likely due to the concurrent activity of multiple internalization pathways which are not all targeted by the used compounds and which we briefly discuss in the manuscript.

      We agree with the reviewer that Ned19 inhibits TPC1 and TPC2. Since ablation of TPC1 reduced invasion of S. aureus, we concluded that TPC1 is important for S. aureus host cell invasion. We thus agree with the reviewer that a role for TPC2 cannot be excluded. We clarified this in the revised manuscript (Lines 552). It needs to be noted, however, that deficiency in either TPC1 or TPC2 alone was sufficient to prevent Ebola virus infection10, which is in line with our observations.

      In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these cell lines supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry and a role for both TPC channels (Author response image 2, see end of the document). Since we did not have a single TPCN2 knock-out available we decided to exclude these data from the main manuscript.

      Author response image 2.

      Invasion efficiency is reduced in TPC1/TPC2 double K.O. HeLa cells. Invasion efficiency of S. aureus JE2 was determined in TPC1/TPC2 double K.O. cells after 10 and 30 min. Results were normalized to the parental HeLa WT cell line (set to 100 %).  

      Invoking an elevation of NAADP as the mediator of calcium release requires measurements of the changes in NAADP concentration in response to the bacteria. This was not performed. Instead, the authors analyzed the possible contribution of putative NAADP-generating systems and reported that the most active of these, CD38, was without effect, while the elimination of SARM1, another potential source of NAADP, had a very modest (≈20%) inhibitory effect that may have been due to clonal variation, which was not ruled out. In view of these data, the conclusion that NAADP is involved in the invasion process seems unwarranted.

      Our results from two independent experimental set-ups (Ned19 [Figure 1B] and TPC1 K.O. [Figure 1C & Figure 2N]) indicate the involvement of NAADP in the process. Together with the metabolomics unit at the Biocenter Würzburg, we attempted to measure cellular NAADP levels, however, this proved to be non-trivial and requires further optimization. However, we can rule out clonal variation in the SARM1 mutant since experiments were conducted with a cell pool as described above in order to avoid clonal variation of single clones.

      The mechanism behind biosynthesis of NAADP is still debated. CD38 was the first enzyme discovered to possess the ability of producing NAADP. However, it requires acidic pH to produce NAADP[11] -which does not match the characteristics of a cytosolic NAADP producer. HeLa cells do not express CD38 and hence, it is not surprising that inhibition of CD38 had no effect on S. aureus invasion in HeLa cells. However, NAADP production by HeLa cells was observed in absence of CD38[12]. Thus CD38independent NAADP generation is likely. SARM1 can produce NAADP at neutral pH[13] and is expressed in HeLa, thus providing a more promising candidate.  

      We agree with the reviewer that the reduction of S. aureus internalization after ablation of SARM1 is less pronounced than in other experiments of ours. This may be explained by NAADP originating from other enzymes, such as the recently discovered DUOX1, DUOX2, NOX1 and NOX2[14], which – with exception of DUOX2- possess a low expression even in HeLa cells. We add this to the discussion in the revised manuscript (line 579).

      We can, however, rule out clonal variation for the inhibitory effect. As stated above we generated K.O. cell pools specifically to avoid inherent problems of clonality. Thus, we also detect some residual wildtype cells within our cell pools.  

      The involvement of lysosomal secretion is, again, predicated largely on the basis of pharmacological evidence. No direct evidence is provided for the insertion of lysosomal components into the plasma membrane, or for the release of lysosomal contents to the medium. Instead, inhibition of lysosomal exocytosis by vacuolin-1 is the sole source of evidence. However, vacuolin-1 is by no means a specific inhibitor of lysosomal secretion: it is now known to act primarily as a PIKfyve inhibitor and to cause massive distortion of the endocytic compartment, including gross swelling of endolysosomes. The modest (20-25%) inhibition observed when using synaptotagmin 7 knockout cells is similarly not convincing proof of the requirement for lysosomal secretion.

      We agree with the reviewer that the manuscript will benefit from a functional analysis of lysosomal exocytosis and therefore conducted assays to investigate exocytosis in the revised manuscript. We previously showed i) by addition of specific antisera that LAMP1 transiently is exposed on the plasma membrane during ionomycin and pore-forming toxin challenge and ii) demonstrated the release of ASM activity into the culture medium under these conditions.[9] However, both measurements are not compatible with S. aureus infection, since LAMP1 antibodies also are non-specifically bound by protein A and another IgG-binding proteins on the S. aureus surface, which would bias the results. Since protein A also may serve as an adhesin in the investigated pathway, we cannot simply delete the ORF without changing other aspects of staphylococcal virulence. Further, FBS contains a ASM background activity that impedes activity measurements of cell culture medium. We previously removed this background activity by a specific heat-inactivation protocol.[9] However, S. aureus invasion is strongly reduced in culture medium containing this heat-inactivated FBS.

      We therefore developed a luminescence assay based on split NanoLuc luciferase that enables detection of LAMP1 exposed on the plasma membrane without usage of antibodies (Figure 2, A-C). We added a section on the assay in the revised manuscript. Briefly, we generated reporter cells by fusing a short peptide fragment of NanoLuc called HiBiT between the signal peptide and the mature luminal domain of LAMP1 and stably expressed the resulting protein in HeLa cells by lentiviral transduction. The LgBiT protein domain of NanoLuc luciferase (Promega) as well as the substrate Furimazine are added to the culture medium. HiBiT can reconstitute a functional NanoLuc with LgBiT and process Furimazine when lysosomes are exocytosed thereby generating luminescence measurable in a suitable plate reader. 

      With this assay we detected that  about 30% of lysosomes that were “releasable” by treatment with ionomycin are exocytosed during S. aureus infection. Lysosomal exocytosis was strongly reduced (even below the levels of untreated controls), if we treated cells with Vacuolin-1 or Ned19.  

      We agree with the reviewer that Vacuolin-1 to some extent has unspecific side effects as has been shown by others and which we addressed in the revised version of the manuscript (line 541 ff). However, our new results with the HiBiT reporter cell line clearly demonstrate a reduction of lysosomal exocytosis after Vacuolin-1 treatment. Supported by this and our other results we hypothesize that Vacuolin-1 decreases S. aureus internalization due to the inhibition of lysosomal exocytosis.

      As to the involvement of synaptotagmin 7: The effect of Syt7 K.O. on invasion was moderate in initial experiments, likely due to a high culture passage and presumably overgrowth of WT cells. However, reduction of invasion in Syt7 K.O.s was more pronounced in experiments with β-toxin complementation (Figure 2, N) and hence, we combined the two data sets (Figure 2, F). This demonstrates the reduction of bacterial invasion by ~40% in Syt7 K.O. cell pools. Moreover, Syt7 is not the only protein possibly involved in Ca<sup>2<sup>+</sup></sup>-dependent exocytosis. For instance, Syt1 has been shown to possess an overlapping function.[15] This may explain the differences between our Vacuolin-1 and Syt7 ablation experiments. We added this information to the discussion. 

      ASM is proposed to play a central role in the rapid invasion process. As above, most of the evidence offered in this regard is pharmacological and often inconsistent between inhibitors or among cell types. Some drugs affect some of the cells, but not others. It is difficult to reach general conclusions regarding the role of ASM. The argument is made even more complex by the authors' use of exogenous sphingomyelinase (beta-toxin). Pretreatment with the toxin decreased invasion efficiency, a seemingly paradoxical result. Incidentally, the effectiveness of the added toxin is never quantified/validated by directly measuring the generation of ceramide or the disappearance of SM.

      Although pharmacological inhibitors can have unspecific side effects, we want to emphasize that the inhibitors used in our study act on the enzyme ASM by completely different mechanisms. Amitriptyline is a so called functional inhibitor of ASM (FIASMA) which induces the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16] By contrast, ARC39 is a competitive inhibitor.[17, 18] 

      There are no inconsistencies in our data obtained with ASM inhibitors. Amitriptyline and ARC39 both reduce the invasion of S. aureus in HuLEC, HuVEC and HeLa cells (Figure 2G). ARC39 needs a longer pre-incubation, since its uptake by host cells is slower (to be published elsewhere). We observe a different outcome in 16HBE14o- and Ea.Hy 926 cells, with 16HBE14o- even demonstrating a slightly increased invasion of S. aureus upon ARC39 treatment. Amitriptyline had no effect (Figure 2G). 

      Thus, the ASM-dependent S. aureus internalization is cell type/line specific, which we state in the manuscript. The molecular origin of these differences is unclear and will require further investigation, e.g. in testing cell lines for potential differences in surface receptors. In a separate study we have already developed a biotinylation-based approach to identify potential novel host cell surface interaction partners during S. aureus infection.[19]

      Moreover, both inhibitors affected the invasion dynamics (Figure 3D), phagosomal escape (Figure 4C and Figure 4D) and Rab7 recruitment (Figure 4A and Supp. Figure 4A-C) in a similar fashion. Proper inhibition of ASM by both compounds in all cell lines used was validated by enzyme assays (Supp. Figure 2H), which again suggests that the ASM-dependent pathway does only exist in specific cell lines and also supports  that we do not observe unspecific side effects of the compounds. We clarified this in the revised manuscript.

      ASM is a key player for SM degradation and recycling. In clinical context, deficiency in ASM results in the so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered[20], which will result in severe side effects. Short-term inhibition by small molecules therefore poses a clear benefit when compared to the usage of ASM K.O. cells. In order to satisfy the query of the reviewer, we generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested these for S. aureus invasion efficiency (Figure 2, I). We did not observe bacterial invasion differences between WT and K.O. cells. However, when we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in inhibitor-treated WT cells  predominantly is due to absence of ASM, while the small reduction observed in ARC39treated ASM K.O.s is likely due to unspecific side effects.  

      We performed lipidomics on these cells and demonstrated a strongly altered sphingolipid profile in ASM K.O. cells compared to untreated and inhibitor-treated WT cells (Figure 2, K). We speculate that other ASM-independent bacterial invasion pathways are upregulated in ASM K.O.s., thereby obscuring the effect contributed by absence of ASM. We discussed this in the revised manuscript (line 518 ff).

      Moreover, we introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I.  The latter strain is non-cytotoxic and serves as negative control, since it is known to possess a very low escape rate, due to its inability to produce toxin. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As observed  for JE2, “early invaders” possess lower escape rates than “early<sup>+</sup>late invaders”.

      We did not observe differences between WT and ASM K.O. cells, if we infected for only 10 min. By contrast, we observed a lower escape rate in ASM K.O (Author response image 3, see end of the document). compared to WT cells, when we infected for 30 min.  

      However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). Reduced phagosomal escape of intracellular S. aureus in ASM K.O. cells may be caused by the altered sphingolipid profile(e.g., by interference with binding of bacterial toxins to phagosomal membranes or altered vesicular acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Author response image 3.

      Phagosomal escape rates were established in either HeLa wild-type or ASM K.O. cells expressing the phagosomal escape reporter RFP-CWT. Host cells that were infected with the cytotoxic S. aureus strain JE2 or the non-cytotoxic strain Cowan I for 10 or 30 minutes and escape rates were determined by microscopy 3h p.i.

      As to the treatment with a bacterial sphingomyelinase:

      Treatment with the bacterial SMase (bSMase, here: β-toxin) was performed in two different ways:

      i) Pretreatment of host cells with β-toxin to remove SM from the host cell surface before infection. This removes the substrate of ASM from the cell surface prior to addition of the bacteria (Figure 2L, Figure 4A-C). Since SM is not present on the extracellular plasma membrane leaflet after treatment, a release of ASM cannot cause localized ceramide formation at the sites of lysosomal exocytosis. Similar observations were made by others.[21] 

      ii) Addition of bSMase to host cells together with the bacteria to complement for the absence of ASM (Figure 2N).  

      Removal of the ASM substrate before infection (i) prevents localized ASM-mediated conversion of SM to Cer during infection and resulted in a decreased invasion, while addition of the SMase during infection resulted in an increased invasion in TPC1 and Syt7 ablated cells. Thus, both experiments are consistent with each other and in line with our other observations. 

      Removal of SM from the plasma membrane by β-toxin was indirectly demonstrated by the absence of Lysenin recruitment to phagosomes/escaped bacteria when host cells were pretreatment with the toxin before infection (Figure5C). We also added another data set that demonstrates degradation of a fluorescence SM derivative upon β-toxin treatment of host cells (Supp Figure 2, M). In another publication, we recently quantified the effectiveness of β-toxin treatment, even though with slightly longer treatment times (75 min vs. 3h).[22]

      To clarify our experimental approaches to the readership we added an explanatory section to the revised manuscript (line 287 ff) and we also added a scheme to in Figure 2M describing the experimental settings.

      As to the general conclusions regarding the role of ASM: ASM and lysosomal exocytosis has been shown to be involved in uptake of a variety of pathogens[21, 23-27] supporting its role in the process.

      The use of fluorescent analogs of sphingomyelin and ceramide is not well justified and it is unclear what conclusions can be derived from these observations. Despite the low resolution of the images provided, it appears as if the labeled lipids are largely in endomembrane compartments, where they would presumably be inaccessible to the secreted ASM. Moreover, considering the location of the BODIPY probe, the authors would be unable to distinguish intact sphingomyelin from its breakdown product, ceramide. What can be concluded from these experiments? Incidentally, the authors report only 10% of BODIPY-positive events after 10 min. What are the implications of this finding? That 90% of the invasion events are unrelated to sphingomyelin, ASM, and ceramide?

      During the experiments with fluorescent SM analogues (Figure 3a,b), S. aureus was added to the samples immediately before the start of video recording. Hence, bacteria are slowly trickling onto the host cells, and we thus can image the initial contact between them and the bacteria, for instance, the bacteria depicted in Figure 3A contact the host cell about 9 min before becoming BODIPY-FL-positive (see Supp. Video 1, 55 min). Hence, in these cases we see the formation of phagosomes around bacteria rather than bacteria in endomembrane compartments. Since generation of phagosomes happens at the plasma membrane, SM is accessible to secreted ASM.  

      The “trickling” approach for infection is an experimental difference to our invasion measurements, in which we synchronized the infection by  centrifugation. This ensures that all bacteria have contact to host cells and are not just floating in the culture medium. However, live cell imaging of initial bacterialhost contact and synchronization of infection is hard to combine technically.

      In our invasion measurements -with synchronization-, we typically see internalization of ~20% of all added bacteria after 30 min. Hence, most bacteria that are visible in our videos likely are still extracellular and only a small proportion was internalized. This explains why only 10% of total bacteria are positive for BODIPY-FL-SM after 10 min. The proportion of internalized bacteria that are positive for BODIPY-FL-SM should be way higher but cannot be determined with this method.

      We agree with the reviewer that we cannot observe conversion of BODIPY-FL-SM by ASM. In order to do that, we attempted to visualize the conversion of a visible-range SM FRET probe (Supp. Figure 3), but the structure of the probe is not compatible with measurement of conversion on the plasma membrane, since the FITC fluorophore released into the culture medium by the ASM activity thereby gets lost for imaging. In general, the visualization of SM conversion with subcellular resolution is challenging and even with novel tools developed in our lab[28] visualization of SM on the plasma membrane is difficult. 

      The conclusions we draw from these experiments are that i.) S. aureus invasion is associated with SM and ii.) SM-associated invasion can be very fast, since bacteria are rapidly engulfed by BODIPY-FL-SM containing membranes.

      It is also unclear how the authors can distinguish lysenin entry into ruptured vacuoles from the entry of RFP-CWT, used as a criterion of bacterial escape. Surely the molecular weights of the probes are not sufficiently different to prevent the latter one from traversing the permeabilized membrane until such time that the bacteria escape from the vacuole.

      We here want to clarify that both Lysenin as well as the CWT reporter have access to ruptured vacuoles (Figure 4B). We used the Lysenin reporter in these experiments for estimation of SM content of phagosomal membranes. If a vacuole is ruptured, both the bacteria and the luminal leaflet of the phagosomal membrane remnants get in contact with the cytosol and hence with the cytosolically expressed reporters YFP-Lysenin as well as RFP-CWT resulting in “Lysenin-positive escape” when phagosomes contained SM (see Figure 5C). By contrast, either β-toxin expression by S. aureus or pretreatment with the bSMase resulted in absence of Lysenin recruitment suggesting that the phagosomal SM levels were decreased/undetectable (Figure 5C, Supp Figure 6F, G, I, J).

      Although this approach does not enable a quantitative measurement of phagosomal SM, this method is sufficient to show that β-toxin expression and pretreatment result in markedly decreased phagosomal SM levels in the host cells.

      The approach we used here to analyze “Lysenin-positive escape” can clearly be distinguished from Lysenin-based methods that were used by others.29 There Lysenin was used to show trans-bilayer movement of SM before rupture of bacteria-containing phagosomes.

      To clarify the function of Lysenin in our approach we added  additional figures (Figure 4F, Supp. Figure 5) and a movie (Supp. Video 4) to the revised manuscript.

      Both SMase inhibitors (Figure 4C) and SMase pretreatment increased bacterial escape from the vacuole. The former should prevent SM hydrolysis and formation of ceramide, while the latter treatment should have the exact opposite effects, yet the end result is the same. What can one conclude regarding the need and role of the SMase products in the escape process?

      As pointed out above, pretreatment of host cells with SMase removes SM from the plasma membrane and hence, ASM does not have access to its substrate. Hence, both treatment with either ASM inhibitors or pretreatment with bacterial SMase prevent ASM from being active on the plasma membrane and hence block the ASM-dependent uptake (Figure 2 G, L). Although overall less bacteria were internalized by host cells under these conditions, the bacteria that invaded host cells did so in an ASM-independent manner. 

      Since blockage of the ASM-dependent internalization pathway (with ASM inhibitor [Figure 4C, D], SMase pretreatment [Figure 5B] and Vacuolin-1[Figure.4E]) always resulted in enhanced phagosomal escape, we conclude that bacteria that were internalized in an ASM-independent fashion cause enhanced escape. Vice versa, bacteria that enter host cells in an ASM-dependent manner demonstrate lower escape rates. 

      This is supported by comparing the escape rates of “early” and “late” invaders [Figure 5D, E], which in our opinion is a key experiment that supports this hypothesis. The “early” invaders are predominantly ASM-dependent (see e.g. Figure 3E) and thus, bacteria that entered host cell in the first 10 min of infection should have been internalized predominantly in an ASM-dependent fashion, while slower entry pathways are active later during infection. The early ASM dependent invaders possessed lower escape rates, which is in line with the data obtained with inhibitors (e.g. Figure 4C, D).

      We hypothesize that the activity of ASM on the plasma membrane during invasion mediates the recruitment of a specific subset of receptors, which then influences downstream phagosomal maturation and escape. This hypothesis is supported by the fact that the subset of receptors interacting with S. aureus is altered upon inhibition of the ASM-dependent uptake pathway. We describe this in another study that is currently under evaluation elsewhere.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca<sup>2<sup>+</sup></sup> and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry.

      The evidence provided is solid, methods used are appropriate and results largely support their conclusions, but can be substantiated further as detailed below. The weakness is a reliance on chemical inhibitors that can be non-specific to delineate critical steps.

      Specific comments:

      A large number of experiments rely on treatment with chemical inhibitors. While this approach is reasonable, many of the inhibitors employed such as amitriptyline and vacuolin1 have other or nondefined cellular targets and pleiotropic effects cannot be ruled out. Given the centrality of ASM for the manuscript, it will be important to replicate some key results with ASM KO cells.

      We thank the reviewer for the critical evaluation of our manuscript and plenty of constructive comments. 

      We agree with the reviewer, that ASM inhibitors such as functional inhibitors of ASM (FIASMA) like amitriptyline used in our study have unspecific side effects given their mode-of-action. FIASMAs induce the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16]  However, we want to emphasize that we also used the competitive inhibitor ARC39 in our study[17, 18] which acts on the enzyme by a completely different mechanism. All phenotypes (reduced invasion [Figure 2G], effect on invasion dynamics [Figure 3D], enhanced escape [Figure 4C, D] and differential recruitment of Rab7 [Supp. Figure 4A-C]) were observed with both inhibitors thereby supporting the role of ASM in the process.  

      We further agree that experiments with genetic evidence usually support and improve scientific findings. However, ASM is a cellular key player for SM degradation and recycling. In a clinical context, deficiency in ASM results in a so-called Niemann Pick disease type A/B. The lipid profile of ASMdeficient cells is massively altered[20], which in itself will result in severe side effects. Thus, the usage of inhibitors provides a clear benefit when compared to ASM K.O. cells, since ASM activity can be targeted in a short-term fashion thereby preventing larger alterations in cellular lipid composition.

      We nevertheless generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested for invasion efficiency (Figure 2, I). Here, we did not observe differences between WT and mutants. However, if we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in WT cells upon inhibitor treatment predominantly is due to inhibition of ASM, whereas the small reduction observed in ARC39-treated ASM K.O.s is likely due to unspecific side effects. We also demonstrated a strongly altered sphingolipid profile in ASM K.O. cells when compared to untreated and inhibitor-treated WT cells (new Figure 2, K). We speculate that other ASM-independent invasion pathways are upregulated in ASM K.O.s., thereby making up for the absence of ASM. We discuss this in the revised manuscript (line 518 ff).

      We introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I (Author response image 3). The latter serves as negative control, since it is known to possess a very low escape rate, due to its inability of toxin production. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As seen before for JE2, early invaders possess lower escape rates than early<sup>+</sup>late invaders. We did not observe differences between WT and K.O. cells, if we infected for 10 min. By contrast, we observed a lower escape rate in ASM K.O. compared to WT cells, when we infected for 30 min. However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). We think that the reduced phagosomal escape in ASM K.O. is caused by the altered sphingolipid profile, which could have versatile effects (e.g., inference with binding of bacterial toxins to phagosomal membranes or changes in acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Most experiments are done in HeLa cells. Given the pathway is projected as generic, it will be important to further characterize cell type specificity for the process. Some evidence for a similar mechanism in other cell types S. aureus infects, perhaps phagocytic cell type, might be good. 

      Whenever possible we performed the experiments not only in HeLa but also in HuLECs. For example, we refer to experiments concerning the role of Ca<sup>2<sup>+</sup></sup> (Figure 1A/Supp.Figure1A), lysosomal Ca<sup>2<sup>+</sup></sup>/Ned19 (Figure1B/Supp Figure 1C), lysosomal exocytosis/Vacuolin-1 (Figure 2D/Supp. Figure2D), ASM/ARC39 and amitriptyline (Figure 2G), surface SM/β-toxin (Figure 2L/Supp. Figure 2L), analysis of invasion dynamics (complete Figure 3) and measurement of cell death during infection (Figure 6C<sup>+</sup>E, Supp. Figure 8A<sup>+</sup>B).

      HuLECs, however, are not really genetically amenable and hence we were not able to generate gene deletions in these cells and upon introduction of the fluorescence escape reporter the cells are not readily growing. 

      As to ASM involvement in phagocytic cells: a role for ASM during the uptake of S. aureus by macrophages was previously reported by others.[25] However, in professional phagocytes S. aureus does not escape from the phagosome and replicates within the phagosome.[30]

      I'm a little confused about the role of ASM on the surface. Presumably, it converts SM to ceramide, as the final model suggests. Overexpression of b-toxin results in the near complete absence of SM on phagosomes (having representative images will help appreciate this), but why is phagosomal SM detected at high levels in untreated conditions? If bacteria are engulfed by SM-containing membrane compartments, what role does ASM play on the surface? If surface SM is necessary for phagosomal escape within the cell, do the authors imply that ASM is tuning the surface SM levels to a certain optimal range? Alternatively, can there be additional roles for ASM on the cell surface? Can surface SM levels be visualized (for example, in Figure 4 E, F)?

      We initially hypothesized that we would detect higher phagosomal SM levels upon inhibition of ASM, since our model suggests SM cleavage by ASM on the host cell surface during bacterial cell entry. However, we did not detect any changes in our experiments (Supp. Figure 4F). We currently favor the following explanation: SM is the most abundant sphingolipid in human cells.[31] If peripheral lysosomes are exocytosed and thereby release ASM, only a localized and relative small proportion of SM may get converted to Cer, which most likely is below our detection limit. In addition, the detection of cytosolically exposed phagosomal SM by YFP-Lysenin is not quantitative and provides a “Yes or No” measurement. Hence, we think that the rather limited SM to Cer conversion in combination with the high abundance of SM in cellular membranes does not visibly affect the recruitment of the Lysenin reporter. 

      In our experiments that employ BODIPY-FL-SM (Figure 3a<sup>+</sup>b), we cannot distinguish between native SM and downstream metabolites such as Cer. Hence, again we cannot make any assumptions on the extent to which SM is converted on the surface during bacterial internalization. Although our laboratory recently used trifunctional sphingolipid analogs to analyze the SM to Cer conversion[22], the visualization of this process on the plasma membrane is currently still challenging.

      Overall, we hypothesize that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms. Subsequently, a certain subset of receptors may be recruited to these platforms and influence the uptake process. These platforms are supposed to be very small, which also would explain that we did not detect changes in Lysenin recruitment.

      Related to that, why is ASM activity on the cell surface important? Its role in non-infectious or other contexts can be discussed.

      ASM release by lysosomal exocytosis is implied in plasma membrane repair upon injury. We added a short description of the role of extracellular ASM in the introduction (line 35).

      If SM removal is so crucial for uptake, can exocytosis of lysosomes alone provide sufficient ASM for SM removal? How much or to what extent is lysosomal exocytosis enhanced by initial signaling events? Do the authors envisage the early events in their model happening in localized confines of the PM, this can be discussed.

      Ionomycin treatment led to a release of ~10 % of all lysosomes and also increased extracellular ASM activity.[8, 9] In the revised manuscript, we developed an assay to determine lysosomal exocytosis during S. aureus infection (Figure 2, A-C). We detected lysosomal exocytosis of ~30% when compared to ionomycin treatment  during infection. Since this is only a fraction of the “releasable lysosomes”, we assume that the effects (lysosomal Ca<sup>2<sup>+</sup></sup> liberation, lysosomal exocytosis and ASM activity) are very localized and take place only at host-pathogen contact sites (see also above). We discuss this in the revised manuscript (line 563 ff). To our knowledge it is currently unclear to which extent the released ASM affects surface SM levels. We attempted to visualize the local ASM activity on the cell surface by using a visible range FRET probe (Supp. Fig. 3). Cleavage of the probe by ASM on the surface leads to release of FITC into the cell culture medium, which does not contribute a measurable signal at the surface. 

      How are inhibitor doses determined? How efficient is the removal of extracellular bacteria at 10 min? It will be good to substantiate the cfu experiments for infectivity with imaging-based methods. Are the roles of TPC1 and TPC2 redundant? If so, why does silencing TPC1 alone result in a decrease in infectivity? For these and other assays, it would be better to show raw values for infectivity. Please show alterations in lysosomal Ca<sup>2<sup>+</sup></sup> at the doses of inhibitors indicated. Is lysosomal Ca<sup>2<sup>+</sup></sup> released upon S. aureus binding to the cell surface? Will be good to directly visualize this.

      Concerning the inhibitor concentrations, we either used values established in published studies or recommendations of the suppliers (e.g. 2-APB, Ned19, Vacuolin-1). For ASM inhibitors, we determined proper inhibition of ASM by activity assays. Concentrations of ionomycin resulting in Ca<sup>2<sup>+</sup></sup> influx and lysosomal exocytosis was determined in earlier studies of our lab.[9, 32] 

      As to the removal of bacteria at 10 min p.i.: Lysostaphin is very efficient for removal of extracellular S. aureus and sterilizes the tissue culture supernatant. It significantly lyses bacteria within a few minutes, as determined by turbidity assays.[33]

      As to imaging-based infectivity assays: We performed imaging-based invasion assays to show reduced invasion efficiency with two ASM inhibitors in the revised manuscript with similar results as obtained by CFU counts (Supp. Figure 2, J).

      Regarding the roles of TPC1 and TPC2: from our data we cannot conclude whether the roles of TPC1 and TPC2 are redundant. One could speculate that since blockage of TPC1 alone is sufficient to reduce internalization of bacteria, that both channels may have distinct roles. On the other hand, there might be a Ca<sup>2<sup>+</sup></sup> threshold in order to initiate lysosomal exocytosis that can only be attained if TPC1 and TPC2 are activated in parallel. Thus, our observations are in line with another study that shows reduced Ebola virus infection in absence of either TPC1 or TPC2.[34] In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these double KO cell lines even further supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry (Author response image 2, see end of the document). Since we did not have a single TPCN2 knockout available, we decided to exclude these data from the main manuscript.

      As to raw CFU counts: whereas the observed effects upon blocking the invasion of S. aureus are stable, the number of internalized bacteria varies between individual biological replicates, for instance, by differences in host cell fitness or growth differences in bacterial cultures, which are prepared freshly for each experiment.

      With respect to visualization of lysosomal Ca<sup>2<sup>+</sup></sup> release: we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection would improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.[1] The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the final manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      The precise identification of cytosolic vs phagosomal bacteria is not very easy to appreciate. The methods section indicates how this distinction is made, but how do the authors deal with partial overlaps and ambiguities generally associated with such analyses? Please show respective images.

      The number of events (individual bacteria) for the live cell imaging data should be clearly mentioned.

      We apologize for not having sufficiently explained the technology to detect escaped S. aureus. The cytosolic location of S. aureus is indicated by recruitment of RFP-CWT.[35] CWT is the cell wall targeting domain of lysostaphin, which efficiently binds to the pentaglycine cross bridge in the peptidoglycan of S. aureus. This reporter is exclusively and homogenously expressed in the host cytosol. Only upon rupture of phagoendosomal membranes, the reporter can be recruited to the cell wall of now cytosolically located bacteria. S. aureus mutants, for instance in the agr quorum sensing system, cannot break down the phagosomal membrane in non-professional phagocytes and thus stay unlabeled by the CWT-reporter.[35] We  include several images (Figure 4, F, Supp. Figure 5) /movies (Supp. Video 4) of escape events in the revised manuscript.  The bacteria numbers for live cell experiments are now shown in Supp. Figure 7.

      In the phagosome maturation experiments, what is the proportion of bacteria in Rab5 or Rab7 compartments at each time point? Will the decreased Rab7 association be accompanied by increased Rab5? Showing raw values and images will help appreciate such differences. Given the expertise and tools available in live cell imaging, can the authors trace Rab5 and Rab7 positive compartment times for the same bacteria?

      We included the proportion of Rab7-associated bacteria in the revised manuscript (Supp. Figure 4A and C) and also shortly mention these proportions in the text (line 353). Usually, we observe that Rab5 is only transiently (for a few minutes) present on phagosomes and only afterwards the phagosomes become positive for Rab7. We do not think that a decrease in Rab7-positive phagosomes would increase the proportion of Rab5-positive phagosomes. However, we cannot exclude this hypothesis with our data.

      We can achieve tracing of individual bacteria for recruitment of Rab5/Rab7 only manually, which impedes a quantitative evaluation. However, we included a Video (Supp. Video 3)  that illustrates the consecutive recruitment of the GTPases.

      The results with longer-term infection are interesting. Live cell imaging suggests that ASM-inhibited cells show accelerated phagosomal escape that reduces by 6 hpi. Where are the bacteria at this time point ? Presumably, they should have reached lysosomes. The relationship between cytosolic escape, replication, and host cell death is interesting, but the evidence, as presented is correlative for the populations. Given the use of live cell imaging, can the authors show these events in the same cell?

      We think that most bacteria-containing phagoendosomes should have fused with lysosomes 6 h p.i. as we have previously shown by acidification to pH of 5 and LAMP1 decoration.[36]

      The correlation between phagosomal escape and replication in the cytosol of non-professional phagocytes has been observed by us and others. In the revised manuscript we also provide images (Supp. Figure 5)/videos (Supp. Video 4) to show this correlation in our experiments.

      Given the inherent heterogeneity in uptake processes and the use of inhibitors in most experiments, the distinction between ASM-dependent and independent pathways might not be as clear-cut as the authors suggest. Some caution here will be good. Can the authors estimate what fraction of intracellular bacteria are taken up ASM-dependent?

      We agree with the reviewer that an overlap between internalization pathways is likely. A clear distinction is therefore certainly non-trivial. Alternative to ASM-dependent and ASM-independent pathways, the ASM activity may also accelerate one or several internalization pathways. We address this limitation in the discussion of the revised manuscript (line 596 ff).

      Early in infection (~10 min after contact with the cells), the proportion of bacteria that enter host cells ASM-dependently is relatively high amounting to roughly 75-80% in HuLEC. After 30 min, this proportion is decreasing to about 50%. We included a paragraph in the discussion of the revised manuscript (line 593 ff).

      Reviewer #2 (Recommendations for the authors):

      (1) The experiment in Figure 4H is interesting. Details on what proportion of the cell is double positive, and if only this fraction was used for analysis will be good.

      We did use all bacteria found in the images independently from whether host cells were infected with only one or both strains. We unfortunately cannot properly determine the proportion of cells that are double infected, since i) we record the samples with CLSM and hence, cannot exclude that there are intracellular bacteria found in higher or lower optical sections. ii) we visualized cells by staining Nuclei and did not stain the cell borders, thus we cannot precisely tell to which host cell the bacteria localize.

      (2) Data is sparse for steps 5 and 6 of the model (line 330).

      We apologize for the inconvenience. There is a related study published  elsewhere[19], in which we identified NRCAM and PTK7 as putative receptors involved in this invasion pathway. We included a section in the discussion with the corresponding citation (line 569).

      (3) Data for the reduced number of intracellular bacteria upon blocking ASM-dependent uptake (line 235) is not clear. Do they mean decreased invasion efficiency? These two need not be the same.

      We changed “reduced number of intracellular bacteria” to “invasion efficiency”.

      (4) b-toxin added to the surface can get endocytosed. Can its surface effect be delineated from endo/phagosomal effect?

      We attempted to delineate effects contributed by the toxin activity on the surface vs. within phagosomes (Figure 5 A-C). We see an increased phagosomal escape, when we pretreated host cells with β-toxin (removal of SM form the surface) and infected either in presence (toxin will be taken up together with the bacteria into the phagosome) or in absence (toxin was washed away shortly before infection) of β-toxin. By contrast, overexpression of β-toxin by S. aureus did not affect phagosomal escape rates. The proper activity of β-toxin was confirmed by absence of Lysenin recruitment during phagosomal escape in all three conditions. We concluded that the activity on the surface and not the activity in the phagosome is important.

      (5) The potential role(s) of bacterial factors in the uptake and subsequent intracellular stages can be discussed.

      There are multiple bacterial adhesins known in S. aureus. These usually are either covalently attached to the bacterial cell wall such as the sortase-dependently anchored Fibronectin-binding Proteins A and B but also secreted and “cell wall binding” proteins as well at non proteinaceous factor such as wall-teichoic acids. A discussion of these factors would thus be out of the scope of this manuscript, and we here suggest reverting to specialized reviews on that topic.

      (6) The manuscript is not very easy to read. The abstract could be rephrased for better clarity and succinctness, with a clearly stated problem statement. The introduction is somewhat haphazard, I feel it can be better structured.

      We apologize for the inconvenience. We stated the problem/research question in the abstract and tried to improve the introduction without adding too much unnecessary detail. In general, we tried  to improve the readability of the manuscript and hope that our results and conclusions can be easier understood by the reader in the revised version.

      (7) Typo in Figure 5F. Step 6 should read "accessory receptors"

      The typo was corrected.

      References

      (1) Lloyd-Evans, E. et al. Niemann-Pick disease type C1 is a sphingosine storage disease that causes deregulation of lysosomal calcium. Nature Medicine 14, 1247-1255 (2008).

      (2) Launay, P. et al. TRPM4 Is a Ca<sup>2<sup>+</sup></sup>-Activated Nonselective Cation Channel Mediating Cell Membrane Depolarization. Cell 109, 397-407 (2002).

      (3) Nilius, B. et al. The Ca<sup>2<sup>+</sup></sup>‐activated cation channel TRPM4 is regulated by phosphatidylinositol 4,5‐biphosphate. The EMBO Journal 25, 467-478-478 (2006).

      (4) Cáceres, M. et al. TRPM4 Is a Novel Component of the Adhesome Required for Focal Adhesion Disassembly, Migration and Contractility. PLoS One 10, e0130540 (2015).

      (5) Silva, I., Brunett, M., Cáceres, M. & Cerda, O. TRPM4 modulates focal adhesion-associated calcium signals and dynamics. Biophysical Journal 123, 390a (2024).

      (6) Schlesier, T., Siegmund, A., Rescher, U. & Heilmann, C. Characterization of the Atl-mediated staphylococcal internalization mechanism. International Journal of Medical Microbiology 310, 151463 (2020).

      (7) Jevon, M. et al. Mechanisms of Internalization ofStaphylococcus aureus by Cultured Human Osteoblasts. Infection and Immunity 67, 2677-2681 (1999).

      (8) Rodriguez, A., Webster, P., Ortego, J. & Andrews, N.W. Lysosomes behave as Ca<sup>2<sup>+</sup></sup>-regulated exocytic vesicles in fibroblasts and epithelial cells. J Cell Biol 137, 93-104 (1997).

      (9) Krones & Rühling et al. Staphylococcus aureus alpha-Toxin Induces Acid Sphingomyelinase Release From a Human Endothelial Cell Line. Front Microbiol 12, 694489 (2021).

      (10) Sakurai, Y. et al. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (11) Aarhus, R., Graeff, R.M., Dickey, D.M., Walseth, T.F. & Lee, H.C. ADP-ribosyl cyclase and CD38 catalyze the synthesis of a calcium-mobilizing metabolite from NADP. J Biol Chem 270, 3032730333 (1995).

      (12) Schmid, F., Fliegert, R., Westphal, T., Bauche, A. & Guse, A.H. Nicotinic acid adenine dinucleotide phosphate (NAADP) degradation by alkaline phosphatase. J Biol Chem 287, 32525-32534 (2012).

      (13) Angeletti, C. et al. SARM1 is a multi-functional NAD(P)ase with prominent base exchange activity, all regulated bymultiple physiologically relevant NAD metabolites. iScience 25, 103812 (2022).

      (14) Gu, F. et al. Dual NADPH oxidases DUOX1 and DUOX2 synthesize NAADP and are necessary for Ca(2<sup>+</sup>) signaling during T cell activation. Sci Signal 14, eabe3800 (2021).

      (15) Schonn, J.-S., Maximov, A., Lao, Y., Südhof, T.C. & Sørensen, J.B. Synaptotagmin-1 and -7 are functionally overlapping Ca<sup>2<sup>+</sup></sup> sensors for exocytosis in adrenal chromaffin cells. Proceedings of the National Academy of Sciences 105, 3998-4003 (2008).

      (16) Kornhuber, J. et al. Functional Inhibitors of Acid Sphingomyelinase (FIASMAs): a novel pharmacological group of drugs with broad clinical applications. Cell Physiol Biochem 26, 9-20 (2010).

      (17) Naser, E. et al. Characterization of the small molecule ARC39, a direct and specific inhibitor of acid sphingomyelinase in vitro. J Lipid Res 61, 896-910 (2020).

      (18) Roth, A.G. et al. Potent and selective inhibition of acid sphingomyelinase by bisphosphonates. Angew Chem Int Ed Engl 48, 7560-7563 (2009).

      (19) Rühling, M., Schmelz, F., Kempf, A., Paprotka, K. & Fraunholz Martin, J. Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (20) Schuchman, E.H. & Desnick, R.J. Types A and B Niemann-Pick disease. Mol Genet Metab 120, 27-33 (2017).

      (21) Miller, M.E., Adhikary, S., Kolokoltsov, A.A. & Davey, R.A. Ebolavirus Requires Acid Sphingomyelinase Activity and Plasma Membrane Sphingomyelin for Infection. Journal of Virology 86, 7473-7483 (2012).

      (22) M. Rühling, L.K., F. Wagner, F. Schumacher, D. Wigger, D. A. Helmerich, T. Pfeuffer, R. Elflein, C. Kappe, M. Sauer, C. Arenz, B. Kleuser, T. Rudel, M. Fraunholz, J. Seibel Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nat Commun accepted in principle (2024).

      (23) Peters, S. et al. Neisseria meningitidis Type IV Pili Trigger Ca(2<sup>+</sup>)-Dependent Lysosomal Trafficking of the Acid Sphingomyelinase To Enhance Surface Ceramide Levels. Infect Immun 87 (2019).

      (24) Grassmé, H. et al. Acidic sphingomyelinase mediates entry of N. gonorrhoeae into nonphagocytic cells. Cell 91, 605-615 (1997).

      (25) Li, C. et al. Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (26) Fernandes, M.C. et al. Trypanosoma cruzi subverts the sphingomyelinase-mediated plasma membrane repair pathway for cell invasion. J Exp Med 208, 909-921 (2011).

      (27) Luisoni, S. et al. Co-option of Membrane Wounding Enables Virus Penetration into Cells. Cell Host & Microbe 18, 75-85 (2015).

      (28) Rühling, M. et al. Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nature Communications 15, 7456 (2024).

      (29) Ellison, C.J., Kukulski, W., Boyle, K.B., Munro, S. & Randow, F. Transbilayer Movement of Sphingomyelin Precedes Catastrophic Breakage of Enterobacteria-Containing Vacuoles. Curr Biol 30, 2974-2983 e2976 (2020).

      (30) Moldovan, A. & Fraunholz, M.J. In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (31) Slotte, J.P. Biological functions of sphingomyelins. Progress in Lipid Research 52, 424-437 (2013).

      (32) Stelzner, K. et al. Intracellular Staphylococcus aureus Perturbs the Host Cell Ca(2<sup>+</sup>) Homeostasis To Promote Cell Death. mBio 11 (2020).

      (33) Kunz, T.C. et al. The Expandables: Cracking the Staphylococcal Cell Wall for Expansion Microscopy. Front Cell Infect Microbiol 11, 644750 (2021).

      (34) Sakurai, Y. et al. Ebola virus. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (35) Grosz, M. et al. Cytoplasmic replication of Staphylococcus aureus upon phagosomal escape triggered by phenol-soluble modulin alpha. Cell Microbiol 16, 451-465 (2014).

      (36) Giese, B. et al. Staphylococcal alpha-toxin is not sufficient to mediate escape from phagolysosomes in upper-airway epithelial cells. Infect Immun 77, 3611-3625 (2009).

    1. Author response:

      The following is the authors’ response to the original reviews.

      In summary, the changes made in the revision process include:

      An addition of a paragraph in the result section that discusses the absolute values of measured Young’s moduli in the light of probing frequencies, accompanied by a new supplementary figure and a supplementary table that support that discussion

      - Fig. S10. Absolute Young’s modulus values across the frequencies characteristic for the three measurement methods.

      - Table S9. Operation parameters of the three methods used for characterizing the mechanical properties of cells.

      Three new supplementary figures that display the expression matrices for the genes from the identified modules in carcinoma datasets used for validation:

      - Fig. S4. Expression of identified target genes in the CCLE microarray dataset used for validation.

      - Fig. S5. Expression of identified target genes in the CCLE RNA-Seq dataset used for validation.

      - Fig. S6. Expression of identified target genes in the Genentech dataset used for validation.

      An addition of a paragraph in the discussion section that discusses the intracellular origins of resistance to deformation and the dominance of actin cortex at low deformations.

      - Refinement of the manuscript text and figures based on the specific feedback from the Reviewers.

      Please see below for detailed responses to the Reviewers’ comments.

      Reviewer #1 (Public Review)

      In this work, Urbanska and colleagues use a machine-learning based crossing of mechanical characterisations of various cells in different states and their transcriptional profiles. Using this approach, they identify a core set of five genes that systematically vary together with the mechanical state of the cells, although not always in the same direction depending on the conditions. They show that the combined transcriptional changes in this gene set is strongly predictive of a change in the cell mechanical properties, in systems that were not used to identify the genes (a validation set). Finally, they experimentally after the expression level of one of these genes, CAV1, that codes for the caveolin 1 protein, and show that, in a variety of cellular systems and contexts, perturbations in the expression level of CAV1 also induce changes in cell mechanics, cells with lower CAV1 expression being generally softer. 

      Overall the approach seems accessible, sound and is well described. My personal expertise is not suited to judge its validity, novelty or relevance, so I do not make comments on that. The results it provides seem to have been thoroughly tested by the authors (using different types of mechanical characterisations of the cells) and to be robust in their predictive value. The authors also show convincingly that one of the genes they identified, CAV1, is not only correlated with the mechanical properties of cells, but also that changing its expression level affects cell mechanics. At this stage, the study appears mostly focused on the description and validation of the methodological approach, and it is hard to really understand what the results obtain really mean, the importance of the biological finding - what is this set of 5 genes doing in the context of cell mechanics? Is it really central, or is it just one of the set of knobs on which the cell plays - and it is identified by this method because it is systematically modulated but maybe, for any given context, it is not the dominant player - all these fundamental questions remain unanswered at this stage. On one hand, it means that the study might have identified an important novel module of genes in cell mechanics, but on the other hand, it also reveals that it is not yet easy to interpret the results provided by this type of novel approach. 

      We thank the Reviewer #1 for the thoughtful evaluation of our manuscript. The primary goal of the manuscript was to present a demonstration of an unbiased approach for the identification of genes involved in the regulations of cell mechanics. The manuscript further provides a comprehensive computational validation of all genes from the identified network, and experimental validation of a selected gene, CAV1. 

      We agree that at the current stage, far-reaching conclusions about the biological meaning of the identified network cannot be made. We are, however, convinced that the identification of an apparently central player such as CAV1 across various cellular systems is per se meaningful, in particular since CAV1 modulation shows clear effects on the cell mechanical state in several cell types. 

      We anticipate that our findings will encourage more mechanistic studies in the future, investigating how these identified genes regulate mechanical properties and interact with each other. Notwithstanding, the identified genes (after testing in specific system of interest) can be readily used as genetic targets for modulating mechanical properties of cells. Access to such modifications is of huge relevance not only for performing further research on the functional consequence of cell mechanics changes (in particular in in-vivo systems where using chemical perturbations is not always possible), but also for the potential future implementation in modulating mechanical properties of the cells to prevent disease (for example to inhibit cancer metastasis or increase efficacy of cancer cell killing by cytotoxic T cells).

      We have now added a following sentence in the first paragraph of discussion to acknowledge the open ends of our study:

      “(...). Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in

      future studies.”

      Reviewer #2 (Public Review)

      A key strength is the quantitative approaches all add rigor to what is being attempted. The approach with very different cell culture lines will in principle help identify constitutive genes that vary in a particular and predictable way. To my knowledge, one other study that should be cited posed a similar pan-tissue question using mass spectrometry proteomics instead of gene expression, and also identified a caveolae component (cavin-1, PTRF) that exhibited a trend with stiffness across all sampled tissues. The study focused instead on a nuclear lamina protein that was also perturbed in vitro and shown to follow the expected mechanical trend (Swift et al 2013). 

      We thank the Reviewer #2 for the positive evaluation of the breadth of the results and for pointing us to the relevant reference for the proteomic analysis related to tissue stiffness (Swift et al., 2013). This study, which focused primarily on the tissue-level mechanical properties, identifying PTRF, a caveolar component, which links to our observation of another caveolar component, CAV1, at the single-cell level. 

      We have now included the citation in the following paragraph of the discussion:

      “To our knowledge, there are no prior studies that aim at identifying gene signatures associated with single-cell mechanical phenotype changes, in particular across different cell types. There are, however, several studies that investigated changes in expression upon exposure of specific cell types to mechanical stimuli such as compression (87, 88) or mechanical stretch (22, 80, 89), and one study that investigated difference in expression profiles between stiffer and softer cells sorted from the same population (90). Even though the studies concerned with response to mechanical stimuli answer a fundamentally different question (how gene expression changes upon exposure to external forces vs which genes are expressed in cells of different mechanical phenotype), we did observe some similarities in the identified genes. For example, in the differentially expressed genes identified in the lung epithelia exposed to compression (87), three genes from our module overlapped with the immediate response (CAV1, FHL2, TGLN) and four with the long-term one (CAV1, FHL2, TGLN, THBS1). We speculate that this substantial overlap is caused by the cells undergoing change in their stiffness during the response to compression (and concomitant unjamming transition). Another previous study explored the association between the stiffness of various tissues and their proteomes. Despite the focus on the tissue-scale rather than single-cell elasticity, the authors identified polymerase I and transcript release factor (PTRF, also known as cavin 1 and encoding for a structural component of the caveolae) as one of the proteins that scaled with tissue stiffness across samples (91).”

      Reviewer #3 (Public Review)

      In this work, Urbanska et al. link the mechanical phenotypes of human glioblastoma cell lines and murine iPSCs to their transcriptome, and using machine learning-based network analysis identify genes with putative roles in cell mechanics regulation. The authors identify 5 target genes whose transcription creates a combinatorial marker which can predict cell stiffness in human carcinoma and breast epithelium cell lines as well as in developing mouse neurons. For one of the target genes, caveolin1 (CAV1), the authors perform knockout, knockdown, overexpression and rescue experiments in human carcinoma and breast epithelium cell lines. They determine the cell stiffness via RT-DC, AFM indentation and AFM rheology and confirm that high CAV1 expression levels correlate with increased stiffness in those model systems. This work brings forward an interesting approach to identify novel genes in an unbiased manner, but surprisingly the authors validate caveolin 1, a target gene with known roles in cell mechanics regulation. 

      I have two main concerns with the current version of this work: 

      (1) The authors identify a network of 5 genes that can predict mechanics. What is the relationship between the 5 genes? If the authors aim to highlight the power of their approach by knockdown, knockout or over-expression of a single gene why choose CAV1 (which has an individual p-value of 0.16 in Fig S4)? To justify their choice, the authors claim that there is limited data supporting the direct impact of CAV1 on mechanical properties of cells but several studies have previously shown its role in for example zebrafish heart stiffness, where a knockout leads to higher stiffness (Grivas et al., Scientific Reports 2020), in cancer cells, where a knockdown leads to cell softening (Lin et al., Oncotarget 2015), or in endothelial cell, where a knockout leads to cell softening (Le Master et al., Scientific Reports 2022). 

      We thank the reviewer for their comments. First, we do acknowledge that studying the relationship between the five identified genes is an intriguing question and would be a natural extension of the currently presented work. It is, however, beyond the scope of presented manuscript, in which our primarily goal was to introduce a general pipeline for de novo identification of genes related to cell mechanics. We did add a following statement in the discussion (yellow highlight) to acknowledge the open ends of our study:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function (76).

      The increasing availability of transcriptional profiles accompanying cell state changes has recently been complemented by the ease of screening for mechanical phenotypes of cells thanks to the advent of high-throughput microfluidic methods (77). This provides an opportunity for data-driven identification of genes associated with the mechanical cell phenotype change in a hypothesis-free manner. Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in future studies.”

      Regarding the selection of CAV1 as the gene that we used for validation experiment; as mentioned in the introductory paragraph of the result section “Perturbing expression levels of CAV1 changes cells stiffness” (copied below), we were encouraged by the previous data already linking CAV1 with cell mechanics when selecting it as our first target. The relationship between CAV1 and cell mechanics regulation, however, is not very well established (of note, two of the latest manuscripts came out after the initial findings of our study). 

      Regarding the citations suggested by the reviewer: two are already included in the original manuscript (Lin et al., Oncotarget 2015 – Ref (63), Le Master –2022 Ref (67)), along with an additional one (Hsu et al 2018 (66)), and the third one (Grivas et al, 2020 (68)) is now also added to the manuscript. Though, we would like to highlight that even though Grivas et al state that the CAV1 KO cells are stiffer, the AFM indentation measurements were performed on the cardiac tissue, with a spherical tip of 30 μm radius and likely reflect primarily supracelluar, tissue-scale properties, as opposed to cell-scale measurements performed in our study (we used cultured cells which mostly lack the extracellular tissue structures, deformability cytometry was performed on dissociated cells and picks up on cell properties exclusively, and in case of AFM measurements a spherical tip with 5 μm radius was used).

      “We decided to focus our attention on CAV1 as a potential target for modulating mechanical properties of cells, as it has previously been linked to processes intertwined with cell mechanics. In the context of mechanosensing, CAV1 is known to facilitate buffering of the membrane tension (45), play a role in β1-inegrin-dependent mechanotransduction (58) and modulate the mechanotransduction in response to substrate stiffness (59). CAV1 is also intimately linked with actin cytoskeleton — it was shown to be involved in cross-talk with Rho-signaling and actin cytoskeleton regulation (46, 60–62), filamin A-mediated interactions with actin filaments (63), and co-localization with peripheral actin (64). The evidence directly relating CAV1 levels with the mechanical properties of cells (47, 62, 65, 66) and tissues (66, 67) , is only beginning to emerge.”

      Regarding the cited p-value of 0.16, we would like to clarify that it is the p-value associated with the coefficient of the crude linear regression model fitted to the data for illustrative purposes in Fig S4. This value only says that from the linear fit we cannot conclude much about the correlation of the level of Cav1 with the Young’s modulus change. Much more relevant parameters to look at are the AUC-ROC values and associated p-values reported in the Table 4 in the main text (see below), which show good performance of CAV1 in separating soft and stiff cell states. 

      The positive hypothesis I assumes that markers are discriminative of samples with stiff/soft mechanical phenotype regardless of the studied biological system, and CAV1 has a clear trend with the minimum AUC-ROC on 3 datasets of 0.78, even though the p-value is below the significance level. The positive hypothesis II assumes that markers are discriminative of samples with stiff/soft mechanical phenotype in carcinoma regardless of data source, and CAV1 has a clear significance because the minimum AUC-ROC on 3 datasets is 0.89 and the p-value is 0.02.

      (2) The authors do not show how much does PC-Corr outperforms classical co-expression network analysis or an alternative gold standard. It is worth noting that PC-Corr was previously published by the same authors to infer phenotype-associated functional network modules from omics datasets (Ciucci et al., Scientific Reports 2017). 

      As pointed out by the Reviewer, PC-corr has been introduced and characterized in detail in a previous publication (Ciucci et al, 2017, Sci. Rep.), where it was compared against standard co-expression analysis (below reported as: p-value network) on molecules selected using univariate statistical analysis. 

      See the following fragment of Discussion in Ciucci et al, 2017:

      “The PC-corr networks were always compared to P-value networks. The first strategical difference lies in the way features are selected: while the PC-corr adopts a multivariate approach, i.e. it uses a combination of features that are responsible for the sample discrimination, in the P-value network the discriminating features are singly selected (one by one) with each Mann-Whitney test (followed by Benjamini-Hochberg procedure). The second strategical difference lies in the generation of the correlation weights in the network. PC-corr combines in parallel and at the same time in a unique formula the discrimination power of the PC-loadings and the association power of the Pearson correlation, directly providing in output discriminative omic associations. These are generated using a robust (because we use as merging factor the minimum operator, which is a very penalizing operator) mathematical trade-off between two important factors: multivariate discriminative significance and correlation association. In addition, as mentioned above, the minimum operator works as an AND logical gate in a digital circuit, therefore in order to have a high link weight in the PCcorr network, both the discrimination (the PC-loadings) and the association (the Pearson correlations) of the nodes adjacent to the link should be simultaneously high. Instead, the Pvalue procedure begins with the pre-selection of the significant omic features and, only in a second separated step, computes the associations between these features. Therefore, in P-value networks, the interaction weights are the result neither of multivariate discriminative significance, nor of a discrimination/association interplay.”

      Here we implement PC-corr for a particular application and do not see it as central to the message of the present manuscript to compare it with other available methods. We considered it much more relevant to focus on an in-silico validation on dataset not used during the PCcorr analysis (see Table 3 and 4 for details).

      Altogether, the authors provide an interesting approach to identify novel genes associated with cell mechanics changes, but the current version does not fulfill such potential by focusing on a single gene with known roles in cell mechanics. 

      Our manuscript presents a demonstration of an overall approach for the identification of genes involved in the regulation of cell mechanics, and the perturbations performed on CAV1 have a demonstrative role (please also refer to the explanations of why we decided to perform the verification focused on CAV1 above). The fact that we identify CAV1, which has been implicated in regulating cell mechanics in a handful of studies, de novo and in an unbiased way speaks to the power of our approach. We do agree that investigation into the effect of manipulating the expression of the remaining genes from the identified network module, as well as into the mutual relationships between those genes and their covariance in perturbation experiments, constitutes a desirable follow-up on the presented results. It is, however, beyond the scope of the current manuscript. Regardless, the other genes identified can be readily tested in systems of interest and used as potential knobs for tuning mechanical properties on demand.

      Reviewer #1 (Recommendations For Authors)

      I am not a specialist of the bio-informatics methods used in this study, so I will not make any specific technical comments on them. 

      In terms of mechanical characterisation of cells, the authors use well established methods and the fact that they systematically validate their findings with at least two independent methods (RT-DC and AFM for example) makes them very robust. So I have no concerns with this part.  The experiments of perturbations of CAV 1 are also performed to the best standards and the results are clear, no concern on that. 

      My main concerns are rather questions I was asking myself and could not answer when reading the article. Maybe the authors could find ways to clarify them - the discussion of their article is already very long and maybe it should not be lengthened to much. In my opinion, some of the points discussed are not really essential and rather redundant with other parts of the paper. This could be improved to give some space to clarify some of the points below:  

      We thank the Reviewer #1 for an overall positive evaluation of the manuscript as well as the points of criticism which we addressed in a point-by-point manner below.

      (1) This might be a misunderstanding of the method on my side, but I was wondering whether it is possible to proceed through the same steps but choose other pairs of training datasets amongst the 5 systems available (there are 10 such pairs if I am not mistaken) and ask whether they always give the same set of 5 genes. And if not, are the other sets also then predictive, robust, etc. Or is it that there are 'better' pairs than others in this respect. Or the set of 5 genes is the only one that could be found amongst these 5 datasets - and then could it imply that it is the only group 'universal' group of predictive genes for cell mechanics (when applied to any other dataset comprising similar mechanical measures and expression profiles, for other cells, other conditions)? 

      I apologize in case this question is just the result of a basic misunderstanding of the method on my side. But I could not answer the question myself based on what is in the article and it seems to be important to understand the significance of the finding and the robustness of the method. 

      We thank the Reviewer for this question. To clarify: while in general it is possible to proceed through the same analysis steps choosing a different pair of datasets (see below for examples), we have purposefully chosen those two and not any other datasets because they encompassed the highest number of samples per condition in the RNAseq data (see Fig 4 and Table R1 below), originated from two different species and concerned least related tissues (the other option for mouse would be neural progenitors which in combination with the glioblastoma would likely result in focusing on genes expressed in neural tissues). This is briefly explained in the following fragment of the manuscript on Page 10:

      “For the network construction, we chose two datasets that originate from different species, concern unrelated biological processes, and have a high number of samples included in the transcriptional analysis: human glioblastoma and murine iPSCs (Table 1).”

      To further address the comment of the reviewer: there is indeed a total of 10 possible two-set combinations of datasets, 6 of those pairs are human-mouse combinations (highlighted in orange in Author response Table 1), 3 are human-human combinations (highlighted in blue), and 1 is mousemouse (marked in green).

      Author response table 1.

      Possible two-set combinations of datasets. For each combination, the number of common genes is indicated. The number on the diagonal represents total number of transcripts in the individual datasets, n corresponds to the number of samples in the respective datasets.  * include non-coding genes.

      To reiterate, we have chosen the combination of set A (glioblastoma) and set D (iPSCs) to choose datasets from different species and with highest sample number. 

      As for the other combinations of human-mouse datasets:

      • set A & E lead to derivation of a conserved module, however as expected this module includes genes specific for neuronal tissues (such as brain & testis specific immunoglobulin IGSF11, or genes involved in neuronal development such as RFX4, SOX8)

      Author response image 1.

      • the remaining combinations (set B&D, B&E, C&D and C&E) do not lead to a derivation of a highly interconnected module

      Author response image 2.

      Author response image 3.

      Author response image 4.

      Author response image 5.

      Finally, it would have also been possible to perform the combined PC-corr procedure on all 5 datasets. However, this would prevent us from doing validation using unknown datasets.

      Hence, we decided to proceed with the 2 discovery and 4 validation datasets.

      For the sake of completeness, we present below some of the networks obtained from the analysis performed on all 5 datasets (which intersect at 8059 genes).

      Author response image 6.

      The above network was created by calculating mean/minimum PC-corr among all five datasets and applying the threshold. The thresholding can be additionally restricted in that we:

      a. constrain the directionality of the correlation between the genes (𝑠𝑔𝑛(𝑐) ) to be the same among all or at least n datasets

      b. constrain the directionality of the correlation between the cell stiffness and gene expression level (𝑠𝑔𝑛(𝑉)) for individual genes.

      Some of the resulting networks for such restrictions are presented below.

      Author response image 7.

      Author response image 8.

      Of note, some of the nodes from the original network presented in the paper (CAV1, FHL2, and IGFBP7) are preserved in the 5-set network (and highlighted with blue rims),

      (2) The authors already use several types of mechanical characterisation of the cells, but there are even more of them, in particular, some that might not directly correspond to global cell stiffness but to other aspects, like traction forces, or cell cortex rheology, or cell volume or passage time trough constrictions (active or passive) - they might all be in a way or another related, but they are a priori independent measures. Would the authors anticipate finding very different 'universal modules' for these other mechanical properties, or again the same one? Is there a way to get at least a hint based on some published characterisations for the cells used in the study? Basically, the question is whether the gene set identified is specific for a precise type of mechanical property of the cell, or is more generally related to cell mechanics modulation - maybe, as suggested by the authors because it is a set of molecular knobs acting upstream of general mechanics effectors like YAP/TAZ or acto-myosin? 

      We thank the Reviewer for this comment. We would like to first note that in our study, we focused on single-cell mechanical phenotype understood as a response of the cells to deformation at a global (RT-DC) or semi-local (AFM indentation with 5-μm bead) level and comparatively low deformations (1-3 μm, see Table S9). There is of course a variety of other methods for measuring cell mechanics and mechanics-related features, such as traction force microscopy mentioned by the reviewer. Though, traction force microscopy probes how the cells apply forces and interact with their environment rather than the inherent mechanical properties of the cells themselves which were the main interest of our study. 

      Nevertheless, as mentioned in the discussion, we found some overlap with the genes identified in other mechanical contexts, for example in the context of mechanical stretching of cells:

      “Furthermore, CAV1 is known to modulate the activation of transcriptional cofactor yesassociated protein, YAP, in response to changes in stiffness of cell substrate (60) and in the mechanical stretch-induced mesothelial to mesenchymal transition (74).”

      Which suggests that the genes identified here may be more broadly related to mechanical aspects of cells. 

      Of note, we do have some insights connected to the changes of cell volume — one of the biophysical properties mentioned by the reviewer — from our experiments.  For all measurements performed with RT-DC, we can also calculate cell volumes from 2D cell contours (see Author response images 9, 10, and 11). For most of the cases (all apart from MEF CAV1KO), the stiffer phenotype of the cells, associated with higher levels of CAV1, shows a higher volume.

      Author response image 9.

      Cell volumes for the divergent cell states in the five characterized biological systems. (A) Glioblastoma. (B) Carcinoma, (C) MCF10A, (D) iPSCs, (E) Developing neurons. Data corresponds to Figure 2. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.

      Author response image 10.

      Cell volumes for CAV1 perturbation experiments. (A) CAV1 knock down performed in TGBC cells. (B) CAV1 overexpression in ECC4 and TGBC cells. Data corresponds to Figure 5. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.  

      Author response image 11.

      Cell volumes for WT and CAV1KO MEFs. Data corresponds to Figure S9. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.  

      (3) The authors have already tested a large number of conditions in which perturbations of the level of expression of CAV1 correlates with changes in cell mechanics, but I was wondering whether it also has some direct explanatory value for the initial datasets used - for example for the glioblastoma cells from Figure 2, in the different media, would a knock-down of CAV1 prevent the increase in stiffness observed upon addition of serum, or for the carcinoma cells from different tissues treated with different compounds - if I understand well, the authors have tested a subset of these (ECC4 versus TGBC in figure 5) - how did they choose these and how general is it that the mechanical phenotype changes reported in Figure 2 are all mostly dependant on CAV1 expression level? I must say that the way the text is written and the results shown, it is hard to tell whether CAV1 is really having a dominant effect on cell mechanics in most of these contexts or only a partial effect. I hope I am being clear in my question - I am not questioning the conclusions of Figures 5 and 6, but asking whether the level of expression of CAV1, in the datasets reported in Figure 2, is the dominant explanatory feature for the differences in cell mechanics. 

      We thank reviewer for this comment and appreciate the value of the question about the generality and dominance of CAV1 in influencing cell mechanics.

      On the computational side, we have addressed these issues by looking at the performance of CAV1 (among other identified genes) in classifying soft and stiff phenotypes across biological systems (positive hypothesis I), as well as across data of different type (sequencing vs microarray data) and origin (different research institutions) (positive hypothesis II). CAV1 showed strong classification performance (Table 4), suggesting it is a general marker of stiffness changes.  

      On the experimental side, we conducted the perturbation experiments in two systems of choice: two intestinal carcinoma cell lines (ECC4 and TGBC) and the MCF10A breast epithelial cell line. These choices were driven by ease of handling, accessibility, as well as (for MCF10A) connection with a former study (Taveres et al, 2017). While we observed correlations between CAV1 expression and cell mechanics in wide range of datasets, the precise role of CAV1 in each system may vary, and further perturbation experiments in specific systems could be performed to solidify the direct/dominant role of CAV1 in cell mechanics. We hypothesize that the suggested knockdown of CAV1 upon serum addition in glioblastoma cells could reduce or prevent the increase in stiffness observed, though this experiment has not been performed. 

      In conclusion, while the computational analysis gives us confidence that CAV1 is a good indicator of cell stiffness, we predict that it acts in concert with other genes and in specific context could be replaced by other changes. We suggest that the suitability of CAV1 for manipulation of the mechanical properties should be tested in each system of interested before use. 

      To highlight the fact that the relevance of CAV1 for modulating cell mechanics in specific systems of interest should be tested and the mechanistic insights into how CAV1 regulates cell mechanics are still missing, we have added the following sentence in the discussion:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function (76). The increasing availability of transcriptional profiles accompanying cell state changes has recently been complemented by the ease of screening for mechanical phenotypes of cells thanks to the advent of high-throughput microfluidic methods (77). This provides an opportunity for data-driven identification of genes associated with the mechanical cell phenotype change in a hypothesis-free manner. Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in future studies.”

      (4) It would be nice that the authors try to more directly address, in their discussion, what is the biological meaning of the set of 5 genes that they found - is it really mostly a product of the methodology used, useful but with little specific relevance to any biology, or does it have a deeper meaning? Either at a system level, or at an evolutionary level. 

      We would like to highlight that our manuscript is focused on the method that we introduce to identify sets of genes involved in the regulation of cell mechanics. The first implementation included here is only the beginning of this line of work which, in the future, will include looking in detail at the biological meaning and the interconnectivity of the genes identified. Most likely, there is a deeper meaning of the identified module which could be revealed with a lot of dedicated future work. As it is a mere speculation at this point, we would like to refrain from going into more detail about it in the current manuscript. We provide below a few words of extended explanation and additional analysis that can shed light on the current limited knowledge of the connections between the genes and evolutionary preservation of the genes. 

      While it is difficult to prove at present, we do believe that the identified node of genes may have an actual biological meaning and is not a mere product of the used methodology. The PC-corr score used for applying the threshold and obtaining the gene network is high only if the Pearson’s correlation between the two genes is high, meaning that the high connected module of genes identified show corelated expression and is likely co-regulated. Additionally, we performed the GO Term analysis using DAVID to assess the connections between the genes (Figure S3). We have now performed an additional analysis using two orthogonal tools the functional protein association tool STRING and KEGG Mapper. 

      With STRING, we found a moderate connectivity using the five network nodes identified in our study, and many of the obtained connections were based on text mining and co-expression, rather than direct experimental evidence (Author response image 12A). A more connected network can be obtained by allowing STRING to introduce further nodes (Author response image 12B). Interestingly, some of the nodes included by STRING in the extended network are nodes identified with milder PCcorr thresholds in our study (such as CNN2 or IGFBP3, see Table S3). 

      With KEGG Mapper, we did not find an obvious pathway-based clustering of the genes from the module either. A maximum of two genes were assigned to one pathway and those included: 

      • focal adhesions (pathway hsa04510): CAV1 and THBS1

      • cytoskeleton in muscle cells (pathway hsa04820): FHL2 and THBS1

      • proteoglycans in cancer (pathway hsa05205): CAV1 and THBS1.

      As for the BRITE hierarchy, following classification was found:

      • membrane trafficking(hsa04131): CAV1, IGFBP7, TAGLN, THBS, with following subcategories:

      - endocytosis / lipid raft mediated endocytosis/caveolin-mediated endocytosis:

      CAV1

      - endocytosis / phagocytosis / opsonins: THBS1

      - endocytosis / others/ insulin-like growth factor-binding proteins: IGFBP7 o others / actin-binding proteins/others: TAGLN.

      Taken together, all that analyses (DAVID, STRING, KEGG) show that at present no direct relationship/single pathway can be found that integrates all the genes from the identified modules. Future experiments, including investigations of how other module nodes are affected when one of the genes is manipulated, will help to establish actual physical or regulatory interactions between the genes from our module. 

      To touch upon the evolutionary perspective, we provide an overview of occurrence of the genes from the identified module across the evolutionary tree. This overview shows that the five identified genes are preserved in phylum Chordata with quite high sequence similarity, and even more so within mammals (Author response image 13).

      Author response image 12.

      Visualisation of interactions between the nodes in the identified module using functional protein association networks tool STRING. (A) Connections obtained using multiple proteins search and entering the five network nodes. (B) Extended network that includes further genes to increase indirect connectivity. The genes are added automatically by STRING. Online version of STRING v12.0 was used with Homo sapiens as species of interest.   

      Author response image 13.

      Co-occurrence of genes from the network module across the evolutionary tree. Mammals are indicated with the green frame, glires (include mouse), as well as primates (include human) are indicated with yellow frames. The view was generated using online version of STRING 12.0.

      Reviewer #2 (Recommendations For Authors) 

      (1) The authors need to discuss the level of sensitivity of their mechanical measurements with RT-DC for changes to the membrane compared to changes in microtubules, nucleus, etc. The limited AFM measurements also seem membrane/cortex focused. For these and further reasons below, "universal" doesn't seem appropriate in the title or abstract, and should be deleted. 

      We thank the reviewer for this comment. Indeed, RT-DC is a technique that deforms the entire cell to a relatively low degree (inducing ca 17% mean strain, i.e. a deformation of approximately 2.5 µm on a cell with a 15 µm diameter, see Table S9 and Urbanska et al., Nat Methods 2020). Similarly, the AFM indentation experiments performed in this study (using a 5-µm diameter colloidal probe and 1 µm indentation) induce low strains, at which, according to current knowledge, the actin cortex dominates the measured deformations. However, other cellular components, including the membrane, microtubules, intermediate filaments, nucleus, other organelles, and cytoplasmic packing, can also contribute. We have reviewed these contributions in detail in a recent publication (Urbanska and Guck, 2024, Ann Rev Biophys., PMID 38382116). For a particular system, it is hard to speculate without further investigation which parts of the cell have a dominant effect on the measured deformability. We have added now a following paragraph in the discussion to include this information:

      “The mechanical phenotype of single cells is a global readout of cell’s resistance to deformation that integrates contributions from all cellular components. The two techniques implemented for measuring cell mechanical in this study — RT-DC and AFM indentation using a spherical indenter with 5 µm radius — exert comparatively low strain on cells (< 3 µm, see Table S9), at which the actin cortex is believed to dominate the measured response. However, other cellular components, including the membrane, microtubules, intermediate filaments, nucleus, other organelles, and cytoplasmic packing, also contribute to the measured deformations (reviewed in detail in (79)) and, for a particular system, it is hard to speculate without further investigation which parts of the cell have a dominant effect on the measured deformability.”

      The key strength of measuring the global mechanics is that such measurements are agnostic of the specific origin of the resistance to shape change. As such, the term “universal” could be seen as rather appropriate, as we are not testing specific contributions to cell mechanics, and we see the two methods used (RT-DC and AFM indentation) as representative when it comes to measuring global cell mechanics. And we highlighted many times throughout the text that we are measuring global single-cell mechanical phenotype. 

      Most importantly, however, we have used the term “universal” to capture that the genes are preserved across different systems and species, not in relation to the type of mechanical measurements performed and as such we would like to retain the term in the title.

      (2) Fig.2 cartoons of tissues is a good idea to quickly illustrate the range of cell culture lines studied. However, it obligates the authors to examine the relevant primary cell types in singlecell RNAseq of human and/or mouse tissues (e.g. Tabula Muris). They need to show CAV1 is expressed in glioblastoma, iPSCs, etc and not a cell culture artifact. CAV1 and the other genes also need to be plotted with literature values of tissue stiffness.  

      We thank the reviewer for this the comment; however, we do believe that the cartoons in Figure 2 should assist the reader to readily understand whether cultured cells derived from the respective tissues were used (see cartoons representing dishes), or the cells directly isolated from the tissue were measured (this is the case for the developing neurons dataset). 

      We did, however, follow the suggestion of the reviewer to use available resources and checked the expression of genes from the identified network module across various tissues in mouse and human. We first used the Mouse Genome Informatics (MGI; https://www.informatics.jax.org/) to visualize the expression of the genes across organs and organ systems (Author response image 14) as well as across more specific tissue structures (Author response image 15). These two figures show that the five identified genes are expressed quite broadly in mouse. We next looked at the expression of the five genes in the scRNASeq dataset from Tabula Muris (Author response image 16). Here, the expression of respective genes seemed more restricted to specific cell clusters. Finally, we also collected the cross-tissue expression of the genes from our module in human tissues from Human Protein Atlas v23 at both mRNA (Author response image 17) and protein (Author response image 18) levels. CAV1, IGFBP7, and THBS1 showed low tissue specificity at mRNA level, FHL2 was enriched in heart muscle and ovary (the heart enrichment is also visible in Author response image 15 for mouse) and TAGLN in endometrium and intestine. Interestingly, the expression at the protein level (Author response image 18) did not seem to follow faithfully the mRNA levels (Author response image 17). Overall, we conclude that the identified genes are expressed quite broadly across mouse and human tissues. 

      Author response image 14.

      Expression of genes from the identified module across various organ and organ systems in mouse. The expression matrices for organs (A) and organ systems (B) were generated using Tissue x Gene Matrix tool of Gene eXpression Database (https://www.informatics.jax.org/gxd/, accessed on 22nd September 2024). No pre-selection of stage (age) and assay type (includes RNA and protein-based assays) was applied. The colors in the grid (blues for expression detected and reds for expression not detected) get progressively darker when there are more supporting annotations. The darker colors do not denote higher or lower levels of expression, just more evidence.

      Author response image 15.

      Expression of genes from the identified module across various mouse tissue structures. The expression matrices for age-selected mouse marked as adult (A) or young individuals (collected ages labelled P42-84 / P w6-w12 / P m1.5-3.0) (B) are presented and were generated using RNASeq Heatmap tool of Gene eXpression Database (https://www.informatics.jax.org/gxd/, accessed on 2nd October 2024).

      Author response image 16.

      Expression of genes from the identified module across various cell types and organs in t-SNE embedding of Tabula Muris dataset. (A) t-SNE clustering color-coded by organ. (B-F) t-SNE clustering colorcoded for expression of CAV1 (B), IGFBP7 (C), FHL2 (D), TAGLN (E), and THBS1 (F). The plots were generated using FACS-collected cells data through the visualisation tool available at https://tabulamuris.sf.czbiohub.org/ (accessed on 22nd September 2024).

      Author response image 17.

      Expression of genes from the identified module at the mRNA level across various human tissues. (A-E) Expression levels of CAV1 (A), IGFBP7 (B), FHL2 (C), TAGLN (D), and THBS1 (E). The plots were generated using consensus dataset from Human Protein Atlas v23 https://www.proteinatlas.org/ (accessed on 22nd September 2024).

      Author response image 18.

      Protein levels of genes from the identified module across various human tissues. (A-E) Protein levels of CAV1 (A), IGFBP7 (B), FHL2 (C), TAGLN (D), and THBS1 (E). The plots were generated using Human Protein Atlas v23 https://www.proteinatlas.org/ (accessed on 22nd September 2024).

      Regarding literature values and tissue stiffness, we would like to argue that cell stiffness is not equivalent to tissue stiffness, and we are interested in the former. Tissue stiffness is governed by a combination of cell mechanical properties, cell adhesions, packing and the extracellular matrix. There can be, in fact, mechanically distinct cell types (for example characterized by different metabolic state, malignancy level etc) within one tissue of given stiffness. Hence, we consider that testing for the correlation between tissue stiffness and expression of identified genes is not immediately relevant.

      (3) Fig.5D,H show important time-dependent mechanics that need to be used to provide explanations of the differences in RT-DC (5B,F) and in standard AFM indentation expts (5C,G). In particular, it looks to me that RT-DC is a high-f/short-time measurement compared to the AFM indentation, and an additional Main or Supp Fig needs to somehow combine all of this data to clarify this issue. 

      We thank the reviewer for this comment. It is indeed the case, that cells typically display higher stiffness when probed at higher rates. We have now expanded on this aspect of the results and added a supplementary figure (Fig. S10) that illustrates the frequencies used in different methods and summarizes the apparent Young’s moduli values into one plot in a frequencyordered manner. Of note, we typically acquire RT-DC measurements at up to three flowrates, and the increase in measurement flow rates accompanying increase in flow rate also results in higher extracted apparent Young’s moduli (see Fig. S10 B,D). We have further added Table S9 that summarizes operating parameters of all three methods used for probing cell mechanics in this manuscript:

      “The three techniques for characterizing mechanical properties of cells — RT-DC, AFM indentation and AFM microrheology — differ in several aspects (summarized in Table S9), most notably in the frequency at which the force is applied to cells during the measurements, with RT-DC operating at the highest frequency (~600 Hz), AFM microrheology at a range of frequencies in-between (3–200 Hz), and AFM indentation operating at lowest frequency (5 Hz) (see Table S9 and Figure S10A). Even though the apparent Young’s moduli obtained for TGBCS cells were consistently higher than those for ECC4 cells across all three methods, the absolute values measured for a given cell line varied depending on the methods: RT-DC measurements yielded higher apparent Young’s moduli compared to AFM indentation, while the apparent Young’s moduli derived from AFM microrheology measurements were frequency-dependent and fell between the other two methods (Fig. 5B–D, Fig. S10B). The observed increase in apparent Young’s modulus with probing frequency aligns with previous findings on cell stiffening with increased probing rates observed for both AFM indentation (68, 69) and microrheology assays (70–72).”

      (4) The plots in Fig.S4 are important as main Figs, particularly given the cartoons of different tissues in Fig.1,2. However, positive correlations for a few genes (CAV1, IGFBP7, TAGLN) are most clear for the multiple lineages that are the same (stomach) or similar (gli, neural & pluri). The authors need to add green lines and pink lines in all plots to indicate the 'lineagespecific' correlations, and provide measures where possible. Some genes clearly don't show the same trends and should be discussed. 

      We thank reviewer for this comment. It is indeed an interesting observation (and worth highlighting by adding the fits to lineage-restricted data) that the relationship between relative change in Young’s modulus and the selected gene expression becomes steeper for samples from similar tissue contexts. 

      For the sake of keeping the main manuscript compact, we decided to keep Fig. S7 (formerly Fig. S4) in the supplement, however, we did add the linear fit to the glioblastoma dataset (pink line) and a fit to the related neural/embryonic datasets (gli, neural & pluri – purple line) as advised — see below.

      We did not pool the stomach data since it is represented by a single point in the figure, aligning with how the data is presented in the main text—stomach adenocarcinoma cell lines (MKN1 and MKN45) are pooled in Fig. 1B (see below).

      We have also amended the respective results section to emphasize that, in certain instances, the correlation between changes in mechanical phenotype and alterations in the expression of analysed genes may be less pronounced:

      “The relation between normalized apparent Young’s modulus change and fold-change in the expression of the target genes is presented in Fig. S7. The direction of changes in the expression levels between the soft and stiff cell states in the validation datasets was not always following the same direction (Fig. 4, C to F, Fig. S7). This suggests that the genes associated with cell mechanics may not have a monotonic relationship with cell stiffness, but rather are characterized by different expression regimes in which the expression change in opposite directions can have the same effect on cell stiffness. Additionally, in specific cases a relatively high change in Young’s modulus did not correspond to marked expression changes of a given gene — see for example low CAV1 changes observed in MCF10A PIK3CA mutant (Fig. S7A), or low IGFBP7 changes in intestine and lung carcinoma samples (Fig. S7C). This indicates that the importance of specific targets for the mechanical phenotype change may vary depending on the origin of the sample.”

      (5) Table-1 neuro: Perhaps I missed the use of the AFM measurements, but these need to be included more clearly in the Results somewhere. 

      To clarify: there were no AFM measurements performed for the developing neurons (neuro) dataset, and it is not marked as such in Table 1. There are previously published AFM measurements for the iPSCs dataset (maybe that caused the confusion?), and we referred to them as such in the table by citing the source (Urbanska et al (30)) as opposed to the statement “this paper” (see the last column of Table 1). We did not consider it necessary to include these previously published data. We have added additional horizontal lines to the table that will hopefully help in the table readability.

      Reviewer #3 (For Authors) 

      Major 

      -  I strongly encourage the authors to validate their approach with a gene for which mechanical data does not exist yet, or explore how the combination of the 5 identified genes is the novel regulator of cell mechanics. 

      We appreciate the reviewer’s insightful comment and agree that it would be highly interesting to validate further targets and perform combinatorial perturbations. However, it is not feasible at this point to expand the experimental data beyond the one already provided. We hope that in the future, the collective effort of the cell mechanics community will establish more genes that can be used for tuning of mechanical properties of cells.

      - If this paper aims at highlighting the power of PC-Corr as a novel inference approach, the authors should compare its predictive power to that of classical co-expression network analysis or an alternative gold standard. 

      We thank the reviewer for the suggestion to compare the predictive power of PC-Corr with classical co-expression network analysis or an alternative gold standard. PC-corr has been introduced and characterized in detail in a previous publication (Ciucci et al, 2017, Sci. Rep.), where it was compared against standard co-expression analysis methods. Here we implement PC-corr for a particular application. Thus, we do not see it as central to the message of the present manuscript to compare it with other available methods again.

      - The authors call their 5 identified genes "universal, trustworthy and specific". While they provide a great amount of data all is derived from human and mouse cell lines. I suggest toning this down. 

      We thank the reviewers for this comment. To clarify, the terms universal, trustworthy and specific are based on the specific hypotheses tested in the validation part of the manuscript, but we understand that it may cause confusion. We have now toned that the statement by adding “universal, trustworthy and specific across the studied mouse and human systems” in the following text fragments:

      (1) Abstract

      “(…) We validate in silico that the identified gene markers are universal, trustworthy and specific to the mechanical phenotype across the studied mouse and human systems, and demonstrate experimentally that a selected target, CAV1, changes the mechanical phenotype of cells accordingly when silenced or overexpressed. (...)”

      (2) Last paragraph of the introduction

      “(…) We then test the ability of each gene to classify cell states according to cell stiffness in silico on six further transcriptomic datasets and show that the individual genes, as well as their compression into a combinatorial marker, are universally, specifically and trustworthily associated with the mechanical phenotype across the studied mouse and human systems. (…)”

      (3) First paragraph of the discussion

      “We provided strong evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems.”

      Minor suggestions 

      -  The authors point out how genes that regulate mechanics often display non-monotonic relations with their mechanical outcome. Indeed, in Fig.4 developing neurons have lower CAV1 in the stiff group. Perturbing CAV1 expression in that model could show the nonmonotonic relation and strengthen their claim. 

      We thank reviewer for highlighting this important point. It would indeed be interesting to explore the changes in cell stiffness upon perturbation of CAV1 in a system that has a potential to show an opposing behavior. Unfortunately, we are unable to expand the experimental part of the manuscript at this time. We do hope that this point can be addressed in future research, either by our team or other researchers in the field. 

      -  In their gene ontology enrichment assay, the authors claim that their results point towards reduced transcriptional activity and reduced growth/proliferation in stiff compared to soft cells. Proving this with a simple proliferation assay would be a nice addition to the paper. 

      This is a valuable suggestion that should be followed up on in detail in the future. To give a preliminary insight into this line of investigation, we have had a look at the cell count data for the CAV1 knock down experiments in TGBC cells. Since CAV1 is associated with the GO Term “negative regulation of proliferation/transcription” (high CAV1 – low proliferation), we would expect that lowering the levels of CAV1 results in increased proliferation and higher cell counts at the end of experiment (3 days post transfection). As illustrated in Author response image 19 below, the cell counts were higher for the samples treated with CAV1 siRNAs, though, not in a statistically significant way. Interestingly, the magnitude of the effect partially mirrored the trends observed for the cell stiffness (Figure 5F).

      Author response image 19.

      The impact of CAV1 knock down on cell counts in TGBC cells. (A) Absolute cell counts per condition in a 6-well format. Cell counts were performed when harvesting for RT-DC measurements using an automated cell counter (Countess II, Thermo Fisher Scientific). (B) The event rates observed during the RT-DC measurements. The harvested cells are resuspended in a specific volume of measuring buffer standardized per experiment (50-100 μl); thus, the event rates reflect the absolute cell numbers in the respective samples. Horizontal lines delineate medians with mean absolute deviation (MAD) as error, datapoints represent individual measurement replicates, with symbols corresponding to matching measurement days. Statistical analysis was performed using two sample two-sided Wilcoxon rank sum test.

      Methods

      - The AFM indentation experiments are performed with a very soft cantilever at very high speeds. Why? Also, please mention whether the complete AFM curve was fitted with the Hertz/Sneddon model or only a certain area around the contact point. 

      We thank the reviewer for this comment. However, we believe that the spring constants and indentation speeds used in our study are typical for measurements of cells and not a cause of concern. 

      For the indentation experiments, we used Arrow-TL1 cantilevers (nominal spring constant k = 0.035-0.045 N m<sup>−1</sup>, Nanoworld, Switzerland) which are used routinely for cell indentation (with over 200 search results on Google Scholar using the term: "Arrow-TL1"+"cell", and several former publications from our group, including Munder et al 2016, Tavares et al 2017, Urbanska et al 2017, Taubenberger et al 2019, Abuhattum et al 2022, among others). Additionally, cantilevers with the spring constants as low as 0.01 N m−1 can be used for cell measurements (Radmacher 2002, Thomas et al, 2013). 

      The indentation speed of 5 µm s<sup>−1</sup> is not unusually high and does not result in significant hydrodynamic drag. 

      For the microrheology experiments, we used slightly stiffer and shorter (100/200 µm compared to 500 µm for Arrow-TL1) cantilevers: PNP-TR-TL (nominal spring constant k = 0.08 N m<sup>−1</sup>, Nanoworld, Switzerland). The measurement frequencies of 3-200 Hz correspond to movements slightly faster than 5 µm s<sup>−1</sup>, but cells were indented only to 100 nm, and the data were corrected for the hydrodynamic drag (see equation (8) in Methods section).

      Author response image 20.

      Exemplary indentation curve obtained using arrow-TL1 decorated with a 5-µm sphere on a ECC4 cell. The shown plot is exported directly from JPK Data Processing software. The area shaded in grey is the area used for fitting the Sneddon model.  

      In the indentation experiments, the curves were fitted to a maximal indentation of 1.5 μm (rarely exceeded, see Author response image 20). We have now added this information to the methods section:

      - Could the authors include the dataset wt #1 in Fig 4D? Does it display the same trend? 

      We thank the reviewer for this comment. To clarify: in the MCF10A dataset (GEO: GSE69822) there are exactly three replicates of each wt (wild type) and ki (knock-in, referring to the H1047R mutation in the PIK3CA) samples. The numbering wt#2, wt#3, wt#4 originated from the short names that were used in the working files containing non-averaged RPKM (possibly to three different measurement replicates that may have not been exactly paired with the ki samples). We have now renamed the samples as wt#1, wt#2 and wt#3 to avoid the confusion. This naming also reflects better the sample description as deposited in the GSE69822 dataset (see Author response table 2).

      Author response table 2.

      - Reference (3) is an opinion article with the last author as the sole author. It is used twice as a self-standing reference, which is confusing, as it suggests there is previous experimental evidence. 

      We thank the reviewer for pointing this out and agree that it may not be appropriate to cite the article (Guck 2019 Biophysical Reviews, formerly Reference (3), currently Reference (76)) in all instances. The references to this opinion article have now been removed from the introduction:

      “The extent to which cells can be deformed by external loads is determined by their mechanical properties, such as cell stiffness. Since the mechanical phenotype of cells has been shown to reflect functional cell changes, it is now well established as a sensitive label-free biophysical marker of cell state in health and disease (1-2).”

      “Alternatively, the problem can be reverse-engineered, in that omics datasets for systems with known mechanical phenotype changes are used for prediction of genes involved in the regulation of mechanical phenotype in a mechanomics approach.”

      But has been kept in the discussion:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function

      (76).”.

      This reference seems appropriate to us as it expands on the point that our ability to control cell mechanics will enable the exploration of its impact on cell and tissue function, which is central to the discussion of the current manuscript. 

      -The authors should mention what PC-corr means. Principle component correlation? Pearson's coefficient correlation? 

      PC-corr is a combination of loadings from the principal component (PC) analysis and Pearson’s correlation for each gene pair. We have aimed at conveying this in the “Discriminative network analysis on prediction datasets” result section. We have now added and extra sentence at the first appearance of PC-corr to clarify that for the readers from the start:

      “After characterizing the mechanical phenotype of the cell states, we set out to use the accompanying transcriptomic data to elucidate genes associated with the mechanical phenotype changes across the different model systems. To this end, we utilized a method for inferring phenotype-associated functional network modules from omics datasets termed PCCorr (28), that relies on combining loadings obtained from the principal component (PC) analysis and Pearson’s correlation (Corr) for every pair of genes. PC-Corr was performed individually on two prediction datasets, and the obtained results were overlayed to derive a conserved network module. Owing to the combination of the Pearson’s correlation coefficient and the discriminative information included in the PC loadings, the PC-corr analysis does not only consider gene co-expression — as is the case for classical co-expression network analysis — but also incorporates the relative relevance of each feature for discriminating between two or more conditions; in our case, the conditions representing soft and stiff phenotypes. The overlaying of the results from two different datasets allows for a multi-view analysis (utilizing multiple sets of features) and effectively merges the information from two different biological systems.”

      - The formatting of Table 1 is confusing. Horizontal lines should be added to make it clear to the reader which datasets are human and which mouse as well as which accession numbers belong to the carcinomas. 

      Horizontal lines have now been added to improve the readability of Table 1. We hope that makes the table easier to follow and satisfies the request. We assume that further modifications to the table appearance may occur during publishing process in accordance with the publisher’s guidelines. 

      - In many figures, data points are shown in different shapes without an explanation of what the shapes represent. 

      We thank the reviewer for this comment and apologize for not adding this information earlier. We have added explanations of the symbols to captions of Figures 2, 3, 5, and 6 in the main text:

      “Fig. 2. Mechanical properties of divergent cell states in five biological systems. Schematic overviews of the systems used in our study, alongside with the cell stiffness of individual cell states parametrized by Young’s moduli E. (…) Statistical analysis was performed using generalized linear mixed effects model. The symbol shapes represent measurements of cell lines derived from three different patients (A), matched experimental replicates (C), two different reprogramming series (D), and four different cell isolations (E). Data presented in (A) and (D) were previously published in ref (29) and (30), respectively.”

      “Fig. 3. Identification of putative targets involved in cell mechanics regulation. (A) Glioblastoma and iPSC transcriptomes used for the target prediction intersect at 9,452 genes. (B, C) PCA separation along two first principal components of the mechanically distinct cell states in the glioblastoma (B) and iPSC (C) datasets. The analysis was performed using the gene expression data from the intersection presented in (A). The symbol shapes in (B) represent cell lines derived from three different patients. (…)”

      “Fig. 5. Perturbing levels of CAV1 affects the mechanical phenotype of intestine carcinoma cells. (…) In (E), (F), (I), and (J), the symbol shapes represent experiment replicates.”

      “Fig. 6. Perturbations of CAV1 levels in MCF10A-ER-Src cells result in cell stiffness changes. (…)  Statistical analysis was performed using a two-sided Wilcoxon rank sum test. In (B), (D), and (E), the symbol shapes represent experiment replicates.”

      As well as to Figures S2, S9, and S11 in the supplementary material (in Figure S2, the symbol explanation was added to the legends in the figure panels as well): 

      “Fig. S2. Plots of area vs deformation for different cell states in the characterized systems. Panels correspond to the following systems: (A) glioblastoma, (B) carcinoma, (C) non-tumorigenic breast epithelia MCF10A, (D) induced pluripotent stem cells (iPSCs), and (E) developing neurons. 95%- and 50% density contours of data pooled from all measurements of given cell state are indicated by shaded areas and continuous lines, respectively. Datapoints indicate medians of individual measurements. The symbol shapes represent cell lines derived from three different patients (A), two different reprogramming series (D), and four different cell isolations (E), as indicated in the respective panels. (…).”

      “Fig. S9. CAV1 knock-out mouse embryonic fibroblasts (CAV1KO) have lower stiffness compared to the wild type cells (WT). (…) (C) Apparent Young’s modulus values estimated for WT and CAV1KO cells using areadeformation data in (B). The symbol shapes represent experimental replicates. (…)”

      “Fig. S11. Plots of area vs deformation from RT-DC measurements of cells with perturbed CAV1 levels. Panels correspond to the following experiments: (A and B) CAV1 knock-down in TGBC cells using esiRNA (A) and ONTarget siRNA (B), (C and D) transient CAV1 overexpression in ECC4 cells (C) and TGBC cells (D). Datapoints indicate medians of individual measurement replicates. The isoelasticity lines in the background (gray) indicate regions of of same apparent Young’s moduli. The symbol shapes represent experimental replicates.”

      - In Figure 2, the difference in stiffness appears bigger than it actually is because the y-axes are not starting at 0. 

      While we acknowledge that starting the y-axes at a value other than 0 is generally not ideal, we chose this approach to better display data variability and minimize empty space in the plots.

      A similar effect can be achieved with logarithmic scaling, which is a common practice (see  Author response image 21 for visualization). We believe our choice of axes cut-off enhances the interpretability of the data without misleading the viewer.

      Author response image 21.

      Visualization of different axis scaling strategies applied to the five datasets presented in Figure 2 of the manuscript. 

      Of note, apparent Young’s moduli obtained from RT-DC measurements typically span 0.5-3.0 kPa (see Figure 2.3 from Urbanska et al 2021, PhD thesis). Differences between treatments rarely exceed a few hundred pascals. For example, in an siRNA screen of mitotic cell mechanics regulators in Drosophila cells (Kc167), the strongest hits (e.g., Rho1, Rok, dia) showed changes in stiffness of 100-150 Pa (see Supplementary Figure 11 from Rosendahl, Plak et al 2018, Nature Methods 15(5): 355-358).

      - In Figure 3, I don't personally see the benefit of showing different cut-offs for PC-corr. In the end, the paper focuses on the 5 genes in the pentagram. I think only showing one of the cutoffs and better explaining why those target genes were picked would be sufficient and make it clearer for the reader. 

      We believe it is beneficial to show the extended networks for a few reasons. First, it demonstrates how the selected targets connect to the broader panel of the genes, and that the selected module is indeed much more interconnected than other nodes. Secondly, the chosen PC-corr cut-off is somewhat arbitrary and it may be interesting to look through the genes from the extended network as well, as they are likely also important for regulating cell mechanics. This broader view may help readers identify familiar genes and recognizing the connections to relevant signaling networks and processes of interest.

      - In Figure 4C, I suggest explaining why the FANTOM5 and not another dataset was used for the visualization here and mentioning whether the other datasets were similar. 

      In Figure 4C, we have chosen to present data corresponding to FANTOM5, because that was the only carcinoma dataset in which all the cell lines tested mechanically are presented. We have now added this information to the caption of Figure 4. Additionally, the clustergrams corresponding to the remaining carcinoma datasets (CCLE RNASeq, Genetech ) are presented in supplementary figures S4-S6. 

      “The target genes show clear differences in expression levels between the soft and stiff cell states and provide for clustering of the samples corresponding to different cell stiffnesses in both prediction and validation datasets (Fig. 4, Figs. S4-S6).”

      Typos 

      We would like to thank the Reviewer#3 for their detailed comments on the typos and details listed below. This is much appreciated as it improved the quality of our manuscript.

      -  In the first paragraph of the results section the 'and' should be removed from this sentence: Each dataset encompasses two or more cell states characterized by a distinct mechanical phenotype, and for which transcriptomic data is available. 

      The sentence has been corrected and now reads:

      “Each dataset encompasses two or more cell states characterized by a distinct mechanical phenotype, and for which transcriptomic data is available.”

      -  In the methods in the MCF10A PIK3CA cell lines part, it says cell liens instead of cell lines. 

      The sentence has been corrected and now reads:

      “The wt cells were additionally supplemented with 10 ng ml<sup>−1</sup> EGF (E9644, Sigma-Aldrich), while mutant cell lienes were maintained without EGF.”

      -  In the legend of Figure 6 "accession number: GSE17941, data previously published in ())" the reference is missing. 

      The reference has been added.

      -  In the legend of Figure 5 "(E) Verification of CAV1 knock-down in TGBC cells using two knock-down system" 'a' between using and two is missing. 

      The legend has been corrected (no ‘a’ is missing, but it should say systems (plural)):

      -  In Figure 5B one horizontal line is missing. 

      The Figure 5B has been corrected accordingly. 

      -  Terms such as de novo or in silico should be written in cursive. 

      We thank the Reviewer for this comment; however, we believe that in the style used by eLife, common Latin expressions such as de novo or in vitro are used in regular font.

      -  In the heading of Table 4 "The results presented in this table can be reproducible using the code and data available under the GitHub link reported in the methods section." It should say reproduced instead of reproducible. 

      Yes, indeed. It has been corrected.

      -  The citation of reference 20 contains several author names multiple times. 

      Indeed, it has been fixed now:

      -  In Figure S2 there is a vertical line in the zeros of the y axis labels. 

      I am not sure if there was some rendering issue, but we did not see a vertical line in the zeros of the y axis label in Figure S2.

      - The Text in Figure S4 is too small.                   

      We thank the reviewer for pointing this out. We have now revised Figure S7 (formerly Figure S4) to increase the text size, ensuring better readability. (It has also been updated to include additional fits as requested by Reviewer #2).

      - In Table 3 "positive hypothesis II markers are discriminative of samples with stiff/soft independent of data source" the words 'mechanical phenotype' are missing. 

      The column headings in Table 3 have now been updated accordingly.

      - In Table S3 explain in the table headline what vi1, vi2 and v are. I assume the loading for PC1, the loading for PC2 and the average of the previous two values. But it should be mentioned somewhere.

      The caption of table S3 has been updated to explain the meaning of vi1, vi2 and v.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a model of the whole somatosensory non-barrel cortex of the rat, with 4.2 million morphologically and electrically detailed neurons, with many aspects of the model constrained by a variety of data. The paper focuses on simulation experiments, testing a range of observations. These experiments are aimed at understanding how the multiscale organization of the cortical network shapes neural activity.

      Strengths:

      (1) The model is very large and detailed. With 4.2 million neurons and 13.2 billion synapses, as well as the level of biophysical realism employed, it is a highly comprehensive computational representation of the cortical network.

      (2) Large scope of work - the authors cover a variety of properties of the network structure and activity in this paper, from dendritic and synaptic physiology to multi-area neural activity.

      (3) Direct comparisons with experiments, shown throughout the paper, are laudable.

      (4) The authors make a number of observations, like describing how high-dimensional connectivity motifs shape patterns of neural activity, which can be useful for thinking about the relations between the structure and the function of the cortical network.

      (5) Sharing the simulation tools and a "large subvolume of the model" is appreciated.

      We thank the reviewer for these comments and are pleased they appreciated these aspects of the work.

      Weaknesses:

      (1) A substantial part of this paper - the first few figures - focuses on single-cell and single-synapse properties, with high similarity to what was shown in Markram et al., 2015. Details may differ, but overall it is quite similar.

      We thank the reviewer for this useful comment and agree that it is important to better highlight the incremental improvements to the model’s low-level physiology. The validity of any model can continuously be improved at all spatial scales and the validity of emergent network activity increases with improved validity at lower levels. For this reason, we felt it was valuable to improve the low-level physiology of the model.

      Regarding neuron physiology, we have added the following in Section 2.1 on page 5:

      “2.1 Improved modeling and validation of neuron physiology

      Similarly to Markram et al. (2015), electrical properties of single neurons were modelled by optimizing ion channel densities in specific compartment-types (soma, axon initial segment (AIS), basal dendrite, and apical dendrite) (Figure 2B) using an evolutionary algorithm (IBEA; Van Geit et al., 2016) so that each neuron recreates electrical features of its corresponding electrical type (e-type) under multiple standardized protocols. Compared to Markram et al. (2015), electrical models were optimized and validated using 1) additional in vitro data, features and protocols, 2) ion channel and electrophysiological data corrected for the liquid junction potential, and 3) stochastic channels (StochKv3) now including inactivation profiles. The methodology and resulting electrical models are described in Reva et al. (2023) (see Methods), and generated quantitatively more accurate electrical activity, including improved attenuation of excitatory postsynaptic potentials (EPSPs) and back-propagating action potentials.”

      And page 8:

      “The new neuron models saw a 5-fold improvement in generalizability compared to Markram et al. (2015) (Reva et al., 2023).”

      We have also made the descriptions of the improvements to synaptic physiology more explicit in Section 2.2 on page 9:

      “2.2 Improved modeling and validation of synaptic physiology

      The biological realism of synaptic physiology was improved relative to Markram et al. (2015) using additional data sources and by extending the stochastic version of the Tsodyks-Markram model (Tsodyks and Markram, 1997; Markram et al., 1998; Fuhrmann et al., 2002; Loebel et al., 2009) to feature multi-vesicular release, which in turn improved the accuracy of the coefficient of variations (CV; std/mean) of postsynaptic potentials (PSPs) as described in Barros-Zulaica et al. (2019) and Ecker et al. (2020). The model assumes a pool of available vesicles that is utilized by a presynaptic action potential, with a release probability dependent on the extracellular calcium concentration ([Ca2+]o; Ohana and Sakmann, 1998; Rozov et al., 2001; Borst, 2010). Additionally, single vesicles spontaneously release as an additional source of variability with a low frequency (with improved calibration relative to Markram et al. (2015)). The utilization of vesicles leads to a postsynaptic conductance with bi-exponential kinetics. Short-term plasticity (STP) dynamics in response to sustained presynaptic activation are either facilitating (E1/I1), depressing (E2/I2), or pseudo-linear (I3). E synaptic currents consist of both AMPA and NMDA components, whilst I currents consist of a single GABAA component, except for neurogliaform cells, whose synapses also feature a slow GABAB component. The NMDA component of E synaptic currents depends on the state of the Mg2+ block (Jahr and Stevens, 1990), with the improved fitting of parameters to cortical recordings from Vargas-Caballero and Robinson (2003) by Chindemi et al. (2022).”

      (2) Although the paper is about the model of the whole non-barrel somatosensory cortex, out of all figures, only one deals with simulations of the whole non-barrel somatosensory cortex. Most figures focus on simulations that involve one or a few "microcolumns". Again, it is rather similar to what was done by Markram et al., 2015 and constitutes relatively incremental progress.

      We thank the reviewer for this comment and have added the following text to the Discussion on page 33 to explain our rationale:

      “In keeping with the philosophy of compartmentalization of parameters and continuous model refinement (see Introduction), it was essential to improve validity at the columnar scale (relative to Markram et al. (2015)) as part of demonstrating validity of the full nbS1. Indeed, improved parametrization and validation at smaller scales was essential to parameterizing background input which generated robust nbS1 activity within realistic [Ca<sup>2+</sup>]<sub>o</sub> and firing rate ranges. We view this as a major achievement, as it was unknown whether the model would achieve a stable and meaningful regime at the start of our investigation. Whilst we would have liked to go further, our primary goal was to publish a well characterized model as an open resource that others could use to undertake further in-depth studies. In this regard, we are pleased that the parametrization of the nbS1 model has already been used to study EEG signals (Tharayil et al., 2024), as well as propagation of activity between two subregions (Bolaños-Puchet and Reimann, 2024).”

      We also make it clearer in the Introduction on page 4 that the improved validation of the emergent columnar regime was essential to stable activity at the larger scale:

      “These initial validations demonstrated that the model was in a more accurate regime compared to Markram et al. (2015) – an essential step before testing more complex or larger-scale validations. For example, under the same parameterization we then observed selective propagation of stimulus-evoked activity to downstream areas, and…”

      (3) With a model like this, one has an opportunity to investigate computations and interactions across an extensive cortical network in an in vivo-like context. However, the simulations presented are not addressing realistic specific situations corresponding to animals performing a task or perceiving a relevant somatosensory stimulus. This makes the insights into the roles of cell types or connectivity architecture less interesting, as they are presented for relatively abstract situations. It is hard to see their relationship to important questions that the community would be excited about - theoretical concepts like predictive coding, biophysical mechanisms like dendritic nonlinearities, or circuit properties like feedforward, lateral, and feedback processing across interacting cortical areas. In other words, what do we learn from this work conceptually, especially, about the whole non-barrel somatosensory cortex?

      We thank the reviewer for this comment and agree that it would be very interesting to explore such topics. In the Introduction on page 4, we have updated the list of papers which have so far used the model for more in depth studies:

      “…propagation of activity between cortical areas (Bolaños-Puchet and Reimann, 2024) the role of non-random connectivity motifs on network activity (Pokorny et al., 2024) and reliability (Egas Santander et al., 2024), the composition of high-level electrical signals such as the EEG (Tharayil et al., 2024), and how spike sorting biases population codes (Laquitaine et al., 2024).”

      In the Discussion on page 33 we also add our additional thoughts on this topic:

      “Whilst we would have liked to go further, our primary goal was to publish a well characterized model as an open resource that others could use to undertake further in-depth studies. In this regard, we are pleased that the parametrization of the nbS1 model has already been used to study EEG signals (Tharayil et al., 2024), as well as propagation of activity between two subregions (Bolaños-Puchet and Reimann, 2024). Investigation, improvement and validation must be continued at all spatial scales in follow up papers with detailed description, figures and analysis, which cannot be covered in this manuscript. Each new study increases the scope and validity of future investigations. In this way, this model and paper act as a stepping stone towards more complex questions of interest to the community such as perception, task performance, predictive coding and dendritic processing. This was similar for Markram et al. (2015) where the initial paper was followed by more detailed studies. Unlike the Markram et al. (2015) model, the new model can also be exploited by the community and has already been used in a number of follow up papers studying (Ecker et al., 2024a,b; Bolaños-Puchet and Reimann, 2024; Pokorny et al., 2024; Egas Santander et al., 2024; Tharayil et al., 2024; Laquitaine et al., 2024). We believe that the number of use cases for such a general model is vast, and is made larger by the increased size of the model.”

      (4) Most comparisons with in vivo-like activity are done using experimental data for whisker deflection (plus some from the visual stimulation in V1). But this model is for the non-barrel somatosensory cortex, so exactly the part of the cortex that has less to do with whiskers (or vision). Is it not possible to find any in vivo neural activity data from the non-barrel cortex?

      We agree with the reviewer that this is a weakness. We have expanded our discussion of the need to mix data sources to also consider our view for network level activity:

      “This paper and its companion paper serve to present a methodology for modeling micro- and mesoscale anatomy and physiology, which can be applied for other cortical regions and species. With the rapid increase in openly available data, efforts are already in progress to build models of mouse brain regions with reduced reliance on data mixing thanks to much larger quantities of available atlas-based data. This also includes data for the validation of emergent network level activity. Here we chose to compare network-level activity to data mostly from the barrel cortex, as well as a single study from primary visual cortex. Whilst a lot of the data used to build the model was from the barrel cortex, the barrel cortex also represents a very well characterized model of cortical processing for simple and controlled sensory stimuli. The initial comparison of population-wise responses in response to accurate thalamic input for single whisker deflections was essential to demonstrating that the model was closer to in vivo, and we were unaware of similar data for nonbarrel somatosensory regions. Moreover, our optogenetic & lesion study demonstrated the capacity to compare and extend studies of canonical cortical processing in the whisker system.”

      (5) The authors almost do not show raw spike rasters or firing rates. I am sure most readers would want to decide for themselves whether the model makes sense, and for that, the first thing to do is to look at raster plots and distributions of firing rates. Instead, the authors show comparisons with in vivo data using highly processed, normalized metrics.

      We thank the reviewer for this comment and agree that better visualizations of the network activity under different conditions is essential for helping the reader assess the work. In addition to raster plots in Video 1, Video 3, Fig 6, Fig 5C, Fig S9a, S16a, we have additionally:

      a) Changed the histograms of spontaneous activity in Fig 4G on page 13 to raster plots for the seven column subvolume for two contrasting meta-parameter regimes.

      b) Added 4 new videos (Video 6a,b and 8a,b) showing all spontaneous and evoked meta-parameter combinations in hex0 and hex39 of the nbS1:

      We have added improved plots showing the distributions of firing rates in the seven column subvolume on page 74:

      With more detailed consideration in the Results on page 15:

      “Long-tailed population firing rate distributions with means ∼ 1Hz

      To study the firing rate distributions of different subpopulations and m-types, we ran 50s simulations for the meta-parameter combinations: [Ca<sup>2+</sup>]<sub>o</sub>: 1.05mM, R<sub>OU</sub>: 0.4,P<sub>FR</sub>: 0.3, 0.7 (Figure S4). Different subpopulations showed different sparsity levels (proportion of neurons spiking at least once) ranging from 6.6 to 42.5%. Wohrer et al. (2013) considered in detail the biases and challenges in obtaining ground truth firing rate distributions in vivo, and discuss the wide heterogeneity of reports in different modalities using different recording techniques. They conclude that most evidence points towards longtailed distributions with peaks just below 1Hz. We confirmed that spontaneous firing rate distributions were long-tailed (approximately lognormally distributed) with means on the order of 1Hz for most subpopulations. Importantly the layer-wise means were just below 1Hz in all layers for the P<sub>FR</sub> = 0.3 meta-parameter combination. Moreover, our recent work applying spike sorting to extracellular activity using this meta-parameter combination found spike sorted firing rate distributions to be lognormally distributed and very similar to in vivo distributions obtained using the same probe geometry and spike sorter (Laquitaine et al., 2024).

      (6) While the authors claim that their model with one set of parameters reproduces many experimentally established metrics, that is not entirely what one finds. Instead, they provide different levels of overall stimulation to their model (adjusting the target "P_FR" parameter, with values from 0 to 1, and other parameters), and that influences results. If I get this right (the figures could really be improved with better organization and labeling), simulations withP<sub>FR</sub> closer to 1 provide more realistic firing rate levels for a few different cases, however, P<sub>FR</sub> of 0.3 and possibly above tends to cause highly synchronized activity - what the authors call bursting, but which also could be called epileptic-like activity in the network.

      We thank the reviewer for this comment. We can now see that the motivation for P<sub>FR</sub> parameter was introduced very briefly in the results and that the results of the calibration and analysis of the spontaneous activity regime are not interpreted in relation to this parameter.

      To address this, we have given more detail where it is first introduced in the Results on page 12:

      “to account for uncertainty in the firing rate bias during spontaneous activity from extracellular spike sorted recordings…”

      We then reconsider that it represents an unknown bias when interpreting the calibration and spontaneous activity results on page 15:

      “We reemphasize that the [Ca<sup>2+</sup>]<sub>o</sub>, R<sub>OU</sub> and P<sub>FR</sub> meta-parameters account for uncertainty of in vivo extracellular calcium concentration, the nature of inputs from other brain regions and the bias of extracellularly recorded firing rates. Whilst estimates for [Ca<sup>2+</sup>]<sub>o</sub> are between 1.0 - 1.1mM (Jones and Keep, 1988; Massimini and Amzica, 2001; Amzica et al., 2002; Gonzalez et al., 2022) and estimates for PFR are in the range of 0.1 - 0.3 (Olshausen and Field, 2006), combinations of these parameters supporting in vivo-like stimulus responses in later sections will offer a prediction for the true values of these parameters. Both these later results and our recent analysis of spike sorting bias using this model (Laquitaine et al., 2024) predict a spike sorting bias corresponding to P<sub>FR</sub> ∼ 0.3, confirming the prediction of Olshausen and Field (2006).”

      And in relation to the stimulus evoked responses on page 17:

      “Specifically, simulations with PFR from 0.1 to 0.5 robustly support realistic stimulus responses, with the middle of this range (0.3) corresponding with estimates of in vivo recording bias; both the previous estimates of Olshausen and Field (2006) and from a spike sorting study using this model (Laquitaine et al., 2024).”

      Following these considerations, the remainder of the experiments using the seven column subvolume only use a single meta-parameter on page 19.

      For the full nbS1 we further discuss the importance of a P_FR value between 0.1 and 0.3 in the Results on page 26:

      “Stable spontaneous activity only emerges in nbS1 at predicted in vivo firing rates

      After calibrating the model of extrinsic synaptic input for the seven column subvolume, we tested to what degree the calibration generalizes to the entire nbS1. Notably, this included the addition of mid-range connectivity (Reimann et al., 2024). The total number of local and mid-range synapses in the model was 9138 billion and 4075 billion, i.e., on average full model simulations increased the number of intrinsic synapses onto a neuron by 45%. Particularly, we ran simulations for P<sub>FR</sub></i ∈ [0.1, 0.15, ..., 0.3] using the OU parameters calibrated for the seven column subvolume for [Ca<sup>2+</sup>]<sub>o</sub> = 1.05mM and R<sub>OU</sub> = 0.4. Each of these full nbS1 simulations produced stable non-bursting activity (Figure 8A), except for the simulation for P<sub>FR</sub></i = 0.3, which produced network-wide bursting activity (Video 6). Activity levels in the simulations of spontaneous activity were heterogeneous (Figure 8B, Video 7). In some areas, firing rates were equal to the target P<sub>FR</sub>, whilst in others they increased above the target (Figure 8C). In the more active regions, mean firing rates (averaged over layers) were on the order of 30-35% of the in vivo references for the maximum non-bursting P<sub>FR</sub> simulation (target P<sub>FR</sub> : 0.25). This range of firing rates again fits with the estimate of firing rate bias from our paper studying spike sorting bias (Laquitaine et al., 2024) and the meta-parameter range supporting realistic stimulus responses in the seven column subvolume. This also predicts that the nbS1 cannot sustain higher firing rates without entering a bursting regime.

      Finally, we also added to our discussion of biases in extracellular firing rates in the Discussion on page 32:

      “This is also inline with our recent work using the model, which estimated a spike sorting bias corresponding to PFR = 0.3 using virtual extracellular electrodes (Laquitaine et al., 2024).”

      We also thank the reviewer for pointing out that we did not define the term “bursting” in the main text. We have added the following definition and discussion in the Results on page 15:

      “Note that the most correlated meta-parameter combination [Ca<sup>2+</sup>]<sub>o</sub>: 1.1mM, R<sub>OU</sub>: 0.2, P<sub>FR</sub>: 1.0 produced network-wide “bursting” activity, which we define as highly synchronous all or nothing events (Video 1). Such activity, which may be characteristic of epileptic activity, can be studied with the model but is not the focus of this study.”

      (7) The authors mention that the model is available online, but the "Resource availability" section does not describe that in substantial detail. As they mention in the Abstract, it is only a subvolume that is available. That might be fine, but more detail in appropriate parts of the paper would be useful.

      Firstly, we are pleased to say that the full nbS1 model is now available to download, in addition to the seven hexagon subvolume. In the manuscript, we have:

      a) Added to the Introduction at the bottom of page 4:

      “To provide a framework for further studies and integration of experimental data, the full model is made available with simulation tools, as well as a smaller subvolume with the optional new connectome capturing inhibitory targeting rules from electron microscopy”.

      b) Updated the open source panel of Figure 1:

      Secondly, we thank the reviewer for noticing that the description of the available model is not well described in the “Resource availability” statement and have addressed this by:

      a) Adding the following to the “Resource availability” statement on page 36:

      “Both the full nbS1 model and smaller seven hexagon subvolume are available on Harvard Dataverse and Zenodo respectively in SONATA format (Dai et al., 2020) with simulation code. DOIs are listed under the heading ``Final simulatable models'' in the Key resources table. An additional link is provided to the SM-Connectome with instructions on how to use it with the seven hexagon subvolume model.”

      b) Creating a new subheading in the “Key resources table” titled: “Final simulatable models” to make it clearer which links refer to the final models.

      Reviewer #2 (Public review):

      Summary:

      This paper is a companion to Reimann et al. (2022), presenting a large-scale, data-driven, biophysically detailed model of the non-barrel primary somatosensory cortex (nbS1). To achieve this unprecedented scale of a bottom-up model, approximately 140 times larger than the previous model (Markram et al., 2015), they developed new methods to account for inputs from missing brain areas, among other improvements. Isbister et al. focus on detailing these methodological advancements and describing the model's ability to reproduce in vivo-like spontaneous, stimulus-evoked, and optogenetically modified activity.

      Strengths:

      The model generated a series of predictions that are currently impossible in vivo, as summarized in Table S1. Additionally, the tools used in this study are made available online, fostering community-based exploration. Together with the companion paper, this study makes significant contributions by detailing the model's constraints, validations, and potential caveats, which are likely to serve as a basis for advancing further research in this area.

      We thank the reviewer for these comments, and are pleased they appreciate these aspects of the work.

      Weaknesses:

      That said, I have several suggestions to improve clarity and strengthen the validation of the model's in vivo relevance.

      Major:

      (1) For the stimulus-response simulations, the authors should also reference, analyze, and compare data from O'Connor et al. (2010; https://pubmed.ncbi.nlm.nih.gov/20869600/) and Yu et al .(2016; https://pubmed.ncbi.nlm.nih.gov/27749825/) in addition to Yu et al. 2019, which is the only data source the authors consider for an awake response. The authors mentioned bias in spike rate measurements, but O'Connor et al. used cell-attached recordings, which do not suffer from activity-based selection bias (in addition, they also performed Ca2+ imaging of L2/3). This was done in the exact same task as Yu et al., 2019, and they recorded from over 100 neurons across layers. Combining this data with Yu et al., 2019 would provide a comprehensive view of activity across layers and inhibitory cell types. Additionally, Yu et al. (2016) recorded VPM neurons in the same task, alongside whole-cell recordings in L4, showing that L4 PV neurons filter movement-related signals encoded in thalamocortical inputs during active touch. This dataset is more suitable for extracting VPM activity, as it was collected under the same behavior and from the same species (Unlike Diamond et al., 1992, which used anesthetized rats). Furthermore, this filtering is an interesting computation performed by the network the authors modeled. The validation would be significantly strengthened and more biologically interesting if the authors could also reproduce the filtering properties, membrane potential dynamics, and variability in the encoding of touch across neurons, not just the latency (which is likely largely determined by the distance and number of synapses).

      We thank the reviewer for pointing out these very useful studies. We have taken on board this suggestion for a future model of the mouse barrel cortex.

      (2) The authors mention that in the model, the response of the main activated downstream area was confined to L6. Is this consistent with in vivo observations? Additionally, is there any in vivo characterization of the distance dependence of spiking correlation to validate Figure 8I?

      We are not aware of data confirming the propagation of activity to downstream areas being confined to layer 6 but have considered the connectivity further between these two regions on page 27, as well as studying this further in follow up work:

      “Stable propagation of evoked activity through mid-range connectivity only emerges in nbS1 at predicted in vivo firing rates

      We repeated the previous single whisker deflection evoked activity experiment in the full model, providing a synchronous thalamic input into the forelimb sub-region (S1FL; Figure 8E; Video 8 & 9). Responses in S1FL were remarkably similar to the ones in the seven column subvolume, including the delays and decays of activity (Figure 8F). However, in addition to a localized primary response in S1FL within 350μm of the stimulus, we found several secondary responses at distal locations (Figure 8E; Video 9), which was suggestive of selective propagation of the stimulus-evoked signal to downstream areas efferently connected by mid-range connectivity. The response of the main activated downstream area (visible in Figure 8E) was confined to L6 (Figure 8G). In a follow up study using the model to explore the propagation of activity between cortical regions (Bolaños-Puchet and Reimann, 2024), it is described how the model contains both a feedforward projection pattern, which projects to principally to synapses in L1 & L23, and a feedback type pattern, which principally projects to synapses in L1 & L6. On visualizing the innervation profile from the stimulated hexagon to the downstream hexagon we can see that we have stimulated a feedback pathway (Figure S16)”

      With referenced Figure S16 on page 85:

      We did find in vivo evidence of similar layer-wise and distance dependence of correlations in the somatosensory cortex discussed on page 27 of the Results:

      “The distance dependence of correlations followed a similar profile to that observed in a dataset characterizing spontaneous activity in the somatosensory cortex (Reyes-Puerta et al., 2015a) (compare red line in Figure 8I with Figure S16). In the in vivo dataset spiking correlation was also low but highest in lower layers, with short “up-states” in spiking activity constrained to L5 & 6 (see Figure 1E,F in (Reyes-Puerta et al., 2015a)). In the model, they are constrained to L6.”

      With Figure S16a on page 85 showing the distance dependence of correlations in the anaesthetized barrel cortex during spontaneous activity (digitization from the reference paper):

      (3) Across the figures, activity is averaged across neurons within layers and E or I cell types, with a limited description of single-cell type and single-cell responses. Were there any predictions regarding the responses of particular cell types that significantly differ from others in the same layer? Such predictions could be valuable for future investigations and could showcase the advantages of a data-driven, biophysically detailed model.

      We thank the review for this comment. In addition to new analyses at higher granularity addressed in other comments, we have added the following comparison of stimulus-evoked membrane potential dynamics in different subpopulations for the original connectome and SM-connectome in Figure 7 on page 24.

      This gave interesting results discussed in a new subsection on page 26:

      “EM targeting trends hyperpolarize Sst+ and HT3aR+ late response, and disinhibit L5/6 E

      Studying somatic membrane potentials for different subpopulations in response to whisker deflections shows that PV+, L23E and L4E subpopulations are largely unaffected in the SM-connectome (Figure 7E). Interestingly, Sst+ and 5HT3aR+ subpopulations show a strong hyperpolarization in the late response that isn’t present in the original connectome. Interestingly, this corresponds with a stronger late response in L5/6 E populations, which could be caused by disinhibition due to the Sst+ and 5HT3aR+ hyperpolarization. This could be explored further in follow up studies using our connectome manipulator tool (Pokorny et al., 2024).”

      (4) 2.4: Are there caveats to assuming the OU process as a model for missing inputs? Inputs to the cortex are usually correlated and low-dimensional (i.e., communication subspace between cortical regions), but the OU process assumes independent conductance injection. Can (weakly) correlated inputs give rise to different activity regimes in the model? Can you add a discussion on this?

      We agree with the reviewer that there are caveats to assuming an OU process for the model of missing inputs and have added the following to the Discussion on page 31:

      “The calibration framework could optimize per population parameters for other compensation methods, whilst still offering an interpretable spectrum of firing rate regimes at different levels of P<sub>FR</sub>. For example, more realistic compensation schemes could be explored which introduce a) correlations between the inputs received by different neurons and b) compensation distributed across dendrites, as well as at the soma. We predict that such changes would make spontaneous activity more correlated at the lower spontaneous firing rates which supported in vivo like responses (P<sub>FR</sub> : 0.1 − 0.5), which would in turn make stimulus-responses more noise correlated.”

      (5) 2.6: The network structure is well characterized in the companion paper, where the authors report that correlations in higher dimensions were driven by a small number of neurons with high participation ratios. It would be interesting to identify which cell types exhibit high node participation in high-dimensional simplices and examine the spiking activity of cells within these motifs. This could generate testable predictions and inform theoretical cell-type-specific point neuron models for excitatory/inhibitory balanced networks and cortical processing.

      We thank the reviewer for this suggestion. We have added two supplementary figures to address this suggestion, which are discussed in the Results on Page 16:

      “Additionally, we studied the structural effect on the firing rate (here measured as the inverse of the inter-spike interval, ISI, which can be thought of as a proxy of non-zero firing rate). We found that for the connected circuit, the firing rate increases with simplex dimension; in contrast with the disconnected circuit, where this relationship remains flat (see Figure S6 red vs. blue curves and Methods).

      This also demonstrates high variability between neurons, in line with biology, both structurally (Towlson et al., 2013; Nigam et al., 2016) and functionally (Wohrer et al., 2013; Buzs´aki and Mizuseki, 2014). We next identified the cell types that are overexpressed in the group of neurons that have the 5% highest values of node participation across dimensions (Figure S7). This could inform theoretical point neuron models with cell-type specificity, for example. We found that while in dimension one (i.e., node degree) this consists mostly of inhibitory cells, in higher dimensions the cell types concentrate in layers 4, 5 and 6, especially for TPC neurons. This is in line with our structural layer-wise findings in Figure 8B in Reimann et al. (2024).”

      Which reference new Figures S6 and S7:

      With the methodology for S6 described on page 49 of the Methods:

      “For any numeric property of neurons, e.g., firing rate, we evaluate the effect of dimension on it by taking weighted averages across dimensions. That is for each dimension k, we take the weighted average of the property across neurons where the weights are given by node participation on dimension k. More precisely, let N be the number of neurons and −→V ∈ RN, be a vector of a property on all the neurons e.g., the vector of firing rates. Then in each dimension k we compute

      Where is the vector of node participation on dimension k for all neurons and ・ is the dot product.

      To measure the over and underexpression of the different m-types among those with the highest 5% of values of node participation, we used the hypergeometric distribution to determine the expected distribution of m-types in a random sample of the same size. More precisely, for each dimension k and m-type m, let N<sub>total</sub> be the total number of neurons in the circuit, Nm be the number of neurons of m-type m in the circuit, Ctop be the number of neurons with the highest 5% values of node participation in dimension k, Cm the number of neurons of mtype m among these, and let P = hypergeom(N<sub>total</sub<,N<sub>m</sub>,C<sub>top</sub>) be the hypergeometric distribution.

      By definition, P(x) describes the probability of sampling x neurons of m-type m in a random sample of size C<sub>top</sub>. Therefore, using the cumulative distribution F(x) = P(Counts ≤ x), we can compute the p-values as follows:

      Small values indicate under and over representation respectively….”

      Minor:

      (1) Since the previous model was published in 2015, the neuroscience field has seen significant advancements in single-cell and single-nucleus sequencing, leading to the clustering of transcriptomic cell types in the entire mouse brain. For instance, the Allen Institute has identified ~10 distinct glutamatergic cell types in layer 5, which exceeds the number incorporated into the current model. Could you discuss 1) the relationship between the modeled me-types and these transcriptomic cell types, and 2) how future models will evolve to integrate this new information? If there are gaps in knowledge in order to incorporate some transcriptome cell types into your model, it would be helpful to highlight them so that efforts can be directed toward addressing these areas.

      We thank the reviewer for this suggestion, particularly the idea to describe what types of data would be valuable towards improving the model in future. We have added the following to the Discussion on page 33:

      “In our previous work (Roussel et al., 2023) we linked mouse inhibitory me-models to transcriptomic types (t-types) in a whole mouse cortex transcriptomic dataset (Gouwens et al., 2019). This can provide a direct correspondence in future large-scale mouse models. As we model only a single electrical type for pyramidal cells there is no one-to-one correspondence between our me-models and the 10 different pyramidal cell types identified there. We are not currently aware of any method which can recreate the electrical features of different types of pyramidal cells using only generic ion channel models. To achieve the firing pattern behavior of more specific electrical types, usually ion channel kinetics are tweaked, and this would violate the compartmentalization of parameters. In future we hope to build morpho-electric-transcriptomic type (met-type) models by selecting gene-specific ion channel models (Ranjan et al., 2019, 2024) based on the met-type’s gene expression. Data specific to different neuron sections (i.e. soma, AIS, apical/basel dendrites) of different met-types, such as gene expression, distribution of ion channels, and voltage recordings under standard single cell protocols would be particularly useful.”

      (2) For the optogenetic manipulation, it would be interesting if the model could reproduce the paradoxical effects (for example, Mahrach et al. reported paradoxical effects caused by PV manipulation in S1; https://pubmed.ncbi.nlm.nih.gov/31951197/). This seems a more relevant and non-trivial network phenomenon than the V1 manipulation the authors attempted to replicate.

      We thank the reviewer for this valuable idea. Indeed, our model is able to reproduce paradoxical effects under certain conditions. We added the following new supplementary Figure S12 demonstrating this finding (black arrows).

      Which we discuss in the Results on page 22:

      “However, at high contrasts, we observed a paradoxical effect of the optogenetic stimulation on L6 PV+ neurons, reducing their activity with increasing stimulation strength (Figure S12B; cf. Mahrach et al. (2020)). This effect did not occur under grey screen conditions (i.e., at contrast 0.0) with a constant background firing rate of 0.2 Hz or 5 Hz respectively (not shown). The individual…”

      and added to the Discussion on page 32:

      “Also, we predicted a paradoxical effect of optogenetic stimulation on L6 PV+ interneurons, namely a decrease in firing with increased stimulus strength. This is reminiscent of the paradoxical responses found by Mahrach et al. (2020) in the mouse anterior lateral motor cortex (in L5, but not in L2/3) and barrel cortex (no layer distinction) respectively. While Mahrach et al. (2020) conducted their recordings in awake mice not engaged in any behavior, we observed this effect only when drifting grating patterns with high contrast were presented. Nevertheless, consistent with their findings, we found the effect only in deep but not in superficial layers, and only for PV+ interneurons but not for PCs. Our model could therefore be used to improve the understanding of this paradoxical effect in follow up studies. These examples demonstrate that the approach of modeling entire brain regions can be used to further probe the topics of the original articles and cortical processing.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      My specific comments are in the Public Review. The summarizing point is that this is a sprawling paper, and it is easy for readers to get confused. Focusing on specific connections between known functional properties and findings in this model, especially for the full-scale model, will be helpful.

      We thank the reviewer for this comment and for their related recommendation (4) below, and have added subheadings through-out the results.

      Reviewer #2 (Recommendations for the authors):

      (1) P4. What are the 10 free parameters?

      We thank the reviewer for pointing out that it would be useful to summarize the 10 parameters at this stage of the text, and have adjusted the sentence to:

      “As a result, the emerging in-vivo like activity is the consequence of only 10 free parameters representing the strength of extrinsic input from other brain regions into 9 layer-specific excitatory and inhibitory populations, and a parameter controlling the noise structure of this extrinsic input.”

      (2) Table 1 and S1 are extremely useful. Could you provide a table summarizing the major assumptions or gaps in the model, their potential influence on the results, and possible ways to collect data that could support or challenge these assumptions? Currently, this information is scattered throughout the manuscript.

      We thank the reviewer for this very useful suggestion and have added a Table S8 on page 68:

      (3) Figure 4F is important, but the legend is unclear. What is the unit on the x-axis? The values seem too large to represent per-neuron measurements.

      Thank you to the reviewer for raising this. Indeed the values are estimated mean numbers of missing number synapses per neuron by population. Such numbers are difficult to estimate but we have further discussed our rationale, justification and consideration of whether these numbers are accurate in the Results, as follows:

      “Heterogeneity in synaptic density within and across neuron classes and sections makes estimating the number of missing synapses challenging (DeFelipe and Fariñas, 1992). Changing the assumed synaptic density value of 1.1 synapses/μm would only change the slope of the relationship, however. Estimates of mean number of existing and missing synapses per population were within reasonable ranges; even the larger estimate for L5 E (due to higher dendritic length; Figure S3) was within biological estimates of 13,000 ± 3,500 total afferent synapses (DeFelipe and Fariñas, 1992).”

      This text references the new supplementary Figure S3:

      Moreover, these numbers represent the number of synapses, rather than the number of connections. The number of connections is usually used for quantifications such as indegree, and are usually much lower.

      We have also updated the caption and axis labels of the original figure:

      (4) Including additional subsections or improving the indexing in the Results section could be beneficial. In its current format, it's difficult to distinguish where the model description ends and where the validation begins. Some readers may want to focus more on the validation than other parts, so clearer segmentation would improve readability.

      We have addressed this comment with the opening comment in the authors “Recommendations for authors”.

      (5) P4. 2nd paragraph. Original vs rewired connectome. The term "rewired connectome" may give the impression that it refers to an artificial manipulation rather than a modification based on the latest data. It might be helpful to use a different term (e.g., SM-connectome as described later in the paper?).

      We have adjusted the text in the introduction:

      “Additionally, we generated a new connectome which captured recently characterized spatially-specific targeting rules for different inhibitory neuron types (Schneider-Mizell et al., 2023) in the MICrONS electron microscopy dataset (MICrONS-Consortium et al., 2021), such as increased perisomatic targeting by PV+ neurons, and increased targeting of inhibitory populations by VIP+ neurons. Comparing activity to the original connectome gave predictions about the role of these additional targeting rules.”

      (6) Figures 7 B, C, D: what is v1/v2? Original vs SM-Connectome?

      We thank the reviewer for noticing this and have corrected the figure to use “Orig” and “SM” consistent with the rest of the figure.

      (7) Page 23, 2.10: what is phi?

      We thank the reviewer for noticing this inconsistency with the earlier text, and have updated the text to read: “Particularly, we ran simulations for PF R ∈ [0.1, 0.15, ..., 0.3] using the OU para-maters calibrated for the seven column subvolume for [Ca<sup>2+</sup>] = 1.05 mM and R<sub>OU</sub> = 0.4.”