10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This valuable study presents a resource for researchers using Drosophila to study neural circuits, in the form of a collection of split-Gal4 lines with an online search engine, which will facilitate the mapping of neuronal circuits. The evidence is convincing to demonstrate the utility of these new tools, and of the search engine, for understanding expression patterns in adults and larvae, and differences between the sexes. These resources will be of broad interest to Drosophila researchers in the field of neurobiology.

    2. Reviewer #1 (Public review):

      Summary:

      Meissner et al describe an update on the collection of split-GAL4 lines generated by a consortium led by Janelia Research Campus. This follows the same experimental pipeline described before and presents as a significant increment to the present collection. This will strengthen the usefulness and relevance of "splits" as a standard tool for labs that already use this tool and attract more labs and researchers to use it.

      Strengths:

      This manuscript presents a solid step to establish Split-GAL4 lines as a relevant tool in the powerful Drosophila toolkit. Not only the raw number of available lines contribute to the relevance of this tool in the "technical landscape" of genetic tools, but additional features of this effort contribute to the successful adoption. These include:

      (1) A description of expression patterns in the adult and larvae, expanding the "audience" for these tools<br /> (2) A classification of line combination according to quality levels, which provides a relevant criterion while deciding to use a particular set of "splits".<br /> (3) Discrimination between male and female expression patterns, providing hints regarding the potential role of these gender-specific circuits.<br /> (4) The search engine seems to be user-friendly, facilitating the retrieval of useful information.<br /> (5) An acknowledgement of the caveats and challenges that splits (like any other genetic tool) can carry.<br /> Overall, the authors employed a pipeline that maximizes the potential of the Split-GAL4 collection to the scientific community.

      Weaknesses:

      My concerns were resolved regarding the existence of caveats while using these tools that researchers should be aware of, particularly those using them for the first time.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes the creation and curation of a collection of genetic driver lines that specifically label small numbers of neurons, often just a single to handful of cell types, in the central nervous system of the fruit fly, Drosophila melanogaster. The authors screened over 77,000 split hemidriver combinations to yield a collection of 3060 lines targeting a range of cell types in the adult Drosophila central nervous system and 1373 lines characterized in third-instar larvae. These genetic driver lines have already contributed to several important publications and will no doubt continue to do so. It is a truly valuable resource that represents the cooperation of several labs throughout the Drosophila community.

      Strengths:

      The authors have thoughtfully curated and documented the lines that they have created, so that they may be maximally useful to the greater community. This documentation includes confocal images of neurons labeled by each driver line and when possible, a list of cell types labeled by the genetic driver line and their identity in an EM connectome dataset. The authors have also made available some information from the other lines they created and tested but deemed not specific or strong enough to be included as part of the collection. This additional resource will be a valuable aid for those seeking to label cell types that may not be included in the main collection.

      The added revisions help to clarify important points relating to the creation of the lines, which lines were included as part of this specific collection, and caveats to be mindful of when using any of the described lines. These revisions will increase the manuscript's utility to users who may be less familiar with this resource.

      Weaknesses:

      The major weakness, which is also in some ways a strength, is the stringent requirement that lines that be included be highly specific across the CNS. As a result, the lines that are part of this specific collection are sparse and specific but also limited in which cell types they cover. Doubtless there are many missing cell types.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Meissner et al describe an update on the collection of split-GAL4 lines generated by a consortium led by Janelia Research Campus. This follows the same experimental pipeline described before and presents as a significant increment to the present collection. This will strengthen the usefulness and relevance of "splits" as a standard tool for labs that already use this tool and attract more labs and researchers to use it.

      Strengths:

      This manuscript presents a solid step to establish Split-GAL4 lines as a relevant tool in the powerful Drosophila toolkit. Not only does the raw number of available lines contribute to the relevance of this tool in the "technical landscape" of genetic tools, but additional features of this effort contribute to the successful adoption. These include:

      (1) A description of expression patterns in the adult and larvae, expanding the "audience" for these tools

      (2) A classification of line combination according to quality levels, which provides a relevant criterion while deciding to use a particular set of "splits".

      (3) Discrimination between male and female expression patterns, providing hints regarding the potential role of these gender-specific circuits.

      (4) The search engine seems to be user-friendly, facilitating the retrieval of useful information.

      Overall, the authors employed a pipeline that maximizes the potential of the Split-GAL4 collection to the scientific community.

      Weaknesses:

      The following aspects apply:

      The use of split-GAL4 lines has improved tremendously the genetic toolkit of Drosophila and this manuscript is another step forward in establishing this tool in the genetic repertoire that laboratories use. Thus, this would be a perfect opportunity for the authors to review the current status of this tool, addressing its caveats and how to effectively implement it into the experimental pipeline.

      (1) While the authors do bring up a series of relevant caveats that the community should be aware of while using split-GAL4 lines, the authors should take the opportunity to address some of the genetic issues that frequently arise while using the described genetic tools. This is particularly important for laboratories that lack the experience using split-GAL4 lines and wish to use them. Some of these issues are covertly brought up, but not entirely clarified.

      First, why do the authors (wisely) rescreen the lines using UAS-CsChrimson-mVenus? One reason is that using another transgene (such as UAS-GFP) and/or another genomic locus can drive a different expression pattern or intensities. Although this is discussed, this should be made more explicit and the readers should be aware of this.

      Second, it would be important to include a discussion regarding the potential of hemidriver lines to suffer from transvection effects whenever there is a genetic element in the same locus. These are serious issues that prevent a more reliable use of split-GAL4 lines that, once again, should be discussed.

      We added additional explanatory text to the discussion.

      (2) The authors simply mention that the goal of the manuscript is to "summarize the results obtained over the past decade.". A better explanation would be welcomed in order to understand the need of a dedicated manuscript to announce the availability of a new batch of lines when previous publications already described the Split-GAL4 lines. At the extreme, one might question why we need a manuscript for this when a simple footnote on Janelia's website would suffice.

      We added an additional mention of the cell type split-GAL4 collection at the relevant section and added more emphasis on the curation process adding value to the final selections. We feel that the manuscript is useful to document the methods used for the contained analysis and datasets and gives a starting point to the reader to go through the many split-GAL4 publications and images.

      Reviewer #2 (Public Review):

      Summary: This manuscript describes the creation and curation of a collection of genetic driver lines that specifically label small numbers of neurons, often just a single to handful of cell types, in the central nervous system of the fruit fly, Drosophila melanogaster. The authors screened over 77,000 split hemidriver combinations to yield a collection of 3060 lines targeting a range of cell types in the adult Drosophila central nervous system and 1373 lines characterized in third-instar larvae. These genetic driver lines have already contributed to several important publications and will no doubt continue to do so. It is a truly valuable resource that represents the cooperation of several labs throughout the Drosophila community.

      Strengths:

      The authors have thoughtfully curated and documented the lines that they have created, so that they may be maximally useful to the greater community. This documentation includes confocal images of neurons labeled by each driver line and when possible, a list of cell types labeled by the genetic driver line and their identity in an EM connectome dataset. The authors have also made available some information from the other lines they created and tested but deemed not specific or strong enough to be included as part of the collection. This additional resource will be a valuable aid for those seeking to label cell types that may not be included in the main collection.

      Weaknesses:

      None, this is a valuable set of tools that took many years of effort by several labs. This collection will continue to facilitate important science for years to come.

      We thank the reviewer for their positive feedback.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Meissner et al. describes a collection of 3060 Drosophila lines that can be used to genetically target very small numbers of brain cells. The collection is the product of over a decade of work by the FlyLight Project Team at the Janelia Research Campus and their collaborators. This painstaking work has used the intersectional split-Gal4 method to combine pairs of so-called hemidrivers into driver lines capable of highly refined expression, often targeting single cell types. Roughly one-third of the lines have been described and characterized in previous publications and others will be described in manuscripts still in preparation. They are brought together here with many new lines to form one high-quality collection of lines with exceptional selectivity of expression. As detailed in the manuscript, all of the lines described have been made publicly available accompanied by an online database of images and metadata that allow researchers to identify lines containing neurons of interest to them. Collectively, the lines include neurons in most regions of both the adult and larval nervous systems, and the imaging database is intended to eventually permit anatomical searching that can match cell types targeted by the lines to those identified at the EM level in emerging connectomes. In addition, the manuscript introduces a second, freely accessible database of raw imaging data for many lower quality, but still potentially useful, split-Gal4 driver lines made by the FlyLight Project Team.

      Strengths:

      Both the stock collection and the image databases are substantial and important resources that will be of obvious interest to neuroscientists conducting research in Drosophila. Although many researchers will already be aware of the basic resources generated at Janelia, the comprehensive description provided in this manuscript represents a useful summary of past and recent accomplishments of the FlyLight Team and their collaborators and will be very valuable to newcomers in the field. In addition, the new lines being made available and the effort to collect all lines that have been generated that have highly specific expression patterns is very useful to all.

      Weaknesses:

      The collection of lines presented here is obviously somewhat redundant in including lines from previously published collections. Potentially confusing is the fact that previously published split-Gal4 collections have also touted lines with highly selective expression, but only a fraction of those lines have been chosen for inclusion in the present manuscript. For example, the collection of Shuai et al. (2023) describes some 800 new lines, many with specificity for neurons with connectivity to the mushroom body, but only 168 of these lines were selected for inclusion here. This is presumably because of the more stringent criteria applied in selecting the lines described in this manuscript, but it would be useful to spell this out and explain what makes this collection different from those previously published (and those forthcoming).

      We added more description of how this collection is focused on the best cell-type-specific lines across the CNS. An important requirement for inclusion was this degree of specificity across the CNS, while many prior publications had a greater emphasis on lines with a narrower focus of specificity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Luckily for us, genetics is for the most part an exact science. However, there's still some "voodoo" in a lot of genetic combinations that the authors should disclose and be as clear as possible in the manuscript. This allows for the potential users to gauge expectations and devise a priori alternative plans.

      We attempted to comprehensively cover the caveats inherent in our genetic targeting approach.

      Minor points:

      (1) The authors mention that fly age should be controlled as expression can vary. Is there any reference to support this claim?

      We added a reference describing driver expression changes over development.

      (2) There should be a citation for "Flies were typically 1-5 days old at dissection for the cell type collection rescreening, 1-8 days old for other non-MCFO crosses and 3-8 days old for MCFO".

      We clarified that these descriptions were of our experimental preparations, not describing other citable work.

      Reviewer #3 (Recommendations For The Authors):

      General Points:

      Overall, the manuscript is very clear, but there are a couple of points where more explicit information would be useful. One of these is with respect to the issue of selectivity of targeting. The cell type specificity of lines is often referred to, but cell types can range from single pairs of neurons to hundreds of indistinguishable neurons with similar morphology and function. It would be useful if the authors explained whether their use of the term "cell type" distinguishes cell type from cell number. It would also be useful if lines that target many neurons of a single cell type were identified.

      We added further discussion of cell types vs. cell numbers. Our labeling strategy was not optimized for counting cell numbers labeled by each line. We believe EM studies are best positioned to comprehensively evaluate the number of cells making up each type.

      The second point relates to vagueness about the intended schedule for providing resources that will match (or allow matching of) neurons to the connectome. For example, on pp. 5-6 it is stated that: "In the future all of the neurons in these lines will be uniquely identified and linked to neurons reconstructed in the electron microscopy volume of the larva" but no timeline is provided. Similarly, for the adult neurons it is stated on p. 4 that: "Anatomical searching for comparison to other light microscopy (LM) and EM data is being made available." A more explicit statement about what resources are and are not yet available, a timeline for full availability, and an indication of how many lines currently have been matched to EM data would be helpful.

      During the review and revision period we have made progress on processing the images in the collection. We updated the text with the current status and anticipated timeline for completion.

      Specific Points:

      p. 4 "Although the lines used for these comparisons are not a random sample, the areas of greatest difference are in the vicinity of previously described sexual dimorphisms..." In the vicinity of is a very vague statement of localization. A couple of examples of what is meant here would be useful.

      We added example images to Figure 3.

      p. 5 "...may have specific expression outside our regions of interest." It's not clear what "our regions of interest" refers to here. Please clarify.

      We clarified that we were referring to the regions studied in the publications listed in Table 1.

      p. 5 "...lines that were sparse in VNC but dirty in the brain or SEZ..." A more quantitative descriptor than "dirty" would be helpful.

      We unfortunately did not quantify the extent of undesired brain/SEZ expression, but attempted to clarify the statement.

      p. 6 "...the images are being made instantly searchable for LM and EM comparisons at NeuronBridge..." Here again it is hard to know what is meant by "being made instantly searchable." How many have been made searchable and what is the bottleneck in making the rest searchable?

      We updated the text as described above. The bottleneck has been available processing capacity for the hundreds of thousands of included images.

      Figure 1 Supplemental File 2: The movie is beautiful, but it seems more useful as art than as a reference. Perhaps converting it to a pdf of searchable images for each line would make it more useful.

      We replaced the movie with a searchable PDF.

      Fig. 2(B) legend: "Other lines may have more than two types." It is not clear what "other lines" are being referred to.

      As part of making the quality evaluation more robust, we scored lines for the clear presence of three or more cell types. We updated the text accordingly.

      Fig. 2(C): Presumably the image shown is an example of variability in expression rather than weakness, but it is hard to know without a point of comparison. Perhaps show the expression patterns of other samples? Or describe briefly in the legend what other samples looked like?

      We added Figure 2 - figure supplement 1 with examples of variable expression in a split-GAL4 line.

    1. eLife Assessment

      This important study reports on PI3KR mutations and a paradoxical mechanism of PI3KR signaling. The strength of evidence for the study is mostly convincing, as conclusions are supported by a variety of mutational strategies and cellular systems to look at interactions among signaling pathways.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides convincing data showing that expression of the PIK3R1(deltaExon11) dominant negative mutation in Activated PI3K Delta Syndrome 1/2 (APDS1/2) patient-derived cells reduces AKT activation and p110δ protein levels. Using a 3T3-L1 model cell system, the authors show that overexpressed p85α(deltaExon 11) displays reduced association with the p110α catalytic subunit but strongly interacts with Irs1/2. Overexpression of PIK3R1 dominant negative mutants inhibit AKT phosphorylation and reduce cellular differentiation of preadipocytes. The experimental design, interpretation, and quantification broadly support the authors' conclusions, which establishes a new paradigm that warrants future studies.

      Strengths:

      The strength of this study is the clear results derived from Western blots analysis of cell signaling markers (e.g. pAKT1), and co-immunoprecipitation of PI3K holoenzyme complexes and associated regulatory factors (e.g. Irs1/2). The authors analyze a variety of PIK3R1 mutants (i.e. deltaExon11, E489K, R649W, and Y657X), which reveals a range of phenotypes that support the proposed model for dominant negative activity. The use of clonal cell lines with doxycycline induced expression of the PIK3R1 mutants (deltaExon 11, R649W, and Y657X) provides convincing experimental data concerning the relationship between p85α mutant expression and AKT phosphorylation in vivo. This approach for overexpression is excellent and should be utilized more broadly by cell biologists. The authors convincingly show that p85α(deltaExon11, R649W, or Y657X) is unable to associate with p110α but instead more strongly associates with Irs1/2 compared to wild type p85α. Overall, this article does a great job of motivating future studies of SHORT and APDS2 PIK3R1 mutants expressed from their endogenous loci (e.g. knock-in mice).

      Weaknesses:

      The limitations for this study lie in the complexity of the cell signaling pathway under investigation, rather than a lack of rigor by the authors. Future experimentation will help reconcile the cell type specific differences (e.g. APDS2 patient derived cells vs. the 3T3-L1 cell model system) in PIK3R1 mutant behavior reported by the authors. This is also intimately linked to variable expression of PIK3R1 mutants and cell-type specific regulatory factors. Although beyond the scope of this work, an unbiased proteomic study that broadly evaluates the cell signaling landscape could provide a more holistic understanding of the APDS2 and SHORT mutants compared to a candidate-based approach. Additional structural biochemistry of the p110α/p85α(deltaExon 11) complex is needed to explain why PIK3R1 mutant regulatory subunits do not strongly associate with the p110 catalytic subunit. A more comprehensive biochemical analysis of p110α/p85α, p110β/p85α, and p110δ/p85α mutant protein complexes will also be necessary to explain various cell signaling phenotypes. A minor limitation of this study is the use of single end point assays to measure PI3K lipid kinase activity in the presence of one regulatory input (i.e. RTK-derived pY peptide). An expanded biochemical analysis of purified mutant PI3K complexes across the canonical membrane signaling landscape will be important for deciphering how competition between wild-type and mutant regulatory subunits are regulated in different cell signaling contexts.

    3. Reviewer #2 (Public review):

      Patsy R. Tomlinson et al; investigated the impact of different p85 alpha variants associated with SHORT syndrome or APDS2 on insulin mediated signaling in dermal fibroblasts and preadipocytes. They perform this study as APDS2 patients oftern present with features of SHORT syndrome. They found no evidence of hyperactive PI3K signalling monitored by pAKT in a APDS2 patient-derived dermal fibroblast cells. In these cells p110 alpha protein levels were comparable to levels in control cells, however, p110 delta protein levels were strongly reduced. Remarkably, the truncated APDS2-causal p85 alpha variant was less abundant in these cells than p85 alpha wildtype. Afterwards they studied the impact of ectopically expressed p85 alpha variants on insulin mediated PI3K signaling in 3T3-L1 preadipocytes. Interestingly they found that the truncated APDS2-causal p85 alpha variant impaired insulin induced signaling. Using immunoprecipitation of p110 alpha they did not find truncated APDS2-causal p85 alpha variant in p110 alpha precipitates. Furthermore, by immunoprecipitating IRS1 and IRS2 they observed that the truncated APDS2-causal p85 alpha variant was very abundant in IRS1 and IRS2 precipitates, even in the absence of insulin stimulation. These important findings add in an interesting way possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Strengths:

      Based on state-of-the-art functional studies, the authors show that the p85 alpha variant responsible for APDS2, known to be associated with increased PI3K-delta signaling, can attenuate PI3K-alpha signalling in preadipocytes, providing a possible mechanistic explanation for the growing number of APDS2 patients with features of SHORT syndrome.

      Weaknesses:

      The proposed paradigm is based on one cell line derived from an APDS2 patient and an overexpressing system. The investigation of a larger number of cell lines derived from APDS2 patients would further substantiate the conclusion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors identify new mechanisms that link a PIK3R1 mutant to cellular signaling and division in Activated PI3 Kinase Delta Syndrome 1 and 2 (APDS1/2). The conclusion that this mutant serves as a dominant negative form of the protein, impacting PI3K complex assembly and IRS/AKT signaling, is important, and the evidence from constitutive and inducible systems in cultured cells is convincing. Nevertheless, there are several limitations relating to differences between cell lines and expression systems, as well as more global characterization of the protein interaction landscape, which would further enhance the work.

      We are pleased by this fair assessment, while noting that this work relates to APDS2 (PIK3R1-related) rather than APDS1 (PIK3CD-related). Our findings we believe are clear, but the observation that studies including more global proteomics/phosphoproteomics in cells expressing mutants at endogenous levels would add further insight is well made. We hope that this report may motivate such studies by laboratories with wider access to primary cells from patients and knock-in mice.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      This study provides convincing data showing that expression of the PIK3R1(delta Exon11) dominant negative mutation in Activated PI3K Delta Syndrome 1/2 (APDS1/2) patient-derived cells reduces AKT activation and p110δ protein levels. Using a 3T3-L1 model cell system, the authors show that overexpressed p85α delta Exon 11) displays reduced association with the p110α catalytic subunit but strongly interacts with Irs1/2. Overexpression of PIK3R1 dominant negative mutants inhibits AKT phosphorylation and reduces cellular differentiation of preadipocytes. The strength of this article is the clear results derived from Western blots analysis of cell signaling markers (e.g. pAKT1), and co-immunoprecipitation of PI3K holoenzyme complexes and associated regulatory factors (e.g. Irs1/2). The experimental design, interpretation, and quantification broadly support the authors' conclusions.

      Strengths:

      The authors analyze a variety of PIK3R1 mutants (i.e. delta Exon11, E489K, R649W, and Y657X), which reveals a range of phenotypes that support the proposed model for dominant negative activity. The use of clonal cell lines with doxycycline-induced expression of the PIK3R1 mutants (DExon 11, R649W, and Y657X) provides convincing experimental data concerning the relationship between p85α mutant expression and AKT phosphorylation in vivo. The authors convincingly show that p85α delta Exon11, R649W, or Y657X) is unable to associate with p110α but instead more strongly associates with Irs1/2 compared to wild type p85α. This helps explain why the authors were unable to purify the recombinant p110α/p85α delta Exon 11) heterodimeric complex from insect cells.

      Weaknesses:

      Future experimentation will be needed to reconcile the cell type specific differences (e.g. APDS2 patient-derived cells vs. the 3T3-L1 cell model system) in PIK3R1 mutant behavior reported by the authors.

      This is a fair comment. It has been established for many years that relative protein levels even of wild type PIK3CA and PIK3R1 gene products influence sensitivity of PI3K to growth factor stimulation. Such issues of stoichiometry become exponentially more complicated when the numerous potential interactions among the full repertoire of Class 1 PI3K regulatory subunits (3 splice variants of PIK3R1, and also PIK3R2 and PIK3R3) and corresponding catalytic subunits (PIK3CA, PIK3CB, PIK3CD) are considered, and when different activities and stabilities of PIK3R1 mutants are added to the mix. It thus seems obvious to us that different levels of expression of different mutants in different cellular contexts will have different signalling consequences. We establish a paradigm in this paper using an overexpression system, and we strongly agree that this merits further investigation in a wider variety of primary cells (or cells with knock in at the endogenous locus), where available.

      An unbiased proteomic study that broadly evaluates the cell signaling landscape could provide a more holistic understanding of the APDS2 and SHORT mutants compared to a candidate-based approach.

      We agree. This would be highly informative, but we think would best be carried out in both “metabolic” and “immune” cells with endogenous levels of expression of SHORT or APDS2 PIK3R1 mutants. These are not all currently available to us, and require follow up studies.

      Additional biochemical analysis of p110α/p85α delta Exon 11 complex is needed to explain why this mutant regulatory subunit does not strongly associate with the p110 catalytic subunit.

      We agree. We present this observation in our overexpression system, which is clear and reproducible, even though somewhat surprising. The failure to bind p110a is likely not absolute, as sufficient p110a-p85a<sup>DEx11</sup> was synthesised in vitro in a prior study to permit structural and biochemical studies, although a series of technical workarounds were required to generate enough heterodimeric PI3K to study in vitro given the manifest instability of the complex, particularly when concentrated (PMID 28167755). We already note in discussion that p85a can homodimerize and bind PTEN, likely among other partners, and it may be that the APDS2 deletion strongly favours binding to proteins that effectively compete with p110a. However this requires further study of the wider interactome of the mutant PIK3R1, which, as noted above, are beyond the scope of the current study.

      It remains unclear why p85α delta Exon 11 expression reduces p110δ protein levels in APDS2 patient-derived dermal fibroblasts.

      We caution that we only had the opportunity to study dermal fibroblasts cultured from a single APDS2 patient, as noted in the paper, and so replication of this finding in future will be of interest. Nevertheless the observation is robust and reproducible in these cells, and we agree that this apparently selective effect on p110d  is not fully explained. Having said that, it has been observed previously that heterodimers of the DEx11 p85a variant with either p110a or p110d are unstable, and when the unstable complexes were eventually synthesised, p110a and p110d were demonstrated to show differences in engagement with the mutant p85, with greater disruption of inhibitory interactions observed for p110d (PMID 28167755). It is thus not a great stretch to imagine that as well as disinhibiting p110d more, the DEx11 p85a variant also destabilises the p85a-p110d complex more, potentially explaining its near disappearance in cells with low baseline p110d expression. Following on from the preceding question and response, however, is an alternative explanation, based on the 3T3-L1 overexpression studies in this paper, wherein we were unable to demonstrate binding of p110a by DEx11 p85a. If, in any given cellular context, the mutant p85 could bind p110d but not p110a, then the destabilising effect would be observed only for p110d. So in summary, we believe the selective effect on p110d is explained by differences in binding kinetics and heterodimer stability for different DEx11 p85a-containing complexes. The net effect of these differences may vary among cell types depending on relative levels of subunit expression.

      This study would benefit from a more comprehensive biochemical analysis of the described p110α/p85α, p110β/p85α, and p110δ/p85α mutant protein complexes. The current limitation of this study to the use of a single endpoint assay to measure PI3K lipid kinase activity in the presence of a single regulatory input (i.e. RTK-derived pY peptide). A broader biochemical analysis of the mutant PI3K complexes across the canonical signaling landscape will be important for establishing how competition between wild-type and mutant regulatory subunits is regulated in different cell signaling pathways.

      We agree that a wider analysis of upstream inputs and downstream network would be of interest, though as noted above the ultimate functional consequences of mutants will be an amalgam of any differential signalling effects of complexes that are stable enough to function, and differential effects of mutant p85a on the kinetics of distinct heterodimer assembly and stability. In this paper we seek to suggest a paradigm worthy of further, deeper assessment. We note that the search space here is large indeed (A. different cell types with differing profiles of PI3K subunit expression B. Multiple upstream stimuli and C. Multiple downstream outputs, with timecourse of responses an additional important factor to consider). These studies are realistically beyond the scope of the current work, but we hope that further studies, as suggested by the reviewer, follow.

      Reviewer #2 (Public Review)

      Summary:

      Patsy R. Tomlinson et al; investigated the impact of different p85alpha variants associated with SHORT syndrome or APDS2 on insulin-mediated signaling in dermal fibroblasts and preadipocytes. They find no evidence of hyperactive PI3K signalling monitored by pAKT in APDS2 patient-derived dermal fibroblast cells. In these cells p110alpha protein levels were comparable to levels in control cells, however, the p110delta protein levels were strongly reduced. Remarkably, the truncated APDS2-causal p85alpha variant was less abundant in these cells than p85alpha wildtype. Afterwards, they studied the impact of ectopically expressed p85alpha variants on insulin-mediated PI3K signaling in 3T3-L1 preadipocytes. Interestingly they found that the truncated APDS2-causal p85alpha variant impaired insulin-induced signaling. Using immunoprecipitation of p110alpha they did not find truncated APDS2-causal p85alpha variant in p110alpha precipitates. Furthermore, by immunoprecipitating IRS1 and IRS2, they observed that the truncated APDS2-causal p85alpha variant was very abundant in IRS1 and IRS2 precipitates, even in the absence of insulin stimulation. These important findings add in an interesting way possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Strengths:

      Based on state-of-the-art functional investigation the authors propose indicating a loss-of-function activity of the APDS2-disease causing p85alpha variant in preadipocytes providing a possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Weaknesses:

      Related to Figure 1: PIK3R1 expression not only by Western blotting but also by quantifying the RNA transcripts, e.g. mutant and wildtype transcripts, was not performed. RNA expression analysis would further strengthen the suggested impaired stabilization/binding.

      It is not completely clear to us how further PIK3R1 mRNA analysis would enhance the points we seek to make. Perhaps the reviewer’s point is that changes in protein expression could be explained by reduced transcription rather than having anything to do with altered protein turnover? As shown in Figure 1 supplemental figure 1, sequencing cDNA from each of the primary cell lines studied indicates that both mutant and WT alleles are expressed at or close to 50% of the total mRNA for PIK3CA or PIK3R1 as relevant. While this is not strictly quantitative, allied to prior evidence that these are dominant alleles which require to be expressed to exert their effect, with no evidence for altered mRNA expression of these variants in prior studies, we don’t believe any further quantification of mRNA expression would add value.

      Related to Figure 2

      As mentioned by the authors in the manuscript the expression of p110delta but also p110beta in 3T3-L1 preadipocytes ectopically expressing p85alpha variants has not been analyzed.

      We agree that such determination would have been a useful addition to the study, but regretfully it was not undertaken in these modified 3T3-L1 cells at the time of study. However independent bulk RNAseq studies of the founder 3T3-L1 cells from which the stably transduced cells were generated, undertaken as part of an unrelated study, revealed the following relative levels of endogenous expression of PI3K subunit mRNA:

      Author response table 1.

      We have not determined endogenous protein expression, and so have left the text of the discussion unchanged, simply noting that we have not formally assessed protein expression of p110d/p110b. However these transcriptomic findings suggest that p110d protein is likely either undetectable, or else present at extremely low levels compared to endogenous p110a. p110b also appears to be expressed at a much lower level than p110a. In our studies overexpressing mutant PIK3R1 and assessing insulin action, we believe we are largely or perhaps entirely assaying the effect of the mutants on p110a, in keeping with the fact that genetic and pharmacological studies have firmly established that it is p110a that is responsible for mediating the metabolic actions of insulin in adipose tissue and preadipocytes including 3T3-L1 (e.g. PMID 16647110). Indeed, to quote from this study, in 3T3-L1 “… inhibitors of p110b (TGX-115 and TGX-286) and p110d (IC87114 and PIK-23) had no effect on the insulin-stimulated phosphorylation of any protein in the PI3-K pathway.”

      We have added the following sentence to the discussion:

      “The current study has limitations. We have studied primary cells from only a single APDS2 patient, and in the 3T3-L1 cell model, we did not determine whether p110d protein could be detected. If not, this could explain the lack of detectable AKT phosphorylation with induction of Pik3r1 DEx11.  Indeed, previous pharmacological studies in 3T3-L1 adipocytes has shown that selective inhibition of p110d or p110b does not alter insulin-induced phosphorylation of any protein studied in the PI3-K pathway, attesting to the dominance of p110a in insulin action in this cell model (Knight et al, 2006).” 

      Furthermore, a direct comparison of the truncated APDS2-causal p85alpha variant with SHORT syndrome-causal p85alpha variants in regard to pAKT level, and p85alpha expression level has not been performed.

      These investigations would further strengthen the data.

      The cell lines conditionally expressing SHORT syndrome variants have been reported already, as cited (PMID: 27766312). Remarkably, the degree of inhibition of insulin-stimulated signalling is actually less pronounced for the SHORT syndrome variants than for the overexpressed APDS2 variant, as seen in the excerpt from the prior paper below. In this prior paper the maximum insulin concentration used, 100nM, was the concentration used in the current study. While overexpression of the APDS2 p85a variant ablated the response to insulin entirely, it is still seen in the prior study, albeit at a clearly reduced level.

      Related to Figure 3

      The E489K and Y657X p85alpha variants should be also tested in combination with p110delta in the PI3K activity in vitro assay. This would help to further decipher the overall impact, especially of the E489K variant.

      We agree that this would make our data more complete, but for logistical reasons (primarily available personnel) we were compelled to constrain the number of p85-p110 combinations we studied. We elected to prioritise the PIK3R1 R649W variant as by far the most common causal SHORT syndrome variant, and as the variant showing the “cleanest” functional perturbation, namely severely impaired or absent ability to dock to phosphotyrosines in cognate proteins.  The paradox that we sought to explain in this paper, namely the phenotypic combination of gain-of-function APDS2 with loss-of-function SHORT syndrome features holds only for APDS2 PIK3R1 variants, and so while it is interesting to document that the canonical SHORT syndrome variant also inhibits PI3Kb and PI3Kd activation in vitro, this was not the main purpose of our study.

      Reviewer #1 (Recommendations For The Authors):

      Points of clarification and suggestions for improving the manuscript:

      (1) Explain whether there are any PIK3R1-independent genetic alterations in the APDS2 and PROS-derived cell lines. For example, are there differences in the karyotype of mutant cell lines compared to wild-type cells?

      Karyotypic abnormalities are not an established feature of either PROS or APDS2, and the patients from whom cells were derived were documented to be of normal karyotype. Karyotypic abnormalities acquired during cell culture would not be unprecedented, but confirming normal karyotypes in primary cell lines where there is no specific reason to suppose any alteration exceeds normal expectations for primary cell studies, and so this has not been undertaken.

      (2) When introducing the APDS2-associated PIK3R1 mutation (lines 126-128), the authors describe both the exon 11 skipping and in-frame deletions. I recommend rewording this sentence to say exon 11 skipping results in an in-frame deletion of PIK3R1. The current wording makes it seem like APDS2-derived cells contain two genetic perturbations: (1) exon 11 skipping and (2) in-frame deletion. Include a diagram in Figure 1 to help explain the location of the mutations being studied in relationship to the PIK3R1 gene sequence and domains (i.e. nSH2, iSH2, cSH2). The description of the exon 11 skipping and in-frame deletions (lines 126-128) would benefit from having a complementary figure that diagrams the location of these mutations in the PIK3R1 gene.

      On review we agree that clarity of description could be enhanced. We have now edited these lines as follows:

      “We began by assessing dermal fibroblasts cultured from a previously described woman with APDS2 due to the common causal PIK3R1 mutation. This affects a splice donor site and causes skipping of exon 11, leading to an in-frame deletion of 42 amino acids (434-475 inclusive) in the inter-SH2 domain, which is shared by all PIK3R1 isoforms (Patient A.1 in (Lucas et al., 2014b))(Figure 1 figure supplement 1).”

      We have moreover introduced a further figure element including a schematic of all PIK3R1 mutations reported in the current study (new Figure 1 figure supplement 1)

      (3) For Figure 2, I recommend including a cartoon that illustrates the experimental design showing the induced expression of PIK3R1 mutants, R649W and Y657X, in the background of the wild-type endogenous gene expression.

      Such a figure element has now been generated and included as Figure 2 figure supplement 1, duly called out in the results section where appropriate.

      (4) For the data plotted in Figure 1B-1C, please clarify whether the experiments represent a single patient or all 3-4 patients shown in Figure 1A.

      Each datapoint shown represents one of the patients in the immunoblots, with all patients included. Each point in turn is the mean from 3 independent experiments. We have added the following to the Figure legend:

      “(B)-(E) quantification of immunoblot bands from 3 independent experiments shown for phosphoAKT-S473, phosphoAKT-T308, p110d and p110a respectively. Each point represents data from one of the patient cell lines in the immunoblots. Paired datapoints +/- insulin are shown in (B) and (C), and dotted lines mark means.”

      (5) I recommend rewording the following sentence: "Given this evidence that APDS2-associated PIK3R1 delta Exon 11 potently inhibits PI3Kα when overexpressed in 3T3-L1 preadipocytes," to say "... potently inhibits PI3Kα signaling when overexpressed in 3T3-L1 preadipocytes." The data shown in Figures 1 and 2 do not support a direct biochemical inhibition of PI3Kα lipid kinase activity by p85α (delta Exon 11).

      This edit has been made.

      (6) Provide more discussion concerning the percentage of humans with APDS2 or SHORT syndrome that contain the mutations discussed in this paper. How strong is the genotype-phenotype link for these diseases? Are these diseases inherited or acquired through environmental stresses?

      Both APDS2 and SHORT syndrome are very well established, highly penetrant and stereotyped monogenic disease. APDS is defined by the presence of activating PIK3R1 mutations such as the one studied here (by far the commonest causal mutation).  SHORT syndrome clinically has some superficial resemblance to other human genetic syndrome including short stature, but when careful attention is paid to characteristic features it is nearly universally attributable to loss-of-function PIK3R1 mutations with the single exception of one case in which a putatively pathogenic PKCE mutation was described (PMID: 28934384). Although both syndromes are monogenic it is often not accurate to refer to them as inherited, as, particularly in SHORT syndrome, de novo mutations (i.e. not found in either parent) are common. Environmental modifiers of phenotypes have not been described. To the introduction has now been added the comment that both conditions are highly penetrant and monogenic.

      (7) The data presented in Figure 5 would benefit from additional discussion and citations that describe the molecular basis of the interaction between PI3K and Irs1/2. What studies have previously established this is a direct protein-protein interactions? Are there PI3K mutants that don't interact with Irs1/2 that can be included as a negative control? Alternatively, the authors can simply reference other papers to support the mechanism of interaction.

      There is a voluminous literature dating back to the early 1990s documenting the mode of interaction of PI3K with Irs1/2. Relevant papers have now been cited as requested:

      p85-Irs1 binding: PMID 1332046 (White lab, PNAS 1992)

      p85-Irs2 binding: PMID 7675087 (White lab, Nature 1995)

      “This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 (PMID 1332046) and Irs2 (PMID 7675087).”

      Regarding PI3K mutants that don't interact with Irs1/2, the SHORT syndrome mutant R649W which we include in this study is perhaps the best example of this, so it is both disease-causing and functions as such a negative control.

      (8) To see the effect of the dominant negative delta Exon 11, the truncated p85α needs to be super stoichiometric to the full-length p85α (Figure 2 - Supplemental Figure 2). This is distinct from the results in Figure 1 showing the ADPS2-derived dermal fibroblast express 5-10x lower levels of p85α delta Exon 11 compared to full-length p85α (Figure 1A), but still strongly inhibits pAKT S473 and T308 (Figure 1B-1C). The manuscript would benefit from more discussion concerning the cell type specific differences in phenotypes. Alternatively, do the APDS2-derived dermal fibroblasts have other genetic perturbations that are not accounted for that potentially modulate cell signaling differently compared to 3T3-L1 preadipocytes?

      The reviewer is astute to point out this apparent contrast. First of all, we have no reason to suppose there is any specific, PI3K-modifying genetic perturbation present in the primary dermal fibroblasts studied, although of course the genetic background of these cells is very distinct to that of 3T3-L1 mouse embryo fibroblasts. Related to such background differences, however, substantial variability is usually apparent in insulin-responsiveness even of healthy control dermal fibroblasts. This means that caution should be exercised in extrapolating from studies of the primary cells of a single individual. To illustrate this, we point the reviewer to our 2016 study in which we extensively studied the dermal fibroblasts of a proband with SHORT syndrome due to PIK3R1 Y657X:

      From this study we conclude that A. WT controls show quite substantial variation in insulin-stimulated AKT phosphorylation and B. even the SHORT syndrome p85a Y657X variant, expressed at higher levels that WT p85a in dermal fibroblasts, does not produce an obvious decrease in insulin-stimulated AKT phosphorylation, despite extensive evidence from other human cell studies and knock-in mice that it does indeed impaired insulin action in metabolic tissues. For both these reasons we are not convinced that the lower insulin-induced AKT phosphorylation we described in Figure 1 should be overinterpreted until reproduced in other studies with primary cells from further APDS2 patients. This is why we did not comment more extensively on this. We now add the following qualifier in results:

      “Despite this, no increase in basal or insulin-stimulated AKT phosphorylation was seen in APDS2 cells compared to cells from wild-type volunteers or from people with PROS and activating PIK3CA mutations H1047L or H1047R (Fig 1A-C, Fig 1 figure supplement 3A,B). Although insulin-induced AKT phosphorylation was lower in fibroblasts from the one APDS2 patient studied compared to controls, we have previously reported extensive variability in insulin-responsiveness of primary dermal fibroblasts from WT controls. Moreover even primary cells from a patient expressing high levels of the SHORT syndrome-associated p85a Y657X did not show attenuated insulin action, so we do not believe the reduced insulin action in APDS2 cells in the current study should be overinterpreted until reproduced in further APDS2 cells.”

      Nevertheless we remind the reviewer that the main purpose of our primary cell experiment was to determine if there were any INCREASE in basal PI3K activity, or any difference in p110a or p110d protein levels, and we regard our findings in these regards to be clear.

      The manuscript would benefit from additional explanation concerning why the E489K, R649W, and Y657X are equivalent substitutes for the characterization of p110α/p85α delta Exon 11). Perhaps a more explicit description of these mutations in relationship to the location of p85α delta Exon 11) mutation would help. I recommend including a diagram in Figure 3 showing the position of the delta Exon 11, E489K, R649W, and Y657X mutations in the PIK3R1 coding sequence. B. Also, please clarify whether all three holoenzyme complexes were biochemically unstable (i.e. p110α/p85α, p110β/p85α, p110δ/p85α) when p85α delta Exon 11) was expressed in insect cells.

      A. Whether or not E489K, R649W and Y657X are “equivalent” to the APDS2 mutant is not really a meaningful issue here. These mutants are being studied because they cause SHORT syndrome without immunodeficiency, while the APDS2 mutant causes APDS2 often with features of SHORT syndrome. That is, it is naturally occurring mutations and the associated genotype-phenotype correlation that we seek to understand. Of the 3 SHORT syndrome causal mutations chosen, R649W is by far the commonest, effectively preventing phosphotyrosine binding, Y657X has the interesting attribute that it can be discriminated from full length p85 on immunoblots due to its truncation, and is moreover a variant that we have studied in cells and mice before, while the rarer E489K is an interesting SHORT syndrome variant as it is positioned more proximally in the p85a protein than most SHORT syndrome causal variants. All variants studied are now illustrated in the new Figure 1 figure supplement 1. B. Regarding stability of PI3K heterodimers containing the APDS2 p85a variant, we tried extensively to purify p110a and p110d complexes without success despite several approaches to optimise production. We did not try to synthesise the p110b-containing complex.

      (10) I recommend presenting the results in Figure 4 before Figure 3 because it provides a good rationale for why it's difficult to purify the p110α/p85α delta Exon 11) holoenzyme from insect cells.

      This would be true of p110d were studied in Figure 4 but it is not. Figure 4 looks instead at effects on p110a of heterologous overexpression of mutant p85, is a natural lead in to the ensuing figures 5 and 6, and we do not agree it would add value or enhance flow to swap Figures 3 and 4.

      (11) The authors show that overexpression of the p85α delta Exon 11) did not result in p110α/p85α delta Exon 11) complex formation based on co-immunoprecipitation. Do the authors get the same result when they co-immunoprecipitation p110α/p85α and p110δ/p85α in the APDS2-derived dermal fibroblasts used in Figure 1A?

      This is an interesting question but not an experiment we have done. It is not unfeasible, but generating enough cells to undertake IP experiments of this nature in dermal fibroblasts is a significant undertaking, and with finite resources available and only one primary cell line to study we elected not to pursue this.

      Details in Methods section:

      (1) Include catalog numbers and vendors for reagents (e.g. lipids, PhosSTOP, G-Dynabeads, etc.). There is not enough information provided to reproduce this work.

      We have now added all vendors and catalogue numbers where relevant.

      (2) Concerning the stated lipid composition (5/10/15/45/20/5 %) in the liposome preparation protocol. Please clarify whether these numbers represent molar percentages or mg/mL percentages.

      We have now added that this is expressed as “(wt/vol)”

      (3) What is the amino acid sequence of the PDGFR (pY2) peptide used for the PI3K activity assay?

      This assay has been published and references with detailed methods are cited. For clarity, however we now say:

      “PI(3,4,5)P3 production was measured by modified PI3-Kinase activity fluorescence polarisation assay (Echelon Biosciences, Salt Lake City, UT, USA). 10μL reactions in 384-well black microtitre plates used 1mM liposomes containing 50μM PI(4,5)P2, optimised concentrations of purified PI3K proteins, 100μM ATP, 2mM MgCl2, with or without 1μM tyrosine bisphosphorylated 33-mer peptide derived from mouse PDGFRβ residues 735-767, including phosphotyrosine at positions 740 and 751 (“pY2”; 735-ESDGGYMDMSKDESIDYVPMLDMKGDIKYADIE-767;  Cambridge peptides).”

      (4) Include a Supplemental file containing a comprehensive description of the plasmids and coding sequencing used in this study.

      Such a supplemental file has been created and is included as Table 2

      Minor points of clarification, citations, and typos:

      (1) Clarify why Activated PI3K Delta Syndrome 1 (APDS1) is thus named APDS2. See lines 71-72 of the introduction. Also see line 89: "...is common in APDS2, but not in APDS1." Briefly describe the difference between APDS1 and APDS2?

      This is described in the introduction, but we apologise if our wording was not sufficiently clear. We have tried now to remove any ambiguity:

      “Some PIK3R1 mutations reduce basal inhibition of catalytic subunits, usually due to disruption of the inhibitory inter-SH2 domain, and are found in cancers (Philp et al, 2001) and vascular malformations with overgrowth(Cottrell et al, 2021). In both diseases, hyperactivated PI3Ka, composed of heterodimers of PIK3R1 products and p110a, drives pathological growth. Distinct inter-SH2 domain PIK3R1 mutations, mostly causing skipping of exon 11 and deletion of residues 434-475, hyperactivate PI3Kd in immune cells, causing highly penetrant monogenic immunodeficiency (Deau et al, 2014; Lucas et al, 2014b). This phenocopies the immunodeficiency caused by genetic activation of p110d itself, which is named Activated PI3K Delta Syndrome 1 (APDS1) (Angulo et al, 2013; Lucas et al, 2014a). The PIK3R1-related syndrome, discovered shortly afterwards, is thus named APDS2.”

      (2) Figure legend 1. Clarify reference to "Figure EV2".

      (3) Figure legend 2. Clarify reference to "Figure EV3".

      (4) Figure legend 3. Clarify reference to "Figure EV5".

      Thank you for pointing out this oversight, arising from failure to update nomenclature fully between versions. “EV” figures actually are the figure supplements in the submission. All labels have now been updated.

      (5) For Figure 1 - supplemental figure 1C, indicate experimental conditions on the blot (e.g. -/+ insulin).

      This is now added

      (6) Figure 4B, y-axis. Clarify how data was quantified. Perhaps reword "(% WT without DOX)" for clarity.

      We have left the Y axis label as it is, but have added the following to the figure legend:

      “(B) Quantification of immunoblot bands from immunoprecipitates from 3 independent experiments, expressed as a percentage relative to the intensity of the band in WT cells without doxycycline exposure.”

      (7) In the results section (lines 117-124), please explicitly state whether the described mutations are homo- or heterozygous.

      All mutations are heterozygous, as now explicitly stated

      (8) I recommend spelling out the SHORT and APDS2 acronyms in the abstract to make this study more accessible.

      We respectfully disagree that such spelling out in the abstract would improve accessibility. Both acronyms are clunky and wordy and are more likely to obscure meaning by squeezing out other words in the abstract. APDS is already spelled out in the introduction, and we now add the following for SHORT syndrome:

      “More surprisingly, phenotypic overlap is reported between APDS2 and SHORT syndrome. SHORT syndrome, named for the characteristic developmental features (Short stature, Hyperextensibility, Hernia, Ocular depression, Rieger anomaly, and Teething delay) is caused by loss of PI3Ka function due to disruption of the phosphotyrosine-binding C-terminal SH2 domain (Chudasama et al, 2013; Dyment et al, 2013; Thauvin-Robinet et al, 2013).”

      (9) I recommend explaining in more detail or rewording the following jargon/terms to make the writing more accessible to a broad audience: "reduced linear growth" (line 83) and "larger series" (line 86). I assume "reduced linear growth" is height.

      Edited as follows:

      “It  features short stature, insulin resistance, and dysmorphic features (Avila et al, 2016). In recent years, both individual case reports (Bravo Garcia-Morato et al, 2017; Petrovski et al, 2016; Ramirez et al, 2020; Szczawinska-Poplonyk et al, 2022) and larger case series (Elkaim et al, 2016; Jamee et al, 2020; Maccari et al, 2023; Nguyen et al, 2023; Olbrich et al, 2016; Petrovski et al., 2016) have established that many people with APDS2 have overt features of SHORT syndrome, while, more generally, linear growth impairment is common in APDS2, but not in APDS1.”

      (10) For clarity, reword lines 214-215 to read, "No increase in p110α levels was seen on conditional overexpression of wild-type or R649W p85α."

      Change made, thank you

      (11) Figure 6A - Western blot label says, "657X" instead of "Y657X."

      Now corrected

      (12) Lines 214-215: For clarity, reword the sentence to say, "No increase in p110α was seen on conditional overexpression...".

      REPEAT OF POINT 10 ABOVE

      (13) Clarify what interactions are being competed for in the following statement: "... delta Ex11 may exert its inhibitory action by competing with PI3K holoenzyme" (lines 237-238). Are you referring to the interaction between p110α and p85α or the interaction between p110α/p85α and another protein?

      We have endeavoured to clarify by editing as follows:

      “As APDS2 p85a DEx11 does not appear to displace wild-type p85a from p110a despite strong overexpression, it is likely that there are high levels of truncated p85a unbound to p110a in the cell. This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 and Irs2. Excess free regulatory subunits compete with heterodimeric PI3K holoenzyme for binding to these phosphotyrosines (Ueki et al., 2002), raising the possibility that excess free, truncated APDS2 p85a DEx11 may exert its inhibitory action similarly by outcompeting PI3K holoenzyme for phosphotyrosine binding.”

      (14) Provide more information about the following statement and how it relates to the mutations in this study: "Homozygous truncating PIK3R1 mutations abolishing p85α expression while preserving p55α and p50α produce agammaglobulinaemia" (lines 271-272). The manuscript would benefit from a more explicit description of the nature of these mutations.

      This wording seems to us to be explicit, however we agree that a schematic of PIK3R1 genotype-phenotype correlation, as requested elsewhere, would help readers. Such a schematic is now included as Figure 1 figure supplement 1.

      (15) Typo on line 299: "unclike".

      Corrected.

      (16) The data presented in this study support a model in which p85α (DExon 11) expression functions as a dominant negative. Please clarify why in the discussion section you explain that p85α (DExon 11) activates PI3K. For example, "...skipping of exon 11, were shown in 2014 to activate PI3K..." (lines 290-291), "...activate PI3Kδ on one hand..." (line 309); "...APDS2 mutations in PIK3R1 has mixed consequences, producing greater hyperactivation of p110δ than p110α" (lines 354-355).

      We do not entirely understand the reviewer’s question and thus request here. p85α (DExon 11) activates PI3Kd in immune cells and in vitro, and this is accepted, based on numerous reports, to be the mechanism underlying immunodeficiency. We do not challenge this, and cite evidence for any such claims in our report. The dominant negative activity we describe here towards PI3Ka activation is based not on inhibition of mutant-containing heterodimer, but rather on destabilisation of and/or competition with heterodimeric WT holoenzyme. This is the basis of the model we present; that is, a finely balanced competition between enzymic activation and mutant holoenzyme destabilisation and competition of mutant free p85a with WT holoenzyme, whose net effect likely differs among cells and tissues, most likely based on the repertoire and proportions of PI3K subunit expression. If the reviewer has specific suggestions for us that will make this point clearer still we should be happy to consider them.

      (17) Provide references for the statements in lines 349-353 of the discussion.

      This brief closing paragraph is a succinct recap and summary of the key points made throughout the manuscript and thoroughly referenced therein. We prefer to keep this section clean to maximise clarity, but are happy to copy references from the various other places in the manuscript to back up these assertions if this is preferred by the editorial team. Current text:

      “In summary, it is already established that: A. genetic activation of PIK3CD causes immunodeficiency without disordered growth, while B. inhibition of PIK3R1 recruitment to RTKs and their substrates impairs growth and insulin action, without immunodeficiency, despite all catalytic subunits being affected and C. loss of p85 alone causes immunodeficiency.”

      Reviewer #2 (Recommendations For The Authors):

      In the abstract line 42 I would rather talk from SHORT syndrome like features.

      Some patients do indeed meet the criteria for SHORT syndrome, but there is a spectrum. We have thus added this qualification and removed “short stature” to maintain the word count, as this is itself a SHORT syndrome-like feature.

      Line 74 It would be helpful for the reader to give the amino-acid exchange and affected position of this single case.

      We agree. Now added.

      Furthermore, an illustration indicating the location of the different PIK3R1 variants on the p85 alpha level would be helpful for the reader.

      As noted above such a figure element is now included as Figure 1 figure supplement 1 and duly called out in the text

      The sentence in lines 298-300 makes no sense to me. Do you mean, unlike APDS1 murine models?

      We agree, on review, that this paragraph is convoluted and makes a simple observation complex. We have rewritten now in what we hope is a more accessible style:

      “Thus, study of distinct PIK3R1-related syndromes shows that established loss-of-function PIK3R1 mutations produce phenotypes attributable selectively to impaired PI3Ka hypofunction, while activating mutations produce phenotypes attributable to selectively increased PI3Kd signalling. Indeed, not only do such activating mutations not produce phenotypes attributable to PI3Ka activation, but they surprisingly have features characteristic of impaired PI3Ka function.”

      Line 321 I propose including the notion of different cells: “The balance between expression and signalling in different cells may be a fine one ...”

      This change has been made

      Line 352 C. loss replace with complete loss.

      “C.” actually denotes the last in a list after “A.” and “B.”. We have now used bold to emphasise this, but we imagine house style may dictate how we approach this.

    1. eLife Assessment

      This important study provides insights into the physiological role of RIPK1 in liver physiology, particularly during short-term fasting. The discovery that RIPK1 deficiency sensitizes the liver to acute injury and hepatocyte apoptosis is based on convincing evidence, highlighting the importance of RIPK1 in maintaining liver homeostasis under metabolic stress. The work will be of relevance to anyone studying liver pathologies.

    2. Reviewer #1 (Public review):

      This study presents an investigation into the physiological functions of RIPK1 within the context of liver physiology, particularly during short-term fasting. Through the use of hepatocyte-specific Ripk1-deficient mice (Ripk1Δhep), the authors embarked on an examination of the consequences of Ripk1 deficiency in hepatocytes under fasting conditions. They discovered that the absence of RIPK1 sensitized the liver to acute injury and hepatocyte apoptosis during fasting, a finding of significant interest given the crucial role of the liver in metabolic adaptation. Employing a combination of transcriptomic profiling and single-cell RNA sequencing techniques, the authors uncovered intricate molecular mechanisms underlying the exacerbated proinflammatory response observed in Ripk1Δhep mice during fasting. While the investigation offers valuable insights into the consequences of Ripk1 deficiency in hepatocytes during fasting conditions, there appears to be a primarily descriptive nature to the study with a lack of clear connection between the experiments. Thus, a stronger focus is warranted, particularly on understanding the dialogue between hepatocytes and macrophages. Moreover, the data would benefit from reinforcement through additional experiments such as Western blotting, flow cytometry, and rescue experiments, which would offer a more quantitative aspect to the findings. By incorporating these enhancements, the study could achieve a more comprehensive understanding of the underlying mechanisms and ultimately strengthen the overall impact of the research.

      Comments on revision:

      The authors have addressed my comments accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      Zhang et al. analyzed the functional role of hepatocyte RIPK1 during metabolic stress, particularly its scaffold function rather than kinase function. They show that Ripk1 knockout sensitizes the liver to cell death and inflammation in response to short-term fasting, a condition that would not induce obvious abnormality in wild-type mice.

      Strengths:

      The findings are based on a knockout mouse model and supported by bulk RNA-seq and scRNA-seq. The work consolidates the complex role of RIPK1 in metabolic stress.

      Comments on revision:

      The authors have addressed my concerns. The added experiments consolidated the findings. I do not have further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings on the role of RIPK1 in maintaining liver homeostasis under metabolic stress. Strengths include the intriguing findings that RIPK1 deficiency sensitizes the liver to acute liver injury and apoptosis, but because the conclusions require additional experimental support, the evidence is incomplete.

      We are truly grateful, and wish to express our sincere acknowledgement to the reviewer and the editor for the time and effort spent in reviewing our manuscript. We highly appreciate the thorough and constructive comments, which can greatly improve our manuscript. We have conducted new experiments to address the reviewer’s concerns. We also carefully checked and changed our manuscript according to the constructive suggestions by the reviewer. Hopefully we have adequately addressed all the concerns. In the revised manuscript version, changes are highlighted in yellow. Please find the detailed point-to-point responses below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      This study presents an investigation into the physiological functions of RIPK1 within the context of liver physiology, particularly during short-term fasting. Through the use of hepatocyte-specific Ripk1-deficient mice (Ripk1Δhep), the authors embarked on an examination of the consequences of Ripk1 deficiency in hepatocytes under fasting conditions. They discovered that the absence of RIPK1 sensitized the liver to acute injury and hepatocyte apoptosis during fasting, a finding of significant interest given the crucial role of the liver in metabolic adaptation. Employing a combination of transcriptomic profiling and single-cell RNA sequencing techniques, the authors uncovered intricate molecular mechanisms underlying the exacerbated proinflammatory response observed in Ripk1Δhep mice during fasting. While the investigation offers valuable insights into the consequences of Ripk1 deficiency in hepatocytes during fasting conditions, there appears to be a primarily descriptive nature to the study with a lack of clear connection between the experiments. Thus, a stronger focus is warranted, particularly on understanding the dialogue between hepatocytes and macrophages. Moreover, the data would benefit from reinforcement through additional experiments such as Western blotting, flow cytometry, and rescue experiments, which would offer a more quantitative aspect to the findings. By incorporating these enhancements, the study could achieve a more comprehensive understanding of the underlying mechanisms and ultimately strengthen the overall impact of the research.

      We thank the reviewer for the encouraging comments and helpful suggestions. We agree with the reviewer that additional experiments could reinforce our findings. Therefore, we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Detailed major concerns:

      (1) Related to Figure 1.

      It is imperative to ensure consistency in the number of animals analyzed across the different graphs. The current resolution of the images appears to be low, resulting in unsharp visuals that hinder the interpretation of data beyond the presence of "white dots". To address this issue, it is recommended to enhance the resolution of the images and consider incorporating zoom-in features to facilitate a clearer visualization of the observed differences. Moreover, it would be beneficial to include a complete WB analysis for the cell death pathways analyzed. These adjustments will significantly improve the clarity and interpretability of Figure 1.

      Thanks very much for the constructive advice. We carefully checked the number of animals and make sure that the animal number were consistent within different figures. We further updated the figures with incorporating zoom-in features in updated Figure 1, and the resolution of the figures were greatly improved. Western blot analysis were also included in updated Supplementary Figure 1.

      (2) Related to Figure 2.

      It is essential to ensure consistency in the number of animals analyzed across the different graphs, as indicated by n=6 in the figure legend (similar to Figure 1). Additionally, it is crucial to distinguish between male and female subjects in the dot plots to assess any potential gender-based differences, which should be consistent throughout the paper. To achieve this, the dots plot should be harmonized to clearly differentiate between males and females and investigate if there are any disparities between the genders. Moreover, it is imperative to correlate hepatic inflammation with the activation of Kupffer cells, infiltrating monocytes, and/or hepatic stellate cells (HSCs). Therefore, conducting flow cytometry would be instrumental in achieving this correlation. Additionally, the staining for Ki67 appears to be non-specific, showing a granular pattern reminiscent of bile crystals rather than the expected nuclear staining of hepatocytes or immune cells. It is crucial to ensure specific staining for Ki67, and conducting in vitro experiments on primary hepatocytes could further elucidate the proliferation process. These experiments are relatively straightforward to implement and would provide valuable insights into the mechanisms underlying hepatic inflammation and proliferation.

      Thanks very much for the helpful advice. First, we corrected the number of animals analyzed in different graphs and make sure that the number of animals listed in the figure legend were consistent with the graphs in all figures. Second, to distinguish the results between male and female mice, blue represents male mice, pink represents female mice, and green represents RIPK1 kinase inactivated mice. The majority of results were obtained from male mice, and our results indicated that there was no difference between male and female mice herein.

      The percentages of immune cell subpopulations isolated from mouse liver tissue were determined. The results were consistent with single cell analysis that greater number of  macrophages were recruited into the liver tissue in Ripk1<sup>Δhep</sup> upon 12-hour fasting (updated Figure 4F&G).

      To confirm the results of Ki67, we first detected the transcriptional expression of Ki67 using real-time qPCR, and the results were consistent with the protein expression measured by immunohistochemical analysis. The percentage of Ki67<sup>+</sup> cells in liver cells were also detected, and there was significantly more Ki67<sup>+</sup> cells in Ripk1<sup>Δhep</sup> mouse liver than WT control mouse upon 12-hour fasting. Taken together, our transcriptional analysis, immunohistochemical analysis as well as flow cytometry data indicated that Ki67 expression was higher in Ripk1<sup>Δhep</sup> mice than Ripk1<sup>fl/fl</sup> mice. (updated Figure 2). 

      (3) Related to Figure 3 & related to Figure 4.

      The immunofluorescence data presented are not entirely convincing and are insufficient to conclusively demonstrate the recruitment of monocytes. Previous suggestions for flow cytometry studies remain pertinent and are indeed necessary to bolster the robustness of the data and conclusions. Conducting flow cytometry analyses would provide more accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings and strengthening the overall conclusions of the study. Regarding the single-cell RNA sequencing analysis presented in the manuscript, it's worth questioning its relevance and depth of information provided. While it successfully identifies a quantitative difference in the cellular composition of the liver between control and knockout mice, it may fall short in elucidating the intricate interactions between different cell populations, which are crucial for understanding the underlying mechanisms of hepatic inflammation. Therefore, I propose considering alternative bioinformatic analyses, such as CellPhone-CellChat, which could potentially provide a more comprehensive understanding of the cellular dynamics and interactions within the liver microenvironment. By examining the dialogue between different cell clusters, these analyses could offer deeper insights into the functional consequences of Ripk1 deficiency in hepatocytes and its impact on hepatic inflammation during fasting.

      Thanks very much for the constructive suggestion. We agree with the reviewer that conducting flow cytometry analyses would provide accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings. Following the advice, both WT and Ripk1<sup>Δhep</sup> mice were fasted for 12 hour and then single hepatic cells were isolated and analyzed by flow cytometry. As indicated in updated Figure 4F&G, the percentage of F4/80<sup>+</sup>CD11b<sup>+</sup> cells were significantly higher in Ripk1<sup>Δhep</sup> compared with WT control mice, confirming that more monocytes were recruited into the liver.

      Additionally, we performed CellChat analysis on the single-cell transcriptomic data. As shown in updated Figures 4H-J, both the number of ligand-receptor pairs and the interaction strength among the eight cell types were significantly increased in Ripk1<sup>Δhep</sup> mice, particularly the interactions between macrophages and other cell types. Network analysis indicated that inflammation and proliferation signals were amplified in Ripk1<sup>Δhep</sup> mice. Consistent with the bulk RNA sequencing data, SAA signaling was upregulated in the hepatocytes of Ripk1<sup>Δhep</sup> mice (updated Figure 4K). SAA has been found to play a role in regulating immune responses and tumor development. Based on these findings, we speculate that fasting-induced liver injury in RIPK1 knockout mice may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. The above data analysis and interpretation were included in the updated Figure 4&S4 and line 421 - 443.

      (4) Related to Figure 5.

      What additional insights do the data from Figure 5 provide compared to the study published in Nat Comms, which demonstrated that RIPK1 regulates starvation resistance by modulating aspartate catabolism (PMID: 34686667)?

      Thank you very much for your constructive suggestion. As noted by the reviewer, this study (PMID: 34686667) primarily focuses on metabolomic analyses of Ripk1<sup>-/-</sup> neonatal mouse brain tissue and Ripk1<sup>-/-</sup> MEF cells. The authors propose that Ripk1 regulates starvation resistance by modulating aspartate catabolism.

      In our study, the global metabolic changes induced by fasting were monitored. Fastinginduced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation, and excessive deposition of free fatty acids has been shown to induce endoplasmic reticulum (ER) stress in the liver. Data from Figure 5 demonstrate that administering the ER stress inhibitor 4-PBA effectively mitigated fasting-induced liver injury and inflammatory responses in Ripk1<sup>Δhep</sup> mice. Our findings suggest that ER stress plays a critical role in fasting-induced liver injury and inflammation in Ripk1<sup>Δhep</sup> mice.

      (5) Related to Figure 6.

      The data presented in Figure 7 are complementary and do not introduce new mechanistic insights.

      Thank you very much for your insightful suggestion. As you mentioned, the AAV-TBG-Cre-mediated liver-specific RIPK1 knockout mice offer complementary validation of the results obtained from Ripk1<sup>Δhep</sup> mice. Moreover, TBG is a promoter that is exclusively expressed in mature hepatocytes, while the ALB promoter is active not only in mature hepatocytes but also in precursor cells and cholangiocytes. Therefore, we think that the inclusion of AAV-TBG-Cre further strengthens our finding that RIPK1 in hepatocytes is responsible for fasting-induced liver injury and inflammatory responses.

      (6) Related to Figure 7.

      The data from Figure 7 suggest that RIPK1 in hepatocytes is responsible for the observed damage. However, it has been previously demonstrated that inhibition of RIPK1 activity in macrophages protects against the development of MASLD (PMID: 33208891). One possible explanation for these findings could be that the overreaction of macrophages to fasting, coupled with the absence of RIPK1 in hepatocytes (an indirect effect), contributes to the observed damage. Considering this, complementing hepatocytes with a kinase-dead version of RIPK1 could be a valuable approach to further refine the molecular aspect of the study. This would allow for a more precise investigation into the specific role of RIPK1's scaffolding or kinase function in response to starvation in hepatocytes. Such experiments could provide additional insights into the mechanisms underlying the observed effects and help delineate the contributions of RIPK1 in different cell types to metabolic stress responses.

      Thank you very much for the constructive suggestion. We fully agree with the reviewer that employing a RIPK1 kinase-inactive mutant mice could precisely investigate the specific roles of RIPK1's scaffolding and kinase functions in hepatocyte responses to starvation, respectively. In accordance with this advice, we established a 12-hour fasting model using Ripk1<sup>WT/WT</sup> and Ripk1<sup>K45A/K45A</sup> mice, which were previously established and confirmed with the inactivity of RIPK1 kinase activity. As demonstrated in updated Supplementary Figure 2, these mice did not show significant liver damage or inflammatory responses after 12 hours of fasting. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not be contributed by the kinase activity of RIPK1.  

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al. analyzed the functional role of hepatocyte RIPK1 during metabolic stress, particularly its scaffold function rather than kinase function. They show that Ripk1 knockout sensitizes the liver to cell death and inflammation in response to short-term fasting, a condition that would not induce obvious abnormality in wild-type mice.

      Strengths:

      The findings are based on a knockout mouse model and supported by bulk RNA-seq and scRNA-seq. The work consolidates the complex role of RIPK1 in metabolic stress.

      Weaknesses:

      However, the findings are not novel enough because the pro-survival role of RIPK1 scaffold is well-established and several similar pieces of research already exist. Moreover, the mechanism is not very clear and needs additional experiments.

      We thank the reviewer for the encouraging comments and helpful suggestions. Here we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (7) I recommend that the authors consider reassessing their results, particularly with regards to elucidating the dialogue between macrophages and hepatocytes, as this could further strengthen the study's conclusions.

      Thank you very much for your constructive suggestion. We conducted additional experiments, including flow cytometry and western blotting, to reassess our findings. Furthermore, to clarify the interactions between cells, we employed CellChat for a more in-depth analysis of the single-cell sequencing results. In the revised manuscript version, changes are highlighted in yellow. In this study, we demonstrated that the specific deletion of RIPK1 in hepatocytes exacerbated the liver's vulnerability to metabolic disturbances, such as short-term fasting and high-fat diet feeding, resulting in increased liver damage, apoptosis, inflammation, and compensatory proliferation. The data indicate that fasting-induced liver injury in RIPK1 knockout mice of hepatic parenchymal cells may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. In summary, we revealed a novel physiological role of RIPK1 as a scaffold in maintaining liver homeostasis during fasting and other nutritional disturbances.

      (8) It would be beneficial for the authors to address the minor weaknesses identified in the study, such as ensuring consistency in the number of animals analyzed across different graphs and enhancing the resolution of images to improve data clarity.

      Thank you for the suggestion. In the revised manuscript, we have addressed these minor weaknesses, and we checked the consistency in the number of animals in different graphs, as well as enhanced the resolution of all images.

      (9) I encourage the authors to incorporate additional experiments, such as Western blotting and flow cytometry, to provide a more quantitative assessment of the observed effects and enhance the robustness of their conclusions.

      Thank you for your insightful suggestion. We completely agree with the reviewer that incorporating flow cytometry and western blotting would strengthen the robustness of our conclusions. We conducted flow cytometry analysis and western blotting and the results were listed in updated Supplementary Figure 1, Figure 2, Figure 4 and Supplementary Figure 4.

      (10) Furthermore, the authors may consider conducting complementary experiments, such as rescue experiments involving complementing hepatocytes with a kinase-dead version of RIPK1, to further refine the molecular aspect of the study and elucidate the specific roles of RIPK1's scaffolding or kinase function in response to starvation.

      Thank you very much for your constructive suggestion. As shown in updated Supplementary Figure 2, we conducted fasting experiments using RIPK1 kinase-dead mice. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not contributed by the kinase activity of RIPK1.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (11) What is the upsteam signal for RIPK1? The study investigated the change induced by short-term fasting which is metabolic stress. Although RIPK1 knockout promotes cell death and inflammation, how it is involved in this condition is unclear. RIPK1 is never reported as a metabolic sensor and its function is typically downstream of TNFR1 as well as other death receptors such as Fas, TRAIL-R1, TRAIL-R2. Thus, it's probable that metabolic stress induces the expression and secretion of some ligand of the above receptors. Although TNFα expression is upregulated on both mRNA and protein levels, it could not be concluded that TNFα is the upsteam signal for RIPK1 because expression difference does not always lead to fuctional role. In addition, a recent study, which is also reference 33, reports that knockout of TNFR1/2 does not protect against 18 h liver ischemia, a condition that is similar to the present study. Therefore, the link between the metabolic fluctuation and RIPK1 function is elusive and should be addressed. The expression difference analysis should be extended to other relevant ligands. A functional study using neutralizing antibodies in RIPK1ΔHep mice is encouraged. At least, this should be discussed in the discussion section.

      Thank you very much for your insightful comments. The upstream signals of RIPK1 remains a significant area of scientific inquiry. Fasting, as one of the main causes of metabolic stress, is known to trigger a series of physiological changes, including but not limited to decreased blood glucose levels, hepatic glycogen depletion, increased production of hepatic glucose and ketone bodies, adipose tissue lipolysis, and the influx and accumulation of free fatty lipids in the liver. It is well-established that the elevated lipid influx and hepatic accumulation during fasting may cause lipotoxicity stress for liver. To investigate whether the elevated free fatty acids influx might act as the signal to induce cytotoxicity, we isolated primary hepatocytes but observed that a significant number of cells underwent spontaneous death during the isolation and perfusion processes. To address this question, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, as illustrated in Author response image 1A.

      To mimic hepatic lipid accumulation induced by short-term fasting, we treated the cells with palmitic acid (PA) or oleic acid (OA) for 12 hours in vitro. Our results indicated a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after PA treatment compared to WT control cells (Author response image 1B). As shown in Author response image 1C, we also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      Collectively, our results highlight the crucial role of RIPK1 in hepatocytes in maintaining the liver's adaptive capacity to counteract lipotoxicity induced by metabolic stress. These in vitro results were not included in the manuscript; however, we addressed them in the discussion section (line 593 - 597). If the reviewer suggest, we would like to incorporate in our manuscript.

      Author response image 1.

      (12) What is the exact relationship between ER stress and RIPK1? In Figure 5A and Figure 6B, Ripk1 knockout only slightly promotes the expression of ER stress markers. The evidence of RIPK1 leading to ER stress is limited in the literature and poorly supported in this study. Also in reference 33, the hypothesis is proposed that ER stress leads to death receptor upregulation and activation, which induces RIPK1 activation. Although the ER stress inhibitor showed good efficacy in rescue experiments, it could not determine whether RIPK1 deficiency leads to ER stress-associated phenotype or ER stress leads to death receptor activation and RIPK1 deficiency-associated phenotype. If RIPK1 deficiency leads to ER stress, the possible mechanism should be investigated.

      Thank you very much for your insightful comments. As the reviewer noted, the specific relationship between endoplasmic reticulum (ER) stress and RIPK1 remains unclear. However, our data, along with findings from other studies (Piccolis M et al., Mol Cell. 2019; Geng Y et al., Hepatol Int. 2021), suggest that fasting-induced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation. Additionally, excessive deposition of free fatty acids has been shown to induce ER stress in the liver. One possible explanation is that ER stress may trigger the upregulation and activation of death receptors, and the scaffold function of RIPK1 may play a protective and checkpoint role in this process. ER stress during the fasting might locate upstream of RIPK1. This could help explain why short-term fasting results in liver damage in Ripk1<sup>Δhep</sup> mice while control mice remain unaffected. Moreover, the inhibition of ER stress using 4-PBA can effectively alleviate this damage.

      Minor:  

      (13) The study starts directly from functional experiments. However, it should be firstly explored whether RIPK1 expression or activation is modulated in wild-type mice.

      Thank you very much for your insightful observation. Previous studies showed that RIPK1 deficiency in hepatocytes does not impact the growth and development of mice, indicating that RIPK1 is dispensable for proper liver development and homeostasis (Filliol A et al., Cell Death Dis. 2016). Furthermore, we did not observe any changes in RIPK1 levels in wild-type mice induced by fasting across different experimental batches. In our bulk transcriptomic analysis, the expression of RIPK1 was not changed before and after 12-hour fasting in Ripk1<sup>fl/fl</sup> mice. Therefore, we focused our attention on the function of RIPK1 and started our study directly with functional experiments.

      (14) Knockout of RIPK1 deprived both its scaffold function and kinase function. It is encouraged to explore whether blocking RIPK1 kinase activity influences the outcome of metabolic stress.

      Thank you for your insightful suggestion. To investigate the role of RIPK1 kinase activity in response to metabolic stress, we added fasting experiments using RIPK1 kinaseinactive mice in the updated Supplementary Figure 2, in which blocking RIPK1 kinase activity does not affect the outcome of metabolic stress.

      (15) In Figure 1, the number of TUNEL+ cells is about 2 times of c-casp3. What is the possible reason?

      Thank you for your careful reading. Indeed, the number of TUNEL<sup>+</sup> cells in Figure 1 is twice that of cleaved-caspase-3<sup>+</sup> cells. There are two possible reasons. First, we speculate that this discrepancy may be attributed to the higher sensitivity of the TUNEL assay compared to the cleaved-caspase-3 assay. Secondly, TUNEL assay detects DNA fragmentation, indicating that these cells are in a pre-apoptotic state or poised to undergo apoptosis. In contrast, cleaved-caspase-3 specifically identifies cells that have already committed to the apoptotic pathway, whereas TUNEL assay could detects all types of apoptosis, but the mechanisms of apoptosis may involve more than just cleaved-caspase3.

      (16) Infiltrated innate immune cells could lead to hepatocyte death. Is the hepatocyte death in this study partially caused by immune cells?

      Many thanks for the advice. As outlined in the response to the 11th comment from the second reviewer, our findings indicate that metabolic stress induced by short-term fasting is the primary cause of hepatocyte death. Additionally, we demonstrate that infiltrated innate immune cells may also play a partial role in hepatocyte death through subsequent cascade reactions.

      (17) Could the in vivo results be consolidated by in vitro experiments on primary mouse hepatocytes? This would be helpful to answer question 4.

      Thank you for your helpful comments. As demonstrated in the response to the 11th comment by the second reviewer, we attempted to conduct in vitro experiments using primary hepatocytes. However, during the isolation and perfusion processes, we observed that a significant number of cells underwent spontaneous death. To address this issue, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, in which a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after palmitic acid (PA) treatment compared to WT control cells. We also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      (18) RIPK1 scaffold function is associated with NF-kB signal. Is NF-kB signal transduction influenced by Ripk1 deficiency? If so, to what extent does it contribute to the observed phynotype? If not, what is the direct downstream effect of Ripk1 deficiency?

      Thank you very much for your insightful perspective. As reported by Clucas J et al., RIPK1 serves as a scaffold for downstream NF-κB signaling through the ubiquitin chains generated by its ubiquitination (Clucas J et al., Nat Rev Mol Cell Biol. 2023). The deficiency of RIPK1 in hepatic parenchymal cells can disrupt NF-κB signaling and impair its pro-survival functions, resulting in increased cell death in response to stress. Our current findings suggest that the RIPK1-NF-κB axis serves as a crucial scaffold platform essential for the liver's adaptation to metabolic fluctuations. Any inappropriate inactivation or deletion of components within this scaffold disrupts the delicate balance between cell death, inflammation, and normal function, making the liver susceptible to metabolic changes, ultimately leading to liver damage, hepatic inflammation, and compensatory proliferation.

      (19) In Figure 6B, the 'RIP' should be changed to 'RIPK1'.

      Thank you for your careful observation. We have corrected "RIP" to "RIPK1" in updated Figure 6B.

      (20) For Western blot results, the blot height should be at least the lane width to reveal additional signals and the molecular weight as well as unspecific signals should be denoted.

      Thank you for your valuable advice. We appreciate your suggestions regarding the western blot results. We went through the previous western blot results and did not find any additional nonspecific signals. We added the molecular weights in the updated figures Figure 5, Figure 6 and Supplementary Figure 1.

    1. eLife Assessment

      This important collection of over 800 new cell type-specific driver lines will be an invaluable resource for researchers studying associative learning in Drosophila. Thoroughly characterized and well documented, this collection will permit researchers to selectively target neurons that deliver information to, or receive it from, the memory center of the fly brain called the Mushroom Body. Given the wealth of new drivers and the genetic access they provide to over 300 cell types, this compelling work will be of interest not only to researchers studying the mechanisms of associative learning but more generally to those dissecting sensorimotor circuits in the fly nervous system.

    2. Reviewer #1 (Public Review):

      Summary:

      The emergence of Drosophila EM connectomes has revealed numerous neurons within the associative learning circuit. However, these neurons are inaccessible for functional assessment or genetic manipulation in the absence of cell-type-specific drivers. Addressing this knowledge gap, Shuai et al. have screened over 4000 split-GAL4 drivers and correlated them with identified neuron types from the "Hemibrain" EM connectome by matching light microscopy images to neuronal shapes defined by EM. They successfully generated over 800 split-GAL4 drivers and 22 split-LexA drivers covering a substantial number of neuron types across layers of the mushroom body associative learning circuit. They provide new labeling tools for olfactory and non-olfactory sensory inputs to the mushroom body; interneurons connected with dopaminergic neurons and/or mushroom body output neurons; potential reinforcement sensory neurons; and expanded coverage of intrinsic mushroom body neurons. Furthermore, the authors have optimized the GR64f-GAL4 driver into a sugar sensory neuron-specific split-GAL4 driver and functionally validated it as providing a robust optogenetic substitute for sugar reward. Additionally, a driver for putative nociceptive ascending neurons, potentially serving as optogenetic negative reinforcement, is characterized by optogenetic avoidance behavior. The authors also use their very large dataset of neuronal anatomies, covering many example neurons from many brains, to identify neuron instances with atypical morphology. They find many examples of mushroom body neurons with altered neuronal numbers or mistargeting of dendrites or axons and estimate that 1-3% of neurons in each brain may have anatomic peculiarities or malformations. Significantly, the study systematically assesses the individualized existence of MBON08 for the first time. This neuron is a variant shape that sometimes occurs instead of one of two copies of MBON09, and this variation is more common than that in other neuronal classes: 75% of hemispheres have two MBON09's, and 25% have one MBON09 and one MBON08. These newly developed drivers not only expand the repertoire for genetic manipulation of mushroom body-related neurons but also empower researchers to investigate the functions of circuit motifs identified from the connectomes. The authors generously make these flies available to the public. In the foreseeable future, the tools generated in this study will allow important advances in the understanding of learning and memory in Drosophila.

      Strengths:

      (1) After decades of dedicated research on the mushroom body, a consensus has been established that the release of dopamine from DANs modulates the weights of connections between KCs and MBONs. This process updates the association between sensory information and behavioral responses. However, understanding how the unconditioned stimulus is conveyed from sensory neurons to DANs, and the interactions of MBON outputs with innate responses to sensory context remains less clear due to the developmental and anatomic diversity of MBONs and DANs. Additionally, the recurrent connections between MBONs and DANs are reported to be critical for learning. The characterization of split-GAL4 drivers for 30 major interneurons connected with DANs and/or MBONs in this study will significantly contribute to our understanding of recurrent connections in mushroom body function.

      (2) Optogenetic substitutes for real unconditioned stimuli (such as sugar taste or electric shock) are sometimes easier to implement in behavioral assays due to the spatial and temporal specificity with which optogenetic activation can be induced. GR64f-GAL4 has been widely used in the field to activate sugar sensory neurons and mimic sugar reward. However, the authors demonstrate that GR64f-GAL4 drives expression in other neurons not necessary for sugar reward, and the potential activation of these neurons could introduce confounds into training, impairing training efficiency. To address this issue, the authors have elaborated on a series of intersectional drivers with GR64f-GAL4 to dissect subsets of labeled neurons. This approach successfully identified a more specific sugar sensory neuron driver, SS87269, which consistently exhibited optimal training performance and triggered ethologically relevant local searching behaviors. This newly characterized line could serve as an optimized optogenetic tool for sugar reward in future studies.

      (3) MBON08 was first reported by Aso et al. 2014, exhibiting dendritic arborization into both ipsilateral and contralateral γ3 compartments. However, this neuron could not be identified in the previously published Drosophila brain connectomes. In the present study, the existence of MBON08 is confirmed, occurring in one hemisphere of 35% of imaged flies. In brains where MBON08 is present, its dendrite arborization disjointly shares contralateral γ3 compartments with MBON09. This remarkable phenotype potentially serves as a valuable resource for understanding the stochasticity of neurodevelopment and the molecular mechanisms underlying mushroom body lobe compartment formation.

    3. Reviewer #2 (Public Review):

      Summary:

      The article by Shuai et al. describes a comprehensive collection of over 800 split-GAL4 and split-LexA drivers, covering approximately 300 cell types in Drosophila, aimed at advancing the understanding of associative learning. The mushroom body (MB) in the insect brain is central to associative learning, with Kenyon cells (KCs) as primary intrinsic neurons and dopaminergic neurons (DANs) and MB output neurons (MBONs) forming compartmental zones for memory storage and behavior modulation. This study focuses on characterizing sensory input as well as direct upstream connections to the MB both anatomically and, to some extent, behaviorally. Genetic access to specific, sparsely expressed cell types is crucial for investigating the impact of single cells on computational and functional aspects within the circuitry. As such, this new and extensive collection significantly extends the range of targeted cell types related to the MB and will be an outstanding resource to elucidate MB-related processes in the future.

      Strengths:

      The work by Shuai et al. provides novel and essential resources to study MB-related processes and beyond. The resulting tools are publicly available and, together with the linked information, will be foundational for many future studies. The importance and impact of this tool development approach, along with previous ones, for the field cannot be overstated. One of many interesting aspects arises from the anatomical analysis of cell types that are less stereotypical across flies. These discoveries might open new avenues for future investigations into how such asymmetry and individuality arise from development and other factors, and how it impacts the computations performed by the circuitry that contains these elements.

    4. Reviewer #3 (Public Review):

      Summary:

      Previous research on the Drosophila mushroom body (MB) has made this structure the best-understood example of an associative memory center in the animal kingdom. This is in no small part due to the generation of cell-type specific driver lines that have allowed consistent and reproducible genetic access to many of the MB's component neurons. The manuscript by Shuai et al. now vastly extends the number of driver lines available to researchers interested in studying learning and memory circuits in the fly. It is an 800-plus collection of new cell-type specific drivers target neurons that either provide input (direct or indirect) to MB neurons or that receive output from them. Many of the new drivers target neurons in sensory pathways that convey conditioned and unconditioned stimuli to the MB. Most drivers are exquisitely selective, and researchers will benefit from the fact that whenever possible, the authors have identified the targeted cell types within the Drosophila connectome. Driver expression patterns are beautifully documented and are publicly available through the Janelia Research Campus's Flylight database where full imaging results can be accessed. Overall, the manuscript significantly augments the number of cell type-specific driver lines available to the Drosophila research community for investigating the cellular mechanisms underlying learning and memory in the fly. Many of the lines will also be useful in dissecting the function of the neural circuits that mediate sensorimotor circuits.

      Strengths:

      The manuscript represents a huge amount of careful work and leverages numerous important developments from the last several years. These include the thousands of recently generated split-Gal4 lines at Janelia and the computational tools for pairing them to make exquisitely specific targeting reagents. In addition, the manuscript takes full advantage of the recently released Drosophila connectomes. Driver expression patterns are beautifully illustrated side-by-side with corresponding skeletonized neurons reconstructed by EM. A comprehensive table of the new lines, their split-Gal4 components, their neuronal targets, and other valuable information will make this collection eminently useful to end-users. In addition to the anatomical characterization, the manuscript also illustrates the functional utility of the new lines in optogenetic experiments. In one example, the authors identify a specific subset of sugar reward neurons that robustly promotes associative learning.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Comments on revised version: 

      Overall, I thought the authors addressed my comments well with the possible exception of what is actually new here. This was the most important thing that I thought should be included in the revision. Although the authors rewrote the paragraph describing the lines presented in the paper, I still can't tell exactly which ones haven't been previously published. Their revised paragraph says that 40 lines have been "previously used," but Supplemental Table 1 shows references for over 200 of the lines, which sounds more reasonable based on papers that have come out. 

      We have modified the text in line 112-120 as below.

      “Supplementary File 1 lists 859 lines (including split-LexA) and their detailed information, such as genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type. A small subset of 47 lines from this collection have been previously used in studies (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      For 842 lines among the 859 lines listed in Supplementary File 1, this study is the primary citation for future papers for the following reason: 

      In 2021 December, we deposited the confocal images of new split-GAL4 lines at Janelia Flylight website (http://www.janelia.org/split-gal4) without a publication to describe annotation of expression patterns, and we already started sharing the lines without restrictions. In 2023 September, we released the preprint of this study at bioRxiv (doi: https://doi.org/10.1101/2023.09.15.557808). Up to this point, 47 lines have been used in other studies. In Supplementary File 1, 30 of them attribute the citation credit to both this study and other papers, because this 2023 preprint was cited as the primary citation in those papers. Similarly, the omni paper to summarize all the eWort of generating split-GAL4 lines by Janelia Flylight team (https://doi.org/10.7554/eLife.98405.1) cite many lines from this paper. However, since this summary paper did not provide additional information such as functional characterization by behavioral experiments, we did not include it in Supplementary File 1 to clarify that this study is the primary citation for these lines. The remaining 17 lines were published before 2021. We included them for the convenience of users, and we attributed the primary citation to the already published papers. 

      Also, in the revised paragraph they state that "All transgenic lines newly generated in this study are listed in Supplementary File 2" but that table lists only the 36 LexA hemidriver lines! Confusingly, this comment cites the same 8 references as are cited for the 40 line that they say were previously published. I am thus only more confused about how many previously uncharacterized lines are presented in this paper. 

      We modified the text as below to clarify that “new lines” indicate LexA or DBD lines but not new combination of already published AD and DBD lines. We removed the 8 citations, which were mistakenly placed in the previous manuscript.

      “The newly generated LexA, Gal4DBD and LexADBD lines are listed in Supplementary File 2. “

    1. eLife Assessment

      This study provides important insights into the brain activity and connectivity underlying speech comprehension, revealing three brain states. The authors present compelling evidence by leveraging hidden Markov modeling of fMRI data to link brain state dynamics to comprehension scores, though the functional role of these states remains under-explored. These findings advance our understanding of how brain state transitions in narrative comprehension relate to stimulus-specific features.

    2. Reviewer #1 (Public review):

      Summary:

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.

      Strengths:

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.

    3. Reviewer #2 (Public review):

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.

      The correlation between narrative features and brain state expression was reliable, but relatively low (~0.03). As discussed in the manuscript, this could be due to measurement noise, as well as narrative features accounting for a small proportion of cognitive processes underlying the brain states.

      A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. Across tasks, however, the spatial regions associated with each state varied. For example, state 2 during narrative comprehension was similar to both states 2 and 3 during rest (Fig. 5A), suggesting that the organization of the three states was task dependent.

      The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over may offer a more parsimonious account than a multi-process account where the states correspond to distinct processes. This possibility is mentioned briefly in the introduction, but not developed further.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech. 

      Strengths: 

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications. 

      Weaknesses: 

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results. 

      For the Experimental Procedure, we will provide a more detailed description about stimuli, and the comprehension test, and upload the audio files and corresponding transcriptions as the supplementary dataset. 

      For statistics/analyses, we have reproduced the states' spatial maps using unnormalized activity pattern. For the resting state, we observed a state resembling the baseline state described in Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states were characterized by network activities varying largely from zero. In addition, we have re-generated the null distribution for behaviorbrain state correlations using circular shift. The results are largely consistent with the previous findings. We have also made some other adjustment to the analyses or add some new analyses as recommended by the reviewer. We will revise the manuscript to incorporate these changes.

      For the interpretation/rationale: We will add a more detailed interpretation for the association between state occurrence and semantic coherence. Briefly speaking, higher semantic coherence may allow for the brain to better accumulate information over time.

      State #2 seems to be involved in the integration of information at shorter timescales (hundreds of milliseconds) while State #3 seems to be involved in the longer timescales (seconds). 

      We greatly appreciate the reviewer for the insightful comments and constructive suggestions.  

      Reviewer #2 (Public review): 

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension. 

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions. 

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      In the revision, we generated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.

      We note that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations are primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will replace Figure 3 and the related supplementary figures with new ones, in which the null distribution is generated via circular shift. Furthermore, we will expand our discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion."

      In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on the ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015).

      Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes. 

      The temporal window hypothesis provides a more fitting explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) The "Participants and experimental procedure" section deserves more details. I've checked Liu et al. (2020), and the dataset contained 43 participants aged 20-75 years, whereas this study contained data from 64 young adults and 30 old adult samples. The previous dataset seems to have two stories, whereas this study seems to have three. Please be specific, given that the dataset does not seem the same. Could the authors also include more descriptions of what the auditory stories were? For example, what were the contents, and how were they recorded? 

      The citation is partially incorrect. The dataset of young adults is shared with our work published in (2022). The 64 participants listened to one of three stories told by a female college student in Mandarin, recounting her real-life experience of hiking, a graduate admission interview, and her first time taking a flight, respectively. The sample of older adults is from our work published in (2020), which includes 30 older adults and additionally 13 young adults. The stimuli in this case were two stories told by an older woman in a Chinese dialect, describing her experience in Thailand and riding a warship, respectively. Since we aim to explore whether the main results can be replicated on a different age group, we excluded the 13 young adults from the analysis. 

      All the stories were recorded during fMRI scanning using a noise-canceling microphone (FOMRI-III; Optoacoustics Ltd, Or-Yehuda, Israel) positioned above the speaker’s mouth. The audio recordings were subsequently processed offline with Adobe Audition 3.0 (Adobe Systems Inc., USA) to further eliminate MRI scanner noise.

      In the revised manuscript, we have updated the citation, and provided a more detailed description of the stimuli in the supplementary material. We have also uploaded the audio files along with their corresponding transcriptions to GitHub.

      (2) I am curious about individual differences in comprehension scores. Did participants have less comprehension of the audio-narrated story because the story was a hard-tocomprehend narrative or because the audio quality was low? Could the authors share examples of comprehension tests? 

      We believe two factors contribute to the individual differences in comprehension scores. First, the audio quality is indeed moderately lower than in dailylife story-listening conditions. This is because those stories were recorded and played during fMRI scanning. Although a noise-canceling equipment was used, there were still some noises accompanying the speech, which may have made speech perception and comprehension more difficult than usual.

      Second, the comprehension test measured how much information about the story (including both main themes and details) participants could recall. Specifically, participants were asked to retell the stories in detail immediately after the scanning session. Following this free recall, the experimenters posed a few additional questions drawn from a pre-prepared list, targeting information not mentioned in their recall. If participants experienced lapses of attention or did not store the incoming information into memory promptly, they might fail to recall the relevant content. In several studies, such a task has been called a narrative recall test. However, memory plays a crucial role in real-time speech comprehension, while comprehension affects the depth of processing during memory encoding, thereby influencing subsequent recall performance. To align with prior work (e.g., Stephens et al., 2010) and our previous publications, we chose to referred to this task as narrative comprehension. 

      In the revised manuscript, we have provided a detailed description about the comprehension test (Line 907-933) and share the examples on GitHub. 

      (3) Regarding Figure 3, what does it mean for a state occurrence to follow semantic coherence? Is there a theoretical reason why semantic coherence was measured and related to brain state dynamics? A related empirical question is: is it more likely for the brain states to transition from one state to another when nearby time points share low semantic similarity compared to chance? 

      We analyzed semantic coherence and sound envelope as they capture different layers of linguistic and acoustic structure that unfold over varying temporal scales. Changes in the sound envelope typically occur on the order of milliseconds to a few hundred milliseconds, changes in word-level semantic coherence span approximately 0.24 ± 0.15 seconds, and changes in clause-level semantic coherence extend to 3.2 ± 1.7 seconds. Previous theory and empirical studies suggest that the timescales of information accumulation vary hierarchically, progressing from early sensory areas to higher-order areas (Hasson et al., 2015; Lerner et al., 2011). Based on this work, we anticipate that the three brain states, which are respectively associated with the auditory and sensory motor network, the language network and the DMN, would be selectively modulated by these speech properties corresponding to distinct timescales. 

      Accordingly, when a state occurrence aligns with (clause-level) semantic coherence, it suggests that this state is engaged in processing information accumulated at the clause level (i.e., its semantic relationship). Higher coherence facilitates better accumulation, making it more likely for the associated brain state to be activated. 

      We analyzed the relationship between state transition probability and semantic coherence, but did not find significant results. Here, the transition probability was calculated as Gamma(t) – Gamma(t-1), where Gamma refers to the state occurrence probability. The lack of significant findings may be because brain state transitions are driven primarily by more slowly changing factors. Indeed, we found the average dwell time of the three states ranges from 9.66 to 15.29s, which is a much slower temporal dynamics compared to the relatively rapid shifts in acoustic/semantic properties. 

      In the revised version, we have updated the Introduction to clarify the rational for selecting the three speech properties and to explore their relationship with brain dynamics (Line 111-118)

      (4) When running the HMM, the authors iterated K of 2 to 10 and K = 4, 10, and 12. However, the input features of the model consist of only 9 functional networks. Given that the HMM is designed to find low-dimensional latent state sequences, the choice of the number of latent states being higher than the number of input features sounds odd to me - to my speculation, it is bound to generate almost the exact same states as 9 networks and/or duplicates of the same state. I suggest limiting the K iterations from 2 to 8. For replication with Yeo et al.'s 7 networks, K iteration should also be limited to K of less than 7, or optionally, Yeo's 7 network scheme could be replaced with a 17network scheme. 

      We understand your concern. However, the determination of the number (K) of hidden states is not directly related to the number of features (in this case, the number of networks), but rather depends on the complexity of the time series and the number of underlying patterns. Given that each state corresponds to a distinct combination of the features, even a small number of features can be used to model a system with complex temporal behaviors and multiple states. For instance, for a system with n features, assuming each is a binary variable (0 or 1), there are maximally 2<sup>n</sup> possible underlying states. 

      In our study, we recorded brain activity over 300 time points and used the 9 networks as features. At different time points, the brain can exhibit distinct spatial configurations, reflected in the relative activity levels of the nine networks and their interactions. To accurately capture the temporal dynamics of brain activity, it is essential to explore models that allow for more states than the number of features. We note that in other HMM studies, researchers have also explored states more than the number of networks to find the best number of hidden states (e.g., Ahrends et al., 2022; Stevner et al., 2019). 

      Furthermore, Ahrends et al. (2022) suggested that “Based on the HCP-dataset, we estimate as a rule of thumb that the ratio of observations to free parameters per state should not be inferior to 200”, where free parameters per state is [𝐾 ∗(𝐾 −1)+ (𝐾 −1)+𝐾 ∗𝑁 ∗(𝑁 +1)/2]/𝐾. According to this, there should be above 10, 980 observations when the number of states (K) is 10 (the maximal number in our study) and the number of networks (N) is 9. In our group-level HMM model, there were 64 (valid runs) * 300 (TR) = 19200 observations for young adults, and 50 (valid runs) * 210 (TR) = 10500 observations for older adults. Aside from the older adults' data being slightly insufficient (4.37% less than the suggestion), all other hyperparameter combinations in this study meet the recommended number of observations. 

      (5) In Figure 2, the authors write that the states' spatial maps were normalized for visualization purposes. Could the authors also show visualization of brain states that are not normalized? The reason why I ask is, for example, in Song, Shim, & Rosenberg (2023), the base state was observed which had activity levels all close to the mean (which is 0 because the BOLD activity was normalized). If the activity patterns of this brain state were to be normalized after state estimation, the base state would have looked drastically different than what is reported. 

      We derived the spatial maps of the states using unnormalized activity patterns, with the BOLD signals Z-score normalized to a mean of zero. Under the speech comprehension task, the three states exhibited relatively large fluctuations in network activity levels. The activity ranges were as follows: [-0.71 to 0.51] for State #1, [-0.26 to 0.30] for State #2, and [-0.82 to 0.40] for State #3. For the resting state, we observed a state resembling the baseline state as described in Song, Shim, & Rosenberg (2023), with activity values ranging from -0.133 to 0.09. 

      In the revision, we have replaced the states' spatial maps with versions showing unnormalized activity patterns. 

      (6) In line 297, the authors speculate that "This may be because there is too much heterogeneity among the older adults". To support this speculation, the authors can calculate the overall ISC of brain state dynamics among older adults and compare it to the ISC estimated from younger adults.  

      We analyzed the overall ISC of brain state dynamics, and found the ISC was indeed significantly lower among the older adults than that among the younger adults. We have revised this statement as follows:

      These factors can diminish the inter-subject correlation of brain state dynamics— indeed, ISCs among older adults were significantly lower than those among younger adults (Figure S5)—and reduce ISC's sensitivity to individual differences in task performance (Line 321-326).

      Other comments: 

      (7) In Figure 4, the authors showed a significant positive correlation between head movement ISC with the best performer and comprehension scores. Does the average head movement of all individuals negatively correlate with comprehension scores, given that the authors argue that "greater task engagement is accompanied by decreased movement"? 

      We examined the relationship between participants' average head movement across the comprehension task and their comprehension scores. There was no significant correlation (r = 0.041, p = 0.74). In the literature (e.g. ,Ballenghein et al., 2019) , the relationship between task engagement and head movement was also assessed at the moment-by-moment level, rather than by using time-averaged data.

      Real-time head movements reflect fluctuations in task engagement and cognitive state. In contrast, mean head movement, as a static measure, fails to capture these changes, and thus is not effective in predicting task performance.

      (8) The authors write the older adults sample, the "independent dataset". Technically, however, this dataset cannot be independent because they were collected at the same time by the same research group. I would advise replacing the word independent to something like second dataset or replication dataset. 

      We have replaced the phrase “independent dataset” with “replication dataset”. 

      (9) Pertaining to a paragraph starting in line 586: For non-parametric permutation tests, the authors note that the time courses of brain state expression were "randomly shuffled". How was this random shuffling done: was this circular-shifted randomly, or were the values within the time course literally shuffled? The latter approach, literal shuffling of the values, does not make a fair null distribution because it does not retain temporal regularities (autocorrelation) that are intrinsic to the fMRI signals. Thus, I suggest replacing all non-parametric permutation tests with random circular shifting of the time series (np. roll in python).  

      In the original manuscript, the time course was literally shuffled. In the revised version, we circular-shifted the time course randomly (circshift.m in Matlab) to generate the null distribution. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence (Line 230-235). 

      (10) The p value calculation should be p = (1+#(chance>=observed))/(1+#iterations) for one-tailed test and p = (1+#(abs(chance)>=abs(observed)))/(1+#iterations) for twotailed test. Thus, if 5,000 iterations were run and none of the chances were higher than the actual observation, the p-value is p = 1/5001, which is the minimal value it can achieve. 

      Have corrected. 

      (11) State 3 in Figure S2 does not resemble State 3 of the main result. Could the authors explain why they corresponded State 3 of the Yeo-7 scheme to State 3 of the nineparcellation scheme, perhaps using evidence of spatial overlap? 

      The correspondence of states between the two schemes was established using evidence of state expression time course. 

      To assess temporal overlap, we calculated Pearson’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network parcellation scheme in terms of state expression probabilities. The time courses of the 64 participants were concatenated, resulting in 19200 (300*64) time points for each state. The one that the candidate state most closely resembled was set to be its corresponding state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. As demonstrated in the confusion matrix, each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two schemes.

      We also assessed the spatial overlap between the two schemes. First, a state activity value was assigned to each voxel across the whole brain (including a total of 34,892 voxels covered by both parcellation schemes). This is done for each brain state. Next, we calculated Spearman’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network scheme in terms of whole-brain activities. The pattern of spatial overlap is consistent with the pattern of temporal overlap, such that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively.

      Author response image 1.

      We noted that the networks between the two schemes are not well aligned in their spatial location, especially for the DMN (as shown below). This may lead to the low spatial overlap of State #3, which is dominated by DMN activity. Consequently, establishing state correspondence based on temporal information is more appropriate in this context. We therefore only reported the results of temporal overlap in the manuscript. 

      We have added a paragraph in the main text for “Establishing state correspondence between analyses” (Line 672-699). We have also updated the associated figures (Fig.S2, Fig.S3 and Fig.5)

      Author response image 2.

      (12) Line 839: gamma parameter, on a step size of? 

      (16) Figure 3. Please add a legend in the "Sound envelope" graph what green and blue lines indicate. The authors write Coh(t) and Coh(t, t+1) at the top and Coh(t) and Coh(t+1) at the bottom. Please be consistent with the labeling. Shouldn't they be Coh(t-1, t) and Coh(t, t+1) to be exact for both? 

      Have corrected. 

      (17) In line 226, is this one-sample t-test compared to zero? If so, please write it inside the parentheses. In line 227, the authors write "slightly weaker"; however, since this is not statistically warranted, I suggest removing the word "slightly weaker" and just noting significance in both States 1 and 2.  

      Have corrected.

      (18) In line 288, please fix "we also whether". 

      Have corrected. 

      (19) In Figure 2C, what do pink lines in the transition matrix indicate? Are they colored just to show authors' interests, or do they indicate statistical significance? Please write it in the figure legend.   

      Yes, the pink lines indicate a meaningful trend, showing that the between-state transition probabilities are significantly higher than those in permutation.

      We have added this information to the figure legend. 

      Reviewer #2 (Recommendations for the authors):

      (1) It is unclear how the correspondence between states across different conditions and datasets was computed. Given the spatial autocorrelation of brain maps, I recommend reporting the Dice coefficient along with a spin-test permutation to test for statistical significance.  

      The state correspondence between different conditions and between the two datasets are established using evidence of spatial overlap. The spatial overlap between states was quantified by Pearson’s correlation using the activity values (derived from HMM) of the nine networks. For each candidate state identified in other analyses (for the Rest, MG and older-adult datasets), we calculate the correlation between its network activity pattern and the three predefined states from the main analysis (for the young-adults dataset), and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      For the comparison between the young and older adults’ datasets (as shown below), the largest spatial overlap occurred along the diagonal of the confusion matrix, with high correlation values. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two datasets. As the HMM is modelled at the level of networks which lack accurate coordinates, we did not apply the spin-test to assess the statistical significance of overlap. Instead, we extracted the state activity patterns from the 1000 permutations (wherein the original BOLD time courses were circularly shifted and an HMM was conducted) for the older-adults dataset. Applying the similar state-correspondence strategy, we generated a null distribution of spatial overlap. The real overlap of the three states was greater than and 97.97%, 95.34% and 92.39% instances from the permutation (as shown below). 

      Author response image 3.

      For the comparison of main task with the resting and the incomprehensible speech condition, there was some degree of confusion: there were two candidate states showing the highest similarity to State #2. In this case, we labeled the most similar candidate as State #2. The other candidate was then assigned to the predefined state with which it had the second-highest correlation. We used a prime symbol (e.g., State #3') to denote cases where such confusion occurred. These findings support our conclusion that the tripartite-organization of brain states is not a task-free, intrinsic property.

      When establishing the correspondence between the Yeo-7 network and the ninenetwork parcellation schemes, we primarily relied on evidence from temporal overlap measures, as a clear network-level alignment between the two parcellation schemes is lacking. Temporal overlap was quantified by calculating the correlation of state occurrence probabilities between the two schemes. To achieve this, we concatenated the time courses of 64 participants, resulting in a time series consisting of 19,200 time points (300 time points per participant) for each state. Each of the three candidate states from the Yeo-7 network scheme was best matched to State #1, State #2, and State #3 from the main analyses, respectively. To determine the statistical significance of the temporal overlap, we circular shifted each participant’s time course of state expression obtained from the Yeo-7network scheme for 1000 times. Applying the same strategy to find the matching states, we generated a null distribution of overlap. The real overlap was much higher than the instances from permutation. 

      Author response image 4.

      In the revision, we have provided detailed description for how the state correspondence is established and reported the statistical significance of those correspondence (Line 671-699). The associated figures have also been updated (Fig.5, Fig. S2 and Fig.S3).  

      (2) Please clarify if circle-shifting was applied to the state expression time course when generating the null distribution for behavior-brain state correlations reported in Figure (3). This seems important to control for the temporal autocorrelation in the time courses.  

      We have updated the results by using circle-shifting to generated the null distribution. The results are largely consistent with the previous on without circular shifting (Line 230-242). 

      (3) Figure 3: What does the green shaded area around the sound envelope represent? In the caption, specify whether the red line in the null distributions indicates the mean or median R between brain state expression and narrative features. It would also be beneficial to report this value in the main text. 

      The green shaded area indicated the original amplitude of speech signal, while blue line indicates the smoothed, low-frequency contour of amplitude changes over time (i.e., speech envelope). We have updated the figure and explained this in the figure caption. 

      The red line in the null distributions indicates the R between brain state expression and narrative features for the real data. and reported the mean R of the permutation in the main text. 

      (4) The manuscript is missing a data availability statement (https://elifesciences.org/inside-elife/51839f0a/for-authors-updates-to-elife-s-datasharing-policies). 

      We have added a statement of data availability in the revision, as follows: 

      “The raw and processed fMRI data are available on OpenNeuro: https://openneuro.org/datasets/ds005623. The experimental stimuli, behavioral data and main scripts used in the analyses are provided on Github. ”

      (5) There is a typo in line 102 ("perceptual alalyses"). 

      Have corrected. 

      We sincerely thank the two reviewers for their constructive feedback, thorough review, and the time they dedicated to improving our work.

      Reference: 

      Ahrends, C., Stevner, A., Pervaiz, U., Kringelbach, M. L., Vuust, P., Woolrich, M. W., & Vidaurre, D. (2022). Data and model considerations for estimating timevarying functional connectivity in fMRI. Neuroimage, 252, 119026. 

      Ballenghein, U., Megalakaki, O., & Baccino, T. (2019). Cognitive engagement in emotional text reading: concurrent recordings of eye movements and head motion. Cognition and Emotion. 

      Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the national academy of sciences, 119(6), e2108091119. https://doi.org/doi:10.1073/pnas.2108091119  

      Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences, 19(6), 304-313. 

      Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story [Article]. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011  

      Liu, L., Li, H., Ren, Z., Zhou, Q., Zhang, Y., Lu, C., Qiu, J., Chen, H., & Ding, G. (2022). The “two-brain” approach reveals the active role of task-deactivated default mode network in speech comprehension. Cerebral Cortex, 32(21), 4869-4884. 

      Liu, L., Zhang, Y., Zhou, Q., Garrett, D. D., Lu, C., Chen, A., Qiu, J., & Ding, G. (2020). Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cerebral Cortex, 30(3), 942-951. https://doi.org/https://doi.org/10.1093/cercor/bhz138

    1. eLife Assessment

      This paper presents useful results that extend our understanding of how the visual cortex encodes temporal structure, providing new information about sequence representations in the upper layers of the visual cortex. The evidence for prediction errors is solid, however, support for other claims regarding sparsification and simplification of activity following training is incomplete. The main concerns pertain to the confounds associated with restricted ordering within blocks that does not allow for separate plasticity mechanisms operating on different time scales.

    2. Reviewer #1 (Public review):

      Summary:

      Knudstrup et al. use two-photon calcium imaging to measure neural responses in the mouse primary visual cortex (V1) in response to image sequences. The authors presented mice with many repetitions of the same four-image sequence (ABCD) for four days. Then on the fifth day, they presented unexpected stimulus orderings where one stimulus was either omitted (ABBD) or substituted (ACBD). After analyzing trial-averaged responses of neurons pooled across multiple mice, they observed that stimulus omission (ABBD) caused a small, but significant, strengthening of neural responses but observed no significant change in the response to stimulus substitution (ACBD). Next, they performed population analyses of this dataset. They showed that there were changes in the correlation structure of activity and that many features about sequence ordering could be reliably decoded. This second set of analyses is interesting and exhibited larger effect sizes than the first results about predictive coding. However, concerns about the design of the experiment temper my enthusiasm.

      The most recent version of this manuscript makes a few helpful changes (entirely in supplemental figures--the main text figures are unchanged). It does not resolve any of the larger weaknesses of the experimental design, or even perform single-neuron tracking in the one case where it was possible (between similar FOVs shown in Supplemental Figure 1).

      Strengths:

      (1) The topic of predictive coding in the visual cortex is exciting, and this task builds on previous important work by the senior author (Gavornik and Bear 2014) where unexpectedly shuffling sequence order caused changes in LFPs recorded from visual cortex.

      (2) Deconvolved calcium responses were used appropriately here to look at the timing of the neural responses.

      (3) Neural decoding results showing that the context of the stimuli could be reliably decoded from trial-averaged responses were interesting. But I have concerns about how the data was formatted for performing these analyses.

      Weaknesses:

      (1) All analyses were performed on trial-averaged neural responses that were pooled across mice (except for Supplementary Figure 6, see below). Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform most analyses on individual datasets to assess how behavioral training might differently affect each animal.

      In the most recent draft, a single-mouse analysis was added for Figure 4C (Supplementary Figure 6). This effect of "representational drift" was not statistically quantified in either the single-mouse results or in the main text figure panel. Moreover, the apparent correlational drift could be accounted for by a reduction in SNR as a consequence of photobleaching.

      (2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

      In the most recent draft, this analysis was still not performed on single mice. I was referring to the "decorrelation of responses" analysis in Figure 3, not the "representational drift" analysis in Figure 4. See my comments on Supplementary Figure 6 above.

      (3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher order block structure of the task were disrupted.

      In the rebuttal letter for the most recent draft, the authors refer to recent work in press (Hosmane et al. 2024) suggesting that because sleep may be important for plastic changes between sessions, they do not expect much change to be apparent within a session. However, they admit that this current study is too underpowered to know for sure--and do not cite or mention this yet unpublished work in the manuscript itself.

      As a control, I would be interested to at least know how much variance in neural responses is observed between intermediate "training" sessions with identical stimuli, e.g. between Day 1 and Day 4, but this is not possible as imaging was not performed on these days.

      Despite being referred to as "similar" I do not think early and late responses are clearly shown--aside from the histograms comparing "early traces" to "all traces" which include early traces in Figure 5B and Figure 6A. Showing variance in single-cell responses would be helpful to add in Supplementary Figure 3 and Supplementary Figure 4.

      (4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

      In the most recent draft, this concern has not been mitigated. Despite Supplementary Figure 1 showing similar FOVs, mostly different neurons were still extracted. In all other sessions, it is not reported how far apart the other recorded FOVs were from each other.

      The rebuttal comment that the PE statistic is computed on an individual cell within-session basis is reasonable. Moreover, the bootstrapped version of the PE analysis in Supplementary Figure 8 is an improvement of the main analysis in the paper. As a control, it would have been helpful to compute the stability of the PE ratio statistics between training days (e.g. between day 1 and day 4). How much change would have been observed when none is expected? Unfortunately, imaging was not performed on these training days so this analysis will not be readily possible to perform. Moreover, the PE statistic requires averaging across cells and trials and is therefore very likely to wash out many interesting effects. Even if it is the population response that is changing, why would it be the arithmetic mean that changes in particular vs. some other projection of the population activity? The experimental and analysis design of the paper here remains weak in my mind.

    3. Reviewer #2 (Public review):

      Knudstrup and colleagues investigate response to short and rapid sequences of stimuli in layer 2/3 of mouse visual cortex. To quote the authors themselves: "the work continues the recent tradition of providing ambiguous support for the idea that cortical dynamics are best described by predictive coding models". Unfortunately, the ambiguity here is largely a result of the choice of experimental design and analysis, and the data provide only incomplete support for the authors' conclusions.

      The authors have addressed some of the concerns of the first revision. However, many still remain.

      (1) From the first review: "There appears to be some confusion regarding the conceptual framing of predictive coding. Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD not just positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in 3rd. Likewise, with ACBD, there is negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation."

      The authors acknowledge in their response that this is a problem, but do not appear to discuss this in the manuscript. This should be fixed.

      (2) From the first review: "Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded. "

      The authors respond by pointing out that what they mean by "drift" is within day changes. This has been clarified. However, the analyses in Figures 3 and 5 still are done across days. Figure 3: "Experience modifies activity in PCA space ..." and figure 5: "Stimulus responses shift with training". Both rely on comparisons of population activity across days. This concern remains unchanged here. It would probably be best to remove any analysis done across days - or use data where the same neurons were tracked. Performing chronic two-photon imaging experiments without tracking the same neurons is simply bad practice (assuming one intends to do any analysis across recording sessions).

      (3) From the first revision: "The block paradigm to test for prediction errors appears ill chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also the first few presentations after block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period. This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret."

      Again, the authors acknowledge the problem and state that "there is no indication that this is a learned effect". However, they provide no evidence for this and perform no analysis to mitigate the concern.

      (4) Some of the minor comments also appear unaddressed and uncommented. E.g. the response amplitudes are still shown in "a.u." instead of dF/F or z-score or spikes.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides insights into predictive coding models of visual cortex processing. These models predict that visual cortex neurons will show elevated responses when there are unexpected changes to learned sequential stimulus patterns. This model is currently controversial, with recent publications providing conflicting evidence. In this work, the authors test two types of unexpected pattern variations in layer 2/3 of the mouse visual cortex. They show that pattern omission evokes elevated responses, in favor of a predictive coding model, but find no evidence for prediction errors with substituted patterns, which conflicts with both prior results in L4, and with the expectations of a predictive coding model. They also report that with sequence training, responses sparsify and decorrelate, but surprisingly find no changes in the ability of an ideal observer to decode stimulus identity or timing.

      These results are an important contribution to the understanding of how temporal sequences and expectations are encoded in the primary visual cortex

      Comments on revisions:

      In this revision, the authors address several of the concerns in the original manuscript. However, the primary issue, raised by all three reviewers, was the block design of the experiments. This design makes disentangling the effects of any rapid (within block) plasticity from any longer term (across days) plasticity-which nominally is the subject of the paper-extremely difficult.

      Although it may be the case that re-running the experiments with an interleaved design is beyond the scope of this paper, unfortunately, the revised manuscript still does not adequately discuss this potential confound. The authors note that stimulus A in ABCD, ABBD, and ACBD could be distinguished on day 0, indicating that within block changes do occur. In both the original and revised manuscript this finding is discussed in terms of representational drift, but the authors fail to discuss how such within block plasticity may impact their primary findings of prediction error effects.

      This remains a significant concern with the revised manuscript.

      Many of the other issues in the original manuscript have been addressed, and in these areas the revised manuscript is both clearer and more accurately reflects the presented data. The additional analyses and controls shown in the supplemental figures aid in the interpretation of the findings.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) All analyses were performed on trial-averaged neural responses that were pooled across mice. Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform at least some analyses on individual analyses to assess how behavioral training might differently affect each animal.

      In order to image at a relatively fast rate (30Hz) appropriate to the experimental conditions, we restricted our imaging to a relatively small field of view (412x412um with 512x512 pixels). This entails a smaller number of ROIs per animal, which can lead to an unbalanced distribution of cells responsive to different stimuli for individual fields-of-view. We used the common approach of pooling across animals (Homann et al., 2021; Kim et al., 2019) to overcome limitations imposed by sampling a smaller number of cells per animal. In response to this comment, we included supplemental analyses (Sup.Fig. 6) showing that representational drift (which was not performed on trial-averaged data) looks substantially the same (albeit nosier) for individual animals as at the population level. Additional analyses (PE ratio, etc.) were difficult since the distribution of cells selective for individual stimuli is unbalanced between individual animals and few mice have multiple cells representing all of the different stimuli.

      (2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

      We repeated the correlation analysis performed on mice individually and included them in the supplement (Supp. Fig. 6). The overall result generally mirrors the result found by pooling across animals.

      (3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher-order block structure of the task was disrupted.

      Our work builds primarily on previous studies (Gavornik & Bear, 2014; Price et al., 2023) that demonstrated clear changes in neural responses over days while employing a similar block structure. Notably, Price et al. found that trial number (within a block) was not a significant factor in the generation of prediction-error responses which strongly suggests short-term plasticity does not play a significant role in shaping responses within the block structure. This finding is consistent with our previous LFP recordings which have not revealed any significant plasticity occurring within a training session, a conclusion bolstered by a collaborative work currently in press (Hosmane et al. 2024, Sleep) revealing the requirement for sleep in sequence plasticity expression.

      It is possible that layer 2/3 adapts to sequences more rapidly than layer 4/5. While manual inspection does not reveal an obvious difference between early and late blocks in this dataset, the n for this subset is too small to draw firm conclusions. It is our view that the block structure provides the strongest comparison to previous work, but agree it would be interesting to randomize or fully interleave sequences in future studies to determine what effect, if any, short-term changes might have. 

      (4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

      The hypothesis being tested was whether expectation violations, as described in Keller & Mrsic-Flogel 2018, exist under a multi-day sequence learning paradigm. For this, tracking cells across days is not necessary as our PE metric compared responses of individual neurons to multiple stimuli within a single session. Given the speed/FOV tradeoff discussed above, we wanted to consider all cells irrespective of whether they were visible/active or trackable across days, especially since we would expect cells that learn to signal prediction errors to be inactive on day 0 and not selected by our segmentation algorithm. Though we did not compare the responses of single cells before/after training, we did analyze cells from the same field of view on days 0 and 5 (see Supp.Fig. 1) and not distinct populations.

      Reviewer #2:

      (1) There appears to be some confusion regarding the conceptual framing of predictive coding.

      Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD is not just for positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in the 3rd. Likewise, with ACBD, there is a negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, but more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation.

      The reviewer has identified a real problem with the framing of “positive” and “negative” prediction errors in context of sensory stimuli where substitution simultaneously introduces unexpected “positive” violation and “negative” omission. Simply put, even if there are separate mechanisms to represent positive and negative errors, there may be no way to isolate the positive response experimentally since an unexpected input always replaces the unseen expected input. For example, had a cell fired solely to ACBD (and not during either ABCD or ABCD), then whether it was signaling the unexpected occurrence of C or the unexpected absence of B would be inherently ambiguous. In either case, such a cell would have been labeled as C-responsive, and its activity would have been elevated compared with ABCD and would have been included in our substitution-type analysis of prediction errors. We accept that there is some ambiguity regarding the description in this particular case, but overall, this cell’s activity pattern would have informed the PE analysis for which the result was essentially null for the substitution-type violation ACBD.

      Omission, in which the sensory input does not change, may experimentally isolate the negative response though this is only true if there is a temporal expectation of when the change should have occurred. If A is predicting B in an ordinal sense but there is no expectation of when B will occur with respect to A, changing the duration of A would not be expected to produce an error signal since at any point in time B might still be coming and the expectation is not broken until something other than B occurs. With respect specifically to ABBD in our experiments, it is correct that the learned error responses take the form of stronger, sustained responses to B during the time C was expected. This is still in contrast to day 0 in which activation decays after a transient response to ABBD. The data shows that responses during an omitted element are altered with training and take the form of elevated responses to ABBD on day 5.As we say in our discussion, this is somewhat ambiguous evidence of prediction errors since it does emerges only with training and is generally consistent with the hypothesis being tested though it takes a different form than we expected it to.

      (2) Related to the interpretation of the findings, just because something can be described as a prediction error does not mean it is computed in (or even is relevant to) the visual cortex. To the best of our knowledge, it is still unclear where in the visual stream the responses described here are computed. It is possible that this type of computation happens before the signals reach the visual cortex, similar to mechanisms predicting moving stimuli already in the retina (https://pubmed.ncbi.nlm.nih.gov/10192333/). This would also be consistent with the authors' finding (in previous work) that single-cell recordings in V1 exhibit weaker sequence violation responses than the author's earlier work using LFP recordings.

      Our work was aimed at testing the specific hypothesis that PE responses, at the very least, exist in L2/3—a hypothesis that is well-supported under different experimental paradigms (often multisensory mismatch). Our aim was to test this idea under a sequence learning paradigm and connect it with previously found PE responses in L4. We don’t claim that it is the only place in which prediction errors may be computed or useful, especially since (as you mentioned), there is evidence for such responses in layer 4. But it is fundamentally important to predictive processing that we determine whether PE responses can be found in layer 2/3 under this passive sequence learning paradigm, whether or not they reflect upstream processes, feedback from higher areas, or entirely local computations. Our aim was to establish some baseline evidence for or against predictive processing accounts of L2/3 activity during passive exposure to visual sequences.

      (3) Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded.

      Our discussion of drift refers to changes occurring within a population of neurons over the course of a single imaging session. We have added clarifying language to the manuscript to make this clear. Changes to the population-level encoding of stimuli over days are treated separately and with different analytical tools. Re. tracking single across days, please see the response to Reviewer #1, comment 4.

      (4) The block paradigm to test for prediction errors appears ill-chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also, the first few presentations after the block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period.

      This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret.

      This work builds on previous studies that used a block structure to drive plasticity across days. We previously tested whether there are intra-block effects and found no indication of changes occurring within a block or withing a session (please see the response to Reviewer #1, comment 3 for further discussion). Observed drift does complicate comparison between blocks. There is no indication in our data that this is a learned effect, though future experiments could test this directly.

      (5) Throughout the manuscript, many of the claims are not statistically tested, and where they are the tests do not appear to be hierarchical (https://pubmed.ncbi.nlm.nih.gov/24671065/), even though the data are likely nested.

      We have modified language throughout the manuscript to be more precise about our claims. We used pooled data between mice and common parametric statistics in line with published literature. The referenced paper offers a broad critique of this approach, arguing that it increases the possibility of type 1 errors, though it is not clear to us that our experimental design carries this risk particularly since most of our results were negative. To address the specific concern, however we performed a non-parametric hierarchical bootstrap analysis (https://pmc.ncbi.nlm.nih.gov/articles/PMC7906290/) that re-confirmed the statistical significance of our positive results, see Supplemental Figure 8.

      (6) The manuscript would greatly benefit from thorough proofreading (not just in regard to figure references).

      We apologize for the errors in the manuscript. We caught the issue and passed on a corrected draft, but apparently the uncorrected draft was sent for review. The re-written manuscript addresses all identified issues.

      (7) With a sequence of stimuli that are 250ms in length each, the use of GCaMP6s appears like a very poor choice.

      We started our experiments using GCaMP6f but ultimately switched to GCaMP6s due to its improved sensitivity, brightness, and accuracy in spike detection (Huang et al., 2021). When combined with deconvolution (Pachitariu et al., 2018; Pnevmatikakis et al., 2016), we found GCaMP6s provides the most complete and accurate view of spiking within 40ms time bins. The inherent limitations of calcium imaging are more likely to be addressed using electrophysiology rather than a faster sensor in future studies.

      (8) The data shown are unnecessarily selective. E.g. it would probably be interesting to see how the average population response evolves with days. The relevant question for most prediction error interpretations would be whether there are subpopulations of neurons that selectively respond to any of the oddballs. E.g. while the authors state they "did" not identify a separate population of omission-responsive neurons, they provide no evidence for this. However, it is unclear whether the block structure of the experiments allows the authors to analyze this.

      We concluded that there is no clear dedicated subpopulation of omission-responding cells by inspecting cells with large PE responses (i.e., ABBD, see supplemental figure 3). Out of the 107 B-responsive cells on day 5, only one appeared to fire exclusively during the omitted stimulus. Average traces for all B-responsive cells are included in the supplement and we have updated the manuscript accordingly. Similarly, a single C-responsive cell was found with an apparently unique substitution error profile (ABCD and ACBD , supplemental figure 4).

      Our primary concern was to make sure that days 0 and 5 had the highest quality fields-of-view. In work leading up to this study, there were concerns that imaging on all intermediate days resulted in a degradation of quality due to photobleaching. We agree that an analysis of intermediate days would be interesting, but it was excluded due to these concerns. 

      Reviewer #3:

      (1) Experimental design using a block structure. The use of a block structure on test days (0 and 5) in which sequences were presented in 100 repetition blocks leads to several potential confounds. First, there is the potential for plasticity within blocks, which could alter the responses and induce learned expectations. The ability of the authors to clearly distinguish blocks 1 and 2 on Day 0 with a decoder suggests this change over time may be meaningful.

      Repeating the experiments with fully interleaved sequences on test days would alleviate this concern. With the existing data, the authors should compare responses from the first trials in a block to the last trials in a block.

      This block design likely also accounts for the ability of a decoder to readily distinguish stimulus A in ABCD from A in ABBD. As all ABCD sequences were run in a contiguous block separate from ABBD, the recent history of experience is different for A stimuli in ABCD versus ABBD. Running fully interleaved sequences would also address this point, and would also potentially mitigate the impact of drift over blocks (discussed below).

      As described in other responses, the block structure was chosen to align more closely with previous studies. We take the overall point though, and future studies will employ the suggested randomized or interleaved structure in addition to block structures to investigate the effects of short-term plasticity.

      (2) The computation of prediction error differs significantly for omission as opposed to substitutions, in meaningful ways the authors do not address. For omission errors, PE compares the responses of B1 and B2 within ABBD blocks. These responses are measured from the same trial, within tens of milliseconds of each other. In contrast, substitution PE is computed by comparing C in ABCD to C in ACBD. As noted above, the block structure means that these C responses were recorded in different blocks, when the state of the brain could be different. This may account for the authors' detection of prediction error for omission but not substitution. To address this, the authors should calculate PE for omission using B responses from ABCD.

      We performed the suggested analysis (i.e., ABBD vs ABCD) prior to submission but omitted it from the draft for brevity (the effect was the same as with ABBD vs ABBD). We have added the results of standardizing with ABCD as supplementary figure 3.

      (3) The behavior of responses to B and C within the trained sequence ABCD differs considerably, yet is not addressed. Responses to B in ABCD potentiate from d0-> d5, yet responses to C in the same sequence go down. This suggests there may be some difference in either the representation of B vs C or position 2 vs 3 in the sequence that may also be contributing to the appearance of prediction errors in ABBD but not ACBD. The authors do not appear to consider this point, which could potentially impact their results. Presenting different stimuli for A,B,C,D across mice would help (in the current paper B is 75 deg and C is 165 deg in all cases). Additionally, other omissions or substitutions at different sequence positions should be tested (eg ABCC or ABDC).

      We appreciate the suggestion. Ideally, we could test many different variants, but practical concerns regarding the duration of the imaging sessions prevented us from testing other interesting variations (such as ABCC) in the current study. We are uncertain as to how we should interpret the overall depressed response to element C seen on day 5, but since the effect is shared in both ABCD and ACBD, we don’t think it affected our PE calculations. 

      (4) The authors' interpretation of their PCA results is flawed. The authors write "Experience simplifies activity in principal component space". This is untrue based on their data. The variance explained by the first set of PCs does not change with training, indicating that the data is not residing in a lower dimensional ("simpler") space. Instead, the authors show that the first 5 PCs better align with their a priori expectations of the stimulus structure, but that does not mean these PCs necessarily represent more information about the stimulus (and the fact that the authors fail to see an improvement in decoding performance argues against this case). Addressing such a question would be highly interesting, but is lacking in the current manuscript. Without such analysis, referring to the PCs after training as "highly discretized" and "untangled" are largely meaningless descriptions that lack analytical support.

      We meant the terms “simpler”, “highly-discretized”, and “untangled” as qualitative descriptions of changes in covariance structure that occurred despite the maintenance of overall dimensionality. As the reviewer notes, the obvious changes in PC space appear to have had practically no effect on decodability or dimensionality, and we found this surprising and worth describing.

      (5) The authors report that activity sparsifies, yet provide only the fraction of stimulus-selective cells. Given that cell detection was automated in a manner that takes into account neural activity (using Suite2p), it is difficult to interpret these results as presented. If the authors wish to claim sparsification, they need to provide evidence that the total number of ROIs drawn on each day (the denominator for sparseness in their calculation) is unbiased. Including more (or less) ROIs can dramatically change the calculated sparseness.

      The authors mention sparsification as contributing to coding efficiency but do not test this. Training a decoder on variously sized subsets of their data on days 0 and 5 would test whether redundant information is being eliminated in the network over training.

      First, we provide evidence for sparseness using a visual responsiveness metric in addition to stimulus-selectivity. Second, it is true that Suite2p’s segmentation is informed by activity and therefore may possibly omit cells with very minimal activity. However, we detected a comparable number of cells on day 5 (n=1500) to day 0 (1368). We reportedly roughly half as many cells are stimulus-selective on day 5 compared with day 0. In order for that to have been a result of biased ROI segmentation, we would have needed to have detected closer to 2600 cells on day 5 rather than 1500.  Therefore, we consider any bias in the segmentation to have had little effect on the main findings.

      (6) The authors claim their results show representational drift, but this isn't supported in the data. Rather they show that there is some information in the structure of activity that allows a decoder to learn block ID. But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc). To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 2-5). In the absence of this or other tests of representational drift over blocks, the authors should remove the statement that "These findings suggest that there is a measurable amount of representational drift".

      “To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 25)”: This is the exact analysis that was performed. Additionally, our analysis of pairwise correlations directly measures representational drift.

      “But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc)”: We have repeated the decoder analysis using normalized population vectors (Supplementary Figure 5) which we believe directly addresses whether the observed drift is due to photobleaching or alertness that would affect the overall magnitudes of response vectors.

      Our analysis of block decoding reflects decoders trained on individual stimulus elements, and we show the average over all such decodings (we have clarified this in the text). For example, we trained a decoder on ABCD presentations from block 1 and tested only against ABCD from other blocks, which I believe is the test being suggested by the reviewer. Furthermore, we do show that representational similarity for all stimulus elements reduces gradually and more-or-less monotonically as the time between presentations increases. We believe this is a fairly straightforward test of representational drift as has been reported and used elsewhere (Deitch et al., 2021).

      (7) The authors allude to "temporal echoes" in a subheading. This term is never defined, or substantiated with analysis, and should be removed.

      We hoped the term ‘temporal echo’ would be understood in the context of rebounding activity during gray periods as supported by analysis in figure 6a. We have eliminated the wording in the updated manuscript.

    1. eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca2+ channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

    2. Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory.

      Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

    3. Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

    4. Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

    5. Author Response:

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of most critical concerns was a possible involvement of Ca2+ channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have already measured total (free plus buffered) calcium increments induced by each of first four APs in a 40 Hz train at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca2+ increments were not different each other, arguing against possible contribution of Ca2+ channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (pv) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of pv in the present study was release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      To other Reviews’ comments, we below described our point-by-point replies.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Quantal content (m) depends on n * pv, where n = RRP size and pv =vesicular release probability. The value for pv critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRPhyper) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured pv based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, pv in the present study denotes vesicular fusion probability of TS vesicles not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead the occupancy (pocc) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k1 and b1, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and pocc, among which N is a fixed parameter but pocc depends on k1/(k1+b1) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca2+ (Hosoi, et al., 2008), pocc is not a fixed parameter. Therefore, release probability should be re-defined as pocc x pv. In this regard, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in pocc rather than pv, because pv is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in pv of reluctant vesicles.

      If the Reviewer meant vesicular release or fusion probability (pv) by ‘release provability’, pv (of TS vesicles) is not a major player in STF, because the baseline pv is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that pv is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca2+-dependent step increase in pv of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that pv of TS vesicles is close to one, an increase in pv of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as pv,LS) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that pv,LS is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in pv, LS that reside in a distinct pool. Because the increase in pv,LS during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in pv,LS that occurs in parallel with STF. Strong PPD, indicative of high pv, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in pv,LS. One may argue that STF may be mediated by a drastic step increase of pv,LS from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we will incorporate these perspectives into the discussion and further clarify the reasoning behind our conclusions.

      <References>

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into pv and pocc, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= pocc x pv). Our novel finding is that the increase in release probability should be attributed to an increase in pocc, not to that in pv.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline pv is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline pocc seems to be increased by OAG.

      <Reference>

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial pv, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial pv. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial pv has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial pv, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial pv already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial pv. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on pv.

      <References>

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We will add an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the pv in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, pocc in the present study is occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at pv close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high pv, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that pv is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in pocc (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high pv too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low pv or for a gradual increase in pv of reluctant vesicles during short-term facilitation.  

      <Following statement will be added to _Discussion_ in the revised manuscript>

      Previous studies suggested that an increase in pv is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRPhyper’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high pv based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, pv in the present study indicates the fusion probability of TS vesicles. From the same reasons, pocc denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in pocc (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRPhyper.

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca2+ channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca2+ channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k1 and b1, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca2+ channel inactivation. Nevertheless, we acknowledge the possibility that Ca2+ channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca2+ transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca2+ channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca2+ channel inactivation is unlikely to contribute to the pronounced PPD.

      Author response image 1 below shows how we measured the total Ca2+ increments at axonal boutons. First we estimated endogenous Ca2+-binding ratio from analyses of single AP-induced Ca2+ transients at different concentrations of Ca2+ indicator dye (panels A to E). And then, using the Ca2+ buffer properties, we converted free [Ca2+] amplitudes to total calcium increments for the first four AP-evoked Ca2+ transients in a 40 Hz train (panels G-I). We will incorporate these results into the revised version of reviewed preprint to provide evidence against the Ca2+ channel inactivation.

      Author response image 1.

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in pfusion of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (pr) within a single active zone rather than a heterogeneity in pfusion of individual docked vesicles. Therefore both pocc and pv of TS vesicles would contribute to the pr distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high pv contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low pv vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we will incorporate the perspectives into Discussion and further clarify the reasoning behind our conclusions.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the reviewer suggested, low external Ca2+ concentration can lower release probability (pr). Given that both pv and pocc are regulated by [Ca2+]i, low external [Ca2+] may affect not only pv but also pocc, both of which would contribute to low pr. Under such conditions, it would be plausible that the baseline pr becomes much lower than 0.1 due to low pv and pocc (for instance, pv decreases from 1 to 0.5, and pocc from 0.3 to 0.1, then pr = 0.05), and then pr (= pv x pocc) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca2+] accumulates during a train.

      If pv is close to one, pr depends pocc, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca2+ binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca2+-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca2+-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      <Reference>

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k1 / (k1 + b1) and (k1 + b1), respectively. The baseline occupancy depends on k1/b1, while the PPR recovery on absolute values of k1 and b1. Based on pocc and PPR recovery time constant of WT and KD synapses, we expect higher k1/b1 but lower values for (k1 +b1) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of Ms, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      <Reference>

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline pocc and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

    1. eLife Assessment

      This important work uses an innovative approach to understand similarities between haemodynamic and electrophysiological activity of the human brain. The study provides incomplete evidence to indicate that while similar functional brain networks are used in both modalities, there is a tendency for these multi-modal networks to spatially converge at synchronous rather than asynchronous time points. This work will be of interest to neurophysiological and brain imaging researchers.

    2. Reviewer #1 (Public review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar networks configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous timepoints. However, my confidence in the findings and their interpretation is undermined by an incomplete justification for the expected outcomes for each of the proposed scenarios.

      Main Concern

      Fig 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.:

      - Scenario1. Temporally convergent, same trajectories through connectome state space<br /> - Scenario2. Temporally divergent, different trajectories through connectome state space

      However, based on my understanding (and apologies if I am mistaken), I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in fig 2C, or the statements in the main text, i.e.:

      - For scenario1, "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"<br /> - For scenario2, "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in fig2C from? They do not seem to be fully justified via previous literature, theory, or simulations?

      In particular, I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points. It seems to me that you would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenario 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, the authors could use surrogate data for scenario 1 and 2. For example:

      a. For scenario 1, run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).<br /> b. For scenario 2, run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simple splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      The authors have provided CRP plots for option a. It shows a CRP, as expected, consistent with scenario 1. This is a useful sanity check. However, as mentioned above, it does not ensure that all CRPs under this scenario will look like this.

      However, the authors have not shown a CRP as per option b. As such, there is an incomplete justification for the expected outcomes of the scenarios.

      Note that another option, which has not been carried out, is to use full simulations, with clearly specified assumptions, for scenario1 and 2. One way of doing this is to use a simplified (state-space) setup where you randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      Using this, you can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities<br /> b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      Minor Concern

      Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." It is great that the authors provide both. However, given that FC in EEG is almost totally dominated by spatial leakage (see Hipp paper), the main results/figures for the scalp EEG should be done using spatial leakage corrected EEG data.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar network configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous time points. However, my confidence in the findings and their interpretation is undermined by an apparent lack of justification for the expected outcomes for each of the proposed scenarios, and in the analysis pipeline itself.

      Main Concerns

      (1) Figure 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.

      Scenario 1: Temporally convergent, same trajectories through connectome state space

      Scenario 2: Temporally divergent, different trajectories through connectome state space

      However, based on my understanding I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in Figure 2C, or the statements in the main text:

      For Scenario 1: "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      For Scenario 2: "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in Figure 2C from? Are they based on previous literature, theory, or simulations?

      I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points? It seems to me that one would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless, a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenarios 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, a couple of ways I can think of to gain insight into this key aspect are:

      (1) Use surrogate data for scenarios 1 and 2:

      a. For scenario 1: Run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      b. For scenario 2: Run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simply splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      (2) Do simulations, with clearly specified assumptions, for scenarios 1 and 2. One way of doing this is to use a simplified (state-space) setup and randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      The authors would only need to worry about simulating the network activation time courses, i.e. they would not need to bother with specifying the spatial configuration of each network, instead, they would make the implied assumption that each of these networks has the same spatial configuration in modality 1 and modality 2.

      With that assumption, the CRP calculation should simply correspond to calculating, at each time i in modality 1 and time j in modality 2, the number of networks that are activating in both modality 1 and modality 2, by using their activation time courses. Using this, one can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We thank the reviewer for raising this important matter as it directly relates to our study hypothesis. To address this point, we chose to focus on the first of the two alternative suggestions of the reviewer, as it provides evidence based on empirical data. In line with the reviewer’s suggestion 1, recurrence plots have indeed been previously applied to connectome dynamics data from the same modality [Hansen et al., NeuroImage 2015; Fig. 2B]. As shown in the referenced study, where the recurrence plot has been estimated within fMRI connectome dynamics, the on-diagonal entries have noticeably larger correlation values in comparison to off-diagonal entries. As the authors state, this contrast emphasizes the autocorrelation of connectome dynamics in their single modality recurrence plot. Extending these findings to our cross-modal recurrence plots, more synchronicity of connectome dynamics across fMRI and EEG will -by theory- translate into stronger correlation values along the diagonal axis as it represents neighboring timepoints in the data. On the other hand, less cross-modal synchronicity translates to a lack of such correlation prevalence along the diagonal axis.

      Complementing these statements with empirical data, Author response image 1 shows the fMRI-to-iEEG and fMRI-to-fMRI CRPs side by side as suggested by the reviewer. For simplicity, we thresholded each CRP at the top 5% of entries and calculated their corresponding on-/off-diagonal ratios. The on/off-diagonal ratio for fMRI-to-fMRI CRP was 4.32 ± 6.26 across -5 to +5 TR lags (with a maximum of 16.56 at a lag of one TR), while this value was 1.00 ± 0.31 for fMRI-to-iEEG CRP. Thus, it becomes apparent that synchronicity of connectome dynamics directly translates to the on-/off-diagonal ratio in CRP.

      Author response image 1.

      Sample CRP shown for a subject for comparing two cases: fMRI-to-iEEG (left) and fMRI-to-fMRI (right). The comparison shows that in the presence of genuine synchronous connectome dynamics, as expected for the within-molality case (right panel), the on-/off-diagonal ratio is expected to show noticeably higher values. This figure establishes a strong link between our proposed metric of on-/off-diagonal ratio and the extent of synchronicity of connectome dynamics.

      Author response image 2

      On-/off-diagonal ratio in the fMRI-to-fMRI recurrence plot is considerably higher than the cross-modal fMRI-to-iEEG case. Horizontal axis shows the lag where the metric was calculated in the CRP. The bars reflect the group average metric while the whickers show standard deviation. Note that for the within-modality case, ratio is not defined at lag zero because of identical connectome frames.

      (2) Choices in the analysis pipeline leading up to the computation of FC in fMRI or EEG will affect the quality of information available in the FC. For example, but not only, the choice of parcellation (in the study, the number of parcels is very high given the number of EEG sensors). I think it is important that we see the impact of the chosen pipeline on the time-averaged connectomes, an output that the field has some idea about what is sensible. This would give confidence that the information being used in the main analyses in the paper is based on a sensible footing and relates to what the field is used to thinking about in terms of FC. This should be trivial to compute, as it is just a case of averaging the time-varying FCs being used for the CRP over all time points. Admittedly, this approach is less useful for the intracranial EEG.

      We agree with the reviewer on ensuring that the time-averaged FC aligns with expectations of the field and prior work. For this reason, our supplementary analysis already included an analysis that replicates the well-established (albeit modest) spatial similarity between fMRI static connectome and EEG/iEEG static connectomes:

      “In scalp EEG-fMRI data, cross-modal spatial (2D) Pearson correlation of group-level time-averaged connectomes between fMRI and EEG-FCAmp or fMRI and EEG-FCPhase were calculated across all frequency bands. The average spatial correlation value across frequency bands r = 0.28 and r = 0.28 for EEG-FCAmp and EEG-FCPhase, respectively. The spatial correlation values across all frequency bands and connectivity measures were significantly higher than the corresponding null distributions generated by phase-permuted group-level fMRI-FC spatial organization (p<0.005; 200 repetitions; FDR-corrected at q<0.05 for the number of frequency bands). …. Of note, the small effect sizes are strongly in line with prior literature (Hipp and Siegel, 2015; Wirsich et al., 2017; Betzel et al., 2019) and may point to possible divergence in the dynamic domain as investigated in the main manuscript.”

      This replication directly confirms the validity of our selected atlas for further investigations into the connectome dynamics. We acknowledge that with 64 EEG channels, one can only estimate a relatively coarse connectome. Among the well-known coarse atlases, we chose the Desikan-Killiany atlas as it is based on anatomical features, eliminating possible biases towards a particular functional data modality. Moreover, this atlas has been commonly used for multimodal functional connectivity studies, facilitating the confirmation of prior findings in the time-averaged domain [Deligianni et al. Front. Neurosci 2104, Wirsich et al. NeuroImage, 2020, Wirsich et al., NeuroImage 2021].

      (3) Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." Given that FC in EEG is dominated by spatial leakage (see Hipp paper), then I cannot see how it can be justified to look at non-spatial leakage correction results at all, let alone put them up front as the main results. All main results/figures for the scalp EEG should be done using spatial leakage-corrected EEG data.

      We agree that relying on leakage-uncorrected scalp EEG alone would be problematic. It is for this reason that the intracranial data constructs the core of our results, emphasizing that the observed multiplex architecture of connectomes is indeed present in the absence of source leakage. Only when this finding is established in the intracranial EEG, do we provide the scalp EEG data as a generalization to whole-cortex coverage connectomes of healthy subjects. Moreover, it is known that existing source-leakage correction algorithms may inadvertently remove some of the genuine zero-lag connectivity. For instance, Finger and colleagues have shown that the similarity of functional connectivity to structural connectivity diminishes after correction for source-leakage (Finger et. al, PLOS Comp. Biol. 2016). Therefore, we have deliberately chosen to include our generalization findings before source-leakage correction (main text) as well as after source-leakage correction reflecting a more stringent approach (supplementary analysis). Importantly, our conclusions hold true for both before and after source-leakage correction.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the time points were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involve a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      Weaknesses:

      Despite the impressive work, the paper still lacks some analyses to make it complete.

      Firstly, the effect of the window size is unclear, especially in the case of different frequencies where the number of cycles that fall in a window will vary drastically. A typical oscillation lasts just a few cycles (see Myrov et al., 2024), and brain states are usually short-lived because of meta-stability (see Roberts et al., 2019).

      We now replicate our results with an additional window size. Please see section “Recommendations for the authors”.

      Secondly, the authors didn't examine frequencies lower than 1Hz despite similarities between fMRI and infra-slow oscillations found in prior literature (see Palva et al., 2014; Zhang et al., 2023).

      We address this issue below. Please see section “Recommendations for the authors”.

      On a minor note, the phase-locking value (PLV) is positively biased for EEG data (see Palva et al., 2018) and a different metric for phase coupling could be a more appropriate choice (e.g., iPLV/wPLI, see Vinck et al., 2011).

      While iPLV and wPLI are not positively biased, they may reduce genuine zero-phase connectivity as they were initially designed to address spurious zero-phase connectivity from source leakage in scalp EEG. Indeed, PLV connectivity is shown to be more strongly correlated with structural connectivity than wPLI and other phase coupling methods [Finger et al., PLOS Comp. Biol. 2016], emphasizing that it contains genuine connectivity that may be lacking when zero-phase connectivity is removed. We chose PLV because it is a widely used functional connectivity metric, particularly in intracranial data where source leakage is not a critical concern. Thus, using PLV facilitates cross-study comparisons including to our prior work [e.g. Mostame et al. NeuroImage 2020, Mostame et al. J Neurosci 2021].

      The repository with the code is also unavailable.

      Thank you for bringing this to our attention. We have now made our repository publicly accessible at: https://github.com/connectlab/Mostame2024_Multiplex_iEEG_fMRI.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The window widths used to compute FC as a function of time are an important aspect, so I feel that this should be briefly described up-front in the main Results text.

      Methods. "Finally, to compensate for the time lag between hemodynamic and neural responses of the brain (Logothetis et al., 2001), we shifted the fMRI-FC time course 6 seconds backwards in time." What about the effects of temporal blurring from the HRF? Do we need to care about that?

      We agree with the importance to investigate the effect if temporal blurring of the HRF. The main text already included a replication of findings from CRPs generated using fMRI data and EEG amplitude signals convolved with the canonical HRF. This method serves as an alternative to the 6-second shifting. Both approaches produced similar results.

      Methods. In fMRI connectome computation it is common to look at partial correlation rather than full correlation. Partial correlation focuses more on direct connections. It would be good if the paper acknowledged and justified why it is OK to use full correlation.

      We have now added a brief explanation in this regard in the main text (Methods section) as follows:

      “In fMRI connectome computation, some prior work has used partial correlation instead of full correlation. Partial correlation emphasizes direct connections by calculating correlation between any pair of bran regions after regressing out the timeseries of all other regions. However, we have opted to use full correlation because this permits interpretation of our outcomes in the context of the vast existing literature that uses full correlations in fMRI including the majority of bimodal (EEG-fMRI) connectome studies (e.g. Tagliazucchi et al., 2012; Deligianni et al., 2014; Wirsich et al., 2017b, 2020, 2021; Allen et al., 2018).”

      The paper should relate the results to findings showing clear links between simultaneously recorded EEG and fMRI beyond FC. E.g. Mantini (PNAS) 2007 and Van De Ville (PNAS) 2010 to name two.

      In line with this important point, we have extended the existing discussion section that compares our outcomes to EEG-fMRI beyond functional connectivity:

      “Prior multi-modal studies of neural dynamics have predominantly aimed at methodologically cross-validating hemodynamic and electrophysiological observations, thus focusing on their convergence. These important foundational studies include e.g., the cross-modal comparison of region-wise (Mukamel et al., 2005; Nir et al., 2007) or ICN-wise (Mantini et al., 2007) activity fluctuations, instantaneous activity maps (Hunyadi et al., 2019; Zhang et al., 2020) or EEG microstates (Van de Ville 2010), infraslow connectome states (Abreu et al., 2020), or connection-wise FC including studies in the iEEG-fMRI and scalp EEG-fMRI data used in the current study (Ridley et al., 2017; and Wirsich et al., 2020, respectively). In contrast to this prior work, the current study investigated the highly time-resolved cross-modal temporal relationship at the level of FC patterns distributed over all available pairwise connections, and found a connectome-level temporal divergence. The discrepancy between temporal divergence in our study and convergence in prior studies implies that infraslow fluctuations of activity in individual regions or of FC in individual region-pairs observable in both modalities (prior studies) are neurally distinct from connectome-wide FC dynamics observable separately in each modality (current study). Indeed, we confirmed the existence of infraslow electrophysiological FC dynamics driving cross-modal temporal associations at the level of individual connections (Fig. S3) …”

      Reviewer #2 (Recommendations For The Authors):

      (1) Check different window sizes and stability of the FC patterns as a function of it.

      We thank the reviewer for the helpful feedback. We agree that the window size could possibly affect the estimation of individual connectome frames, particularly given that neural processes unfold at hundreds of milliseconds rather than seconds. However, we expect that the asynchronous nature of cross-modal convergence observed in our data would remain intact regardless of the specific window length used for FC calculations. To confirm this, we replicated some of our main analyses in the iEEG-fMRI data with a window length of 500ms (as opposed to 3s, equivalent to one TR) as follows:

      First, we showed that changing the window length does not substantially impact the overall architecture of the connectomes (Author response image 3). Particularly, the time-averaged connectome patterns across different frequency bands were all strongly correlated between the two analyses (500ms and 3s window lengths).

      Author response image 3.

      Time-averaged connectome patterns are highly replicable when calculated using 3s or 500ms window lengths. Horizontal axis represents frequency bands, while each dot represents a subject. Vertical axis shows 2D Pearson correlation of the two connectomes. The group average within each frequency band is marked by a horizontal line.

      Second, we replicated our major findings of CRP and its on-/off-diagonal ratio in the iEEG-fMRI dataset using a window length of 500ms for FC calculations. Indeed, the data does not show a substantial difference in the on-/off-diagonal ratios of the CRP entries between the 3s and 500ms window lengths. Specifically, the ratio was equal to 1.02 ± 0.07 for 500ms window length, emphasizing absence of significant temporal convergence of the connectome dynamics (see Author response image 4). A paired t-test between group-averaged ratios across different lags confirms a lack of significant difference between the two analyses (p= 0.50). This finding further emphasizes the genuine asynchronous nature of connectome dynamics across the neural timescales measured in fMRI and electrophysiology. We have added this analysis to the supplementary data.

      Author response image 4.

      On-/off-diagonal ratio is shown across lags for both analyses: 3s window length (blue) and 500ms window length (red). Each bar shows the mean across subjects, while the whiskers show the corresponding standard deviations.

      (2) Try to decrease the lowest frequency of the analysis below 1Hz or just compute it for multiple log-spaced frequencies from infra-slow delta to high-gamma band.

      Thank you for pointing out this matter. We do not expect considerable signal in the frequency range below the current lower bound of delta (1Hz) because as in most other EEG recordings, EEG was not recorded in DC setting and has a hardware high-pass filter of 0.1Hz. Nonetheless, we investigated the power spectral density of our iEEG-fMRI data and found that there is indeed little signal power left in the available infraslow range [0.5 – 1 Hz] after the preprocessing steps (Author response image 5).

      Author response image 5.

      Power spectral density of all subjects in the fMRI-iEEG dataset shows lack of sufficient power in the infraslow range. Infraslow range signals are almost always filtered out during recording unless the recording setup includes a DC amplifier. The infraslow signal of EEG that is often considered correlated with the fMRI signals in the literature most commonly are extracted from the slow-changing envelope of the bandlimited signals, like envelope of gamma oscillations.

      Accordingly, when the iEEG signals are filtered within the range of [0.5, 1], there is little signal variation observed in the signal timeseries, contrasting the adjacent delta band signal (Author response image 6). Importantly, the power envelope of the delta band (and all other canonical bands not shown here) comprise major fluctuations in the infraslow range, as expected. We would like to emphasize that the existing studies addressing infraslow EEG signal dynamics typically consider the infraslow envelope fluctuations of band-limited signals in traditional frequency bands [e.g. Nir et. al, Nat Neurosci 2008] rather than direct recordings in the infraslow frequency range. Investigating HRF-convolved EEG signals similarly captures the infraslow characteristics of the timeseries [e.g. Mantini et al. PNAS 2007, Sadaghiani et al., J Neurosci 2010] (and note that HRF-convolved analyses are included as supplementary investigation in the current study). To the best of our knowledge, very few studies have investigated direct infraslow EEG signals using DC EEG, and we are aware of only two DC-EEG studies with concurrent fMRI [Hiltunen et al., J Neurosci 2014, Grooms et al., Brain Connectivity 2017]. The infraslow correlates of fMRI in electrophysiological signals reported in prior work therefore reflect the slow changes in faster activity or connectivity of traditional frequency bands, which is indeed already included in the current study.

      Author response image 6.

      Sample timeseries of the iEEG signal of the nine subjects (nine rows) for a 400 second interval. Blue signals show the bandlimited delta with its envelope shown as darker blue. The red signal represents the infraslow signal component left in the data, which is much lower in power.

    1. eLife Assessment

      This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is solid.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.

      Strengths:

      The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.

      Strengths:

      The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.

      In the 1st revision, the authors added a simulation study to show how the performance of the classification based on z-diff scores relatively changes with different disruptions (and autocorrelation). Unfortunately, in my view this is insufficient as it only shows how the performance of using z-diff score relatively changes in different scenarios. I would suggest adding the comparison of performance to using the naïve difference in two simple z-scores to first show its better performance, which should also further highlight the inappropriate use of simple z-scores in inferring within-subject longitudinal changes.

      Thank you for the suggestion for additional comparison, which we have now implemented in the simulated methods comparison, see Figure 2 and the extended text of Section 2.1.4 Simulation study.

      Specifically, we have revised the simulation section to not only illustrate the performance of our z-diff method under various scenarios but also to include a direct comparison with a naïve approach that subtracts two z-scores.

      The updated results demonstrate that, compared to the naïve method, the z-diff score consistently maintains a fixed false-positive rate, making it a more robust and controllable approach. Additionally, we show that under conditions of high autocorrelation, the z-diff method is significantly more sensitive in detecting smaller changes than the subtraction method. Importantly, our analysis of a sample from our dataset indicates that high autocorrelation is a prevalent characteristic in real-world data, further supporting the utility of the z-diff method.

      We believe that these findings strengthen the case for adopting the z-diff method and underscore the limitations of more intuitive approaches, which, while simple, lack mathematical rigour.

      Additionally, Figure 1 is hard to read and obtain the actual values of the performance measure. I would suggest reducing it to several 2-dimensional figures. For example, for several fixed values of rho, how the performance changes with different values of the true disruption (and also adding the comparison to the naïve method (difference in two z-scores)).

      We believe that the Reviewer meant Figure 2; indeed, the 3-dimensional visualization, while attractive to some, may have been difficult to read, so we have now replaced it with several 2-dimensional figures as requested.

      I would also suggest changing the title to reflect that the evaluation of "intra-subject" longitudinal change is the method's focus.

      Thanks for the suggestion. We have now implemented it by changing the title to Using normative models pre-trained on cross-sectional data to evaluate intra-individual longitudinal changes in neuroimaging data.

      We hope the changes implemented fulfill the expectations of the Reviewer.

    1. eLife Assessment

      Yonk and colleagues provide a valuable, timely, and in-depth study showcasing the role of thalamostriatal inputs in learning and action selection. After characterizing the synaptic properties of these inputs onto different striatal cell types in vitro, they provide solid evidence that posterior medial thalamic nucleus (POm) terminals in striatum are activated during reward expectation and arousal. The overall function of this pathway and the degree to which results are confounded by viral contamination of surrounding nuclei and movements remain open questions.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims at understanding the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow to conclude if more neurones are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity in average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neurons or if new neurons are progressively recruited in the process.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figure 1 and 2 are very clean and consistent with POm injection.

      Comment after review: The weaknesses remain as concerns have not been addressed. The dataset is interesting but the interpretation, due partly to the lack of control (especially relative to VPM contamination), is difficult.

    3. Reviewer #2 (Public review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      Comments on revisions:

      The revision has a lot of thoughtful discussion added. I think overall the paper is more thorough and will also be a nice set up for a number of future research questions.

    4. Reviewer #3 (Public review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analysis of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      Comments on revisions:

      The authors have only tweaked the Discussion and not necessarily in ways that addressed our previous comments. They could have fairly easily analyzed the effect of distance of recording from injection site and compared subsets of data depending on contamination beyond PO (my comments 1 and 2) or effects of movements (3 and 4). Minimally, they could have given caveats in the Results and Discussion about these, and I would strongly encourage them to be explicit about the caveats. The analyses would probably be better.

      The suggestion that the effects have something to do with priming (5), seems a grasp for function of the circuit.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work aims to understand the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task, they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow us to conclude if more neurons are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      We thank the reviewer for these constructive comments. The points are addressed below.  

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity on average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      This is true of the parafascicular (Pf) thalamic nucleus, which has been well studied in this context. However, there is much less known about the striatal projections of other thalamic nuclei, including POm, and their inputs to cholinergic neurons. Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little of no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here would be an important area of further study.

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      Tracing of single POm axons by Ohno et al. (2012) indicated that POm axons form a branched collateral that innervates striatum, while the main axon continues in the rostral-dorsal direction to innervate cortex. We think it is reasonable, based on the morphology, that our optogenetic suppression experiment restricted the suppression of glutamate release to this branch and avoided the other branches of the axon that project to cortex. However, testing this would require monitoring S1 activity during the POm-striatal axon suppression, which we did not do in this study.

      It is a very interesting question whether there could be different axon activity behavior in striatum versus S1. There is surprising evidence that POm synaptic terminals are different sizes in S1 and M1 and show different synaptic physiological properties depending on these cortical projection targets (Casas-Torremocha et al., 2022). Based on this, it is possible that POm-striatal synapses show distinct properties compared to cortex; however, this will need to be tested in future work.

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neuron or if new neurons are progressively recruited in the process.

      This is a good point. It would be necessary to perform chronic two-photon imaging of POm neurons (or chronic electrophysiological recordings) to determine whether the activity of individual neurons increased versus whether individual neuron activity levels remained similar but new neurons became active with learning. Even under baseline conditions, it is not known in detail what fraction of the population of POm neurons is active during sensory processing or behavior, highlighting how much is still to be discovered in this exciting area of neuroscience.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figures 1 and 2 are very clean and consistent with POm injection.

      Although this image was selected to demonstrate the position of the POm injection site and optical fiber implant above striatal axons, the reviewer is correct that there appears to be mixed labeling of axons in L4 and L5a. In some cases, there was expression slightly outside the border of POm (see Fig. 1B, right), which might explain the cortical innervation pattern in this figure. While cortically bound VPM axons pass through the striatum, they do not form synaptic terminals until reaching the cortex (Hunnicutt et al., 2016). If, as may be the case, inhibitory opsins suppress release of neurotransmitter at synaptic terminals more effectively than action potential propagation in axons, it may be likely that optogenetic suppression of POm-striatal terminals is more effective than suppression of action potentials in off-target-labelled VPM axons of passage. Ideally, we could compare effects of suppression of POm-striatal synapses with POm-cortical synapses and VPM-cortical synapses, but this was outside the bandwidth of the present study.

      Reviewer #2 (Public Review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in the striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to the striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      We are very interested to understand the potentially distinct learning-related synaptic and circuit changes that potentially occur at the POm synapses with D1- and D2-SPNs and PV interneurons, and other striatal cell types. We agree that this would be an important topic for further investigation.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      We appreciate this point and agree that higher N can be important for observing robust effects. A factor of our experiments that helped reduce the number of animals used was the longitudinal design, with repeated measures in the same subjects. This allowed for the internal control of comparing learning effects in the same subject from naïve to expert stages and therefore increased robustness. Even with relatively small group sizes, results were statistically significant, suggesting that the use of more mice was unnecessary, which we considered consistent with best practice in the use of animals in research. We also note that our group sizes were consistent with other studies in the field.  

      Reviewer #3 (Public Review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in-vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analyses of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by the thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact, the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      We thank the reviewer for these constructive comments. The points are addressed below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Do POm neurons innervate CINs also? The connection between the PF thalamus and CINs is mentioned in a couple of places - one question is how unique are the input patterns for the POm versus adjacent sensorimotor thalamic regions, including the PF? This isn't a weakness per se but knowing the answer to that question would help in forming a more complete picture of how these different thalamostriatal circuits do or do not contribute uniquely to learning and action selection.

      Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little or no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing.

      Another difference between Pf and other thalamic nuclei (likely including POm) comes from anatomical tracing evidence (Smith et al., 2014; PMID: 24523677) which indicates that Pf inputs form the majority of their synapses onto dendritic shafts of SPNs, while other thalamic nuclei form synapses onto dendritic spines. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here, including ChAT neurons and subcellular localization, would be an important area of further study.

      It would be useful to know to what extent these POm-striatum neurons are activated generally during movement, versus this discrimination task specifically.

      We agree that distinguishing general movement-related activity from task-specific activity would be very useful. Earlier work (Petty et al., 2021) showed a close relationship between POm neuron activity, spontaneous (task-free) whisker movements, and pupil-indexed arousal in head-restrained mice. Oram et al. (2024; PMID: 39003286) recently recorded VPM and POm in freely moving mice during natural movements, finding that activity of both nuclei correlated with head and whisker movements. These studies indicate that POm is generally coactive with exploratory head and whisker movements.

      During task performance, the situation may change with training and attentional effects. For example, Petty and Bruno (2024) (https://elifesciences.org/reviewed-preprints/97188) showed that POm activity correlates more closely with task demands than tactile or visual stimulus modality. Our data indicate that POm axonal signals are increased at trial start during anticipation of tactile stimulus delivery and through the sensory discrimination period, then decrease to baseline levels during licking and water reward collection (Fig. 3). Results of Petty and Bruno (2024) together with ours suggest that POm is particularly active during the context of behaviorally relevant task performance. Thus, we think it is likely that, while pupil dilation indexes general movement and arousal, POm activity is more specific to movement and arousal associated with task engagement and behavioral performance. We have strengthened this point in the Discussion.

      Many of the data panels and text for legends/axes are quite small, and the stroke on line art is quite faint - overall figures could be improved from a readability standpoint.

      We thank the reviewer for their careful attention to the figures. 

      Reviewer #3 (Recommendations For The Authors):

      Major

      (1) Page 4, the Results regarding PSP and distance from injection site. The r-squared is the wrong thing to look at to test for a relationship. One should look at the p-value on the coefficient corresponding to the slope. The p-value is probably significant given the figures, in which case there may be a relationship contrary to what is stated. All the low r-squared value says is that, if there is a relationship, it does not explain a lot of the PSP variability.

      We thank the reviewer for alerting us this oversight. We have included the p value (p = 0.0293) in the figure and legend, and indicated that the relationship is “small but significant”.

      (2) Figure 1B suggests that the virus injections extend beyond POm and into other thalamic structures. Do any of the results change if the injections contaminating other nuclei are excluded from the analysis? I am not suggesting the authors change the figures/analyses. I am simply suggesting they double-check.

      We selected for injections that were predominantly expressing in POm as determined by post-hoc histological analysis (see Fig. 1, right). As above, we think that axons of passage that do not form striatal synapses are less likely to be suppressed than axons with terminals; however, this would need to be determined in further experiments. Because the preponderance of expression is within POm, we think the results would be similar even with a stricter selection criterion. 

      (3) The authors conclude that POm and licking are not correlated (bottom of page 6 pertaining to Figures 3A-F). The danger of these analyses is that they assume that GCaMP8 is a perfect linear reporter of POm spikes. The reliability of GCaMP8 has been quantified in some cell types, but not thalamic neurons, which have relatively higher firing rates.

      The reviewer is correct that the relationship between GCaMP8 fluorescence changes and spiking has not been sufficiently characterized in thalamic neurons, and that this would be important to do.

      What if the indicator is simply saturated late into the trial (after the average reaction time)? It would look like there is no response and one would conclude no correlation, but there could be a very strong correlation.

      While saturation is worthy of concern, the signal dynamics here argue against this possibility. The reason is that the signal increased in the early part of the trial and decreased by the end. If saturation was an issue, this would have been apparent during the initial increase. When the signal decreased in amplitude at the end of the trial, this indicates that the signal is not saturated because it is returning from a point closer to its maximum (and is becoming less saturated).

      Also, what happens between trials? Are the correlations the same, stronger, weaker? Ideally, the authors would analyze the data during and between trials.

      Between trials the signal did not show further changes in baseline beyond what was displayed at the start and end of behavioral trials. There were no consistent increases or decreases in signals between trials, except perhaps during strong whisking bouts. This is anecdotal because we did not analyze between-trial data. However, it is interesting and important to note that signals increased dramatically in amplitude from naïve, early learning to expert behavioral performance (Fig. 3), highlighting that POm-axonal signals relate to behavioral engagement and performance rather than spontaneous behaviors.  

      (4) Axonal activity could also appear more correlated with the pupil than licking because pupil dynamics are slow like the dynamics of calcium indicators. These kernels could artificially inflate the correlation. Ideally, the authors could consider these temporal effects. Perhaps they could deconvolve the temporal profiles of calcium and pupil before correlating? Or equivalently incorporate the profiles into their analysis?

      We analyzed the lick probability histograms, which had a temporal profile similar to the calcium signals (Fig. 3D,E), ruling out concerns about effects of temporal effects on correlations. It is also worth noting that we observed changes in correlations between calcium signals and pupil with learning stage (Fig. 3I), even though the temporal profiles (signal dynamics) are not changing. Thus, temporal effects of the signals themselves are not the driver of correlations, but rather the changes in relative timing between calcium signals and pupil, as occur with learning.

      (5) The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing. The data here support the first part. It is not clear that the data support the second part, largely because it is vague what "priming" of sensory processing or "a key role in the initial stages of action selection (p.9) even means here. Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? Some conceptual proposals from the authors, even if speculative and not offered as a conclusion, would be helpful.

      We appreciate these good points and have added further consideration and revision of the concept of priming and potential roles in an extensively revised Discussion section.

      (6) The photometry shows that PO turns on about 2 seconds before the texture presentation. PO's activity seems locked to the auditory cue, not the texture (Figure 2). This means that the attempt to suppress the thalamostriatal pathway with JAWS (Figure 4) is rather late, isn't it? Some PO signals surely go through. This seems to contradict the idea of priming above. It would be good if the authors could factor this into their narrative. Perhaps labelling the time of the auditory cue in Figure 4C would also be helpful.

      The start of texture presentation (movement of the texture panel toward the mouse) and auditory cue occur at the same time. To clarify this, we added a label “start tone” in Figure 4C and also in Figure 2C.

      For optogenetic (JAWS) suppression, we intentionally chose a time window between start tone onset and texture presentation, because our photometry experiments showed that this was when the preponderance of the signal occurred. However, the reviewer is correct that our chosen optogenetic suppression (JAWS) onset occurs shortly after the photometry signal has already started, potentially leaving the early photometry signal un-suppressed. Our motivation for choosing a restricted time window surrounding the texture presentation time was 1) to minimize illumination and potential heating of brain tissue; 2) to target a time window that avoids the auditory cue but covers stimulus presentation. We did not want to extend the duration of the suppression to before the trial started, because this could produce task-non-specific effects, such as distraction or loss of attention before the start of the trial.

      Even if some signal were getting through before suppression, we don’t think this contradicts the possibility of ‘priming’, because the process underlying priming would still be disrupted even if not totally suppressed. This would alter the temporal relationship between POm-striatal inputs and further corticostriatal inputs (from S1 and M1 cortex, for example). We have included further consideration of these points and possible relation to the priming concept in the Discussion.

      Minor

      (1) Page 5, "the sensitivity metric is artificially increased". What do you mean "artificially"? The mice are discriminating better. It is true that either a change in HR or FAR can cause the sensitivity metric to change, but there is nothing artificial or misleading about this.

      We removed the word artificial and clarified our definition of behaviorally Expert in this context:

      “Mice were considered Expert once they had reached ≥ 0.80 Hit Rate and ≤ 0.30 FA Rate for two consecutive sessions in lieu of a strict sensitivity (d’) threshold; we found this definition more intuitive because d’ is enhanced as Hit Rate and FA Rate approach their extremes (0 or 1)”

      (2) Page 7, "Upon segmentation (Figure S4G-J)". Do you mean "segregation by trial outcome"?

      Corrected.

      (3) Page 9, "POm projections may have discrete target-specific functions, such that POm-striatal inputs may play a distinct role in sensorimotor behavior compared to POm-cortical inputs". Would POm-cortical inputs not also be sensorimotor? The somatosensory cortex contains a lot of corticostriatal cells. It also has various direct and indirect links to the motor cortex as well.

      We have clarified the wording here to convey the possibility that POm signals could be received and processed differently by striatal versus cortical circuitry, and have moved this statement to later in the discussion for better elaboration.

      (4) The Methods state that male and female mice were used. Why not say how many of each and whether or not there are any sex-specific differences?

      We added the following information to the Methods:

      The number of male and female mice were as follows, by experiment type: 6 male, 4 female (electrophysiology); 3 male, 2 female (fiber photometry); 4 male, 5 female (optogenetics). Data were not analyzed for sex differences.

    1. eLife Assessment

      Somatostatin-expressing neurons of the entopeduncular nucleus (EPNSst+) provide a limbic output of the basal ganglia and co-release GABA and Glutamate in their projection to the lateral habenula, a structure that is key for reward-based learning. Combining fiber photometry and computational modeling, the authors provide compelling evidence that EPNSst+ neural activity represents movement, choice direction and reward outcomes in a probabilistic switching task but, surprisingly, neither chronic genetic silencing of these neurons nor selectively elimination glutamate release affected behavioral performance in well-trained animals. This valuable study shows that despite its representation of key task variables, EPNSst+ neurons are dispensable for ongoing performance in a task requiring outcome monitoring to optimize reward. This work will be of interest to those interested in neural circuits, learning, and/or decision making.

    2. Reviewer #1 (Public review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modelling to demonstrate that EPNSst+ neural activity represents movement, choice direction and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modelling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results. To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Fig 1 F-J, Supp 1 F,K etc i.e via inclusion of faint hash lines connecting individual data points across variables. Additionally, Fig 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms. Finally, the authors claimed calcium activity increased following ipsilateral movements. However, figure 3C clearly shows that both SXcontra and SXisi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point? Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high. Vs low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results are striking. However, this relies on co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected. Alternative means of glutamate packaging (other VGluT isoforms, other transporters etc) could also compensate for the partial absence of VGluT2, which should be discussed. The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitations is transparently discussed by the authors.

      Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduces generalizability and to some extent the impact of their findings.

      Comments on the latest version:

      Specifically, they have included more thorough analyses to address several concerns related to interpreting activity patterns of EPSst+ neurons. The authors clearly point out that calcium activity increased during ipsilateral movements, and the increase was statistically larger during the choice phase (Figure 2 supplement 1F-G), indicating that these neurons may represent movement and additional factors (e.g. executive decision-making). Correspondingly, we appreciate the thorough explanation of using a GLM model to determine which behavioural variables contribute to observed physiological signals and adding the example reconstructed signal with direction and reward variables omitted in Figure 3 supplements 1 and 2.

      Although no new manipulation experiment is added to the manuscript, the authors respond to common critiques related to testing the behavioural effect after the manipulations in well-trained mice. The discussion related to technical limitations, possible compensatory mechanisms and alternative interpretations is thorough and overall satisfying. Based on the behaviour modeling results, the authors speculate that animals need to integrate more evidence from the past to guide choice in a more uncertain environment (70/30 version), instead of adopting a 'win-stay, lose-shift' strategy in the more deterministic 90/10 version. The authors expand the discussion, but the possibility that EPNSst+ neurons contribute to task performance in well-trained animals under uncertainty is not directly tested. Along with other alternative explanations discussed in the manuscript, we think the paper is valuable literature for future studies to understand the basal ganglia circuits in learning and decision-making.

    3. Reviewer #2 (Public review):

      Summary:

      This paper aimed to determine the role EP sst+ neurons play in a probabilistic switching task.

      Strengths:

      - The in vivo recording of the EP sst+ neurons activity in the task is one of the strongest parts of this paper. Previous work had recorded from the EP-LHb population in rodents and primates in head fixed configurations, the recordings of this population in a freely moving context is a valuable addition to these studies and has highlighted more clearly that these neurons respond both at the time of choice and outcome.

      - The use of a refined intersectional technique to record specifically the EP sst+ neurons is also an important strength of the paper. This is because previous work has shown that there are two genetically different types of glutamatergic EP neurons that project to the LHb. Previous work had not distinguished between these types in their recordings so the current results showing that the bidirectional value signaling is present in the EP sst+ population is valuable.

      Weaknesses:

      - One of the main weaknesses of the paper is to do with how the effect of the EP sst+ neurons on the behavior was assessed.

      o All the manipulations (blocking synaptic release and blocking glutamatergic transmission) are chronic and more importantly the mice are given weeks of training after the manipulation before the behavioral effect is assessed. This means that as the authors point out in their discussion the mice will have time to adjust to the behavioral manipulation and compensate for the manipulations. The results do show that mice can adapt to these chronic manipulations and that the EP sst+ are not required to perform the task. What is unclear is whether the mice have compensated for the loss of EP sst+ neurons and whether they play a role in the task under normal conditions. Acute manipulations or chronic manipulations without additional training would be needed to assess this.

      o Another weakness is that the effect of the manipulations was assessed in the 90/10 contingency version of the task. Under these contingencies, mice integrate past outcomes over fewer trials to determine their choice and animals act closer to a simple win-stay-lose switch strategy. Due to this it is unclear if the EP sst+ neurons would play a role in the task when they must integrate over a larger number of conditions in the less deterministic 70/30 version of the task. Indeed it is not clear that lesioning any other regions involved in evaluation of action outcomes such as VTA dopamine neurons, that encode reward prediction errors, would have any deficit when assessed in this way. Due to this, it's not clear if the mice have adapted to solve the task without evaluating action outcomes at all and are just acting in a more deterministic lose switch manner that would not presumably involve any of the circuitry in evaluating action outcomes.

      - The authors conclude that they do not see any evidence for bidirectional prediction errors. It is not possible to conclude this. First, they see a large response in the EP sst+ neurons to the omission of an expected reward. This is what would be expected of a negative reward prediction error. There are much more specific well controlled tests for this that are commonplace in head-fixed and freely moving paradigms that could be tested to probe this. The authors do look at the effect of previous trials on the response and do not see strong consistent results, but this is not a strong formal test of what would be expected of a prediction error, either a positive or negative. The other way they assess this is by looking at the size of the responses in different recording sessions with different reward contingencies. They claim that the size of the reward expectation and prediction error should scale with the different reward probabilities. If all the reward probabilities were present in the same session this should be true as lots of others have shown for RPE. Because however this data was taken from different sessions it is not expected that the responses should scale, this is because reward prediction errors have been shown to adaptively scale to cover the range of values on offer (Tobler et al., Science 2005). A better test of positive prediction error would be to give a larger than expected reward on a subset of trials. Either way there is already evidence that responses reflect a negative prediction error in their data and more specific tests would be needed to formally rule in or out prediction error coding especially as previous recordings have shown it is present in previous primate and rodent recordings.

      - There are a lot of variables in the GLM that occur extremely close in time such as the entry and exit of a port. If two variables occur closely in time and are always correlated it will be difficult if not impossible for a regression model to assign weights accurately to each event. This is not a large issue, but it is misleading to have regression kernels for port entry and exits unless the authors can show these are separable due to behavioral jitter or a lack of correlation under specific conditions, which does not seem to be the case.

    4. Reviewer #3 (Public review):

      Summary:

      The authors find that Sst-EPN neurons, which project to the lateral habenula, encode information about response directionality (left vs right) and outcome (rewarded vs unrewarded). Surprisingly, chronic impairment of vesicular signaling in these neurons onto their LHb targets did not impair probabilistic choice behavior.

      Strengths:

      Strengths of the current work include extremely detailed and thorough analysis of data at all levels, not only of the physiological data, but also an uncommonly thorough analysis of behavioral response patterns.

      Weaknesses:

      In this revised manuscript, the authors have addressed my earlier critiques.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modeling to demonstrate that EPNSst+ neural activity represents movement, choice direction, and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modeling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results.

      (1) To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Figures 1 F-J, Supplementary Figures 1 F, K, etc i.e via the inclusion of faint hash lines connecting individual data points across variables.

      Thank you for the suggestion. The same mouse is now represented in Fig 1 and Fig 1—Figure Supplement 1 as a darkened circle so it can be followed across different panels. Photometry from this mouse was used as sample date in Figure 2b and Figure 2—figure supplement 1a-b.

      (2) Additionally, Figure 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms.

      We have now added analyses and reconstructed photometry signals from GLMs excluding important predictors in Figure 3—figure supplement 1 and 2. We use the model where both “Direction and reward” were omitted as predictors for the GLM and showed photometry reconstructions aligned to behavioral events used for the full model (Figure 3—figure supplement 1) and partial model (Figure 3—figure supplement 2) to compare model performance.  

      (3) Finally, the authors claimed calcium activity increased following ipsilateral movements. However, Figure 3C clearly shows that both SXcontra and SXipsi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point?

      We observe that calcium activity increases during ipsilateral choices as the animal moves toward the ipsilateral side port (e.g. CX<sub>ipsi</sub> to SE<sub>ipsi</sub>; Fig 2C and Fig 3C). The animal also makes other ipsiversive movements not during the “choice” phase of a trial such as when it is returning to the center port following a contralateral choice (e.g. SX<sub>Contra</sub> to CE; Fig 2—figure supplement 1F and Fig 3C). We also observe an increase in calcium activity during these ipsiversive movements (e.g. SX<sub>Contra</sub> to CE), but they are not as large as those observed during the choice phase (Fig 2—figure supplement 1G). Therefore, during the choice phase of a trial, activity contains signals related to ipsilateral movement and additional factors (e.g. executive decision making).    

      (4) Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high vs. low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      Unfortunately, neither licks nor locomotion were recorded during the behavioral sessions when photometry was recorded. In Figure 2—figure supplement 1a we now show individual trials sorted by trial duration (time elapsed between CE and SE) to illustrate the dynamics of the photometry signal on fast vs slow trials within a session.  

      (5) There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results is striking. However, this relies on the co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected.

      For the Tet-tox experiments in Figure 4 we estimate approximately 70±15% of EP<sup>Sst+</sup> neurons expressed Tet-tox based on our histological counts and published stereological counts in EP (Miyamoto and Fukuda, 2015). It is true that channelrhodopsin expression will not overlap 100% with cells infected by the other virus, indeed our in vitro synaptic physiology shows small residual postsynaptic currents following optogenetic stimulation either from incomplete blockade of synaptic release or neurons that expressed channelrhodopsin but not Tettx (Figure 4—figure supplement 1J-K). The same is shown for CRISPR mediated deletion of Slc17a6 (Fig 5 – Fig supplement 1J-K).  

      (6) Alternative means of glutamate packaging (other VGluT isoforms, other transporters, etc) could also compensate for the partial absence of VGluT2, which should be discussed.

      While single cell sequencing (Wallace et al, 2017) has shown EP<sup>Sst+</sup> neurons do not express Slc17a7/8 (vGlut1 or vGlut3) it is possible that these genes could be upregulated following CRISPR mediated deletion of Slc17a6, however we do not see evidence of this with our in vitro synaptic physiology (EPSCs are significant suppressed, Figure 5 – Fig supplement 1J-K) and therefore can conclude it is highly unlikely to occur to a significant degree in our experiments. This is now included in the Discussion.

      (7) The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well-learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitation is transparently discussed by the authors.

      As mentioned above, it is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      (8) Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization, it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      In a previous study we confirmed that EP<sup>Sst+</sup> neurons project exclusively to the LHb using cell-type specific rabies infection and examining all reported downstream regions for axon collaterals (Wallace et al 2017, Suppl. Fig 6F-G). When EP<sup>Sst+</sup> neurons were labeled we did not observe axon collaterals in known targets of EP such as ventro-antero lateral thalamus, red nucleus, parafasicular nucleus of the thalamus, or the pedunculopontine tegmental nucleus, only in the LHb. Additionally, using single cell tracing techniques, others have shown EP neurons that exclusively project to the LHb (Parent et al, 2001).

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduce generalizability and to some extent the impact of their findings.

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to determine the role EP sst+ neurons play in a probabilistic switching task.

      Strengths:

      The in vivo recording of the EP sst+ neuron activity in the task is one of the strongest parts of this paper. Previous work had recorded from the EP-LHb population in rodents and primates in head-fixed configurations, the recordings of this population in a freely moving context is a valuable addition to these studies and has highlighted more clearly that these neurons respond both at the time of choice and outcome.

      The use of a refined intersectional technique to record specifically the EP sst+ neurons is also an important strength of the paper. This is because previous work has shown that there are two genetically different types of glutamatergic EP neurons that project to the LHb. Previous work had not distinguished between these types in their recordings so the current results showing that the bidirectional value signaling is present in the EP sst+ population is valuable.

      Weaknesses:

      (1) One of the main weaknesses of the paper is to do with how the effect of the EP sst+ neurons on the behavior was assessed.

      (a) All the manipulations (blocking synaptic release and blocking glutamatergic transmission) are chronic and more importantly the mice are given weeks of training after the manipulation before the behavioral effect is assessed. This means that as the authors point out in their discussion the mice will have time to adjust to the behavioral manipulation and compensate for the manipulations. The results do show that mice can adapt to these chronic manipulations and that the EP sst+ are not required to perform the task. What is unclear is whether the mice have compensated for the loss of EP sst+ neurons and whether they play a role in the task under normal conditions. Acute manipulations or chronic manipulations without additional training would be needed to assess this.

      Unfortunately, when mice are given a three week break from behavioral training (the time required to allow for adequate viral expression) behavioral performance on the task (p(highport), p(switch), trial number, trial time, etc.) is significantly degraded. Animals do eventually recover to previous performance levels, but this takes place during a 4-5 day “relearning” period. Here we sought to examine if EP<sup>Sst+</sup> neurons are required for continued task performance and chose to continue to train the animals following viral injection to avoid the “relearning” period that occurs following an extended break from behavioral training which may have made it difficult to interpret changes in behavioral performance due to the viral manipulation vs relearning.  

      Acute manipulations were not used because we planned to compare complete synaptic ablation (Tettx) and single neurotransmitter ablation (CRISPR Slc17a6) over similar time courses and we know of no acute manipulation that could achieve single neurotransmitter ablation. 

      (b) Another weakness is that the effect of the manipulations was assessed in the 90/10 contingency version of the task. Under these contingencies, mice integrate past outcomes over fewer trials to determine their choice and animals act closer to a simple win-stay-lose switch strategy. Due to this, it is unclear if the EP sst+ neurons would play a role in the task when they must integrate over a larger number of conditions in the less deterministic 70/30 version of the task.

      It is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      The authors show an intriguing result that the EP sst+ neurons are excited when mice make an ipsilateral movement in the task either toward or away from the center port. This is referred to as a choice response, but it could be a movement response or related to the predicted value of a specific action. Recordings while mice perform movement outside the task or well-controlled value manipulations within the session would be needed to really refine what these responses are related to.

      If activity of EP<sup>Sst+</sup> neurons included a predicted value component, we would expect to see a change in activity during ipsilateral movements when the previous trial was rewarded vs unrewarded. This is examined in Fig 2—figure suppl. 2C, where we compare EP<sup>Sst+</sup> responses during ipsilateral trials when the previous trials were either rewarded (blue) or unrewarded (gray). We show that EP<sup>Sst+</sup> activity prior to side port entry (SE) is identical in these two trial types indicating that EP<sup>Sst+</sup> neurons do not show evidence of predicted value of an action in this context. Therefore, we conclude that increased EP<sup>Sst+</sup> activity during ipsilateral trials is primarily related to ipsilateral movement following CX (we call this the “choice” phase of the trial). We also show that other ipsiversive movements outside of the “choice” phase of a trial (such as the return to center port following a contralateral trial) show a smaller but significant increase in activity (Figure 2—figure supplement 1F-G). Therefore, whereas the activity observed during ipsilateral choice contains signals related to ipsilateral movement and additional factors, our data suggest that predicted value is not one of those factors. We will clarify this point and our definition of “choice” in the narrative.  

      (2) The authors conclude that they do not see any evidence for bidirectional prediction errors. It is not possible to conclude this. First, they see a large response in the EP sst+ neurons to the omission of an expected reward. This is what would be expected of a negative reward prediction error. There are much more specific well-controlled tests for this that are commonplace in head-fixed and freely moving paradigms that could be tested to probe this. The authors do look at the effect of previous trials on the response and do not see strong consistent results, but this is not a strong formal test of what would be expected of a prediction error, either a positive or negative. The other way they assess this is by looking at the size of the responses in different recording sessions with different reward contingencies. They claim that the size of the reward expectation and prediction error should scale with the different reward probabilities. If all the reward probabilities were present in the same session this should be true as lots of others have shown for RPE. Because however this data was taken from different sessions it is not expected that the responses should scale, this is because reward prediction errors have been shown to adaptively scale to cover the range of values on offer (Tobler et al., Science 2005). A better test of positive prediction error would be to give a larger-than-expected reward on a subset of trials. Either way, there is already evidence that responses reflect a negative prediction error in their data and more specific tests would be needed to formally rule in or out prediction error coding especially as previous recordings have shown it is present in previous primate and rodent recordings.

      We do not conclude that we see no evidence for RPE and the reviewer is correct in stating that a large increase in EP<sup>Sst+</sup> activity following omission of an expected reward would be expected of a negative reward prediction error. However, this observation alone is not strong enough evidence that EP<sup>Sst+</sup> neurons signal RPE. When we looked for additional evidence of RPE within our experiments we did not find consistent demonstrations of its existence in our data. When performing photometry measurements of dopamine release in the striatum, RPE signals are readily observed with a task identical to ours using trial history to as a modifier of reward prediction (Chantranupong, et al 2023). Of course, there could be a weaker more heterogeneous RPE signal in EP<sup>Sst+</sup> neurons that we cannot detect with our methods. As we state in the discussion, RPE signals may be present in a subset of individual neurons (as observed in Stephenson-Jones et al, 2016 and Hong and Hikosaka, 2008) which are below our detection threshold using fiber photometry. Additionally, Hong and Hikosaka, 2008 show that LHb-projecting GPi neurons show both positive and negative reward modulations which may obscure observation of RPE signals with photometry recordings that arise from population activity of genetically defined neurons.   

      (3) There are a lot of variables in the GLM that occur extremely close in time such as the entry and exit of a port. If two variables occur closely in time and are always correlated it will be difficult if not impossible for a regression model to assign weights accurately to each event. This is not a large issue, but it is misleading to have regression kernels for port entry and exits unless the authors can show these are separable due to behavioral jitter or a lack of correlation under specific conditions, which does not seem to be the case.

      It is true that two variables that are always correlated are redundant in a GLM. For example, center entry (CE) and center exit (CX) occur in quick succession in most trials and are highly correlated (Figure 1C). For this reason, when only one is removed as a predictor from the model but not the other there is a very small change in the MSE of the fit (Figure 3E, -CE or -CX). However, when both are removed model performance decreases further indicating that center-port nose-pokes do contribute to model performance (Figure 3E, -CE/CX). Due to the presence/absence of reward following side port entry there is substantial behavioral jitter (due to water consumption in rewarded trials) that the SE and SX are not always correlated, therefore the model performs worse when either are omitted alone, but even worse still when both SE/SX are omitted together (Figure 3E, -SE/SX). We will update Figure 3 and the narrative to make this more explicit.

      Reviewer #3 (Public Review):

      Summary:

      The authors find that Sst-EPN neurons, which project to the lateral habenula, encode information about response directionality (left vs right) and outcome (rewarded vs unrewarded). Surprisingly, impairment of vesicular signaling in these neurons onto their LHb targets did not impair probabilistic choice behavior.

      Strengths:

      Strengths of the current work include extremely detailed and thorough analysis of data at all levels, not only of the physiological data but also an uncommonly thorough analysis of behavioral response patterns.

      Weaknesses:

      Overall, I saw very few weaknesses, with only two issues, both of which should be possible to address without new experiments:

      (1) The authors note that the neural response difference between rewarded and unrewarded trials is not an RPE, as it is not affected by reward probability. However, the authors also show the neural difference is partly driven by the rapid motoric withdrawal from the port. Since there is also a response component that remains different apart from this motoric difference (Figure 2, Supplementary Figure 1E), it seems this is what needs to be analyzed with respect to reward probability, to truly determine whether there is no RPE component. Was this done?

      We thank the reviewer for this comment, we believe this is particularly important for unrewarded trials as SE and SX occur in rapid succession. In Figure 2—figure supplement 2A-B we now show the photometry signal from Rewarded and Unrewarded ipsilateral trials aligned to SX for different reward probabilities. We quantify the signals for different reward probabilities during a 500ms window immediately prior to SX but find no differences between groups.  

      (2) The current study reaches very different conclusions than a 2016 study by Stephenson-Jones and colleagues despite using a similar behavioral task to study the same Sst-EPN-LHb circuit. This is potentially very interesting, and the new findings likely shed important light on how this circuit really works. Hence, I would have liked to hear more of the authors' thoughts about possible explanations of the differences. I acknowledge that a full answer might not be possible, but in-depth elaboration would help the reader put the current findings in the context of the earlier work, and give a better sense of what work still needs to be done in the future to fully understand this circuit.

      For example, the authors suggest that the Sst-EPN-LHb circuit might be involved in initial learning, but play less of a role in well-trained animals, thereby explaining the lack of observed behavioral effect. However, it is my understanding that the probabilistic switching task forces animals to continually update learned contingencies, rendering this explanation somewhat less persuasive, at least not without further elaboration (e.g. maybe the authors think it plays a role before the animals learn to switch?).

      Also, as I understand it, the 2016 study used manipulations that likely impaired phasic activity patterns, e.g. precisely timed optogenetic activation/inhibition, and/or deletion of GABA/glutamate receptors. In contrast, the current study's manipulations - blockade of vesicle release using tetanus toxin or deletion of VGlut2, would likely have blocked both phasic and tonic activity patterns. Do the authors think this factor, or any others they are aware of, could be relevant?

      We have added further discussion of the Stephenson-Jones, et al 2016 study as well as the Lazaridis, et al 2019 study which shows no effect of phasic stimulation of EP when specifically manipulating EP<sup>Sst+</sup> (vGat+/vGlut2+) neurons rather than vGlut2+ neurons as in the Stephenson-Jones study.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In some places, there seems to be a mismatch between referenced figures and texts. For example:

      (1) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (2) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 - Figure Supplement 1d). I presume that they mean Figure 2 - Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Minor suggestions related to specific figure presentation are below:

      Figure 2 and supplement figures:

      (1) Figure 2B: the authors may consider presenting outcome-related signals recorded from all trials, including both ipsilateral and contralateral events, and align signals to SE when reward consumption presumably begins, rather than aligning to CE.

      We have added sample recordings from ipsilateral and contralateral trials and sorted them by trial duration to allow for clearer presentation of activity following CE and SE (Figure 2—figure supplement 1a-b).

      (2) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (3) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 -Figure Supplement 1d). I presume that they mean Figure 2 -Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Figure 3 and supplement figures:

      (1) Figure 3C-F: it is hard to compare the amplitude of calcium signals between different behaviour events without a uniform y-axis.

      The scale for the y-axis on Figure 3C-D is uniform for all panels. Figure 3E is also uniform for all boxplots. The reviewer may be referring to Figure 2C-F, but the y-axis for all of the photometry data is uniform for all panels and the horizontal line represents zero. The y-axis for the quantification on the right of each panel is scaled to the max/min for each comparison.

      (2) Figure 3E is difficult to follow. The authors explained that the 'SE' variable is generated by collapsing the ipsilateral and contralateral port entries, and hence the variable has no choice of direction information. I assumed that the 'SX', 'CE', and 'CX' variables are generated similarly. It is not clear if this is the case for the 'side', 'centre' and 'choice' variables. The authors explained that 'omitting center port entry/exit together or individually also resulted in decreased GLM performance but to a smaller degree than the omission of choice direction (Figure 3e, "-Center")'. My understanding is that they created the Centre variable by collapsing ipsilateral and contralateral centre port entry/exit together. The Centre variable should have no choice of direction information. How is the Center variable generated differently from omitting centre port entry/exit together? I would ask the authors to explain the model and different variables a bit more thoroughly in the text.

      We apologize for the confusion. All ten variables used to train the full GLM are listed in Fig. 3C. In Figure 3E variable(s) were omitted to test how they contributed to GLM performance (data labeled “None” is the full model with all variables). Omitted variables are now defined as follows: -Rew = Rew+Unrew removed, -Direction = Ipsi/Contra designation removed and collapsed into CE, CX, SE, SX, -Direction & Rew = Ipsi/Contra info removed from all variables + Rew/Unrew removed, -CE/CX = Ipsi/Contra CE and CX removed, -CE = Ipsi/contra CE removed, -CX = Ipsi/contra CX removed, -SE/SX = Ipsi/Contra SE and SX removed, -SE = Ipsi/contra SE removed, -SX = Ipsi/contra SX removed. This clarification has also been added to the Generalized Linear Model section of Materials and Methods.

      Figure 5 and supplement figures:

      There are no representative and summary figures show the specificity and efficiency of oChief-tdTomato or Tetx-GFP expression. Body weight changes following virus injection are not well described.

      A representative image of Tettx GFP expression are shown in Fig. 4A and percent of infected EP<sup>Sst+</sup> neurons is described in the text (70±15.1% (mean±SD), 1070±230 neurons/animal, n=6 mice). Most oChief-tdTom animals were used for post-hoc electrophysiology experiments and careful quantification of viral expression was not possible. However, Slc17a6 deletion was confirmed in these animals (Fig. 5 – Fig supplement 1J-K) to confirm the manipulation was effective in the experimental group. A representative image of oChief-tdTom expression is shown in Fig. 5A.

      We now mention the body weight changes observed following Tettx injection in the narrative.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the RFLR section you state that "this variable decays...", a variable can't decay only the value of a variable can change. Also, it is not mentioned what variable is being discussed. There are lots of variables in the model so this should be made clear.

      We now state, “This variable (β) changes over trials and is updated with new evidence from each new trial’s choice and outcome with an additional bias towards or away from its most recent choice (Figure 1-figure supplement 2A-C).”

      (2) I couldn't find in the results section, or the methods section the details for the Tet tx experiments, were mice trained and tested on 90/10 only? Were they trained while the virus was expressing etc? This should be added.

      In the methods section we state, ”For experiments where we manipulated synaptic release in EP<sup>Sst+</sup> neurons (Figures 4-5) we trained mice (reward probabilities 90/10, no transparent barrier present) to the following criteria for the 5 days prior to virus injection: 1) p(highport) per session was greater than or equal to 0.80 with a variance less than 0.003, 2) p(switch) per session was less than or equal to 0.15 with a variance less than 0.001, 3) the p(left port) was between 0.45-0.55 with a variance less than 0.005, and 4) the animal performed at least 200 trials in a session. The mean and variance for these measurements was calculated across the five session immediately preceding surgery. The criterion were determined by comparing performance profiles in separate animals and chosen based on when animals first showed stable and plateaued behavioral performance. Following surgery, mice were allowed to recover for 3 days and then continued to train for 3 weeks during viral expression. Data collected during the 5 day pre-surgery period was then compared to data collected for 10 sessions following the 3 weeks allotted for viral expression (i.e. days 22-31 post-surgery).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The kernel in Figure 3C shows an activation prior to CE on "contra" trials that is not apparent in Figure 2C which shows no activation prior to CE on either contra or ipsi trials. Given that movement directionality prior to CE is dictated by the choice on the PREVIOUS trial, is the "contra" condition in 3C actually based on the previous trial? If so, this should be clarified.

      On most “contra” trials the animal is making an ipsiversive movement just prior to CE as it returns to the center from the contralateral side-port (as most trials are no “switch” trials). Therefore, an increase in activity is expected and shown most clearly following SX for contralateral trials in Fig 2 –Fig suppl 1F. A significant increase in activity prior to CE on contra trials compared to ipsi trials can also be seen in Fig 2C, its just not as large a change as the increase observed following CE for ipsi. trials. The comparison between activity observed during the two types of ipsiversive movements is now shown directly in Figure 2—figure supplement 1G.

      (2) Paragraph 7 of the discussion uses a phrase "by-in-large", which probably should be "by and large".

      Thank you for the correction.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      Sex breakdown has been added to figure legends for each experiment, full statistical reporting is now also include in the figure legends.

    1. eLife Assessment

      Cav2 voltage-gated calcium channels play key roles in regulating synaptic strength and plasticity. In contrast to mammals, invertebrates like Drosophila encode a single Cav2 channel, raising questions on how diversity in Cav2 is achieved from a single gene. Here, the authors present solid evidence that two alternatively spliced Cac isoforms enable important changes in Cav2 expression, localization, and function in synaptic transmission and plasticity at the Drosophila neuromuscular junction. How the isoforms affect synaptic calcium channel levels remains less clear. This study provides insights into the roles of voltage-gated calcium channel splice isoforms in synaptic transmission.

    2. Reviewer #2 (Public review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones. Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There remain some questions regarding the mechanisms underlying the changes in Cac localization to somatodendritic compartments. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

    3. Reviewer #3 (Public review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      Comments on revisions:

      The authors addressed most points. However, from my point of view, the new data (somatodendritic cac currents in adult motoneurons of IS4B mutants without the pre-pulse, and localization of IS4A channels in the larval brain) do not strongly support that the IS4B exon is required for cacophony localization. According to their definition of localization, IS4B is required for cacophony channels to enter motoneuron boutons and to localize to active zones. In case of a true localization defect (without degradation, as they claim), IS4A channels should mislocalize to the soma, axon, or dendrite. However, they do not find them in motoneurons of IS4B mutants. Furthermore, I find the interpretation of the voltage clamp data in flight motoneurons rather difficult. On the one hand, sustained HVA cac currents are strongly attenuated/absent in IS4B mutants. On the other hand, total cac currents (without the -50 mV pre-pulse, not shown in the original submission) are less affected in IS4B mutants. Based on these data, they conclude that IS4B is required for sustained HVA cac currents and that IS4A channel isoforms are expressed and functional. How does this relate to a localization defect at the NMJ? Finally, detecting IS4A channels in other cell types and regions is not a strong argument for a localization defect at the NMJ. I, therefore, suggest toning down the conclusions regarding a localization defect in IS4B mutants/a role for the IS4B exon in cac localization. It should be also discussed how a splice isoform in S4 may result in no detectable cac channels at the NMJ or regulate subcellular channel localization.

      I have a few additional points, mainly related to the responses to my previous points:

      (1) The authors state "active zones at the NMJ contain only cac isoforms with the IS4B exon. Nevertheless, the small representative EPSC remaining in IS4B mutants suggests that there is synchronous release in the absence of IS4B (Fig. 3B). Are the small EPSCs in dIS4B (Fig. 3B) indeed different from noise/indicative of evoked release? If yes, which cac channels may drive these EPSCs? IS4A channels?<br /> (2) (Related to previous point 4) The authors argue that EPSC amplitudes are not statistically different between Canton S and IS4A mutants (Fig. 2F). However, the Canton S group appears undersampled, thus precluding conclusions based on statistics. Moreover, the effect size Canton S vs. dIS4A looks similar to the one of Canton S vs. dIS4A/dIS4B.<br /> (3) (Related to previous point 11): Can they cite a paper relating calcium channel inactivation to EPSC half width/decay kinetics to support their speculation that "decreased EPSC half width could be caused by significantly faster channel inactivation kinetics" (p. 42, l.42). In addition, many papers have demonstrated that mini decay kinetics provide valuable insights into GluR subunit composition at the Drosophila NMJ (e.g., Schmid et al., 2008 https://doi.org/10.1038/nn.2122). Thus, the statement "Mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells" is not valid and should be revised.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Bell et. al. describes an analysis of the effects of removing one of two mutually exclusive splice exons at two distinct sites in the Drosophila CaV2 calcium channel Cacophony (Cac). The authors perform imaging and electrophysiology, along with some behavioral analysis of larval locomotion, to determine whether these alternatively spliced variants have the potential to diversify Cac function in presynaptic output at larval neuromuscular junctions. The author provided valuable insights into how alternative splicing at two sites in the calcium channel alters its function.

      Strengths:

      The authors find that both of the second alternatively spliced exons (I-IIA and I-IIB) that are found in the intracellular loop between the 1st and second set of transmembrane domains can support Cac function. However, loss of the I-IIB isoform (predicted to alter potential beta subunit interactions) results in 50% fewer channels at active zones and a decrease in neurotransmitter release and the ability to support presynaptic homeostatic potentiation. Overall, the study provides new insights into Cac diversity at two alternatively spliced sites within the protein, adding to our understanding of how regulation of presynaptic calcium channel function can be regulated by splicing.

      Weaknesses:

      The authors find that one splice isoform (IS4B) in the first S4 voltage sensor is essential for the protein's function in promoting neurotransmitter release, while the other isoform (IS4A) is dispensable. The authors conclude that IS4B is required to localize Cac channels to active zones. However, I find it more likely that IS4B is required for channel stability and leads to the protein being degraded, rather than any effect on active zone localization. More analysis would be required to establish that as the mechanism for the unique requirement for IS4B.

      (1) We thank the reviewer for this important point. In fact, all three reviewers raised the same question, and the reviewing editor pointed out that caution or additional experiments were required to distinguish between IS4 splicing being important for cac channel localization versus channel stability/degradation. We provide multiple sets of experiments as well as text and figure revisions to strengthen our claim that the IS4B exon is required for cacophony channels to enter motoneuron presynaptic boutons and localize to active zones.

      a. If IS4B was indeed required for cac channel stability (and not for localization to active zones) IS4A channels should be instable wherever they are. This is not the case because we have recorded somatodendritic cacophony currents from IS4A expressing adult motoneurons that were devoid of cac channels with the IS4B exon. Therefore, IS4A cac channels are not instable but underlie somatodendritic voltage dependent calcium currents in these motoneurons. These new data are now shown in the revised figure 3C and referred to in the text on page 7, line 42 to page 8 line 9.

      b. Similarly, if IS4B was required for channel stability, it should not be present anywhere in the nervous system. We tested this by immunohistochemistry for GFP tagged IS4A channels in the larval CNS. Although IS4A channels are sparsely expressed, which is consistent with low expression levels seen in the Western blots (Fig. 1E), there are always defined and reproducible patterns of IS4A label in the larval brain lobes as well as in the anterior part of the VNC. This again shows that the absence of IS4A from presynaptic active zones is not caused by channel instability, because the channel is expressed in other parts of the nervous system. These data are shown in the new supplementary figure 1 and referred to in the text on page 15, lines 3 to 8.

      c. As suggested in a similar context by reviewers 1 and 2, we now show enlargements of the presence of IS4B channels in presynaptic active zones as well as enlargements of the absence of IS4A channels in presynaptic active zones in the revised figures 2A-C and 3A. In these images, no IS4A label is detectable in active zones or anywhere else throughout the axon terminals, thus indicating that IS4B is required for expressing cac channels in the axon terminal boutons and localizing it to active zones. Text and figure legends have been adjusted accordingly.

      d. Related to this, reviewer 1 also recommended to quantify the IS4A and ISB4 channel intensity and co-localization with the active zone marker brp (recommendation for authors). After following the reviewers’ suggestion to adjust the background values in IS4A and IS4B immunolabels to identical (revised Figs. 2A-C), it becomes obvious that IS4A channel are not detectable above background in presynaptic terminals or active zones, thus intensity is close to zero. We still calculated the Pearsons co-localization coefficient for both IS4 variants with the active zone marker brp. For IS4B channels the Pearson’s correlation coefficient is control like, just above 0.6, whereas for IS4A channels we do not find colocalization with brp (Pearson’s below 0.25). These new analyses are now shown in the revised figure 2D and referred to on page 6, lines 33 to 38.

      e. Consistent with our finding that IS4B is required for cac channel localization to presynaptic active zones, upon removal of IS4B we find no evoked synaptic transmission (Fig. 2 in initial submission, now Fig. 3B).

      Together these data are in line with a unique requirement of IS4B at presynaptic active zones (not excluding additional functions of IS4B), whereas IS4A containing cac isoforms are not found in presynaptic active zones and mediate different functions.

      Reviewer #2 (Public Review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones.

      Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There are some outstanding questions regarding the mechanisms underlying the changes in Cac localization and function, and some additional suggestions are listed below for the authors to consider in strengthening this study. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

      (2) We believe that the additional data on cac IS4A isoform localization and function as detailed above (response to public review 1) has strengthened the manuscript and answered some of the remaining questions the reviewer refers to. We are also grateful for the specific additional reviewer suggestions which we have addressed point-by-point and refer to below (section recommendations for authors).

      Reviewer #3 (Public Review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      (1) Based on the data shown in Figures 2A-C, and 2H, it is difficult to judge the localization of the cac isoforms. Could they analyze cac localization with regard to Brp localization (similar to Figure 3; the term "co-localization" should be avoided for confocal data), as well as cac and Brp fluorescence intensity in the different genotypes for the experiments shown in Figure 2 and 3 (Brp intensity appears lower in the dI-IIA example shown in Figure 3G)? Furthermore, heterozygous dIS4B imaging data (Figure 2C) should be quantified and compared to heterozygous cacsfGFP/+.

      According to the reviewer’s suggestion, we have quantified cac localization relative to brp localization by computing the Pearson’s correlation coefficient for controls and IS4A as well as IS4B animals. These new data are shown in the revised Fig. 2D and referred to on page 6, lines 33-38. Furthermore, we now confirm control-like Pearson’s correlation coefficients for all exon out variants except ΔIS4B and show Pearson’s correlation coefficients for all genotypes side-by-side in the revised Fig. 4D (legend has been adjusted accordingly). In addition, in response to the recommendations to authors, we now provide selective enlargements for the co-labeling of Brp and each exon out variant in the revised figures 2-4. We have also adjusted the background in Fig. 2C (ΔIS4B) to match that in Figs. 2A and B (control and ΔIS4A). This allows a fair comparison of cac intensities following excision of IS4B versus excision of IS4A and control (see also Fig 3). Together, this demonstrates the absence of IS4A label in presynaptic active zones much clearer. As suggested, we have also quantified brp puncta intensity on m6/7 across homozygous exon excision mutants and found no differences (this is now stated for IS4A/IS4B in the results text on page 6, lines 37/38 and for I-IIA/I-IIB on page 8, lines 42-44.). We did not quantify the intensity of cacophony puncta upon excision of IS4B because the label revealed no significant difference from background (which can be seen much better in the images now), but the brp intensities remained control-like even upon excision of IS4B.

      (2) They conclude that I-II splicing is not required for cac localization (p. 13). However, cac channel number is reduced in dI-IIB. Could the channels be mis-localized (e.g., in the soma/axon)? What is their definition of localization? Could cac be also mis-localized in dIS4B? Furthermore, the Western Blots indicate a prominent decrease in cac levels in dIS4B/+ and dI-IIB (Figure 1D). How do the decreased protein levels seen in both genotypes fit to a "localization" defect? Could decreased cac expression levels explain the phenotypes alone?

      We have now precisely defined what we mean by cac localization, namely the selective label of cac channels in presynaptic active zones that are defined as brp puncta, but no cac label elsewhere in the presynaptic bouton (page 6, lines 18 to 20). On the level of CLSM microscopy this corresponds to overlapping cac puncta and brp puncta, but no cac label elsewhere in the bouton. Based on the additional analysis and data sets outlined in our response 1 (see above) we conclude that excision of IS4B does not cause channel mislocalization because we find reproducible expression patterns elsewhere in the nervous system as well as somatodendritic cac current in ΔIS4B (for detail see above). Therefore, the isoforms containing the mutually exclusive IS4A exon are expressed and mediate other functions, but cannot substitute IS4B containing isoforms at the presynaptic AZ. In fact, our Western blots are in line with reduced cac expression if all isoforms that mediate evoked release are missing, again indicating that the presynapse specific cac isoforms cannot be replaced by other cac isoforms. This is also in line with the sparse expression of IS4A throughout the CNS as seen in the new supplementary figure 1 (for detail see above).

      (3) Cac-IS4B is required for Cav2 expression, active zone localization, and synaptic transmission. Similarly, loss of cac-I-IIB reduces calcium channel expression and number. Hence, the major phenotype of the tested splice isoforms is the loss of/a reduction in Cav2 channel number. What is the physiological role of these isoforms? Is the idea that channel numbers can be regulated by splicing? Is there any data from other systems relating channel number regulation to splicing (vs. transcription or post-transcriptional regulation)?

      Our data are not consistent with the idea that splicing regulates channel numbers. Rather, splicing can be used to generate channels with specific properties that match the demand at the site of expression. For the IS4 exon pair we find differences in activation voltage between IS4A and IS4B channels (revised Fig. 3C), with IS4B being required for sustained HVA current. IS4A does not localize to presynaptic active zones at the NMJ and is only sparsely expressed elsewhere in the NS (new supplementary Fig. 1). By contrast, IS4B is abundantly expressed in many neuropils. Therefore, taking out IS4B takes out the more abundant IS4 isoform. This is consistent with different expression levels for IS4 isoforms that have different functions, but we do not find evidence for splicing regulating expression levels per se.

      Similarly, the I-II mutually exclusive exon pair differs markedly in the presence or absence of G-protein βγ binding sites that play a role in acute channel regulation as well the conservation of the sequence for β-subunit binding (see page 5, lines 9-17). Channel number reduction in active zones occurs specifically if expression of the cac channels with the G<sub>βγ</sub>-binding site as well as the more conserved β-subunit binding is prohibited by excision of the I-IIB exon (see Fig. 5F). Vice versa, excision of I-IIA does not result in reduced channel numbers. This scenario is consistent with the hypothesis that conserved β-subunit binding affects channel number in the active zone (see page 17, lines 3 to 6 and lines 33-36), but we have no evidence that I-II splicing per se affects channel number.

      (4) Although not supported by statistics, and as appreciated by the authors (p. 14), there is a slight increase in PSC amplitude in dIS4A mutants (Figure 2). Similarly, PSC amplitudes appear slightly larger (Figure 3J), and cac fluorescence intensity is slightly higher (Figure 3H) in dI-IIA mutants. Furthermore, cac intensity and PSC amplitude distributions appear larger in dI-IIA mutants (Figures 3H, J), suggesting a correlation between cac levels and release. Can they exclude that IS4A and/or I-IIA negatively regulate release? I suggest increasing the sample size for Canton S to assess whether dIS4A mutant PSCs differ from controls (Figure 2E). Experiments at lower extracellular calcium may help reveal potential increases in PSC amplitude in the two genotypes (but are not required). A potential increase in PSC amplitude in either isoform would be very interesting because it would suggest that cac splicing could negatively regulate release.

      There are several possibilities to explain this, but as none of the effects is statistically significant, we prefer to not investigate this in further depth. However, given that we cannot find IS4A in presynaptic active zones (revised figures 2C and 3A plus the new enlargements 2Ci and 3Ai, revised text page 6, lines 22 to 24 and 29 to 31, and page 7, second paragraph, same as public response 1D) IS4A channels cannot have a direct negative effect on release probability. Nonetheless, given that IS4A containing cac isoforms mediate functions in other neuronal compartments (see revised Fig. 3C) it may regulate release indirectly by affecting e.g. action potential shape. Moreover, in response to the more detailed suggestions to authors we provide new data that give additional insight.

      (5) They provide compelling evidence that IS4A is required for the amplitude of somatic sustained HVA calcium currents. However, the evidence for effects on biophysical properties and activation voltage (p. 13) is less convincing. Is the phenotype confined to the sustained phase, or are other aspects of the current also affected (Figure 2J)? Could they also show the quantification of further parameters, such as CaV2 peak current density, charge density, as well as inactivation kinetics for the two genotypes? I also suggest plotting peaknormalized HVA current density and conductance (G/Gmax) as a function of Vm. Could a decrease in current density due to decreased channel expression be the only phenotype? How would changes in the sustained phase translate into altered synaptic transmission in response to AP stimulation?

      Most importantly, sustained HVA current is abolished upon excision of IS4B (not IS4A, we think the reviewer accidentally mixed up the genotype) and presynaptic active zones at the NMJ contain only cac isoforms with the IS4B exon. This indicates that the cac isoforms that mediate evoked release encode HVA channels. The somatodendritic currents shown in the revised figure 3C (previously 2J) that remain upon excision of IS4B are mediated by IS4A containing cac isoforms. Please note that these never localize to the presynaptic active zone, and thus do not contribute to evoked release. Therefore, the interpretation is that specifically sustained HVA current encoded by IS4B cac isoforms is required for synaptic transmission. Reduced cac current density due to decreased channel expression is not the cause for impaired evoked release upon IS4B excision, but instead, the cause is the absence of any cac channels in active zones. IS4B-containing cac isoforms encode sustained HVA current, and we speculate that this might be a well suited current to minimize cacophony channel inactivation in the presynaptic active zone. Given that HVA current shows fast voltage dependent activation and fast inactivation upon repolarization, it is useful at large intraburst firing frequencies as observed during crawling (Kadas et al., 2017) without excessive cac inactivation (see page 15, Kadas, lines 16 to 20).

      However, we agree with the reviewer that a deeper electrophysiological analysis of splice isoform specific cac currents will be instructive. We have now added traces of control and ΔIS4B from a holding potential of -90 mv (revised Fig. 3C, bottom traces and revised text on page 7, line 43 to page 8, lines 1 to 10), and these are also consistent with IS4B mediating sustained HVA cac current. However, further analysis of activation and inactivation voltages and kinetics suffers form space clamp issues in recordings from the somata of such complex neurons (DLM motoneurons of the adult fly contain roughly 6000 µm of dendrites with over 4000 branches, Ryglewski et al., 2017, Neuron 93(3):632-645). Therefore, we will analyze the currents in a heterologous expression system and present these data to the scientific community as a separate study at a later time point.

      (6) Why was the STED data analysis confined to the same optical section, and not to max. intensity z-projections? How many and which optical sections were considered for each active zone? What were the criteria for choosing the optical sections? Was synapse orientation considered for the nearest neighbor Cac - Brp cluster distance analysis? How do the nearest-neighbor distances compare between "planar" and "side-view" Brp puncta?

      Maximum intensity z-projections would be imprecise because they can artificially suggest close proximity of label that is close by in x and y but far away in z. Therefore, the analysis was executed in xy-direction of various planes of entire 3D image stacks. We considered active zones of different orientations (Figs. 5C, D) to account for all planes. In fact, we searched the entire z-stacks until we found active zones of all orientations within the same boutons, as shown in figures 5C1-C6. The same active zone orientations were analyzed for all exon-out mutants with cac localization in active zones. The distance between cac and brp did not change if viewed from the side or any other orientation. We now explain this in more clarity in the results text on page 9, lines 23/24.

      (7) Cac clusters localize to the Brp center (e.g., Liu et al., 2011). They conclude that Cav2 localization within Brp is not affected in the cac variants (p. 8). However, their analysis is not informative regarding a potential offset between the central cac cluster and the Brp "ring". Did they/could they analyze cac localization with regard to Brp ring center localization of planar synapses, as well as Brp-ring dimensions?

      In the top views (planar) we did not find any clear offset in cac orientation to brp between genotypes. In such planar synapses (top views, Fig. 5D, left row) we did not find any difference in Brp ring dimensions. We did not quantify brp ring dimensions rigorously, because this study focusses on cac splice isoform-specific localization and function. Possible effects of different cac isoforms on brp-ring dimensions or other aspects of scaffold structure are not central to our study, in particular given that brp puncta are clearly present even if cac is absent from the synapse (Fig. 3A), indicating that cac is not instructive for the formation of the brp scaffold.

      (8) Given the accelerated PSC decay/ decreased half width in dI-IIA (Fig. 5Q), I recommend reporting PSC charge in Figure 3, and PPR charge in Figures 5A-D. The charge-based PPRs of dI-IIA mutants likely resemble WT more closely than the amplitude-based PPR. In addition, miniature PSC decay kinetics should be reported, as they may contribute to altered decay kinetics. How could faster cac inactivation kinetics in response to single AP stimulation result in a decreased PSC half-width? Is there any evidence for an effect of calcium current inactivation on PSC kinetics? On a similar note, is there any evidence that AP waveform changes accelerate PSC kinetics? PSC decay kinetics are mainly determined by GluR decay kinetics/desensitization. The arguments supporting the role of cac splice isoforms in PSC kinetics outlined in the discussion section are not convincing and should be revised.

      We agree that reporting charge in figure 3 is informative and do so in the revised text. Since the result (no significant difference in the PSCs between between CS, cac<sup>GFP</sup>, <sup>ΔI-IIA</sup>, and transheterozygous I-IIA/I-IIB, but significantly smaller values in ΔI-IIB) remained unchanged no matter whether charge or amplitude were analyzed, we decided to leave the figure as is and report the additional analysis in the text (page 8, lines 40 to 42). This way, both types of analysis are reported. Please note that EPSC amplitude is slightly but not significantly increased upon excision of I-IIA (Fig. 4J), whereas EPSC half amplitude width is significantly smaller (Fig. 5Q, now revised Fig 6R). Together, a tendency of increased EPSC amplitudes and smaller half amplitude width result in statistically insignificant changes in EPSC in ∆I-IIA (now discussed on page 15, lines 37 to 40). We also understand the reviewer’s concern attributing altered EPSC kinetics to presynaptic cac channel properties. We have toned down our interpretation in the discussion and list possible alterations in presynaptic AP shape or cac channel kinetics as alternative explanations (not conclusions; see revised discussion on page 15, line 40 to page 16, line 2). Moreover, we have quantified postsynaptic GluRIIA abundance to test whether altered PSC kinetics are caused by altered GluRIIA expression. In our opinion, the latter is more instructive than mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells. Although we find no difference in GluRIIA expression levels we now clearly state that we cannot exclude other changes in GluR receptor fields, which of course, could also explain altered PSC kinetics. We have updated the discussion on page 16, lines 2/3 accordingly.

      (9) Paired-pulse ratios (PPRs): On how many sweeps are the PPRs based? In which sequence were the intervals applied? Are PPR values based on the average of the second over the first PSC amplitudes of all sweeps, or on the PPRs of each sweep and then averaged? The latter calculation may result in spurious facilitation, and thus to the large PPRs seen in dI-IIB mutants (Kim & Alger, 2001; doi: 10.1523/JNEUROSCI.21-2409608.2001).

      We agree that the PP protocol and analyses had to be described more precisely in the methods and have done so on page 23, lines 31 to 37 in the methods. Mean PPR values are based on the PPRs of each sweep and then averaged. We are aware of the study of Kim and Alger 2001 and have re-analyzed the PP data in both ways outlined by the reviewer. We get identical results with either analyses method. Spurious facilitation is thus not an issue in our data. We now explain this in the methods section along with the PPR protocol. The large spread seen in dI-IIB is indeed caused by reduced calcium influx into active zones with fewer channels, as anticipated by the reviewer (see next point).

      (10) Could the dI-IIB phenotype be simply explained by a decrease in channel number/ release probability? To test this, I propose investigating PPRs and short-term dynamics during train stimulation at lower extracellular Ca2+ concentration in WT. The Ca2+ concentration could be titrated such that the first PSC amplitude is similar between WT and dI-IIB mutants. This experiment would test if the increased PPR/depression variability is a secondary consequence of a decrease in Ca2+ influx, or specific to the splice isoform.

      In fact, the interpretation that decreased PSC amplitude upon I-IIB excision is caused mainly by reduced channel number is precisely our interpretation (see discussion page 14, last paragraph to page 15, first paragraph in the original submission, now page 16, second paragraph paragraph). In addition, we are grateful for the reviewer’s suggestion to triturate the external calcium such that the first PSC amplitude in matches in ∆I-IIB and control. This experiment tests whether altered short term plasticity is solely a function of altered channel number or whether additional causes, such as altered channel properties, also play into this. We triturated the first pulse amplitude in ∆I-IIB to match control and find that paired pulse ratio and the variance thereof are not different anymore. Therefore, the differences observed in identical external calcium can be fully explained by altered channel numbers. This additional dataset is shown in the revised figures 6D and E and referred to in the results section on page 10, lines 14 to 25 and the discussion on page16, lines 36 to 38.

      (11) How were the depression kinetics analyzed? How many trains were used for each cell, and how do the tau values depend on the first PSC amplitude? Time constants in the range of a few (5-10) milliseconds are not informative for train stimulations with a frequency of 1 or 10 Hz (the unit is missing in Figure 5H). Also, the data shown in Figures 5E-K suggest slower time constants than 5-10 ms. Together, are the data indeed consistent with the idea that dIIIB does not only affect cac channel number, but also PPR/depression variability (p. 9)?

      For each animal the amplitudes of all subsequent PSCs in each train were plotted over time and fitted with a single exponential. For depression at 1 and 10 Hz, we used one train per animal, and 5-6 animals per genotype (as reflected in the data points in Figs. 6I, M). This is now explained in more detail in the revised methods section (page 23, lines 39 to 41). The tau values are not affected by the amplitude of the first PSC. First, we carefully re-fitted new and previously presented depression data and find that the taus for depression at low stimulation frequencies (1 and 10Hz) are not affected by exon excisions at the I-II site. We thank the reviewer for detecting our error in units and tau values in the previous figure panels 5H and L (this has now been corrected in the revised figure panels 6I and M). Given that PSC amplitude upon I-IIB excision is significantly smaller than in controls and following I-IIA excision, we suspected that the time course of depression at low stimulation frequency is not significantly affected by the amount of calcium influx during the first PSC. To further test this, we followed the reviewer ’s suggestion and re-measured depression at 1 and 10 Hz for cac-GFP controls and for delta I-IIB in a higher external calcium concentration (1.8 mM), so that the first PSC was increased in amplitude in both genotypes (1.8 mM external calcium triturates the PSC amplitude in delta I-IIB to match that of controls measured in 0.5 mM external calcium, see revised Figs. 6H, L). Neither in control, nor in delta I-IIB did this affect the time course of synaptic depression (see revised Figs. 6I, M). This indicates that at low stimulation frequencies (1 and 10Hz) the time course of depression is not affected by mean quantal content. This is consistent with the paired pulse ratio at 100 ms interpulse interval shown in figures 6A-D. However, for synaptic depression at 1 Hz stimulation the variability of the data is higher for delta I-IIB (independent of external calcium concentration, see rev. Fig. 6I), which might also be due to reduced channel number in this genotype. Taken together, the data are in line with the idea that altered cac channel numbers in active zones are sufficient to explain all effects that we observe upon I-IIB excision on PPRs and synaptic depression at low stimulation frequencies. This is now clarified in the revised text on page 12, lines 3 to 7.

      (12) The GFP-tagged I-IIA and mEOS4b-tagged I-IIB cac puncta shown in Figure 6N appear larger than the Brp puncta. Endogenously tagged cac puncta are typically smaller than Brp puncta (Gratz et al., 2019). Also, the I-IIA and I-IIB fluorescence sometimes appear to be partially non-overlapping. First, I suggest adding panels that show all three channels merged. Second, could they analyze the area and area overlap of I-IIA and I-IIB with regard to each other and to Brp, and compare it to cac-GFP? Any speculation as to how the different tags could affect localization? Finally, I recommend moving the dI-IIA and dI-IIB localization data shown in Figure 6N to an earlier figure (Figure 1 or Figure 3).

      We now show panels with the two I-II cac isoforms merged in the revised figure 7H (previously 6N). We also tested merging all three labels as suggested, but found this not instructive for the reader. We thank the reviewer for pointing out that the Brp puncta appeared smaller than the cac puncta in some panels. We carefully went through the data and found that the Brp puncta are not systematically smaller than the cac puncta. Please note that punctum size can appear quite differently, depending on different staining qualities as well as different laser intensities and different point spread in different imaging channels. The purpose of this figure was not to analyze punctum size and labeling intensity, but instead, to demonstrate that I-IIA and I-IIB are both present in most active zones, but some active zones show only I-IIB labeling, as quantified in figure 7I. We did not follow the suggestion to conduct additional co-localization analyses and compare it with cac-GFP controls, because Pearson co-localization coefficients for cac-GFP and all exon-out variants analyzed, including delta I-IIA and delta I-IIB are presented in the revised figure 4D. Moreover, delta I-IIA and delta I-IIB show similar Manders 1 and 2 co-localization coefficients with Brp (see Figs. 4E, F). We do not want to speculate whether the different tags have any effect on localization precision. Artificial differences in localization precision can also be suggested by different antibodies, but we know from our STED analyses with identical tags and antibodies for all isoforms that I-IIA and I-IIB co-localize identically with Brp (see Figs. 5A-E). Finally, we prefer to not move the figure because we believe it is informative to show our finding that active zones usually contain both splice I-II variants together with the finding that only I-IIB is required for PHP.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We thank you for your submission. All three reviewers urge caution in interpreting the S4 splice variant playing a role specifically in Cac localization, as opposed to just leading to instability and degradation. There are other issues with the electrophysiological experiments, a need for improved imaging and analyses, and some areas of interpretation detailed in the reviews.

      We agree that additional data was required to conclude that IS4 splicing plays a specific role in cac channel localization and is not just leading to channel instability and degradation. As outlined in detail in our response to reviewer 1, comment 1, we conducted several sets of experiments to support our interpretation. First, electrophysiological experiments show that upon removal of IS4B, which eliminates synaptic transmission at the larval NMJ and cac positive label in presynaptic active zones, somatodendritic cac current is reliably recorded (new data in revised figure 3C). This is not in line with a channel instability or degradation effect, but instead with IS4B containing isoforms being required and sufficient for evoked release from NMJ motor terminals, whereas IS4A isoforms are not sufficient for evoked release from axon terminals, but IS4A isoforms alone can mediate a distinct component of somatodendritic calcium current. Second, immunohostochemical analyses reveal that IS4A, which is not present in NMJ presynaptic active zones, is expressed sparsely, but in reproducible patterns in the larval brain lobes and in specific regions of the anterior VNC parts (new supplementary figure 1). Again, the absence of a IS4A-containing cac isoform from presynaptic active zones but their simultaneous presence in other parts of the nervous system is in accord with isoform specific localization, but not with general channel isoform instability. Third, enlargements of NMJ boutons with brp positive presynaptic active zones confirm the absence of IS4A and the presence of IS4B in active zones (these enlargements are now shown in the revised figures 2A-C, 3A, and 4A-C). Fourth, as suggested we have quantified the Pearson co-localization of IS4 isoforms with Brp in presynaptic active zones (revised Fig. 2D). This confirms quantitatively similar co-localization of IS4B and control with Brp, but no co-localization of IS4A with Brp. In fact, the labeling intensity of IS4A in presynaptic active zones is quantitatively not significantly different from background, no IS4A label is detected anywhere in the axon terminals at the NMJ, but we find IS4 label in the CNS. Together, these data strongly support our interpretation that the IS4 splice site plays a distinct role in cac channel localization. Figure legends as well as results and discussion section have been modified accordingly (the respective page and line numbers are listed in our-point-by-point responses).

      In addition, we have carefully addressed all other public comments as well as all other recommendations for authors by providing multiple new data sets, new image analyses, and revising text. Addressing the insightful comments of all three reviewers and the reviewing editor has greatly helped to make the manuscript better.

      Reviewer #1 (Recommendations For The Authors):

      The conclusion that the IS4B exon controls Cac localization to active zones versus simply being required for channel abundance is not well supported. The authors need to either mention both possibilities or provide stronger support for the active zone localization model if they want to emphasize this point.

      We agree and have included several additional data sets as outlined in our response to point 1 of reviewer 1 and to the reviewing editor (see above). These new data strongly support our interpretation that the IS4B exon controls Cac localization to active zones and is not simply required for channel abundance. The additions to the figures and accompanying text (including the respective figure panel, page, and line numbers) are listed in the point-bypoint responses to the reviewers’ public suggestions.

      Figure 2C staining for Cac localization in the delta 4B line is difficult to compare to the others, as the background staining is so high (muscles are green for example). As such, it is hard to determine whether the arrows in C are just background.

      We had over-emphasized the green label to show that there really is no cacophony label in active zones. However, we agree that this hampered image interpretation. Thus, we have adjusted brightness such that it matches the other genotypes (see new figure panel 2C, and figure 3A, bottom). Revising the figure as suggested by the reviewer shows much more clearly that IS4B puncta are detected exclusively in presynaptic active zones, whereas IS4A channels are not detectable in active zones or anywhere else in the axon terminal boutons. Quantification of IS4A label in brp positive active zones confirms that labeling intensity is not significantly above background (page 6, lines 29 to 31 and page 7, lines 19 to 21). Therefore, IS4A is not detectable in active zones at the NMJ.

      It seems more likely that the removal of the 4B exon simply destabilizes the protein and causes it to be degraded (as suggested by the Western), rather than mislocalizing it away from active zones. It's hard to imagine how some residue changes in the S4 voltage sensor would control active zone localization to begin with. The authors should note that the alternative explanation is that the protein is just degraded when the 4B exon is removed.

      Based on additional data and analyses, we disagree with the interpretation that removal of IS4B disrupts protein integrity and present multiple lines of evidence that support sparse expression of IS4A channels (ΔIS4B). As outlined in our response to reviewer 1 and to the reviewing editor, we show (1) in new immunohistochemical stainings (new supplementary figure 1) that upon removal of IS4B, sparse label is detectable in the VNC and the brain lobes (for detail see above). (2) In our new figure 3C, we show cacophony-mediated somatodendritic calcium currents recorded from adult flight motoneurons in a control situation and upon removal of IS4B that leaves only IS4A channels. This clearly demonstrates that IS4A underlies a substantial component of the HVA somatodendritic calcium current, although it is absence from axon terminals. This is in line with isoform specific functions at different locations, but not with IS4A instability/degradation. (3) We do not agree with the reviewer’s interpretation of the Western Blot data in figure 1E (formerly figure 1D). Together with our immunohistochemical data that show sparse cacophony IS4A expression, we think that the faint band upon removal of IS4B in a heterozygous background (that reduces labeled channels even further) reflects the sparseness of IS4A expression. This sparseness is not due to channel instability, but to IS4A functions that are less abundant than the ubiquitously expressed cac<sup>IS4B</sup> channels at presynaptic active zones of fast chemical synapses (see page 15, lines 24 to 29).

      If they really want to claim the 4B exon governs active zone localization, much higher quality imaging is required (with enlarged views of individual boutons and their AZs, rather than the low-quality full NMJ imaging provided). Similarly, higher resolution imaging of Cac localization at Muscle 12 (Figure 2H) boutons would be very useful, as the current images are blurry and hard to interpret. Figure 6N shows beautiful high-resolution Cac and Brp imaging in single boutons for the I-II exon manipulations - the authors should do the same for the 4B line. For all immuno in Figure 2, it is important to quantify Cac intensity as well. There is no quantification provided, just a sample image. The authors should provide quantification as they do for the delta I-II exons in Figure 3.

      We did as suggested and added figure panels to figure 2A-C and to new figures 3A (formerly part of figure 2 and 4A-C (formerly figure 3) showing magnified label at the NMJ AZs to better judge on cacophony expression after exon excision. These data are now referred to in the results section on page 6, lines 22 to 24, page 7, lines 18 to 21 and page 8, lines 17/18.

      As suggested, we now also provide quantification of co-localization with brp puncta as Pearson’s correlation coefficient for control, IS4B, and IS4A in the new figure panel 2D (text on page 6, lines 34 to 38). This further underscores control-like active zone localization of IS4B but no significant active zone localization of IS4A. As suggested, we quantified now also the intensity of IS4B label in active zones, and it was not different from control (see revised figure 4H and text on page 8, lines 38/39). We did not quantify the intensity of IS4A label, because it was not over background (text, page 6, lines 30/31).

      Reviewer #2 (Recommendations For The Authors):

      (1a) Questions about the engineered Cac splice isoform alleles:

      The authors using CRISPR gene editing to selectively remove the entire alternatively spliced exons of interest. Do the authors know what happens to the cac transcript with the deleted exon? Is the deleted exon just skipped and spliced to the next exon? Or does the transcript instead undergo nonsense-mediated decay?

      We do not believe that there is nonsense mediated mRNA decay, because for all exon excisions the respective mRNA and protein are made. Protein has been detected on the level of Western blotting and immunocytochemistry. Therefore, we are certain that the mRNA is viable for each exon excision (and we have confirmed this for low abundance cac protein isoforms by rt-PCR), but only subsets of cac isoforms can be made from mRNAs that are lacking specific exons. However, we can not make any statements as to whether the lack of specific protein isoforms exerts feedback on mRNA stability, the rate of transcription and translation, or other unknown effects.

      (1b) While it is clear that the IS4 exons encode part of the voltage sensor in the first repeat, are there studies in Drosophila to support the putative Ca-beta and G-protein beta-gamma binding sites in the I-II loop? Or are these inferred from Mammalian studies?

      To the best of our knowledge, there are no studies in Drosophila that unambiguously show Caβ and Gβγ binding sites in the I-II loop of cacophony. However, sequence analysis strongly suggests that I-IIB contains both, a Caβ as well as a Gβγ binding site (AID: α-interacting domain) because the binding motif QXXER is present. In mouse Cav2.1 and Ca<sub>v</sub>2.2 channels the sequence is QQIER, while in Drosophila cacophony I-IIB it is QQLER. In the alternative IIIA, this motif is not present, strongly suggesting that G<sub>βγ</sub> subunits cannot interact at the AID. However, as already suggested by Smith et al. (1998), based on sequence analysis, Ca<sub>β</sub> should still be able to bind, although possibly with a lower affinity. We agree that this information should be given to the reader and have revised the text accordingly on page 5, lines 9 to 17.

      (1c) The authors assert that splicing of Cav2/cac in flies is a means to encode diversity, as mammals obviously have 4 Cav2 genes vs 1 in flies. However, as the authors likely know, mammalian Cav2 channels also have various splice isoforms encoded in each of the 4 Cav2 genes. The authors should discuss in more detail what is known about the splicing of individual mammalian Cav2 channels and whether there are any homologous properties in mammalian channels controlled by alternative splicing.

      We agree and now provide a more comprehensive discussion of vertebrate Ca<sub>v</sub>2 splicing and its impact on channel function. In line to what we report in Drosophila, properties like G<sub>βγ</sub> binding and activation voltage can also be affected by alternative splicing in vertebrate Ca<sub>v</sub>2 channel, through the exon patterns are quite different from Drosophila. We integrated this part on page 14, first paragraph) in the revised discussion. The respective text is below for the reviewer’s convenience:

      “However, alternative splicing increases functional diversity also in mammalian Ca<sub>v</sub>2 channels. Although the mutually exclusive splice site in the S4 segment of the first homologous repeat (IS4) is not present in vertebrate Cav channels, alternative splicing in the extracellular linker region between S3 and S4 is at a position to potentially change voltage sensor properties (Bezanilla 2002). Alternative splice sites in rat Ca<sub>v</sub>2.1 exon 24 (homologous repeat III) and in exon 31 (homologous repeat IV) within the S3-S4 loop modulate channel pharmacology, such as differences in the sensitivity of Ca<sub>v</sub>2.1 to Agatoxin. Alternative splicing is thus a potential cause for the different pharmacological profiles of P- and Q-channels (both Ca<sub>v</sub>2.1; Bourinet et al. 1999). Moreover, the intracellular loop connecting homologous repeats I and II is encoded by 3-5 exons and provides strong interaction with G<sub>βγ</sub>-subunits (Herlitze et al. 1996). In Ca<sub>v</sub>2.1 channels, binding to G<sub>βγ</sub> subunits is potentially modulated by alternative splicing of exon 10 (Bourinet et al. 1999). Moreover, whole cell currents of splice forms α1A-a (no Valine at position 421) and α1A-b (with Valine) represent alternative variants for the I-II intracellular loop in rat Ca<sub>v</sub>2.1 and Ca<sub>v</sub>2.2 channels. While α1A-a exhibits fast inactivation and more negative activation, α1A-b has delayed inactivation and a positive shift in the IV-curve (Bourinet et al. 1999). This is phenotypically similar to what we find for the mutually exclusive exons at the IS4 site, in which IS4B mediates high voltage activated cacophony currents while IS4A channels activate at more negative potentials and show transient current (Fig. 3; see also Ryglewski et al. 2012). Furthermore, altered Ca<sub>β</sub> interaction have been shown for splice isoforms in loop III (Bourinet et al. 1999), similar to what we suspect for the I-II site in cacophony. Finally, in mammalian VGCCs, the C-terminus presents a large splicing hub affecting channel function as well as coupling distance to other proteins. Taken together, Ca<sub>v</sub>2  channel diversity is greatly enhanced by alternative splicing also in vertebrates, but the specific two mutually exclusive exon pairs investigated here are not present in vertebrate Ca<sub>v</sub>2 genes.”

      (1d) In Figure 1, it would be helpful to see the entire cac genomic locus with all introns/exons and the 4 specific exons targeted for deletion.

      We agree and have changed figure 1 accordingly.

      (2a) Cav2.IS4B deletion alleles:

      More work is necessary to explain the localization of Cac controlled by the IS4B exon. First, can the authors determine whether actual Cac channels are present at NMJ boutons? The authors seem to indicate that in the IS4B deletion mutants, some Cac (GFP) signal remains in a diffuse pattern across NMJ boutons. However, from the imaging of wild-type Cac-GFP (and previous studies), there is no Cac signal outside of active zones defined by the BRP signal. It would benefit the study to a) take additional, higher resolution images of the remaining Cac signal at NMJs in IS4B deletion mutants, and b) comment on whether the apparent remaining signal in these mutants is only observed in the absence of IS4Bcontaining Cac channels, or if the IS4A-positive channels are normally observed (but perhaps mis-localized?).

      We have conducted additional analyses to show convincingly that IS4A channels (that remain upon IS4B deletion) are absent from presynaptic active zone. Please see also responses to reviewers 1 and 3. By adjusting the background values in of CLSM images to identical values in control, delta IS4A, and delta IS4B, as well as by providing selective enlargements as suggested, the figure panels 2C, Ci and 3A now show much clearer, that upon deletion of IS4B no cac label remains in active zones or anywhere else in the axon terminal boutons (see text on page 6, lines 22 to 24). This is further confirmed by quantification showing the in IS4B mutants cac labeling intensity in active zones is not above background (see text on page 6, lines 27 to 31). We never intended to indicate that there was cac signal outside of active zones defined by the brp signal, and we now carefully went through the text to not indicate this possibility unintentionally anywhere in the manuscript.

      (2b) Do the authors know whether any presynaptic Ca2+ influx is contributed by IS4Apositive Cac channels at boutons, given the potential diffuse localization? There are various approaches for doing presynaptic Ca2+ imaging that could provide insight into this question.

      We agree that this is an interesting question. However, based on the revisions made, we now show with more clarity that IS4A channels are absent from the presynaptic terminal at the NMJ. IS4A labeling intensities within active zones and anywhere else in the axon terminals are not different from background (see text on page 6, lines 27 to 31 and revised Figs. 2C, Ci, and 3A with new selective enlargements in response to comments of both other reviewers). This is in line with our finding that evoked synaptic transmission from NMJ axon terminals to muscle cells is mostly absent upon excision of IS4B (see Fig. 3B). The very small amplitude EPSC (below 5 % of the normal amplitude of evoked EPSCs) that can still be recorded in the absence of IS4B is similar to what is observed in cac null mutant junctions and is mediated by calcium influx through another voltage gated calcium channels, a Ca<sub>v</sub>1 homolog named Dmca1D, as we have previously published (Krick et al., 2021, PNAS 118(28):e2106621118. Gathering additional support for the absence of IS4A from presynaptic terminals by calcium imaging experiments would suffer significantly from the presence of additional types of VGCCs in presynaptic terminals (for sure Dmca1D (Krick et al., 2021) and potentially also the Ca<sub>v</sub>3 homolog DmαG or Dm-α1T). Such experiments would require mosaic null mutants for cac and DmαG channels in a mosaic IS4B excision mutant, which, if feasible at all, would be very hard and time consuming to generate. In the light of the additional clarification that IS4A is not located in NMJ axon terminal boutons, as shown by additional labeling intensity analysis, revised figures with selective enlargement, and revised text, we feel confident to state that IS4A is not sufficient for evoked SV release.

      (2c) Mechanistically, how are amino acid changes in one of the voltage sensing domains in Cac related to trafficking/stabilization/localization of Cac to AZs?

      This is an exciting question that has occupied our discussions a lot. Some sorting mechanism must exist that recognizes the correct protein isoforms, just as sorting and transport mechanisms exist that transport other synaptic proteins to the synapse. We do not think that the few amino acid changes in the voltage sensor are directly involved in protein targeting. We rather believe that the cacophony variants that happen to contain this specific voltage sensor are selected for transport out to the synapse. There are possibilities to achieve this cell biological, but we have not further addressed potential mechanisms because we do not want enter the realms of speculation.

      (3) How are auxiliary subunits impacted in the Cac isoform mutants?

      Recent work by Kate O'Connor-Giles has shown that both Stj and Ca-Beta subunits localize to active zones along with Cac at the Drosophila NMJ. Endogenously tagged Stj and CaBeta alleles are now available, so it would be of interest to determine if Stj and particular Cabeta levels or localization change in the various Cac isoform alleles. This would be particularly interesting given the putative binding site for Ca-beta encoded in the I-II linker.

      We agree that the synthesis of the work of Kate O'Connor-Giles group and our study open up new avenues to explore exciting hypotheses about differential coupling of specific cacophony splice isoforms with distinct accessory proteins such as Caβ and α<sub>2</sub>δ subunits. However, this requires numerous full sets of additional experiments and is beyond the scope of this study.

      (4a) Interpretation of short-term plasticity in the I-IIB exon deletion:

      The changes in short-term plasticity presented in Figure 5 are interpreted as an additional phenotype due to the loss of the I-IIB exon, but it seems this might be entirely explained simply due to the reduced Cac levels. Reduced Cac levels at active zones will obviously reduce Ca2+ influx and neurotransmitter release. This may be really the only phenotype/function of the I-IIB exon. Hence, to determine whether loss of the I-IIB exon encodes any functions in short-term plasticity, separate from reduced Cac levels, the authors should compare short-term plasticity in I-IIB loss alleles compared to wild type with starting EPSC amplitudes are equal (for example by reducing extracellular Ca2+ levels in wild type to achieve the same levels at in Cac I-IIB exon deleted alleles). Reduced release probability, simply by reduced Ca2+ influx (either by reduced Cac abundance or extracellular Ca2+) should result in more variability in transmission, so I am not sure there is any particular function of the I-IIB exon in maintaining transmission variability beyond controlling Cac abundance at active zones.

      For two reasons we are particularly grateful for this comment. First, it shows us that we needed to explain much clearer that our interpretation is that changes in paired pulse ratios (PPRs) and in depression at low stimulation frequencies are a causal consequence of lower channel numbers upon I-IIB exon deletion, precisely as pointed out by the reviewer. We have carefully revised the text accordingly on page 10, lines 14-25, page 11, lines 3-7 and 22-28; page 16, lines 36-38. Second, the experiment suggested by the reviewer is superb to provide additional evidence that the cause of altered PPRs is in fact reduced channel number, but not altered channel properties. Accordingly, we have conducted additional TEVC recordings in elevated external calcium (1.8 mM) so that the single PSC amplitudes in I-IIB excision animals match those of controls in 0.5 mM extracellular calcium. This makes the amplitudes and the variance of PPR for all interpulse intervals tested control-like (see revised Figs. 6D, E). This strongly indicates that differences observed in PPRs as well as the variance thereof were caused by the amount of calcium influx during the first EPSC, and thus by different channel numbers in active zones.

      (4b) Another point about the data in Figure 5: If "behaviorally relevant" motor neuron stimulation and recordings are the goal, the authors should also record under physiological Ca2+ conditions (1.8 mM), rather than the highly reduced Ca2+ levels (0.5 mM) they are using in their protocols.

      Although we doubt that the effective extracellular calcium concentration that determines the electromotoric force for calcium to enter the ensheathed motoneuron terminals in vivo during crawling is known, we followed the reviewer’s suggestion partly and have repeated the high frequency stimulation trains for ΔI-IIB in 1.8 mM calcium. As for short-term plasticity this brings the charge conducted to values as observed in control and in ΔI-IIA in 0.5 mM calcium. Therefore, all difference observed in previous figure 5 (now revised figure 6) can be accounted to different channel numbers in presynaptic active zones. This is now explained on page 11, lines 19-28. For controls recordings at high frequency stimulation in higher external calcium (e.g. 2 mM) have previously been published and show significant synaptic depression (e.g. Krick et al., 2021, PNAS). Given that in the exon out variants we do not expect any differences except from those caused by different channel numbers, we did not repeat these experiments for control and ΔI-IIA.

      (5a) Mechanism of Cac's role in PHP :

      As the authors likely know, mutations in Cac were previously reported to disrupt PHP expression (see Frank et al., 2006 Neuron). Inexplicably, this finding and publication were not cited anywhere in this manuscript (this paper should also be cited when introducing PhTx, as it was the first to characterize PhTx as a means of acutely inducing PHP). In the Frank et al. paper (and in several subsequent studies), PHP was shown to be blocked in mutations in Cac, namely the CacS allele. This allele, like the I-IIB excision allele, reduces baseline transmission presumably due to reduced Ca2+ influx through Cac. The authors should at a minimum discuss these previous findings and how they relate to what they find in Figure 6 regarding the block in PHP in the Cac I-IIB excision allele.

      We thank the reviewer for pointing this out and apologize for this oversight. We agree that it is imperative to cite the 2006 paper by Frank et al. when introducing PhTx mediated PHP as well as when discussing cac the effects of cac mutants on PHP together with other published work. We have revised the text accordingly on page 12, lines 9-11 and 21-23 and on page 17, lines 29-33.

      In terms of data presentation in Fig. 6, as is typical in the field, the authors should normalize their mEPSC/QC data as a percentage of baseline (+PhTx/-PhTx). This makes it easier to see the reduction in mEPSC values (the "homeostatic pressure" on the system) and then the homeostatic enhancement in QC. Similarly, in Fig. 6M, the authors should show both mEPSC and QC as a percentage of baseline (wild type or non-GluRIIA mutant background).

      We agree and have changed figure presentation accordingly. Figure 7 (formerly figure 6) was updated as was the accompanying results text on page 12, lines 23-40.

      (6) Cac I-IIA and I-IIB excision allele colocalization at AZs:

      These are very nice and important experiments shown in Figures 6N and O, which I suggest the authors consider analyzing in further detail. Most significantly:

      (6i) The authors nicely show that most AZs have a mix of both Cac IIA and IIB isoforms. Using simple intensity analysis, can the authors say anything about whether there is a consistent stoichiometric ratio of IIA vs IIB at single AZs? It is difficult to extract actual numbers of IIA vs IIB at individual AZs without having both isoforms labeled mEOS4b, but as a rough estimate can the authors say whether the immunofluorescence intensity of IIA:IIB is similar across each AZ? Or is there broad heterogeneity, with some AZs having low vs high ratios of each isoform (as the authors suggest across proximal to distal NMJ AZs)?

      We agree and have conducted experiments and analyses to provide these data. We measured the cac puncta fluorescence intensities for heterozygous cac<sup>sfGFP</sup>/cac, cacIIIA<sup>sfGFP</sup>/cacI-IIB, and cacI-IIB<sup>sfGFP</sup>/cacI-IIA animals. We preferred this strategy, because intensity was always measured from cac puncta with the same GFP tag. Next, we normalized all values to the intensities obtained in active zones from heterozygous cac<sup>sfGFP</sup>/cac controls and then plotted the intensities of I-IIA versus I-IIB containing active zones side by side. Across junctions and animals, we find a consistent ratio 2:1 in the relative intensities of I-IIB and I-IIA, thus indicating on average roughly twice as many I-IIB as compared to I-IIA channels across active zones. This is consistent with the counts in our STED analysis (see Fig. 5F). These new data are shown in the new figure panel 7J and referred to on page 13, lines 10-16 in the revised text.

      (6ii) Intensity analysis of Cac IIA vs IIB after PHP: Previous studies have shown Cac abundance increases at NMJ AZs after PHP. Can the authors determine whether both Cac IIA vs IIB isoforms increase after PHP or whether just one isoform is targeted for this enhancement?

      We already show that PHP is not possible in the absence of I-IIB channels (see figure 7). However, we agree that it is an interesting question to test whether I-IIA channel are added in the presence of I-IIB channels during PHP, but we consider this a detail beyond the scope of this study.

      Minor points:

      (1) Including line numbers in the manuscript would help to make reviewing easier.

      We agree and now provide line numbers.

      (2) Several typos (abstract "The By contrast", etc).

      We carefully double checked for typos.

      (3) Throughout the manuscript, the authors refer to Cac alleles and channels as "Cav2", which is unconventional in the field. Unless there is a compelling reason to deviate, I suggest the authors stick to referring to "Cac" (i.e. cacdIS4B, etc) rather than Cav2. The authors make clear in the introduction that Cac is the sole fly Cav2 channel, so there shouldn't be a need to constantly reinforce that cac=Cav2.

      We agree and have changed all fly Ca<sub>v</sub>2 reference to cac.

      (4) In some figures/text the authors use "PSC" to refer to "postsynaptic current", while in others (i.e. Figure 6) they switch to the more conventional terms of mEPSC or EPSC. I suggest the authors stick to a common convention (mEPSC and EPSC).

      We have changed PSC to EPSC throughout.

      Reviewer #3 (Recommendations For The Authors):

      (1) The abstract could focus more on the results at the expense of the background.

      We agree and have deleted the second introductory background sentence and added information on PPRs and depression during low frequency stimulation.

      (2) What does "strict" active zone localization refer to? Could they please define the term strict?

      Strict active zone localization means that cac puncta are detected in active zones but no cac label above background is found anywhere else throughout the presynaptic terminal, now defined on page 6, lines 27-29.

      (3) Single boutons/zoomed versions of the confocal images shown in Figures 2A-C, 2H, and 3A-C would be very helpful.

      We have provided these panels as suggested (see above and revised figures 2-4). Figure 3 is now figure 4.

      (4) The authors cite Ghelani et al. (2023) for increased cac levels during homeostatic plasticity. I recommend citing earlier work making similar observations (Gratz et al., 2019; DOI: 10.1523/JNEUROSCI.3068-18.2019), and linking them to increased presynaptic calcium influx (Müller & Davis, 2012; DOI: 10.1016/j.cub.2012.04.018).

      We agree and have added Gratz et al. 2019 and Davis and Müller 2012 to the results section on page 12, lines 17/18 and lines 21-23, in the discussion on page 17, lines 29-33.

      (5) The data shown in Figure 3 does not directly support the conclusion of altered release probability in dI-IIB. I therefore suggest changing the legend's title.

      We have reworded to “Excisions at the I-II exon do not affect active zone cacophony localization but can alter cacsfGFP label intensity in active zones and PSC amplitude” as this is reflecting the data shown in the figure panels more directly.

      (6) It would be helpful to specify "adult flight muscle" in Figure 2J.

      We agree that it is helpful to specify in the figure (now revised figure 3C) that the voltage clamp recordings of somatodendritic calcium current were conducted in adult flight motoneurons and have revised the headline of figure panel 3C and the legend accordingly. Please note, these are not muscle cells but central neurons.

      (7) Do dIS4B/Cav2null MNs indeed show an inward or outward current at -90 to -70 mV/-40 and -50 mV, or is this an analysis artifact?

      No, this is due to baseline fluctuations as typical for voltage clamp in central neurons with more than 6000 µm dendritic length and more than 4000 dendritic branches.

      (8) Loss of several presynaptic proteins, including Brp (Kittel et al., 2006), and RBP (Liu et al., 2011), induce changes in GluR field size (without apparent changes in miniature amplitude). The statement regarding the Cav2 isoform and possible effects on GluR number (p. 8) should be revised accordingly.

      We understand and have done two things. First, we measured the intensity of GluRIIA immunolabel in ΔI-IIA, ΔI-IIB, and controls and found no differences. Second, we reworded the statement. It now reads on page 9, lines 1-6: “It seems unlikely that presynaptic cac channel isoform type affects glutamate receptor types or numbers, because the amplitude of spontaneous miniature postsynaptic currents (mEPSCs, Fig. 4K) and the labeling intensity of postsynaptic GluRIIA receptors are not significantly different between controls, I-IIA, and I-IIB junctions (see suppl. Fig. 2, p = 0.48, ordinary one-way ANOVA, mean and SD intensity values are 61.0 ± 6.9 (control), 55.8 ± 8.5 (∆I-IIA), 61.1 ± 17.3 (∆I-IIB)). However, we cannot exclude altered GluRIIB numbers and have not quantified GluR receptor field sizes.”

      (9) The statement relating miniature frequency to RRP size is unclear (p. 8). Is there any evidence for a correlation between miniature frequency to RRP size? Could the authors please clarify?

      We agree that this statement requires caution. Although there is some published evidence for a correlation of RRP size and mini frequency (Neuron, 2009 61(3):412-24. doi: 10.1016/j.neuron.2008.12.029 and Journal of Neuroscience 44 (18) e1253232024; doi: 10.1523/JNEUROSCI.1253-23.2024), which we now refer to on page 9, it is not clear whether this is true for all synapses and how linear such a relationship may be. Therefore, we have revised the text on page 9, lines 6-9. It now reads: “Similarly, the frequency of miniature postsynaptic currents (mEPSCs) remains unaltered. Since mEPSCs frequency has been related to RRP size at some synapses (Pan et al., 2009; Ralowicz et al., 2024) this indicates unaltered RRP size upon I-IIB excision, but we have not directly measured RRP size.”

      (10) Please define the "strict top view" of synapses (p. 8).

      Top view is what this reviewer referred to as “planar view” in the public review points 6 and 7. In our responses to these public review points we now also define “strict top view”, see page 9, lines 17-19.

      (11) Two papers are cited regarding a linear relationship between calcium channel number and release probability (p. 15). Many more papers could be cited to demonstrate a supralinear relationship (e.g., Dodge & Rahaminoff, 1967; Weyhersmüller et al., 2011 doi: 10.1523/JNEUROSCI.6698-10.2011). The data of the present study were collected at an extracellular calcium concentration of 0.5 mM, whereas Meideiros et al. (2023) used 1.5 mM. The relationship between calcium and release is supra-linear around 0.5 mM extracellular calcium (Weyhersmüller et al. 2011). This should be discussed/the statements be revised. Also, the reference to Meideiros et al. (2023) should be included in the reference list.

      We have now updated the Medeiros reference (updated version of that paper appeared in eLife in 2024) in the text and reference list. We agree that the relationship of the calcium concentration and P<sub>r</sub> can also be non-linear and refer to this on page 16, lines 26-32, but the point we want to make is to relate defined changes in calcium channel number (not calcium influx) as assessed by multiple methods (CLSM intensity measures and sptPALM channel counting) to release probability. We now also clearly state that we measured at 0.5 mM external calcium (page 16, lines 27/28) whereas Medeiros et al. 2024 measured at 1.5 mM calcium (page 16, lines 31/32).

      (12) Figure 6: Quantal content does not have any units - please remove "n vesicles".

      We have revised this figure in response to reviewer 2 (comment 5) and quantal content is now expressed as percent baseline, thus without units (see revised figure 7).

      (13) Figure 6C should be auto-scaled from zero.

      This has been fixed by revising that figure in response to reviewer 2 (comment 5)

      (14) The data supporting the statement on impaired motor behavior and reduced vitality of adult IS4A should be either shown, or the statement should be removed (p. 13). Any hypotheses as to why IS4A is important for behavior and or viability?

      As suggested, we have removed that statement.

      (15) They do not provide any data supporting the statement that changes in PSC decay kinetics "counteract" the increase in PSC amplitude (p. 14). The sentence should be changed accordingly.

      We agree and have down toned. It now reads on page 16, lines 7-9: “During repetitive firing, the median increase of PSC amplitude by ~10 % is potentially counteracted by the significant decrease in PSC half amplitude width by ~25 %...”.

      (16) How do they explain the net locomotion speed increase in dI    -IIA larvae? Although the overall charge transfer is not affected during the stimulus protocols used, could the accelerated PSC decay affect PSP summation (I would actually expect a decrease in summation/slower speed)? Independent of the voltage-clamp data, is muscle input resistance changed in dI-IIA mutants?

      Muscle input resistance is not altered in I-II mutants. We refer to potential causes of the locomotion effects of I-IIA excision in the discussion. On page 16, lines 12 to 21 it reads: “there is no difference in charge transfer from the motoneuron axon terminal to the postsynaptic muscle cell between ∆I-IIA and control. Surprisingly, crawling is significantly affected by the removal of I-IIA, in that the animals show a significantly increased mean crawling speed but no significant change in the number of stops. Given that the presynaptic function at the NMJ is not strongly altered upon I-IIA excision, and that I-IIA likely mediates also Ca<sub>v</sub>2 functions outside presynaptic AZs (see above) and in other neuron types than motoneurons, and that the muscle calcium current is mediated by Ca<sub>v</sub>1>/i> and Ca<sub>v</sub>3, the effects of I-IIA excision of increasing crawling speed is unlikely caused by altered pre- or postsynaptic function at the NMJ. We judge it more likely that excision of I-IIA has multiple effects on sensory and pre-motor processing, but identification of these functions is beyond the scope of this study.”

    1. eLife Assessment

      This important study presents an evaluation of several tools used for detecting Identity-By-Descent (IBD) segments in highly recombining genomes, using simulated data to replicate the high recombination and low marker density of Plasmodium falciparum, the parasite responsible for malaria. Most of the evidence presented by the authors is solid demonstrating that users should be cautious calling IBD when SNP density is low and recombination rate is high. This study will be of interest to scientists working in the field of genome evolution and infectious diseases.

    2. Reviewer #1 (Public review):

      Summary:

      Authors benchmarked 5 IBD detection methods (hmmIBD, isoRelate, hap-IBD, phasedIBD, and Refined IBD) in Plasmodium falciparum using simulated and empirical data. Plasmodium falciparum has a mutation rate similar to humans but a much higher recombination rate and lower SNP density. Thus, the authors evaluated how recombination rate and marker density affect IBD segment detection. Next, they performed parameter optimization for Plasmodium falciparum and benchmarked the robustness of downstream analyses (selection detection and NE inference) using IBD detected by each of the methods. They also tracked the computational efficiency of these methods. The authors work is valuable for the tested species and the analyses presented appear to support their claim that users should be cautious calling IBD when SNP density is low and recombination rate is high.

      Strengths:

      The study design was solid. The authors set up their reasoning for using P. falciparum very well. The high recombination rate and similar mutation rate to human is indeed an interesting case. Further, they chose methods that were developed explicitly for each species. This was a strength of the work, as well as incorporating both simulated and empirical data to support their goal that IBD detection should be benchmarked in P. falciparum.

      Weaknesses:

      The scope of the optimization and application of results from the work are narrow, in that everything is fine-tuned for Plasmodium. Some of the results were not entirely unexpected for users of any of the tested software that was developed for humans. For example, it is known that Refined IBD is not going to do well with the combination of short IBD segments and low SNP density. Lastly, it appears the authors only did one large-scale simulation (there are no reported SDs).

    3. Reviewer #2 (Public review):

      Summary:

      Guo et al. benchmarked and optimized methods for detecting Identity-By-Descent (IBD) segments in Plasmodium falciparum (Pf) genomes, which are characterized by high recombination rates and low marker density. Their goal was to address the limitations of existing IBD detection tools, which were primarily developed for human genomes and do not perform well in the genomic context of highly recombinant genomes. They first analysed various existing IBD callers, such as hmmIBD, isoRelate, hap-IBD, phased-IBD, refinedIBD. They focused on the impact of recombination on the accuracy, which was calculated based on two metrics, the false negative rate and the false positive rate. The results suggest that high recombination rates significantly reduce marker density, leading to higher false negative rates for short IBD segments. This effect compromises the reliability of IBD-based downstream analyses, such as effective population size (Ne) estimation.<br /> They showed that the best tool for IBD detection in Pf is hmmIBD, because it has relatively low FN/FP error rates and is less biased for relatedness estimates. However, this method is the less computationally efficient.<br /> Their suggestion is to optimize human-oriented IBD methods and use hmmIBD only for the estimation of Ne.

      Strengths:

      Although I am not an expert on Plasmodium falciparum genetics, I believe the authors have developed a valuable benchmarking framework tailored to the unique genomic characteristics of this species. Their framework enables a thorough evaluation of various IBD detection tools for non-human data, such as high recombination rates and low marker density, addressing a key gap in the field.<br /> This study provides a comparison of multiple IBD detection methods, including probabilistic approaches (hmmIBD, isoRelate) and IBS-based methods (hap-IBD, Refined IBD, phased IBD). This comprehensive analysis offers researchers valuable guidance on the strengths and limitations of each tool, allowing them to make informed choices based on specific use cases. I think this is important beyond the study of Pf.<br /> The authors highlight how optimized IBD detection can help identify signals of positive selection, infer effective population size (Ne), and uncover population structure.<br /> They demonstrate the critical importance of tailoring analytical tools to suit the unique characteristics of a species. Moreover, the authors provide practical recommendations, such as employing hmmIBD for quality-sensitive analyses and fine-tuning parameters for tools originally designed for non-P. falciparum datasets before applying them to malaria research.

      Overall, this study represents a meaningful contribution to both computational biology and malaria genomics, with its findings and recommendations likely to have an impact on the field.

      Weaknesses:

      One weakness of the study is the lack of emphasis on the broader importance of studying Plasmodium falciparum as a critical malaria-causing organism. Malaria remains a significant global health challenge, causing hundreds of thousands of deaths annually. The authors could have introduced better the topic, even though I understand this is a methodological paper. While the study provides a thorough technical evaluation of IBD detection methods and their application to Pf, it does not adequately connect these findings to the broader implications for malaria research and control efforts. Additionally, the discussion on malaria and its global impact could have framed the study in a more accessible and compelling way, making the importance of these technical advances clearer to a broader audience, including researchers and policymakers in the fight against malaria.

    4. Author response:

      Provisional Responses to Review #1's comments:

      We thank the reviewer for the comments, which highlight both strengths and weaknesses.

      We acknowledge that the optimized parameter values are somewhat specific to Plasmodium, as demographic and mutation/recombination rates can vary across species. However, we would like to emphasize that our simulation and benchmarking framework, along with associated tools like the efficient ibdutils, should be broadly applicable to many species, such as Apicomplexan parasites and other high-recombining eukaryotes, especially when their demographic and evolutionary parameters can be provided or estimated. We will update relevant paragraphs in the disucssion to highlight this point.

      Results related to Refined IBD may not seem unexpected, but our work demonstrates that its direct application to malaria parasites without species-specific optimization can be suboptimal, as has previously occurred in malaria research with their validity not formally evaluated. We believe it is crucial for the research community focusing on non-standard model organisms to validate assumptions made in methods developed for standard models, such as humans, before they are applided to new species.

      Although standard deviations (SDs) are not provided for many analyses, we argue that simulating 14 chromosomes independently serves as repeats (data were shown as means over chromosomes), particularly when assessing the accuracy of IBD segments or scanning for selection signals. For analyses that aggregate information across chromosomes, we are planning to conduct additional repeated simulations or analyses to quantify the uncertainty of estimates. In the upcoming revised version, we will provide SDs where appropriate and explanations when repeated simulation are not necessary given a large number of data points have well captured their empirical distributions.

      Provisional response to review #2's comment:

      Thank you to the reviewer for the suggestions. We agree with the comments, and addressing the mentioned weakness will improve the manuscript's clarity and impact. We plan to enhance the introduction by highlighting the significance of studying malaria and specifically focusing on P. falciparum in this work. We will also update the discussion to reinforce the connection between our findings and malaria research and control and further emphasize the broader implications for the field.

    1. eLife Assessment

      This useful observational study was conducted in Dar es Salaam, Tanzania, to investigate potential associations between genetic variation in Mycobacterium tuberculosis and human host vs. disease severity. The authors conclude that human genetic ancestry did not contribute to tuberculosis severity, but the evidence for this conclusion is currently incomplete, as the analysis did not fully leverage the genome-wide data available in a human-strain association study, and there was no comparison group from the general population (or household controls), to which the ancestry findings could be compared. The findings have significance for the understanding of the influence of host / bacillary genetics on tuberculosis disease.

    2. Reviewer #1 (Public review):

      Summary:

      This Tanzanian study focused on the relationship between human genetic ancestry, Mycobacterium tuberculosis complex (MTBC) diversity, and tuberculosis (TB) disease severity. The authors analyzed the genetic ancestry of 1,444 TB patients and genotyped the corresponding MTBC strains isolated from the same individuals. They found that the study participants predominantly possess Bantu-speaking genetic ancestry, with minimal European and Asian ancestry. The MTBC strains identified were diverse and largely resulted from introductions from South or Central Asia. Unfortunately, no associations were identified between human genetic ancestry, the MTBC strains, or TB severity. The authors suggest that social and environmental factors are more likely to contribute to TB severity in this setting.

      Strengths:

      In comparison to other studies investigating the role of human genetics in TB phenotypes, this study is relatively large, with more than 1,400 participants.

      The matched human-MTBC strain collection is valuable and offers the opportunity to address questions about human-bacterium co-evolution.

      Weaknesses:

      Although the authors had genome-wide genotyping and whole genome sequencing data, they only compared the associations between human ancestry and MTBC strains. Given the large sample size, they had the opportunity to conduct a genome-wide association study similar to that of Muller et al. (https://doi.org/10.1016/j.ygeno.2021.04.024).

      The authors tested whether human genetic ancestry is associated with TB severity. However, the basis for this hypothesis is unclear. The studies cited as examples all focused on progression to active TB (from a latent infection state), which should not be conflated with disease severity. It is difficult to ascertain whether the role of genetic ancestry in disease severity would be detectable through this study design, as some participants might simply have been sicker for longer before being diagnosed (despite the inquiry about cough duration). This delay in diagnosis would not be influenced solely by human genetics, which is the conclusion of the study.

      Additionally, the study only included participants who attended the TB clinic.

      Including healthy controls from the general population would have provided an interesting comparison to see if ancestry proportions differ.

      Although the authors suggest that social and environmental factors contribute to TB severity, only age, smoking, and HIV status were characterised in the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the results of an observational study conducted in Dar es Salaam, Tanzania, investigating potential associations between genetic variation in M. tuberculosis and human host vs. disease severity. The headline finding is that no such associations were found, either for host / bacillary genetics as main effects or for interactions between them.

      Strengths:

      Strengths of the study include its large size and rigorous approaches to classification of genetic diversity for host and bacillus.

      Weaknesses:

      (1) There are some limitations of the disease severity read-outs employed: X-ray scores and Xpert cycle thresholds from sputum analysis can only take account of pulmonary disease. CXR is an insensitive approach to assessing 'lung damage', especially when converted to a binary measure. What was the basis for selection of Ralph score of 71 to dichotomise patients? If outcome measures were analysed as continuous variables, would this have been more sensitive in capturing associations of interest?

      (2) There is quite a lot of missing data, especially for TB scores - could this have introduced bias? This issue should be mentioned in the discussion.

      (3) The analysis adjusted for age, sex, HIV status, age, smoking and cough duration - but not for socio-economic status. This will likely be a major determinant of disease severity. Was adjustment made for previous TB (i.e. new vs repeat episode) and drug-sensitivity of the isolate? Cough duration will effectively be a correlate/consequence of more severe disease - thus likely highly collinear with disease severity read-outs - not a true confounder. How does removal of this variable from the model affect results? Data on socioeconomic status should be added to models, or if not possible then lack of such data should be noted as a limitation.

      (4) Recruitment at hospitals may have led to selection bias due to exclusion of less severe, community cases. The authors already acknowledge this limitation in the Discussion however.

      (5) Introduction: References refer to disease susceptibility, but the authors should also consider the influences of host/pathogen genetics on host response - both in vitro (PMIDs 11237411, 15322056) and in vivo (PMID 23853590). The last of these studies encompassed a broader range of ethnic variation than the current study, and showed associations between host ancestry and immune response - null results from the current study may reflect the relative genetic homogeneity of the population studied.

    1. eLife Assessment

      This important study introduces a fully differentiable variant of the Gillespie algorithm as an approximate stochastic simulation scheme for complex chemical reaction networks, allowing kinetic parameters to be inferred from empirical measurements of network outputs using gradient descent. The concept and algorithm design are convincing and innovative. While the proofs of concept are promising, the determination of the range of applicability and of the errors is incomplete, leaving open some questions about implications for more complex systems that cannot be addressed by existing methods. This work has the potential to be of significant interest to a broad audience of quantitative and synthetic biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This work introduces the differentiable Gillespie algorithm, DGA, which is a differentiable variant of the celebrated (and exact) Gillespie algorithm commonly used to perform stochastic simulations across numerous fields, notably in the life sciences. The proposed DGA approximates the exact Gillespie algorithm using smooth functions yielding a suitable approximate differentiable stochastic system as a proxy for the underlying discrete stochastic system, where DGA stochastic reactions have continuous reaction index and the species abundances. To illustrate their methodology, the authors specifically consider in detail the case of a well-studied two-state promoter gene regulation system that they analyze using a machine learning approach, and by combining simulation data with analytical results. For the two-state promoter gene system, the DGA is benchmarked by accurately reproducing the results of the exact Gillespie algorithm. For this same simple system, the authors also show that how the DGA can be used for estimating kinetic parameters of both simulated and real noisy experimental data. This let them argue convincingly that the DGA can become a powerful computation tool for applications in quantitative and synthetic biology. In order to argue that the DGA can be employed to design circuits with ad-hoc input-output relations, these considerations are then extended to a more complex four-state promoter model of gene regulation.

      Strengths:

      The main strength of the paper is its clarity and its pedagogical presentation of the simulation methods.

      Weaknesses:

      It would have been useful to have a brief discussion, based on a concrete example, of what can be achieved with the DGA and is totally beyond the reach of the Gillespie algorithm and the numerous existing stochastic simulation methods. A more comprehensive and quantitative analysis of the limitations of the DGA, e.g. for rare events, would have also been helpful.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors present a differentiable version of the widely-used Gillespie Algorithm. The Gillespie Algorithm has been used for decades to simulate the behavior of stochastic biochemical reaction networks. But while the Gillespie Algorithm is a powerful tool for the forward simulation of biochemical systems given some set of known reaction parameters, it cannot be used for reverse process, i.e. inferring reaction parameters given a set of measured system characteristics. The Differentiable Gillespie Algorithm ("DGA") overcomes this limitation by approximating two discontinuous steps in the Gillespie Algorithm with continuous functions. This makes it possible to calculate of gradients for each step in the simulation process which, in turn, allows the reaction parameters to be optimized via powerful backpropagation techniques. In addition to describing the theoretical underpinnings of DGA, the authors demonstrate different potential use-cases for the algorithm in the context of simple models of stochastic gene expression.

      Overall, the DGA represents an important conceptual step forward for the field, and should lay the groundwork for exciting innovations in the analysis and design of stochastic reaction networks. At the same time, significantly more work is needed to establish when the approximations made by DGA are valid, and to demonstrate the viability of the algorithm in the context of complicated reaction networks.

      Strengths:

      This work makes an important conceptual leap by introducing a version of the Gillespie Algorithm that is end-to-end differentiable. This idea alone has the potential to drive a number of exciting innovations in the analysis, inference, and design of biochemical reaction networks. Beyond the theoretical adjustments, the authors also implement their algorithm in a Python-based codebase that combines DGA powerful optimization libraries like PyTorch. This codebase has the potential to be of interest to a wide range of researchers, even if the true scope of the method's applicability remains to be fully determined.

      The authors also demonstrate how DGA can be used in practice both to infer reaction parameters from real experimental data (Figure 7) and to design networks with user-specified input-output characteristics (Figure 8). These illustrations should provide a nice roadmap for researchers interested in applying DGA to their own projects/systems.

      Finally, although it does not stem directly from DGA, the exploration of pairwise parameter dependencies in different network architectures provides an interesting window into the design constraints (or lack thereof) that shape the architecture of biochemical reaction networks.

      Weaknesses:

      While it is clear that the DGA represents an important conceptual advancement, the authors do not do enough in the present manuscript to (i) validate the robustness of DGA inference and (ii) demonstrate that DGA inference works in the kinds of complex biochemical networks where it would actually be of legitimate use.

      It is to the authors' credit that they are open and explicit about the potential limitations of DGA due to breakdowns in its continuous approximations. However they do not provide the reader with nearly enough empirical (i.e. simulation-based) or theoretical context to assess when, why, and to what extent DGA will fail in different situations. In Figure 2, they compare DGA to GA (i.e. ground-truth) in the context of a simple two state model of a stochastic transcription. Even in this minimal system, we see that DGA deviates notably from ground-truth both in the simulated mRNA distributions (Figure 2A) and in the ON/OFF state occupancy (Figure 2C). This begs the question of how DGA will scale to more complicated systems, or systems with non-steady state dynamics. Will the deviations become more severe? This is important because, in practice, there is really not much need for using DGA with a simple 2 state system-we have analytic solutions for this case. It is the more complex systems where DGA has the potential to move the needle.

      A second concern is that the authors' present approach for parameter inference and error calculation does not seem to be reliable. For example, in Figure 5A, they show DGA inference results for the ON rate of a two-state system. We see substantial inference errors in this case, even though the inference problem should be non-degenerate in this case. One reason for this seems to be that the inference algorithm does not reliably find the global minimum of the loss function (Figure 2B). To turn DGA into a viable approach, it is paramount that the authors find some way to improve this behavior, perhaps by using multiple random initializations to better search the loss space.

      Finally, the authors do a good job of illustrating how DGA might be used to infer biological parameters (Figure 7) and design reaction networks with desired input-output characteristics (Figure 8). However, analytic solutions exist for both of the systems they select for examples. This means that, in practice, there would be no need for DGA in these contexts, since one could directly optimize, e.g., the expressions for the mean and Fano Factor of the system in Figure 7A. I still believe that it is useful to have these examples, but it seems critical to add a use-case where DGA is the only option.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a differentiable variant of the Gillespie algorithm (DGA) that allows gradient calculation using backpropagation. The most significant contribution of this work is the development of the DGA itself, a novel approach to making stochastic simulations differentiable. This is achieved by replacing discontinuous operations in the traditional Gillespie algorithm with smooth, differentiable approximations using sigmoid and Gaussian functions. This conceptual advance opens up new avenues for applying powerful gradient-based optimization techniques, prevalent in machine learning, to studying stochastic biological systems.

      The method was tested on a simple two-state promoter model of gene expression. The authors found that the DGA accurately captured the moments of the steady-state distribution and other major qualitative features. However, it was less accurate at capturing information about the distribution's tails, potentially because rare events result from frequent low-probability reaction events where the approximations made by the DGA have a greater impact. The authors could further use the DGA to design a four-state promoter model of gene regulation that exhibited a desired input-output relationship. The DGA could learn parameters that produced a sharper response curve, which was achieved by consuming more energy.

      The authors conclude that the DGA is a powerful tool for analyzing and designing stochastic systems.

      Strengths:

      The DGA allows gradient-based optimization techniques to estimate parameters and design networks with desired properties.

      The DGA efficacy in estimating kinetic parameters from both synthetic and experimental data. This capability highlights the DGA's potential to extract meaningful biophysical parameters from noisy biological data.

      The DGA's ability to design a four-state promoter architecture exhibiting a desired input-output relationship. This success indicates the potential of the DGA as a valuable tool for synthetic biology, enabling researchers to engineer biological circuits with predefined behaviours.

      Weaknesses:

      The study primarily focuses on analysing the steady-state properties of stochastic systems. It is unclear how and if this framework can be used beyond the steady-state data presented in the case studies, where it is already quite computationally heavy.<br /> A more in-depth exploration of the DGA's performance in analysing dynamic trajectories, which capture the system's evolution over time, would provide a more comprehensive view of the algorithm's capabilities.<br /> Gradient computations in the DGA can be susceptible to numerical instability, particularly when the sharpness parameters of the sigmoid and Gaussian approximations are set to high values. This issue could lead to challenges in convergence during the optimization process.

    1. eLife Assessment

      This systematic review presents valuable insights into CCR5 antagonist drugs for neuroprotection and stroke management. The strength of the evidence is convincing, and the review methods and reporting adhere to the expected standards. A sensitivity analysis based on the risk of bias assessment of the included studies would be beneficial, and a more focused/detailed acknowledgment of key limitations of the review would add value to the quality of the reporting and interpretations of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The paper is well-organized, with clearly defined sections. The systematic review methodology is thorough, with clear eligibility criteria, search strategy, and data collection methods. The risk of bias assessment is also detailed and useful for evaluating the strength of evidence. The involvement of a patient panel is noticeable and positive, ensuring the research addresses real-world concerns and aligning scientific inquiry with patient perspectives. The statistical approach used for analyzing seems appropriate.

      The authors are encouraged to take into account the following points:

      As the authors have acknowledged, there is a high risk of bias across all included studies, particularly in randomization, selective outcome reporting, and incomplete data, which could be highlighted more explicitly in the paper's discussion section, particularly the potential implications for the generalizability of the results. The authors can also suggest mitigation strategies for future studies (e.g., better randomization, blinding, reporting standards, etc.). None of the studies include female animals, and the use of young adult animals (instead of aged models) limits the applicability of the findings to the human stroke population, where stroke incidence is higher in older adults and perhaps the gender issue must be included to reflect the translational aspects. The authors can add to the paper's discussion section that perhaps future preclinical studies should include both sexes and aged animals to align better with the clinical population and improve the translation of findings. Another point is the comorbidity. Comorbidities such as diabetes and hypertension are prevalent in stroke patients. How can these be considered in preclinical designs? The authors should emphasize the importance of future research incorporating such comorbid models to enhance clinical relevance.

      None of the studies had independent replication of their findings, which is a key limitation, especially for a field with high translational expectations. This should be highlighted as a critical next step for validating the efficacy of CCR5 antagonists.

      The studies accessed limited cognitive outcomes (only one reported a cognitive outcome). Given the importance of cognitive recovery post-stroke, this is a gap to highlight in the discussion. Future studies should include more diverse and comprehensive behavioral assessments, including cognitive and emotional domains, to fully evaluate the therapeutic potential.

      The timing of CCR5 administration across studies varies widely (from pre-stroke to several days post-stroke) complicating the interpretation and comparison of results. The authors are encouraged to add that future preclinical studies could focus on narrowing the therapeutic window to more clinically relevant time points.<br /> The paper identifies some alignment with clinical trials, but there are several gaps, too, particularly in the types of behavioral tests used in preclinical studies versus those in clinical trials. If this systematic review and meta-analysis aim to formulate a set of recommendations for future studies, it is important that the authors also propose specific preclinical behavioral tasks that could better align with clinical measures used in trials, like functional assessments related to human stroke outcomes.

      The discussion needs some revisions. It could benefit from an expanded explanation of CCR5's mechanistic role in neuroplasticity and stroke recovery. For instance, linking CCR5 antagonism more closely with molecular pathways related to synaptic repair and remyelination would enhance the quality of the discussion and understanding of the drugs' potential.

      While the tool is used to assess the risk of bias, it might be helpful to integrate a broader framework for evaluating the quality of included studies. This could include sample size justifications, statistical power analysis, or the use of pre-registration in animal studies. These elements can also introduce bias or minimize those if in place.

      Please also highlight confounding factors that might have influenced the results in the included studies, such as variation in stroke models, dosing regimens, or behavioral assessment methods.

      There is some discussion of the meta-analysis' limitations due to the few studies, but this point could be more thoroughly addressed. Please consider including a more critical discussion of the limitations of pooling data from heterogeneous study designs, stroke models, and outcome measures. What can this lead to? Is it reliable to do so, or does it lack scientific rigor? The authors are encouraged to formulate a balanced discussion adding, positive and negative aspects.<br /> The conclusion should more explicitly acknowledge that while CCR5 antagonists show potential, the findings are still preliminary due to the limitations in the preclinical studies (high bias risk, lack of diverse animal models). Overall, the conclusion can end with a call for rigorous, well-controlled, and replicated studies with improved alignment to clinical populations and trials to show that the conclusion remains inconclusive, considering what has been analyzed here.

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting, timely, and high-quality study on the potential neuroprotective capabilities of C-C chemokine receptor type 5 (CCR5) antagonists in ischemic stroke. The focus is on preclinical investigations.

      Strengths:

      The results are timely and interesting. An outstanding feature is that stroke patient representatives have directly participated in the work. Although this is often called for, it is hardly realized in research practice, so the work goes beyond established standards.

      The included studies were assessed regarding the therapeutic impact and their adherence to current quality assurance guidelines such as STAIR and SRRR, another important feature of this work. While overall results were promising, there were some shortcomings regarding guideline adherence.

      The paper is very well written and concise yet provides much highly useful information. It also has very good illustrations and extremely detailed and transparent supplements.

      Weaknesses:

      Although the paper is of very high quality, a couple of items that may require the authors' attention to increase the impact of this exciting work further. Specifically:

      Major aspects:

      (1) I hope I did not miss that (apologies if I did), but when exactly was the search conducted? Is it possible to screen the recent literature (maybe up to 12/2024) to see whether any additional studies were published?

      (2) Please clearly define the difference between "study" and "experiment," as this is not entirely clear. Is an "experiment" a distinct investigation within a particular publication (=study) that can describe more than one such "experiment"? Thanks for clarifying.

      (3) Is there an opportunity to conduct a correlation analysis between the quality of a study (for instance, after transforming the ROB assessment into a kind of score) and reported effect sizes for particular experiments or studies? This might be highly interesting.

    1. eLife Assessment

      This study presents a platform to implement closed-loop experiments in mice based on auditory feedback. The authors provide solid evidence that their platform enables a variety of closed-loop experiments using neural or movement signals, indicating that it will be a valuable resource to the neuroscience community. However, the demonstration experiments could be strengthened by increasing the sample size for several groups in the neurofeedback experiments, as well as a more thorough description of the results in the text.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a resource to the systems neuroscience community, by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      There are several weaknesses in the paper that diminish the impact of its strengths. First, the value of the CLoPy platform is not clearly articulated to the systems neuroscience community. Similarly, the resource could be better positioned within the context of the broader open-source neuroscience community. For an example of how to better frame this resource in these contexts, I recommend consulting the pyControl paper. Improving this framing will likely increase the accessibility and interest of this paper to a less technical neuroscience audience, for instance by highlighting the types of experimental questions CLoPy can enable.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, and an analysis of the movement-feedback experiments, my excitement for these experiments is tempered by the relative incompleteness of the dataset, as well as its description and analysis in the text. For instance, in the neurofeedback experiment, many of these regions only have data from a single mouse, limiting the conclusions that can be drawn. Additionally, there is a lack of reporting of the quantitative results in the text of the document, which is needed to better understand the degree of the results. Finally, the writing of the results section could use some work, as it currently reads more like a methods section.

      Suggestions for improved or additional experiments, data or analyses:

      Not necessary for this paper, but it would be interesting to see if the CLNF group could learn without auditory feedback.

      There are no quantitative results in the results section. I would add important results to help the reader better interpret the data. For example, in: "Our results indicated that both training paradigms were able to lead mice to obtain a significantly larger number of rewards over time," You could show a number, with an appropriate comparison or statistical test, to demonstrate that learning was observed.

      For: "Performing this analysis indicated that the Raspberry Pi system could provide reliable graded feedback within ~63 {plus minus} 15 ms for CLNF experiments." The LED test shows the sending of the signal, but the actual delay for the audio generation might be longer. This is also longer than the 50 ms mentioned in the abstract.

      It could be helpful to visualize an individual trial for each experiment type, for instance how the audio frequency changes as movement speed / calcium activity changes.

      The sample sizes are small (n=1) for a few groups. I am excited by the variety of regions recorded, so it could be beneficial for the authors to collect a few more animals to beef up the sample sizes.

      I am curious as to why 60 trials sessions were used. Was it mostly for the convenience of a 30 min session, or were the animals getting satiated? If the former, would learning have occurred more rapidly with longer sessions?

      Figure 4 E is interesting, it seems like the changes in the distribution of deltaF was in both positive and negative directions, instead of just positive. I'd be curious as to the author's thoughts as to why this is the case. Relatedly, I don't see Figure 4E, and a few other subplots, mentioned in the text. As a general comment, I would address each subplot in the text.

      For: "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time," I would provide a visual summary showing the learning curves for the different types of regions.

      Relatedly, I would further explain the fast vs slow learners, and if they mapped onto certain regions.

      Also I would make the labels for these plots (e.g. Supp Fig3) more intuitive, versus the acronyms currently used.

      The CLMF animals showed a decrease in latency across learning, what about the CLNF animals? There is currently no mention in the text or figures.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, the present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2020 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears uncomplete, and more generally the paper tends to offer overstatements regarding several points:<br /> - In contrast to the introductory statements of the article, closed-loop physiology in rodents is a well-established research topic. Beyond auditory feedback, this includes optogenetic feedback (O'Connor et al. 2013, Abbasi et al. 2018, 2023), electrical feedback in hippocampus (Girardeau et al. 2009), and much more.<br /> - The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. In particular, the closed-loop latency that they achieve (>60 ms) may be perceived by the mice. This is in contrast with other available closed-loop setups.<br /> - Through the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.<br /> - The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not clearly analyzed (for instance looking at the timing of the calcium signal modulation across the two ROIs. It seems that overall ROIs1 and 2 covariate, in contrast to Clancy et al. 2020. How can this be explained?

    4. Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and authors call it CLoPy. The authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenario.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the close loop system mice have shown better performance, learnt arbitrary task and can adapt to change in the rule as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems designed. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

      Major comments:

      (1) Page 5 paragraph 1: "We tested our CLNF system on Raspberry Pi for its compactness, general-purpose input/output (GPIO) programmability, and wide community support, while the CLMF system was tested on an Nvidia Jetson GPU device." Can these programs and hardware be integrated with windows-based system and a microcontroller (Arduino/ Tency). As for the broad adaptability that's what a lot of labs would already have (please comment/discuss)?

      (2) Hardware Constraints: The reliance on Raspberry Pi and Nvidia Jetson (is expensive) for real-time processing could introduce latency issues (~63 ms for CLNF and ~67 ms for CLMF). This latency might limit precision for faster or more complex behaviors, which authors should discuss in the discussion section.

      (3) Neurofeedback Specificity: The task focuses on mesoscale imaging and ignores finer spatiotemporal details. Sub-second events might be significant in more nuanced behaviors. Can this be discussed in the discussion section?

      (4) The activity over 6s is being averaged to determine if the threshold is being crossed before the reward is delivered. This is a rather long duration of time during which the mice may be exhibiting stereotyped behaviors that may result in the changes in DFF that are being observed. It would be interesting for the authors to compare (if data is available) the behavior of the mice in trials where they successfully crossed the threshold for reward delivery and in those trials where the threshold was not breached. How is this different from spontaneous behavior and behaviors exhibited when they are performing the test with CLNF?

    1. eLife Assessment

      This important study conducted experiments to quantify how changes in blood flow results in apparent fluorescence changes when imaging neural activity sensors using two-photon microscopy. While the study highlights the prevalence neural-activity independent artifacts in two-photon imaging, the evidence linking the observed signals to hemodynamic occlusion remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Fluorescence imaging has become an increasingly popular technique for monitoring neuronal activity and neurotransmitter concentrations in the living brain. However, factors such as brain motion and changes in blood flow and oxygenation can introduce significant artifacts, particularly when activity-dependent signals are small. Yogesh et al. quantified these effects using GFP, an activity-independent marker, under two-photon and wide-field imaging conditions in awake behaving mice. They report significant GFP responses across various brain regions, layers, and behavioral contexts, with magnitudes comparable to those of commonly used activity sensors. These data highlight the need for robust control strategies and careful interpretation of fluorescence functional imaging data.

      Strengths:

      The effect of hemodynamic occlusion in two-photon imaging has been previously demonstrated in sparsely labeled neurons in V1 of anesthetized animals (see Shen and Kara et al., Nature Methods, 2012). The present study builds on these findings by imaging a substantially larger population of neurons in awake, behaving mice across multiple cortical regions, layers, and stimulus conditions. The experiments are extensive, the statistical analyses are rigorous, and the results convincingly demonstrate significant GFP responses that must be accounted for in functional imaging experiments. However, whether these GFP responses are driven by hemodynamic occlusion remains less clear, given the complexities associated with awake imaging and GFP's properties (see below).

      Weaknesses:

      (1) The authors primarily attribute the observed GFP responses to hemodynamic occlusion. While this explanation is plausible, other factors may also contribute to the observed signals. These include uncompensated brain movement (e.g., axial-direction movements), leakage of visual stimulation light into the microscope, and GFP's sensitivity to changes in intracellular pH (see e.g., Kneen and Verkman, 1998, Biophysical Journal). Although the correlation between GFP signals and blood vessel diameters supports a hemodynamic contribution, it does not rule out significant contributions from these (or other) factors. Consequently, whether GFP fluorescence can reliably quantify hemodynamic occlusion in two-photon microscopy remains uncertain.

      (2) Regardless of the underlying mechanisms driving the GFP responses, these activity-independent signals must be accounted for in functional imaging experiments. However, the present manuscript does not explore potential strategies to mitigate these effects. Exploring and demonstrating even partial mitigation strategies could have significant implications for the field.

      (3) Several methodology details are missing from the Methods section. These include: (a) signal extraction methods for two-photon imaging data (b) neuropil subtraction methods (whether they are performed and, if so, how) (c) methods used to prevent visual stimulation light from being detected by the two-photon imaging system (d) methods to measure blood vessel diameter/area in each frame. The authors should provide more details in their revision.

    3. Reviewer #2 (Public review):

      Approach

      In this study, Yogesh et al. aimed at characterizing hemodynamic occlusion in two photon imaging, where its effects on signal fluctuations are underappreciated compared to that in wide field imaging and fiber photometry. The authors used activity-independent GFP fluorescence, GCaMP and GRAB sensors for various neuromodulators in two-photon and widefield imaging during a visuomotor context to evaluate the extent of hemodynamic occlusion in V1 and ACC. They found that the GFP responses were comparable in amplitude to smaller GCaMP responses, though exhibiting context-, cortical region-, and depth-specific effects. After quantifying blood vessel diameter change and surrounding GFP responses, they argued that GFP responses were highly correlated with changes in local blood vessel size. Furthermore, when imaging with GRAB sensors for different neuromodulators, they found that sensors with lower dynamic ranges such as GRAB-DA1m, GRAB-5HT1.0, and GRAB-NE1m exhibited responses most likely masked by the hemodynamic occlusion, while a sensor with larger SNR, GRAB-ACh3.0, showed much more distinguishable responses from blood vessel change.

      Strengths

      This work is of broad interest to two photon imaging users and GRAB developers and users. It thoroughly quantifies the hemodynamic driven GFP response and compares it to previously published GCaMP data in a similar context, and illustrates the contribution of hemodynamic occlusion to GFP and GRAB responses by characterizing the local blood vessel diameter and fluorescence change. These findings provide important considerations for the imaging community and a sobering look at the utility of these sensors for cortical imaging.

      Importantly, they draw clear distinctions between the temporal dynamics and amplitude of hemodynamic artifacts across cortical regions and layers. Moreover, they show context dependent (Dark versus during visual stimuli) effects on locomotion and optogenetic light-triggered hemodynamic signals.

      Most of the first generation neuromodulator GRAB sensors showed relatively small responses, comparable to blood vessel changes in two photon imaging, which emphasizes a need for improved the dynamic range and response magnitude for future sensors and encourages the sensor users to consider removing hemodynamic artifacts when analyzing GRAB imaging data.

      Weaknesses

      The largest weakness of the paper is that, while they convincingly quantify hemodynamic artifacts across a range of conditions, they do not quantify any methods of correcting for them. The utility of the paper could have been greatly enhanced had they tested hemodynamic correction methods (e.g. from Ocana-Santero et al., 2024) and applied them to their datasets. This would serve both to verify their findings-proving that hemodynamic correction removes the hemodynamic signal-and to act as a guide to the field for how to address the problem they highlight.

      The paper attributes the source of 'hemodynamic occlusion' primarily to blood vessel dilation, but leaves unanswered how much may be due to shifts in blood oxygenation. Figure 4 directly addresses the question of how much of the signal can be attributed to occlusion by measuring the blood vessel dilation, but notably fails to reproduce any of the positive transients associated with locomotion in Figure 2. Thus, an investigation into or at least a discussion of what other factors (movement? Hb oxygenation?) may drive these distinct signals would be helpful.

      Along these lines, the authors carefully quantified the correlation between local blood vessel diameter and GFP response (or neuropil fluorescence vs blood vessel fluorescence with GRAB sensors). To what extent does this effect depend on proximity to the vessels? Do GFP/ GRAB responses decorrelate from blood vessel activity in neurons further from vessels (refer to Figure 5A and B in Neyhart et al., Cell Reports 2024)?

      Raw traces are shown in Figure 2 but we are never presented with the unaveraged data for locomotion of stimulus presentation times, which limits the reader's ability to independently assess variability in the data. Inclusion of heatmaps comparing event aligned GFP to GCaMP6f may be of value to the reader.

      More detailed analysis of differences between the kinds of dynamics observed in GFP vs GCaMP6f expressing neurons could aid in identifying artifacts in otherwise clean data. The example neurons in Figure 2A hint at this as each display unique waveforms and the question of whether certain properties of their dynamics can reveal the hemodynamic rather than indicator driven nature of the signal is left open. Eg. do the decay rate and rise times differ significantly from GCaMP6f signals?

      The authors suggest that signal to noise ratio of an indicator likely affects the ability to separate hemodynamic response from the underlying fluorescence signal. Does the degree of background fluorescence affect the size of the artifact? If there was variation in background and overall expression level in the data this could potentially be used to answer this question. Could lower (or higher!) expression levels increase the effects of hemodynamic occlusion?<br /> The choice of the phrase 'hemodynamic occlusion' may cause some confusion as the authors address both positive and negative responses in the GFP expressing neurons, and there may be additional contributions from changes in blood oxygenation state.

      The choice of ACC as the frontal region provides a substantial contrast in location, brain movement, and vascular architecture as compared to V1. As the authors note, ACC is close to the superior sagittal sinus and thus is the region where the largest vascular effects are likely to occur. The reader is left to wonder how much of the ROI may or may not have included vasculature in the ACC vs V1 recordings as the only images of the recording sites provided are for V1. We are left unable to conclude whether the differences observed between these regions are due to the presence of visible vasculature, capillary blood flow or differences in neurovasculature coupling between regions. A less medial portion of M2 may have been a more appropriate comparison. At least, inclusion of more example imaging fields for ACC in the supplementary figures would be of value.

      In Figure 3, How do the proportions of responsive GFP neurons compare to GCaMP6f neurons?

      How is variance explained calculated in Figure 4? Is this from a linear model and R^2 value? Is this variance estimate for separate predictors by using single variable models? The methods should describe the construction of the model including the design matrix and how the model was fit and if and how cross validation was run.

      Cortical depth is coarsely defined as L2/3 or L5, without numerical ranges in depth from pia.

      Overall Assessment:

      This paper is an important contribution to our understanding of how hemodynamic artifacts may corrupt GRAB and calcium imaging, even in two-photon imaging modes. Certain useful control experiments, such as intrinsic optical imaging in the same paradigms, were not reported, nor were any hemodynamic correction methods investigated. Thus, this limits both mechanistic conclusions and the overall utility with respect to immediate applications by end users. Nevertheless, the paper is of significant importance to anyone conducting two-photon or widefield imaging with calcium and GRAB sensors and deserves the attention of the broader neuroscience and in-vivo imaging community.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to investigate if hemodynamic occlusion contributes to fluorescent signals measured with two-photon microscopy. For this, they image the activity-independent fluorophore GFP in 2 different cortical areas, at different cortical depths and in different behavioral conditions. They compare the evoked fluorescent signals with those obtained with calcium sensors and neuromodulator sensors and evaluate their relationship to vessel diameter as a readout of blood flow.<br /> They find that GFP fluorescence transients are comparable to GCaMP6f stimuli-evoked signals in amplitude, although they are generally smaller. Yet, they are significant even at the single neuronal level. They show that GFP fluorescence transients resemble those measured with the dopamine sensor GRAB-DA1m and the serotonin sensor GRAB-5HT1.0 in amplitude an nature, suggesting that signals with these sensors are dominated by hemodynamic occlusion. 
Moreover, the authors perform similar experiments with wide-field microscopy which reveals the similarity between the two methods in generating the hemodynamic signals. Together the evidence presented calls for the development and use of high dynamic range sensors to avoid measuring signals that have another origin from the one intended to measure. In the meantime, the evidence highlights the need to control for those artifacts such as with the parallel use of activity independent fluorophores.

      Strengths:

      - Comprehensive study comparing different cortical regions in diverse behavioral settings in controlled conditions.<br /> - Comparison to the state-of-the-art, i.e. what has been demonstrated with wide-field microscopy.<br /> - Comparison to diverse activity-dependent sensors, including the widely used GCaMP.

      Weaknesses:

      - The kinetics of GCaMP is stereotypic. An analysis/comment on if and how the kinetics of the signals could be used to distinguish the hemodynamic occlusion artefacts from calcium signals would be useful.<br /> - Is it possible that motion is affecting the signals in a certain degree? This issue is not made clear.<br /> - The causal relationship with blood flow remains open. Hemodynamic occlusion seems a good candidate causing changes in GFP fluorescence, but this remains to be well addressed in further research.

    1. eLife Assessment

      This study examined how multidimensional social relationships influence social attention in rhesus macaques, linking individual and group-level behaviors to attentional processes. The findings that oxytocin altered social attention and its relationship to both social tendencies and dyadic relationships are important, as recent technological advances allow for the exploration of neuronal activities and mechanisms in free-moving macaques. This work is convincing and will be of interest to those studying the interplay between social dynamics and information processing in primates.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the links between social behaviors observed in free-moving situations and behavioral performances measured in well-controlled, laboratory settings. The authors assessed general social tendencies and dyadic relationships among four monkeys in a group by scoring agonistic (aggression) and affiliative (grooming and proximity) behaviors in each pair. By measuring the saccadic reaction time in a classic social interference task, the authors reported that the monkeys with higher SEIs (i.e., more social individuals) were less distracted by the faces of other monkeys. These effects were enhanced when the distractors were out-group monkey faces rather than in-group ones. Lastly, oxytocin administration increased the impact of the out-group monkey faces in the social interference task, while reducing the magnitude of general social tendencies measured with SEI.

      Strengths:

      (1) The combination of behavioral data obtained in a colony room and in a laboratory environment is rare and important.<br /> (2) The evaluation of social interactions were successfully performed based on an automated target detection algorithm. The resulting multi-dimensional, complicated social interactions were summarized into simple indices (SEI and IEI). These indices provide a good measure for the social tendencies of each monkey.<br /> (3) Well-designed and robust experiments in the laboratory environment that are linked nicely with the general social tendencies observed in spontaneous behaviors.

      Weaknesses:

      (1) While the overall results are interesting, I am somewhat left confused about how to interpret the difference in the scores derived from different conditions. For example, the authors stated "Comparing the weights for in-group and out-group distractors, the effect of proximity was larger than that of aggression and grooming" in p.8. Does this mean that the proximity is indeed the type of behavior most affected in the out-group condition compared to the in-group condition? The out-group effects are difficult to examine with actual behavioral data, but some in-group effects such as those involving OT can be tested, which possibly provides good insights into interpreting the differences of the weights observed across the experimental conditions.

      (2) I think it is important to provide how variable spontaneous social interactions were across sessions and how impactful the variability of the interactions is on the SEI and IEI, as it helps to understand how meaningful the differences of weights are across the conditions, but such data are missing. In line with this point, although the conclusions still hold as those data were obtained during the same experimental periods, shouldn't the weights in Fig. 3f and Figs. 4g and 4h (saline) be expected to be similar, if not the same?

    3. Reviewer #2 (Public review):

      Summary:

      The study presents significant findings that elucidate the relationship between multi-dimensional social relationships and social attention in rhesus macaques. By integrating advanced computational methods, behavioral analyses, and neuroendocrine manipulation, the authors provide strong evidence for how oxytocin modulates attention within social networks. The results are robust and address critical gaps in understanding the dynamics of social attention in primates.

      Strengths:

      (1) The use of YOLOv5 for automatic behavioral detection is an exceptional methodological advance. The combination of automated analyses with manual validation enhances confidence in the data.<br /> (2) The study's focus on three distinct dimensions of social interaction (aggression, grooming, and proximity) is comprehensive and provides nuanced insights into the complexity of primate social networks.<br /> (3) The investigation of oxytocin's role adds a compelling neuroendocrine dimension to the findings, providing a bridge between behavioral and neural mechanisms.

      Weaknesses:

      (1) The study's conclusions are based on observations of only four monkeys, which limits the generalizability of the findings. Larger sample sizes could strengthen the validity of the results.<br /> (2) The limited set of stimulus images (in-group and out-group faces) may introduce unintended biases. This could be addressed by increasing the diversity of stimuli or incorporating a broader range of out-group members.

    1. eLife Assessment

      This valuable study uses a novel method to record spine calcium responses without the confounds of backpropagating action potentials to study how the dendritic integration of large numbers of inputs generates the tuned output of cortical neurons. While the results are generally solid, the study would benefit from more details, characterizations, and quantifications, including better validation of the method to suppress backpropagating action potentials.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Kondo et al. developed a method to suppress somatic action potentials while recording spine calcium signals using two-photon imaging in the L2/3 visual cortex in response to visual stimuli. The authors identified different patterns of dendritic spine activation by visual stimuli and analyzed how the different patterns of spine responses may contribute to somatic visual responses. Their analysis results suggest that spines on dendrites with a clustered arrangement can potentially generate sharply tuned output.

      Strengths:

      This is an interesting study addressing a standing question of how previously reported pepper-and-salt-like distributed sensory inputs on individual spines may give rise to somatic sensory selectivity. The method of somatic inhibition to prevent bAPs appears new and effective. The measurements of spine activity are carefully done. The finding that a small number of spines located in the same branch with similar tuning properties would predict the somatic tuning is consistent with local dendritic nonlinear integration mechanisms.

      Weaknesses:

      (1) The demonstration of the effectiveness of soma-specific inhibition is inadequate. Figure 1 only provides a single example trace showing the inhibition of somatic visual responses. The authors should provide statistical analysis over grouped data. For the effect of soma-specific inhibition on spine activity, the authors provided mostly negative results, lacking effects on spine responses for both soma inhibition and bAP subtraction. This is confusing. One possible explanation is that bAPs normally have little influence on spine activity. However, this would conflict with the known fact that somatic APs can easily invade spines in L2/3 neurons (e.g., Chen et al., Nature 2011). Another possibility is that under the current experimental conditions, somatic APs were rarely evoked by the visual stimulus. The authors should also rule out the possibility that the spines they imaged are from different neurons than the ones with somatic inhibition. The authors may consider identifying those cases where somatic APs have a significant impact on spine activity or spine tuning and show how bAP inhibition influences the dendritic and spine responses.

      (2) Figure 4 shows that the proportion of spines with a preferred orientation similar to the soma (ΔOri {less than or equal to} 30{degree sign}) was 60%, which is surprisingly high. It is intriguing that without somatic AP invasion, there could be such a high degree of similarity between spine activity and somatic tuning. What is the ratio without soma inhibition? One could reason that with bAP invasion, there should be even more spines showing visual responses similar to those of the soma. Moreover, with such a high proportion of spines showing similar sensory tuning to the soma, it is inevitable that many branches contain more spines with similar tuning as the soma, exhibiting an apparent branch-specific clustering. While such apparent clustering may well predict somatic tuning, it primarily reflects a correlational relationship rather than a causal synaptic integration mechanism.

      (3) There has been extensive work studying how the integration of spine activity or sub-branch activity gives rise to somatic output. The proposed main contribution of this study is to use an improved method to inhibit somatic activity in order to more confidently measure spine-specific activity and examine the integration mechanisms. However, the results showed that the measured spine-specific activity under soma inhibition was not significantly different from that measured under normal conditions (see point 1). It becomes unclear how this new method contributes to obtaining new insights into the synaptic integration mechanism.

      (4) Figure 6 shows how the tuning similarity between spines depends on the distance between them. It is unclear what new information was acquired regarding the functional clustering of spines. This result can be largely explained by the overall higher proportion of similarly tuned spines (60%) compared to the soma's preferred orientations. Moreover, the authors did not demonstrate how such clustering may contribute to nonlinear synaptic integration.

      (5) The results shown in Figure 7 can again be largely explained by the static property of a higher proportion of spines tuned similarly to the soma. These results do not reveal any active dendritic integration mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      The paper from Kondo et al., addresses how the functional organization of synaptic inputs in 2/3 pyramidal neurons contributes to their output firing. Expressing GCamp6s to monitor calcium activity and the bi-stable inhibitory opsin SwiChR++ to inhibit the somatic activity of the imaged neurons, the authors were able to image up to ~5700 spines in basal dendrites from 6 neurons. Mapping the functional responses of such a large number of dendritic spines and relating it to the output firing of the parent neuron is a remarkable feat. The authors studied the clustering of similarly tuned spines within individual dendrites and found that while some dendrites are similarly tuned to the same orientation of the parent neuron, other dendrites exhibit tuning to other orientations and moreover a significant proportion of dendrites exhibit no tuning. Modelling work suggests that the clustering of spines in a small proportion of dendrites should suffice to give rise to the tuning of the parent cell.

      Strengths:

      (1) Removal of the potential confound of somatic firing via optogenetic inhibition is convincing and validates a useful tool for the neuroscientific community. As discussed by the authors the tool would be most valuable for the study of excitatory inputs in inhibitory neurons.

      (2) The comparison of optogenetic inhibition of somatic responses and isolation of spine-specific signals using the removal of backpropagating action potential by robust regression is an important control and constitutes an important affirmation of previously published work.

      (3) The large dataset size provides enough statistical power to test for clustering of similarly tuned spines in basal dendrites.

      (4) The study provides a useful replication of previously published results.

      (5) Modelling work in the study shows that as in the ferret visual cortex (Wilson et al., 2016), a combination of dendritic nonlinearity and spike thresholding contribute to the sharpness of orientation tuning in the mouse visual cortex.

      Weaknesses:

      (1) One of the main conclusions of the study, the classification of dendrites according to the presence or absence of visual responses, lacks quantification.

      (2) Some of the statistics employed in combination with shuffling controls are not adequate.

      (3) All the neurons imaged are very highly tuned (with a very high orientation selectivity index (OSI)). The performance of the models is evaluated by the correlation coefficient between the predicted and the measured somatic tuning curve. The high OSI of the neurons reduces the sensitivity of the evaluation of the models, as it results in extremely high or low correlation coefficients (Figure 8a). It would be important to recapitulate the results from the model for neurons with lower OSI, given that not all L2/3 neurons are so highly tuned.

      (4) It is very hard to understand how the modelling results relate to the experimental data, as the definitions of what constitutes a clustered dendrite in the model or in the experimental data are unclear.

    1. eLife Assessment

      This study presents a valuable finding on a new role of glia in activity-dependent synaptic remodeling using the Drosophila NMJ as a model system. The evidence supporting the claims of the authors is solid. However, the unaddressed cell-type specific mechanisms of Shv secretion and regulation on the extracellular glutamate levels and lack of details on the methods for statistical analysis have hindered further evaluation of the claims. The work will be of interest to neuroscientists working on glia-neuron interaction and synaptic remodeling.

    2. Reviewer #1 (Public review):

      In this manuscript, Chang et al. investigated the cell type-specific role of the integrin activator Shv in activity-dependent synaptic remodeling. Using the Drosophila larval neuromuscular junction as a model, they show that glial-secreted Shv modulates synaptic plasticity by maintaining the extracellular balance of neuronal Shv proteins and regulating ambient extracellular glutamate concentrations, which in turn affects postsynaptic glutamate receptor abundance. Furthermore, they report that genetic perturbation of glial morphogenesis phenocopies the defects observed with the loss of glial Shv. Altogether, their findings propose a role for glia in activity-induced synaptic remodeling through Shv secretion. While the conclusions are intriguing, several issues related to experimental design and data interpretation merit further discussion.

    3. Reviewer #2 (Public review):

      In this paper Chang et al follow up on their lab's previous findings about the secreted protein Shv and its role in activity-induced synaptic remodeling at the fly NMJ. Previously they reported that shv mutants have impaired synaptic plasticity. Normally a high stimulation paradigm should increase bouton size and GluR expression at synapses but this does not happen in shv mutants. The phenotypes relating to activity dependent plasticity were completely recapitulated when Shv was knocked down only in neurons and could be completely rescued by incubation in exogenously applied Shv protein. The authors also showed that Shv activation of integrin signaling on both the pre- and post- synapse was the molecular mechanism underlying its function. Here they extend their study to consider the role of Shv derived from glia in modulating synaptic features at baseline and remodeling conditions. This study is important to understand if and how glia contribute to these processes. Using cell-type specific knockdown of Shv only in glia causes abnormally high baseline GluR expression and prevents activity-dependent increases in bouton size or GluR expression post-stimulation. This does not appear to be a developmental defect as the authors show that knocking down Shv in glia after basic development has the same effects as life long knockdown, so Shv is acting in real time. Restoring Shv in ONLY glia in mutant animals is sufficient to completely rescue the plasticity phenotypes and baseline GluR expression, but glial-Shv does not appear to activate integrin signaling which was shown to be the mechanism for neuronally derived Shv to control plasticity. This led the authors to hypothesize that glial Shv works by controlling the levels of neuronal Shv and extracellular glutamate. They provide evidence that in the absence of glial Shv, synaptic levels of Shv go up overall, presumably indicating that neurons secrete more Shv. In this context which could then work via integrin signaling as described to control plasticity. They use a glutamate sensor and observe decreased signal (extracellular glutamate) from the sensor in glial Shv KD animals, however, this background has extremely high GluR levels at the synapse which may account for some or all of the decreases in sensor signal in this background. Additional controls to test if increased GluR density alone affects sensor readouts and/or independently modulating GluR levels in the glial KD background would help strengthen this data. In fact, glial-specific shv KD animals have baseline levels of GluR that are potentially high enough to have hit a ceiling of expression or detection that accounts for the inability for these levels to modulate any higher after strong stimulation and such a ceiling effect should be considered when interpreting the data and conclusions of this paper. Several outstanding questions remain-why can't glial derived Shv activate integrin pathways but exogenously applied recombinant Shv protein can? The effects of neuronal specific rescue of shv in a shv mutant are not provided vis-à-vis GluR levels and bouton size to compare to the glial only rescue. Inclusion of this data might provide more insight to outstanding questions of how and why the source of Shv seems to matter for some aspects of the phenotypes but not others despite the fact that exogenous Shv can rescue and in some experimental paradigms but not others.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides compelling evidence that glia-derived Shriveled (Shv) modulates activity-dependent synaptic plasticity at the Drosophila neuromuscular junction (NMJ). This mechanism differs from the previously reported function of neuronally released Shv, which activates integrin signaling. They further show that this requirement of Shv is acute and that glial Shv supports synaptic plasticity by modulating neuronal Shv release and the ambient glutamate levels. However, there are a number of conceptual and technical issues that need to be addressed.

      Major comments

      (1) From the images provided for Fig 2B +RU486, the bouton size appears to be bigger in shv RNAi + stimulation, especially judging from the outline of GluR clusters.<br /> (2) The shv result needs to be replicated with a separate RNAi.<br /> (3) The phenotype of shv mutant resembles that of neuronal shv RNAi - no increased GluR baseline. Any insights why that is the case?<br /> (4) In Fig 3B, SPG shv RNAi has elevated GluR baseline, while PG shv RNAi has a lower baseline. In both cases, there is no activity induced GluR increase. What could explain the different phenotypes?<br /> (5) In Fig 4C, the rescue of PTP is only partial. Does that suggest neuronal shv is also needed to fully rescue the deficit of PTP in shv mutants?<br /> (6) The observation in Fig 5D is interesting. While there is a reduction in Shv release from glia after stimulation, it is unclear what the mechanism could be. Is there a change in glial shv transcription, translation or the releasing machinery? It will be helpful to look at the full shv pool vs the released ones.<br /> (7) In Fig 5E, what will happen after stimulation? Will the elevated glial Shv after neuronal shv RNAi be retained in the glia?<br /> (8) It would be interesting to see if the localization of shv differs based on if it is released by neuron or glia, which might be able to explain the difference in GluR baseline. For example, by using glia-Gal4>UAS-shv-HA and neuronal-QF>QUAS-shv-FLAG. It seems important to determine if they mix together after release? It is unclear if the two shv pools are processed differently.<br /> (9) Alternatively, do neurons and glia express and release different Shv isoforms, which would bind different receptors?<br /> (10) It is claimed that Sup Fig 2 shows no observable change in gross glial morphology, further bolstering support that glial Shv does not activate integrin. This seems quite an overinterpretation. There is only one image for each condition without quantification. It is hard to judge if glia, which is labeled by GFP (presumably by UAS-eGFP?), is altered or not.<br /> (11) The hypothesis that glutamate regulates GluR level as a homeostatic mechanism makes sense. What is the explanation of the increased bouton size in the control after glutamate application in Fig 6?<br /> (12) What could be a mechanism that prevents elevated glial released Shv to activate integrin signaling after neuronal shv RNAi, as seen in Fig 5E?<br /> (13) Any speculation on how the released Shv pool is sensed?

    1. eLife Assessment

      In this important manuscript, Ryan et al perform a genome-wide CRISPR based screen to identify genes that modulate TDP-43 levels in neurons. They identify a number of genes and pathways and highlight the BORC complex, which is required for anterograde lysosome transport as one such regulator of TDP-43 protein levels. Overall, this is a convincing study, which opens the door for additional future investigations on the regulation of TDP-43.

    2. Reviewer #1 (Public review):

      Summary:

      As TDP-43 mislocalization is a hallmark of multiple neurodegenerative diseases, the authors seek to identify pathways that modulate TDP-43 levels. To do this, they use a FACS based genome wide CRISPR KD screen in a Halo tagged TDP-43 KI iPSC line. Their screen identifies a number of genetic modulators of TDP-43 expression including BORC which plays a role in lysosome transport.

      Strengths:

      Genome wide CRISPR based screen identifies a number of modulators of TDP-43 expression to generate hypotheses regarding RNA BP regulation and perhaps insights into disease.

      Weaknesses:

      It is unclear how altering TDP-43 levels may relate to disease where TDP-43 is not altered in expression but mislocalized. This is a solid cell biology study, but the relation to disease is not clear without providing evidence of BORC alterations in disease or manipulation of BORC reversing TDP-43 pathology in disease.

      The mechanisms by which BORC and lysosome transport modulate TDP-43 expression are unclear. Presumably, this may be through altered degradation of TDP protein but this is not addressed.

      Previous studies have demonstrated that TDP-43 levels can be modulated by altering lysosomal degradation so the identification of lysosomal pathways is not particularly novel.

      It is unclear whether this finding is specific to TDP-43 levels or whether lysosome localization may more broadly impact proteostasis in particular of other RNA BPs linked to disease.

      Unclear whether BORC depletion alters lysosome function or simply localization.

    3. Reviewer #2 (Public review):

      Summary:

      The authors employ a novel CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. They also find that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. This is highly significant for the field given that a) other proteins could also be regulated in this way, b) understanding mechanisms that influence TDP-43 levels are significant given that its dysregulation is considered a major driver of several neurodegenerative diseases and c) the novelty of the proposed mechanism.

      Strengths:

      The novelty and information provided by the CRISPRi screen. The authors provide evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels and show a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. Further, the methods and concepts will have an impact to the larger community as well. The work also sets up for further work to understand the somewhat paradoxical findings that even though the tagged TDP-43 protein is reduced in the screen, it does not alter cryptic exon splicing and there is a longer TDP-43 half-life with BORC KD.

      Weaknesses:

      While the data is very strong, the work requires some additional clarification.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Ryan et al. have performed a state-of-the-art full genome CRISP-based screen of iNEurons expressing a teggd version of TDP-43 in order to determine expression modifiers of this protein. Unexpectedly, using this approach the authors have uncovered a previously undescribed role of the BORC complex in affecting the levels of TDP-43 protein, but not mRNA expression. Taken together, these findings represent a very solid piece of work that will certainly be important for the field.

      Strengths:

      - BORC is a novel TDP-43 expression modifier that has never been described before and it seemingly acts on regulating protein half life rather than transcriptome level. It has been long known that different labs have reported different half-lives for TDP-43 depending on the experimental system but no work has ever explained these discrepancies. Now, the work of Ryan et al. has for the time identified one of these factors which could account for these differences and play an important role in disease (although this is left to be determined in future studies).<br /> - The genome wide CRISPR screening has demonstrated to yield novel results with high reproducibility and could eventually be used to search for expression modifiers of many other proteins involved in neurodegeneration or other diseases

      Weaknesses:

      - The fact that TDP-43 mRNA does not change following BORCS6 KD is based on a single qRT-PCR that does not really cover all possibilities. For example, the mRNA total levels may not change but the polyA sites may have switched from the highly efficient pA1 to the less efficient and nuclear retained pA4. There are therefore a few other experiments that could have been performed to make this conclusion more compelling, maybe also performing RNAscope experiments to make sure that no change occurred in TDP-43 mRNA localisation in cells.<br /> - Even assuming that the mRNA does not change, no explanation for the change in TDP-43 protein half life has been proposed by the authors. This will presumably be addressed in future studies: for example, are mutants that lack different domains of TDP-43 equally affected in their half-lives by BORC KD?. Alternatively, can a mass-spec be attempted to see whether TDP-43 PTMs change following BORCS6 KD?

    1. eLife Assessment

      This valuable study by Cui et al. investigates mechanisms generating sighs, which are crucial for respiratory function and linked to emotional states. Utilizing advanced methods in mice, they provide solid evidence that increased excitability in specific preBötzinger complex neuronal subpopulations expressing Neuromedin B receptors, gastrin-releasing peptide receptors, or somatostatin can induce sigh-like large amplitude inspirations. With additional technical clarifications and further elaboration of the limitations in terms of how the results are interpreted in the revised manuscript, the study will interest neuroscientists studying respiratory neurobiology and rhythmic motor systems.

    2. Reviewer #1 (Public review):

      Summary of what is achieved: This manuscript validates and extends upon the sigh generating circuit between the NMB/GRP+ RTN/parafacial neurons and the NMBR/GRPR+ preBötC neurons established in Li et al., 2016. The authors generate multiple transgenic lines that enable selective targeting of these various sub-populations of cells and demonstrate the sufficiency of each type in generating a sigh breath. Additionally, they show that NMBR and GPRP preBötC neurons are glutamatergic, have overlapping and distinct expression, and do not express SST. Beyond this validation, the authors show that ectopic stimulation of SST neurons is sufficient to evoke sighs and that they are necessary for NMB/GRP induced sighing. This data is the first time that preBötC neurons downstream of NMBR/GRPR neurons have been identified that transform a eupneic breath into a sign breath. The five conclusions stated at the end of the introduction are supported by the data.

      Summary of a primary weakness: A strong emphasis throughout the manuscript is the identification of an unsubstantiated slow sigh rhythm that is produced by NMBR/GRPR neurons. It is even suggested that this is an intrinsic property of these neurons. However, to make such a novel (and quite surprising) claim requires many more studies and the conclusion is dependent on how the authors have defined a sigh. Moreover, some data within the paper conflicts with this idea. The resubmitted manuscript does not contain any revisions and the rebuttal does not sufficiently address the critiques.

      In summary, the optogenetic and chemogenetic characterization of the neuropeptide pathway transgenic lines nicely aligns with and provides important validation of the previous study by Li et. al., 2016 and the SST neuron studies provide a new mechanism for the transformation of NMBR/GRPR neuropeptide activation into a sigh. These are important findings, and they should be the points emphasized. The proposal of a slow sigh rhythm should be more rigorously established with new experiments and analysis or should be more carefully described and discussed.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates in mice neural mechanisms generating sighs, which are periodic large-amplitude breaths occurring during normal breathing that subserve physiological pulmonary functions and are associated with emotional states such as relief, stress, and anxiety. Sighs are generated by a structure called the preBötzinger complex (preBötC) in the medulla oblongata that generates various forms of inspiratory activity including sighs. The authors have previously described a circuit involving neurons producing bombesin-related peptides Neuromedin B (NMB) and gastrin releasing peptide (GRP) that project to preBötC neurons expressing receptors for NMB (NMBRs) and GRP (GRPRs) and that activation of these preBötC neurons via these peptide receptors generates sighs. In this study the authors further investigated mechanisms of sigh generation by applying optogenetic and chemogenetic strategies to selectively activate the subpopulations of preBötC neurons expressing NMBRs and/or GRPRs, and a separate subpopulation of neurons expressing somatostatin (SST) but not NMBRs and GRPRs. The authors present convincing evidence that sigh-like inspirations can be evoked by photostimulation of the preBötC neurons expressing NMBRs or GRPRs. Photostimulation of SST neurons can independently evoke sighs, and chemogenetic inhibition of these neurons can abolish sighs. The results presented support the authors' conclusion that the preBötC neurons expressing NMBRs or GRPRs produce sighs via pathways to downstream SST neurons. Thus, these studies have identified some of the preBötC cellular elements likely involved in generating sighs.

      Strengths:

      (1) This study employs an effective combination of electrophysiological, transgenic, optogenetic, chemogenetic, pharmacological, and neuron activity imaging techniques to investigate sigh generation by distinct subpopulations of preBötC neurons in mice.

      (2) The authors extend previous studies indicating that there is a peptidergic circuit consisting of NMB and GRP expressing neurons that project from the parafacial (pF) nucleus region to the preBötC and provides sufficient input to generate sighs, since photoactivation of either pF NMB or GRP neurons evoke ectopic sighs in this study.

      (3) Solid evidence is presented that sighs can be evoked by direct photostimulation of preBötC neurons expressing NMBRs and/or GRPRs, and also a separate subpopulation of neurons expressing somatostatin (SST) but not NMBRs and GRPRs.

      (4) The mRNA-expression data presented from in situ hybridization indicates that most preBötC neurons expressing NMBR, GRPR (or both) are glutamatergic and excitatory.

      (5) Measurements in slices in vitro indicate that only the NMBR expressing neurons are normally rhythmically active during normal inspiratory activity and endogenous sigh activity.

      (6) Evidence is presented that activation of preBötC NMBRs and/or GRPRs is not necessary for sigh production, suggesting that sighs are not the unique product of the preBötC bombesin-peptide signaling pathway.

      (7) The novel conclusion is presented that the preBötC neurons expressing NMBRs and/or GRPRs produce sighs via the separate downstream population of preBötC SST neurons, which the authors demonstrate can independently generate sighs, whereas chemogenetic inhibition of preBötC SST neurons selectively abolishes sighs generated by activating NMBRs and GRPRs.

      Weaknesses:

      (1) While these studies have identified subpopulations of preBötC neurons capable of episodically evoking sigh-like inspiratory activity, mechanisms producing the normal slow sigh rhythm were not investigated and remain unknown.

      (2) The authors have addressed some of the reviewers' main technical concerns and issues relating to interpretation of the results in their rebuttal letter, but have minimally revised the manuscript. Accordingly, there remain important technical and interpretation issues requiring resolution in the revised manuscript.

      Comments on revisions:

      The authors have clarified in their rebuttal letter the rationale for utilizing two different photostimulation paradigms but have not incorporated any of this explanation in Methods, which would be helpful for readers.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Cui et al., studies the mechanisms for the generation of sighing, an essential breathing pattern. This is an important and interesting topic, as sighing maintains normal pulmonary function and is associated with various emotional conditions. However, the mechanisms of its generation remain not fully understood. The authors employed different approaches, including optogenetics, chemogenetics, intersectional genetic approach, and slice electrophysiology and calcium imaging, to address the question, and found several neuronal populations are sufficient to induce sighing when activated. Furthermore, ectopic sighs can be triggered without the involvement of neuromedin B (NMB) or gastrin releasing peptide (GRP) or their receptors in the preBötzinger Complex (preBötC) region of the brainstem. Additionally, activating SST neurons in the preBötC region induces sighing, even when other receptors are blocked. Based on these results, the authors concluded that increased excitability in certain neurons (NMBR or GRPR neurons) activates pathways leading to sigh generation, with SST neurons serving as a downstream component in converting regular breaths into sighs.

      Strengths:

      The authors employed a combination of various sophisticated approaches, including optogenetics, chemogenetics, intersectional genetic approach, and slice electrophysiology and calcium imaging, to precisely pinpoint the mechanism responsible for sigh generation. They utilized multiple genetically modified mouse lines, enabling them to selectively manipulate and observe specific neuronal populations involved in sighing.<br /> Using genetics and calcium imaging, the authors record the neuronal activity of NMBR and GRPR neurons, respectively, and identified their difference in activity pattern. Furthermore, by applying the intersectional approach, the authors were able to genetically target and manipulate several distinct neuronal populations, such as NMBR+, GRPR- neurons and GRPR+, NMBR- neurons, and conducted a detailed characterization of their functions in influencing sighing.

      Weaknesses:

      (1) The authors employed two conditions for optogenetic activation: long pulse photostimulation (LPP) and short pulse photostimulation (SPP), with durations ranging from 4-10s for LPP and 100-500 ms for SPP. These could generate huge variability in the experiments. The rationale behind the selection of these conditions in each experiment remains unclear in the manuscript. Additionally, it is not explained why these specific durations were chosen. Furthermore, the interpretation for the varied responses observed under these conditions is not provided. Clarification on the rationale and interpretation of these experimental parameters would enhance the understanding of the results. The description of the experiment conditions should be consistent throughout the manuscript.

      (2) Regarding the fiber optics, my understanding is that they are placed outside of the brainstem from the ventral side. Given the locations of the pF and preBötC neurons, could the differences in responses be attributed to the varying distances of each population from the ventral surface? In fact, in Figure 8, NMBR is illustrated as being closer to the ventral surface. Does it represent the actual location of these neurons?

      (3) The results of recording on NMBR neurons in Figure 4 were compelling. However, I'm curious why the recording of GRPR neurons and their response to the neuropeptide were not presented or examined. Additionally, considering the known cross-reaction between peptides and their receptors, it might be worthwhile to investigate how GRP modulates NMBR neurons and how NMB modulates GRPR neurons.

      (4) The authors found that activation of several preBötC populations, including NMBR, GRPR, and SST neurons, despite pharmacological inhibition of NMBR and GRPR, can still induce sighing, and concluded that "activation of preBötC NMBRs and/or GRPRs is not necessary for sigh production". I disagree with this conclusion. Even when the receptors are silenced, artificial (optogenetic or chemogenetic) activation could still activate the same downstream pathways. This cannot be used as evidence to claim that the receptors are not required for sighing in vivo, because it is possible that the receptors are still necessary for the activation of these neurons under natural conditions. For instance, while diaphragm activation induces breathing, it does not negate the crucial role of the nervous system in regulating this process in physiological conditions.

      (5) The authors noted varied responses upon activating specific subpopulations of the preBötC neurons, namely NMBR, GRPR, and SST neurons. Could these differences be attributed to variations in viral labeling efficiency among different mouse genetic lines? Are there discrepancies in the number of labeled neurons across the lines? Additionally, the authors did not thoroughly characterize the specificities of AAV targeting in their Cre and Flp lines. It's uncertain whether the AAV-labeled neurons are strictly restricted to the designated population without notable leakage into other populations. This is particularly crucial for the experiments manipulating SST neurons. If there's substantial labeling of NMBR or GRPR neurons, it could undermine the conclusions drawn. Further examination of the precision and selectivity of the labeling techniques is necessary to ensure the accurate interpretation of the experimental findings.

      (6) The authors have addressed some of the reviewers' concerns in the revision; however, many important issues remain unaddressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      (1) Reviewer 3: Moreover, the conclusion that preBötC NMBR and GRPR activations are unnecessary for sighing is not fully supported by the current experimental design. While the study shows that sighing can still be induced despite pharmacological inhibition of NMBR and GRPR, this does not conclusively prove that these receptors are not required under natural conditions. 

      We concluded that “NMBR and GRPR receptors are not necessary for sigh generation”. We acknowledge that under normal conditions these receptors almost certainly play a role; in fact, microinjection of saporin conjugated to bombesin, which presumably ablates NMBR<sup>+</sup> and GRPR<sup>+</sup> preBötC neurons, completely eliminated endogenous sighing activity in awake mice (Li et al., Nature, 2015). However, that study did not establish that the receptors per se are essential in this context, since the protocol ablated not just the receptors but also the preBötC neurons that happened to express these receptors. Here, we show that we could evoke sighs AFTER complete pharmacological blockade of NMBRs and GRPRs. Also, we show that sighs can be elicited by stimulation of a distinct subpopulation of preBötC neurons expressing the peptide somatostatin (SST<sup>+</sup>). These results demonstrate that sighs can be evoked in absence of activation of NMBRs and/or GRPRs, leading to the conclusion that NMBRs and/or GRPRs are not required for sighs but rather contribute to periodic sigh generation under normal conditions.

      (2) Reviewer 1: To make such a novel (and quite surprising) claim requires many more studies and the conclusion is dependent on how the authors have defined a sigh. Moreover, some data within the paper conflicts with this idea.

      Our definition of sighs was carefully chosen so that it applied across different experimental conditions, including in vitro slices, anesthetized or awake in vivo. We defined sighs as transient changes in minute ventilation on a time scale slower than eupneic breathing period, to avoid classifying breathing after vagotomy or under isoflurane anesthesia as “all-sigh breathing”. This is why induction of persistent large amplitude breaths (such as in Figures 5-6) were not counted as sighs.

      (3) Reviewer 2: Several key technical aspects of the study require further clarification to aid in interpreting the experimental results, including issues relating to the validation of the transgenic mouse lines and virally transduced expressions of proteins utilized for optogenetic and chemogenetic experiments, as well as justifying the optogenetic photostimulation paradigms used to evoke sighs.

      The rationale for using SPP and LPP stems from our published observations of the effects of optogenetic stimulation of various preBötC neuronal subpopulations. Thus, SPP and LPP evoke the same responses in GlyT2 (Sherman et al., 2015) and Dbx1 (Cui et al., 2016) neurons, while for other subpopulations, e.g., SST (Cui et al., 2015), the effects of SPP are markedly different from LPP. Hence, in this study we examined both. As effects of SPP and LPP of SST neurons were examined previously (Cui et al., 2016), these protocols were not repeated except for evoking sighs after blockade of NMBR/GRPRs. SPP of pF NMB or GRP did not evoke any respiratory responses and hence were not presented in any figures (see Results, section “Activation of Nmb- or Grp-expressing pF neurons induces sighs”).

      (4) Reviewer 3: however, the rationale and experimental details require further explanation, and their impacts on the conclusion require clarification. For instance, how and why the variability in optogenetic activation conditions could impact the experimental outcomes. 

      Refractory periods reported here for pF NMB, pF GRP, preBötC NMBR and preBötC GRPR were all obtained using the same intensity LPP. We acknowledge the possibility, even the likelihood that higher intensity LPP would shorten refractory periods. In line with this, we observed that ectopic sighs were evoked earlier during the LPP as the sigh phase progressed. As described in RESULTS, such effects were observed for pF NMB, pF GRP, preBötC NMBR and preBötC GRPR only and not for preBötC SST, which might suggest that timing of intrinsically generated sighs depends on the NMB-GRP signaling pathway, yet sigh production depends on the SST pathway.

    1. eLife Assessment

      In this revised manuscript, Dong et al. investigate the role of the small Ras-like GTPase Rab10 in the exocytosis of DCVs in mouse hippocampal neurons, showing that Rab10 depletion hinders DCV exocytosis independently of its effects on neurite outgrowth. Upon revising their work, these findings provide compelling evidence that Rab10 depletion leads to altered ER morphology, impaired ER-based calcium buffering, and decreased ribosomal protein expression, which collectively contributes to defective DCV secretion. The study comes to the fundamental conclusion that Rab10 is critical for DCV release by ensuring ER calcium homeostasis.

    2. Reviewer #1 (Public review):

      Summary:

      Dong et al here have studied the impact of the small Ras-like GTPase Rab10 on the exocytosis of dense core vesicles (DVC), which are important mediators of neuropeptide signaling in brain. They use optical imaging to show that lentiviral depletion of Rab10 in mouse hippocampal neurons in culture independent of the established defects in neurite outgrowth hamper DCV exocytosis. They further demonstrate that such defects are paralleled by changes in ER morphology and defective ER-based calcium buffering as well as reduced ribosomal protein expression in Rab10-depleted neurons. Re-expression of Rab10 or supplementation of exogenous L-leucine to restore defective neuronal protein synthesis rescues impaired DCV secretion. Based on these results they propose that Rab10 regulates DCV release by maintaining ER calcium homeostasis and neuronal protein synthesis.

      Strengths:

      This work provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. The authors combine advanced optical imaging with light and electron microscopy, biochemistry and proteomics approaches to thoroughly assess the effects of Rab10 knockdown at the cellular level in primary neurons. The proteomic dataset provided may be valuable in facilitating future studies regarding Rab10 function. This work will thus be of interest to neuroscientists and cell biologists.

      Weaknesses:

      Whether and how the phenotypes of Rab10 reported in this study are linked remains an open question. Likewise, a possible role of Rab10 in exocytosis cannot be excluded at this stage.

      Comments on revisions:

      My previous questions and concerns have been satisfactorily addressed by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors assess the function of Rab10 in dense core vesicle (DCV) exocytosis using RNAi and cultured neurons. The author provides evidence that their knockdown (KD) is effective and provides evidence that DCV is compromised. They also perform proteomic analysis to identify potential pathway that are affected upon KD of Rab10 that may be involved in DCV release. Upon focusing on ER morphology and protein synthesis, the authors conclude that defects in protein synthesis and ER Ca2+ homeostasis contributes to the DVC release defect upon Rab10 KD.

      Strengths:

      The data related to Rab10's role in DCV release seems to be strong and carried out with rigor. While the paper lacks in vivo evidence that this gene is indeed involved in DCV in a living mammalian organism, I feel the cellular studies have value. The identification of ER defect in Rab10 manipulation is not truly novel but it is a good conformation of studies performed in other systems. The finding that DCV release defect and protein synthesis defect seen upon Rab10 KD can be significantly suppressed by Leucine supplementation is also a strength of this work.

      Weaknesses:

      The weaknesses mentioned in my previous comments have been addressed through the revision process.

    4. Reviewer #3 (Public review):

      In this study, Dong and colleagues set to dissect the role of Rab10 small GTPase on the intracellular trafficking and exocytosis of dense core vesicles (DCVs). While the authors have already shown that Rab3 plays a central role in the exocytosis of DVC in mammalian neurons, the roles of several other Rab-members have been identified genetically, but their precise mechanism of action in mammalian neurons remains unclear. In this study, the authors use a carefully designed and thoroughly executed series of experiments, including live-cell imaging, functional calcium-imaging, proteomics, and electron microscopy, to identify that DCV secretion upon Rab10 depletion in adult neurons is primarily a result of dysregulated protein synthesis and, to a lesser extent, disrupted intracellular calcium buffering. Given that the full deletion of Rab10 has deleterious effect on neurons and that Rab10 has a major role in axonal development, the authors cautiously employed the knock-down strategy from 7 DIV, to focus on the functional impact of Rab10 in mature neurons. The experiments in this study were meticulously conducted, incorporating essential controls and thoughtful considerations, ensuring rigorous and comprehensive results that fully support the conclusions.

      Comments on revisions:

      The authors have addressed all the comments and suggestions raised by reviewers, making this an excellent and timely study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dong et al here have studied the impact of the small Ras-like GTPase Rab10 on the exocytosis of dense core vesicles (DVC), which are important mediators of neuropeptide signaling in the brain. They use optical imaging to show that lentiviral depletion of Rab10 in mouse hippocampal neurons in culture independent of the established defects in neurite outgrowth hamper DCV exocytosis. They further demonstrate that such defects are paralleled by changes in ER morphology and defective ER-based calcium buffering as well as reduced ribosomal protein expression in Rab10-depleted neurons. Re-expression of Rab10 or supplementation of exogenous L-leucine to restore defective neuronal protein synthesis rescues impaired DCV secretion. Based on these results they propose that Rab10 regulates DCV release by maintaining ER calcium homeostasis and neuronal protein synthesis.

      Strengths:

      This work provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. The authors combine advanced optical imaging with light and electron microscopy, biochemistry, and proteomics approaches to thoroughly assess the effects of Rab10 knockdown at the cellular level in primary neurons. The proteomic dataset provided may be valuable in facilitating future studies regarding Rab10 function. This work will thus be of interest to neuroscientists and cell biologists.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      While the main conclusions of this study are comparably well supported by the data, I see three major weaknesses:

      (1) For some of the data the statistical basis for analysis remains unclear. I.e. is the statistical assessment based on N= number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing.

      This is an important point and we agree that multiple samples from the same biological replicate are not independent observations. We reanalyzed all nested data using a linear mixed model and indicated this in the Methods section and the relevant figure legends (Brunner et al., 2022). In brief, biological replicates (individual neuronal cultures) were used as a linear predictor. Outliers were identified and excluded using the ROUT method in GraphPad. A fixed linear regression model was then fitted to the data using the lm() function in R. A one-way anova (analysis of variance) was used to assess whether including the experimental group as a second linear predictor (formula = y ~ Group + Culture) statistically improved the fit of a model without group information (formula = y ~ 1 + Culture). Post-hoc analysis was performed using the emmeans() function with Tukey’s adjustment when more than two experimental groups were present. Importantly, our conclusions remain unchanged.

      (2) As it stands the paper reports on three partially independent phenotypic observations, the causal interrelationship of which remains unclear. Based on prior studies (e.g. Mercan et al 2013 Mol Cell Biol; Graves et al JBC 1997) it is conceivable that defective ER-based calcium signaling and the observed reduction in protein synthesis are causally related. For example, ER calcium release is known to promote pS6K1 phosphorylation, a major upstream regulator of protein synthesis and ribosome biogenesis. Conversely, L-leucine supplementation is known to trigger calcium release from ER stores via IP3Rs. Given the reported impact of Rab10 on axonal transport of autophagosomes and, possibly, lysosomes via JIP3/4 or other mediators (see e.g. Cason and Holzbaur JCB 2023) and the fact that mTORC1, the alleged target of leucine supplementation, is located on lysosomes, which in turn form membrane contacts with the ER, it seems worth analyzing whether the various phenotypes observed are linked at the level of mTORC1 signaling.

      This is great suggestion that could indeed further clarify the potential interplay between ER-based Ca2+ signaling and protein synthesis. To address this, we assessed the phosphorylation level of pS6K1 in control and Rab10 knockdown (KD) neurons with or without leucine treatment. These data are included in the new Figure 8—figure supplement 1 in the revised manuscript. Our results indicate that pS6K1 phosphorylation was not upregulated in Rab10 KD neurons, suggesting that the level of mTORC1 signaling is not different between wild-type or KD neurons. Furthermore, leucine treatment increased the pS6K1 phosphorylation level, as expected, but this effect was similar in both groups. Hence, we conclude that differences in mTORC1 signaling induced by Rab10 loss is not a major factor in the observed impairment in protein synthesis.

      Author response image 1.

      Rab10 depletion does not upregulate mTORC1 pathway. (A)Typical immunoblot showing pS6K1 levels in each condition. (B) Quantification of relative pS6K1 levels in each condition. All Data are plotted as mean±s.e.m. (C) Control, Control + Leu: N = 2, n = 2, Rab10 KD, Rab10 KD + Leu: N = 2, n = 4.

      (3) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature.

      We agree that 200 APs stimulation might be too strong to detect specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies (Emperador-Melero et al., 2018; Granseth et al., 2006; Ivanova et al., 2021; Kwon and Chapman, 2011; Reshetniak et al., 2020). We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Reviewer #2 (Public Review):

      Summary:<br /> In this paper, the authors assess the function of Rab10 in dense core vesicle (DCV) exocytosis using RNAi and cultured neurons. The author provides evidence that their knockdown (KD) is effective and provides evidence that DCV is compromised. They also perform proteomic analysis to identify potential pathways that are affected upon KD of Rab10 that may be involved in DCV release. Upon focusing on ER morphology and protein synthesis, the authors conclude that defects in protein synthesis and ER Ca2+ homeostasis contributes to the DVC release defect upon Rab10 KD. The authors claim that Rab10 is not involved in synaptic vesicle (SV) release and membrane homeostasis in mature neurons.

      Strengths:

      The data related to Rab10's role in DCV release seems to be strong and carried out with rigor. While the paper lacks in vivo evidence that this gene is indeed involved in DCV in a living mammalian organism, I feel the cellular studies have value. The identification of ER defect in Rab10 manipulation is not truly novel but it is a good conformation of studies performed in other systems. The finding that DCV release defect and protein synthesis defect seen upon Rab10 KD can be significantly suppressed by Leucine supplementation is also a strength of this work.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      The data showing Rab10 is NOT involved in SV exocytosis seems a bit weak to me. Since the proteomic analysis revealed so many proteins that are involved in SV exo/encodytosis to be affected upon Rab10, it is a bit strange that they didn't see an obvious defect. Perhaps this could have been because of the protocol that the authors used to trigger SV release (I am not an E-phys expert but perhaps this could have been a 'sledge-hammer' manipulation that may mask any subtle defects)? Perhaps the authors can claim that DCV is more sensitive to Rab10 KD than SV, but I am not sure whether the authors should make a strong claim about Rab10 not being important for SV exocytosis.

      We agree that 200 APs stimulation might be too strong to see specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies. We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Also, the authors mention "Rab10 does not regulate membrane homeostasis in mature neurons" but I feel this is an overstatement. Since the authors only performed KD experiments, not knock-out (KO) experiments, I believe they should not make any conclusion about it not being required, especially since there is some level of Rab10 present in their cells. If they want to make these claims, I believe the authors will need to perform conditional KO experiments, which are not performed in this study.

      This is a valid point. We have changed the statement to “membrane homeostasis in mature neurons was unaffected by Rab10 knockdown” (p. 13, l.376-377).

      Finally, the authors show that protein synthesis and ER Ca2+ defects seem to contribute to the defect but they do not discuss the relationship between the two defects. If the authors treat the Rab10 KD cells with both ionomycin and Leucine, do they get a full rescue? Or is one defect upstream of the other (e.g. can they see rescue of ER morphology upon Leucine treatment)? While this is not critical for the conclusions of the paper, several additional experiments could be performed to clarify their model, especially considering there is no clear model that explains how Rab10, protein synthesis, ER homeostasis, and Ca2+ are related to DCV (but not SV) exocytosis.

      This is an important point and a great suggestion. We have now tested the rescue effects of leucine treatment on ER morphology, as suggested. These data are included in the new Figure 8—figure supplement 2 in the revised manuscript. Our results indicate that the same dose of leucine that rescues DCV fusion and protein translation failed to rescue ER morphology. Hence, the defects in ER morphology appear to be independent of the impaired protein translation.

      Author response image 2.

      Leucine supplementation does not rescue ER morphological deficiency in Rab10 KD neurons. (A) Typical examples showing the KDEL signals in each condition. (B) Quantification of RTN4 intensity in MAP2-positive dendrites. (C) The ratio of neuritic to somatic RTN4 intensity (N/S).

      All Data are plotted as mean±s.e.m. (B, C) Control: N = 3, n = 10; Rab10 KD: N = 3, n = 11; Rab10 KD + Leu: N = 3; n = 11. A one-way ANOVA tested the significance of adding experimental group as a predictor. **** = p<0.0001, ns = not significant.

      Reviewer #3 (Public Review):

      In the submitted manuscript, Dong and colleagues set out to dissect the role of the Rab10 small GTPase on the intracellular trafficking and exocytosis of dense core vesicles (DCVs). While the authors have already shown that Rab3 plays a central role in the exocytosis of DVC in mammalian neurons, the roles of several other Rab-members have been identified genetically, but their precise mechanism of action in mammalian neurons remains unclear. In this study, the authors use a carefully designed and thoroughly executed series of experiments, including live-cell imaging, functional calcium-imaging, proteomics, and electron microscopy, to identify that DCV secretion upon Rab10 depletion in adult neurons is primarily a result of dysregulated protein synthesis and, to a lesser extent, disrupted intracellular calcium buffering. Given that the full deletion of Rab10 has a deleterious effect on neurons and that Rab10 has a major role in axonal development, the authors cautiously employed the knock-down strategy from 7 DIV, to focus on the functional impact of Rab10 in mature neurons. The experiments in this study were meticulously conducted, incorporating essential controls and thoughtful considerations, ensuring rigorous and comprehensive results.

      We are grateful for the positive evaluation of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The work by Dong et al provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. I suggest that the authors address the following points experimentally to increase the impact of this potentially important study.

      Major points:

      (1) As alluded to above, for some of the data the statistical basis for analysis remains unclear (examples are Figures 1C-F, J,K; Figure 2 1B-D,I-K; Figure 2 - Supplement 1D-F; Figure 2 - Supplement 2J,K, etc). I.e. is the statistical assessment based on N = number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing. The Ms misses also misses a dedicated paragraph on statistics in the methods section.

      See reply to reviewer 1 above. We fully agree and solved this point.

      (2) A main weakness of the paper is the missing connection between neuronal protein synthesis, and the observed structural and signaling defects at the level of the ER. I suggest that the authors analyze mTORC1 signaling in Rab10 depleted neurons and under rescue conditions (+Leu or re-expression of Rab10) as ribosome biogenesis is a major downstream target of mTORC1 and mTORC1 activity is related to lysosome position, which may be affected upon rab10 loss -either directly or via effects on the ER that forms tight contacts with lysosomes.

      See reply to reviewer 1 above. We agreed and followed up experimentally.

      (3) Related to the above: Does overexpression of SERCA2 restore normal DCV exocytosis in Rab10-depleted neurons? This would help to distinguish whether calcium storage and release at the level of the ER indeed contribute to the exocytosis defect.

      This is an important point and a great suggestion. We have now tested the rescue effects of overexpression of SERCA2 on DCV fusion. These data are included in the new Figure 8—figure supplement 3 in the revised manuscript. SERCA2 OE failed to rescue the DCV fusion defects in Rab10 KD neurons.

      Author response image 3.

      Overexpression of SERCA2 does not rescue DCV fusion deficits in Rab10 KD neurons. (A) Typical examples showing the SERCA2 signals in each condition. (B) Cumulative plot of DCV fusion events per cell. (C) Summary graph of DCV fusion events per cell. (A) Total number of DCVs (total pool) per neuron, measured as the number of NPY-pHluorin puncta upon NH4Cl perfusion. (B) Fraction of NPY-pHluorin-labeled DCVs fusing during stimulation.

      All Data are plotted as mean±s.e.m. (C-E) Control: N = 2, n = 10; Rab10 KD: N = 2, n = 13; SERCA2 OE: N = 2; n = 15. A one-way ANOVA tested the significance of adding experimental group as a predictor. *** = p<0.001, ** = p<0.01, ns = not significant.

      (4) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature. The authors should conduct additional experiments under conditions of single or few Aps (e.g. 4 or 10 Aps) to really assess whether or not Rab10 depletion alters SV exocytosis at the level of pHluorin analysis in cultured neurons.

      See reply to reviewer 2 above. Agreed to and made textual adjustments to solve this

      (5) Related to the above: I am puzzled by the data shown in Figure 1H-J: From the pHluorin traces shown I would estimate a tau value of about 20-30 s (e.g. decay to 1/e = 37% of the peak value). The bar graph in Figure 1K claims 3-4 s, clearly clashing with the data shown. Were these experiments conducted at RT (where expected tau values are in the range of 30s) or at 37{degree sign}C (one would expect taus of around 10 s in this case for Syp-pH)? I ask the authors to carefully check and possibly re-analyze their datasets.

      This is indeed a mistake. We thank the reviewer for flagging this miscalculation. Our original Matlab script used for calculating the tau value contained an error and the datasets were normalized twice by mistake. We now reanalyzed the data and the corresponding figures and texts have been updated. Our conclusion that Rab10 KD does not affect SV endocytosis remains unchanged since the difference in tau between the control (28.5 s) and Rab10 KD (32.8 s) suffered from the same systematic error and were/are not significantly different.

      (6) How many times was the proteomics experiment shown in Figure 3 conducted? I noticed that the data in panel H missed statistical analysis and error bars. Given the typical variation in these experiments, I suggest to only include data for proteins identified in at least 3 out of 4 experimental replicates.

      We agree that this information has not been clear. We have now explained replication in the Methods section (p. 42, l. 879-885). In brief, the proteomics experiment presented in Fig 3 was conducted with two independent cultures (‘biological replicates’), hence, formally only two independent observations. For each biological replicate, we performed four technical replicates. For our analysis, we only included peptides that were consistently detected across all samples (not only three as this reviewer suggests). Proteins in Panel H are ER-related proteins that are significantly different from control neurons with an adjusted FDR ≤ 0.01 and Log2 fold change ≥ 0.56. The primary purpose of our proteomics experiments was to generate hypotheses and guide subsequent experiments and the main findings were corroborated by other experiments presented in the manuscript.

      Minor:

      (7) Figure 2 - supplement 3 and Figure 4 - supplement 3 are only mentioned in the discussion. The authors should consider referring to these data in the results section.

      This is a valid point. We have now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6-7, l. 162-164).

      (8) Where is the pHluorin data shown in Figure 1 bleach-corrected? If so, this should be stated somewhere in the Ms. Moreover, the timing of the NH4Cl pulse should be indicated in the scheme in panel I.

      We thank the reviewer for pointing these omissions out. We have now included information about the timing of NH4Cl pulse in panel I. We did not do bleach-correction for the pHluorin data shown in Figure 1. It has been shown that pHluorin is very stable with a bleaching rate in the alkaline state of 0.06% per second and 0.0024% per second in the quenched state (Balaji and Ryan, 2007). Indeed, we did not observe obvious photobleaching in the first 30s during our imaging as indicated by the average trace of pHluorin intensity in panel I.

      (9) Page 3/ lines 59-60: "...strongest inhibition of neuropeptide accumulation...". What is probably meant is "...strongest inhibition of neuropeptide release".

      We agree this statement is unclear. Sasidharan et al used a coelomocyte uptake assay as an indirect readout for DCV release. The ‘strongest inhibition of neuropeptide accumulation’ in coelomocytes in Rab10 mutant indicates DCV fusion deficits. We have now replaced the text with “Rab10 deficiency produces the strongest inhibition of neuropeptide release in C. elegans” to make it more clear.

      Reviewer #3 (Recommendations For The Authors):

      I strongly recommend the publishing of this study as a VOR with minor comments directed to the authors.

      (1) In Figure 4, the authors should include examples of tubular ER at the synapse, especially as this is an interesting point discussed in ln 226-229. Are there noticeable changes in the ER-mitochondria contacts at the synaptic boutons?

      We agree that examples of tubular ER at the synapse would improve the manuscript. We have now replaced the Figure 4A with such examples. We found it challenging to quantify ER-mitochondria contacts based on the electron microscopy (EM) images we currently have. The ER-mitochondria contact sites are quite rare in the cross-sections of our samples, making it difficult to perform a reliable quantitative analysis.

      (2) The limited impairment of calcium-ion homeostasis in Rab10 KD neurons is very interesting. Would the overexpression of Rab10T23N mimic the effect of a KD scenario? Is there a separation of function for Rab10 in calcium homeostasis vs. the regulation of protein synthesis?

      This is an interesting possibility. We tested this and expressed Rab10T23N in a new series of experiments. These data are presented as a new Figure 5 in the revised manuscript (p. 29). We observed that Ca2+ refilling after caffeine treatment was delayed to a similar extent in Rab10T23N-expressing and Rab10 KD neurons. While impaired Ca2+ homeostasis may affect protein synthesis through ER stress or mTORC1 activation, our findings indicate otherwise in Rab10 KD neurons. First, ATF4 levels, a marker of ER stress, were unaffected in Rab10 KD neurons. This indicates that any ER stress present is minimal or insufficient to significantly impact protein synthesis through this pathway. Second, we did not observe significant changes in mTORC1 activation in Rab10 KD neurons as indicated by a normal pS6K1 phosphorylation (see above). Based on these observations, we conclude that Rab10's roles in calcium homeostasis and protein synthesis are most likely separate.

      (3) The authors indicate that the internal release of calcium ions from the ER has no effect on DCV trafficking and fusion without showing the data. It is important to include this data as the major impact of the study is the dissecting of the calcium effects in mammalian neurons from the previous studies in invertebrates.

      We agree this is an important aspect in our reasoning. We are submitting the related manuscript on internal calcium stores to BioRVix. The link will be added to the consolidated version of our manuscript

      (4) The distinction between Rab3 and Rab10 co-trafficking on DCVs should be reported in the Results (currently, Figure 2 - supplement 3 is only mentioned in the Discussion) as it helps to understand the effects on DCV fusion.

      We agree. We now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6, l. 162-163).

      Reference:

      Balaji, J., Ryan, T.A., 2007. Single-vesicle imaging reveals that synaptic vesicle exocytosis and endocytosis are coupled by a single stochastic mode. Proceedings of the National Academy of Sciences 104, 20576–20581. https://doi.org/10.1073/pnas.0707574105

      Brunner, J.W., Lammertse, H.C.A., Berkel, A.A. van, Koopmans, F., Li, K.W., Smit, A.B., Toonen, R.F., Verhage, M., Sluis, S. van der, 2022. Power and optimal study design in iPSC-based brain disease modelling. Molecular Psychiatry 28, 1545. https://doi.org/10.1038/s41380-022-01866-3

      Emperador-Melero, J., Huson, V., van Weering, J., Bollmann, C., Fischer von Mollard, G., Toonen, R.F., Verhage, M., 2018. Vti1a/b regulate synaptic vesicle and dense core vesicle secretion via protein sorting at the Golgi. Nat Commun 9, 3421. https://doi.org/10.1038/s41467-018-05699-z

      Granseth, B., Odermatt, B., Royle, S.J., Lagnado, L., 2006. Clathrin-Mediated Endocytosis Is the Dominant Mechanism of Vesicle Retrieval at Hippocampal Synapses. Neuron 51, 773–786. https://doi.org/10.1016/j.neuron.2006.08.029

      Ivanova, D., Dobson, K.L., Gajbhiye, A., Davenport, E.C., Hacker, D., Ultanir, S.K., Trost, M., Cousin, M.A., 2021. Control of synaptic vesicle release probability via VAMP4 targeting to endolysosomes. Science Advances 7, eabf3873. https://doi.org/10.1126/sciadv.abf3873

      Kwon, S.E., Chapman, E.R., 2011. Synaptophysin Regulates the Kinetics of Synaptic Vesicle Endocytosis in Central Neurons. Neuron 70, 847–854. https://doi.org/10.1016/j.neuron.2011.04.001

      Reshetniak, S., Fernández-Busnadiego, R., Müller, M., Rizzoli, S.O., Tetzlaff, C., 2020. Quantitative Synaptic Biology: A Perspective on Techniques, Numbers and Expectations. International Journal of Molecular Sciences 21, 7298. https://doi.org/10.3390/ijms21197298

    1. eLife Assessment

      This important study develops and exploits novel ideas in dendritic integration and implements these ideas in a neural network. Historically, dendritic plateau potentials were thought to exist primarily for maintaining neurons in a depolarized state for 100s of milliseconds, but this study presents a new perspective that dendritic plateau potentials are equally effective in much shorter integration windows. The computational evidence supporting the article's claims is compelling.

    2. Reviewer #2 (Public review):

      Summary.

      Some forms of Artificial Intelligence (AI), particularly those based on artificial neural networks (ANNs), draw inspiration from biological brains and neurons. Understanding the functional repertoire and underlying logic of real neurons could, therefore, help improve ANNs. While the cell bodies and axons of neurons produce rapid, high-amplitude action potentials (~100 mV over ~2 ms), dendrites-constituting about 80% of neuronal membrane area-generate smaller but longer-lasting electrical signals, known as glutamate-mediated dendritic plateau potentials (~50 mV over >100 ms). The authors have designed artificial neurons capable of producing these dendritic plateau potentials and, through simulations, demonstrate that such prolonged dendritic signals reduce the negative effects of temporal jitter in real or artificial neural networks. Specifically, they show that in ANNs with neurons capable of dendritic plateau potentials, reliable sparse spiking computation can occur without the need for precise input synchronization. This means that despite fluctuations in network activity (such as delays in the brain circuit responses, for example), neurons can still link related network events. Thus, dendritic plateau potentials enable neurons to retain information longer, connecting events that are not exactly simultaneous. Interestingly, one of the indirect conclusions of the current study is that neurons equipped with dendritic plateau potentials may reduce the total number of cells (nodes, units) required to perform robust computations.

      Strengths.

      Most studies in neuroscience are descriptive, focusing on observations and measurements. Fewer tackle the more challenging task of explaining the rationale behind specific natural designs. This study does just that, addressing the fundamental problem of asynchrony in neural communication caused by conduction delays and noise. Given that neurons with short membrane time constants can integrate only nearly simultaneous inputs, the authors propose a solution: dendritic plateau potentials. These potentials, generated through glutamate-mediated depolarization within dendritic branches, effectively broaden the temporal integration window, allowing neurons to handle temporal jitter, variability, stochasticity, and maintain reliable computation. Thus, dendritic plateau potentials appear to be an adaptive feature evolved to support rapid, reliable CNS computations.

      Weaknesses.

      The authors have appropriately revised unsupported statements from previous versions, but the manuscript could benefit from examples of testable hypotheses derived from their findings. For example, what specific experimental questions could be investigated to validate these computational predictions? Providing concrete examples of potential experimental tests would make the work more accessible and actionable for experimentalists, assuming such experiments are feasible.

      Additionally, many readers may lack a background in computational modeling or Artificial Neural Networks. To enhance accessibility, key terms and concepts should be explained at a level suitable for first-year graduate students, ensuring clarity for a broader audience.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an elegant didactic exposition showing how dendritic plateau potentials can enable neurons to perform reliable 'binary' computations in the face of realistic spike time jitter in cortical networks. The authors make many good arguments, and the general concept underlying the paper is sound. A strength is their systematic progression from biophiysical to simplified models of single neurons, and their parallel investigation of spiking and binary neural networks, with training happening in the binary neural network.

      Reviewer #2 (Public Review):

      Summary:

      Artificial intelligence (AI) could be useful in some applications and could help humankind. Some forms of AI work on the platform of artificial neural networks (ANN). ANNs are inspired by real brains and real neurons. Therefore understanding the repertoire and logic of real neurons could potentially improve AANs. Cell bodies of real neurons, and axons of real neurons, fire nerve impulses (nerve impulses are very brief ~2 ms, and very tall ~100 mV). Dendrites, which comprise ~80% of the total neuronal membrane (80% of the total neuronal apparatus) typically generate smaller (~50 mV amplitude) but much longer (~100 ms duration) electrical transients, called glutamate-mediated dendritic plateau potentials. The authors have built artificial neurons capable of generating such dendritic plateau potentials, and through computer simulations the authors concluded that long-lasting dendritic signals

      (plateau potentials) reduce negative impact of temporal jitter occurring in real brain, or in

      AANs. The authors showed that in AANs equipped with neurons whose dendrites are capable of generating local dendritic plateau potentials, the sparse, yet reliable spiking computations may not require precisely synchronized inputs. That means, the real world can impose notable fluctuations in the network activity and yet neurons could still recognize and pair the related network events. In the AANs equipped with dendritic plateaus, the computations are very robust even when inputs are only partially synchronized. In summary, dendritic plateau potentials endow neurons with ability to hold information longer and connect two events which did not happen at the same moment of time. Dendritic plateaus circumvent the negative impact, which the short membrane time constants arduously inflict on the action potential generation (in both real neurons and model neurons). Interestingly, one of the indirect conclusions of the current study is that neurons equipped with dendritic plateau potentials may reduce the total number of cells (nodes, units) required to perform robust computations.

      Strengths:

      The majority of published studies are descriptive in nature. Researchers report what they see or measure. A smaller number of studies embark on a more difficult task, which is to explain the logic and rationale of a particular natural design. The current study falls into that second category. The authors first recognize that conduction delays and noise make asynchrony unavoidable in communication between circuits in the real brain. This poses a fundamental problem for the integration of related inputs in real (noisy) world. Neurons with short membrane time constants can only integrate coincident inputs that arrive simultaneously within 2-3 ms of one another. Then the authors considered the role for dendritic plateau potentials. Glutamate-mediated depolarization events within individual dendritic branches, can remedy the situation by widening the integration time window of neurons. In summary, the authors recognized that one important feature of neurons, their dendrites, are built-in to solve the major problems of rapid signal processing: [1] temporal jitter, [2] variation, [3] stochasticity, and [4] reliability of computation. In one word, the dendritic plateau potentials have evolved in the central nervous systems to make rapid CNS computations robust.

      Weaknesses:

      The authors made some unsupported statements, which should either be deleted, or thoroughly defended in the manuscript. But first of all, the authors failed to bring this study to the readers who are not experts in computational modeling or Artificial Neural Networks. Critical terms (syntax) and ideas have not been explained. For example: [1] binary feature space? [2] 13 dimensions binary vectors? [3] the binary network could still cope with the loss of information due to the binarization of the continuous coordinates? [4] accurate summation?

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      However, I have a number of specific points, listed below, that should be addressed. Most of them are relatively minor, but the authors should especially address point 10, which is a major point, by redoing the simulations affected by the erroneous value of the time constant, and by remaking the relevant figures based on the new simulations.

      Specific comments:

      (1) 7f "This feature is conspicuous because it is an order of magnitude longer than unitary synaptic inputs and axonal spikes.": — It is an order of magnitude longer than AMPA receptor-mediated synaptic currents (EPSCs), but more similar in time course to synaptic potentials (EPSPs) whose decay is governed by the passive membrane time constant (about 10 to 20 ms in pyramidal neurons in vivo) and which determines the lifetime of the 'memory' of the neuron for synaptic inputs under conditions of subthreshold, non-spiking dendritic integration. The quoted sentence should be rewritten accordingly.

      Following this suggestion, we have rewritten the sentence (l. 7) to: "This timescale is conspicuous, being many times longer than the fastest signalling processes in the nervous systems, including Excitatory Post-Synaptic Potentials (EPSPs) and axonal spikes."

      (2) 16ff "This is especially relevant to integration of inputs during high conductance states that are prevalent in-vivo. In these states the effective time constant of the neuronal membrane is extremely short and varies substantially depending on synaptic drive [13, 34, 49].": — The time-averaged synaptic conductance driven by sensory input in vivo is much less high than implied by this statement (e.g. see Fig. 4 of Haider et al. 2013 https://www.nature.com/articles/nature11665 ), and reduces the passive membrane time constant only by a small percentage. The energy cost of a high prevalence of highconductance states and extremely short membrane time constants would also exceed the energy budget of the brain (ref. 4). I would therefore suggest dropping this sentence.

      We have clarified this sentence thanks to the reviewer's suggestion. We meant that the instantaneous, rather than the time-averaged, conductance can be very big. To clarify this we have rewritten this section (l. 15): This is especially relevant to integration of inputs during high conductance states that are prevalent in vivo, where a typical neuron receives significant synaptic drive. In these states, the effective membrane time constant can be extremely short, and varies substantially depending on synaptic input.

      (3) l. 17f "As a consequence, computations that rely on passive summation of multiple inputs place punishing constraints on spike timing precision.": — Again, the passive membrane time constant is on the order of 10 ms and I would tone down this statement accordingly, removing the word 'punishing' for example.

      Following the suggestion, we have rewritten the sentence to (l. 18): "As a consequence, computations that rely on passive summation of multiple inputs would place strong constraints on spike timing precision."

      (4) l. 18ff "Dendritic action potentials, by contrast, have a consistently long duration that is ensured by the kinetic properties of voltage gated ion channels and NMDA receptors [54, 47, 10, 3]. These properties are largely determined by the amino acid sequence of receptor and channel proteins that are specifically expressed in dendrites [45, 44, 40]. This suggests dendritic properties are specifically tuned to produce localised, suprathreshold events that outlive rapid membrane fluctuations.": — Yes, but see also Attwell & Gibb 2005 ( https://www.nature.com/articles/nrn1784 ), especially the last two of their key points. The slow NMDA receptor decay kinetics (and therefore their high affinity for binding glutamate) may also be the consequence of a design goal to set the temporal coherence window for NMDA receptor-mediated synaptic plasticity such as STDP to be on the order of tens of milliseconds, somewhat longer than the membrane time constant.

      The reviewer is correct; other functions (e.g. synaptic plasticity) are also part of the dendrite's repertoire. To acknowledge this, we added a section (l. 34) where we mention that our idea does not conflict with, for example, synaptic plasticity.

      (5) l. 32f "Numerous studies point out that nonlinear summation in dendrites can make neurons computationally equivalent to entire networks of simplified point models, or 'units' in a traditional neural network [9, 21, 38, 40, 45, 48, 50, 51].": — See also Beniaguev et al. 2021 ( https://www.cell.com/neuron/pdf/S0896-6273(21)00501-8.pdf ), which also speaks to the next sentence.

      We thank the reviewer for the suggestion; the citation has been added.

      (6) Fig. 2E and F: the top of panel F corresponds to the top of panel E, but the bottom ofpanel F does not correspond to the bottom of panel E - it corresponds to a dendritic neuron with passive dendrites, not a point neuron. Panel E should be changed to reflect this fact.

      We have followed the suggestion to change the figure.

      (7) l. 49f "Despite these dendritic spikes being initiated at different times, they still sum in the soma, leading to a sodium spike there (Figure 2E).": — You probably mean Fig. 2D, and instead of a sodium spike (which could be misunderstood as local and dendritic) you triggered a sodium action potential. Likewise, Fig. 2B (right) shows the timescale of sodium action potentials at the soma (cf. l. 46).

      The error in the referencing to the figure has been corrected. The phrasing has also been changed to "a sodium action potential" (l. 56), following the reviewer's suggestion.

      (8) Please check the scale bars in Fig. 2D. Do they also apply to panel F below? If yes thatshould be stated.

      The scale bars are indeed the same; I have repeated them in the figure to avoid any confusion.

      (9) l. 68 "This time constant is consistent with the high-conductance state of pyramidalneurons in the cortex [6]":

      You do not need to invoke a high-conductance state to justify this time constant, which is indeed typical for the membrane time constant of pyramidal neurons in vivo.

      On a related note, Fig. 3B and its legend seem to assume that tau = 1 ms, and calls that one EPSP duration in the legend. An EPSC may have a decay time constant of 1 ms, but an EPSP will have a decay time constant of about 10 ms, similar to the membrane time constant. Fig. 3B (and therefore also the rest of Figure 3) seems to have been constructed with a value of tau that is too small by a factor of 10, and this should be corrected by remaking the figure. If tau = 1 ms was used also in Figure 4 then this figure also needs to be remade.

      Section 3.3 and Table 1 also use tau = 1 ms. This is unrealistic and needs to be changed an appropriate value of tau = 10 ms is given by the authors themselves in line 67. The incorrect value of tau in Table 1 causes other entries of the Table to be terribly wrong; a leak conductance of 1 µS would imply an input resistance of the neuron of 1 MOhm, but somatic input resistances of pyramidal neurons in vivo are on the order of 20 to 50 MOhm. The total capacitance of 1 nF is slightly too large, and should be adjusted to yield a membrane time constant of 10 ms given an appropriate leak conductance leading to an input resistance of about 20 to 50 MOhm. These are key numbers to get right for both Figures 3 and 4, especially if you want to be able to say "We have been careful to respect the essence of basic physiological facts while trying to build an abstraction of how elementary spiking computations might occur." (l. 215f).

      We thank the reviewer for catching this. We had actually already used tau = 10 ms, but had not yet updated the paper. Moreover, the somatic input resistance was indeed off. To rectify this, we have used the values: $Cm = 0.5 nF$, $\taum = 10 ms$, $Rm = 20 M \Ohm$, $gl = 0.05 \mu S$. Figure 3 was remade using these values, and Table 1 updated accordingly.

      (10) l. 158ff "The assumption that each neuron connects to one dendrite of an upstream neuron is actually grounded in physiology, although it may appear like a strong assumption at first glance: related inputs arrive at local clusters of spines synchronously [60].": — You probably mean "each neuron connects to one dendrite of a downstream neuron." And I would add "But see Beniaguev et al. 2022 https://www.biorxiv.org/content/10.1101/2022.01.28.478132v2.abstract " - your restrictive arrangement of inputs is probably not really needed, especially if postsynaptic neurons have more dendrites.

      The suggested wording was correct, and has been now incorporated (l. 166). I have also added the suggested citation.

      (11) I note that the plateaus in Fig. 4D are much shorter than those in Fig. 2D and F, but thisis a good thing: The experimental and simulation results in Fig. 2 are based on ref. 18, which used microiontophoresis of glutamate, leading to much slower glutamate concentration time courses at the dendritic NMDA receptors than synaptic release of glutamate would. The time courses of plateaus in Fig. 4 are much more in line with the NMDA plateau durations shown in ref. 21, especially their Figure 2B. These faster NMDA plateaus (or NMDA spikes as they are called in ref. 21) are based on synaptic release of glutamate in vivo, and on the faster NMDA receptor kinetics at physiological temperature compared to the old models with room temperature kinetics used in ref. 18.

      Here are two additional references that the authors might find interesting:

      Fisek et al. 2023 https://www.nature.com/articles/s41586-023-06007-6 Dudai et al. 2022 https://www.jneurosci.org/content/42/7/1184.full

      We thank the reviewer for the suggested references. The first has been added to the references in the introduction, on l. 28. The second has been added on l. 78.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 3A, we observed some animal pictures, which were never explained in the figurecaption, or text of the manuscript. These pictures were probably explained at the lab meeting, so it is unnecessary to waste effort on these pictures in the manuscript draft.

      We agree with the reviewer; the figures have been removed.

      (2) Figure 1 has not been referenced anywhere in the manuscript text!

      Indeed, this had to be corrected, figure is now references on l. 9.

      (3) Line 45. "[18] triggered two NMDA spikes by glutamate uncaging at the indicated (red,blue) sites". [18] triggered one NMDA spike while recording at three locations simultaneously (two locations in dendrite and one location in the soma).

      The reviewer is correct here. The sentence has now been rephrased to "(ref.) triggered an NMDA spike by glutamate microiontophoresis while recording at the soma and the indicated (red, blue) sites in the dendrite." (l. 49)

      (4) Fig. 2B. The two labels, "Dendrite 2" and "Dendrite 1" incorrectly suggest that two traceswere recorded in two dendrites. These two traces were recorded in the same dendrite.

      We agree with the reviewer; labels have now been changed to "Dendrite site".

      (5) Line 45. "[18] triggered two NMDA spikes by glutamate uncaging at the indicated (red,blue) sites". - - One NMDA spike by "glutamate microiontophoresis".

      This is correct, the phrasing on (l. 50) has been changed accordingly.

      (6) Line 47. "... simulated glutamate releases 50 ms apart in the three dendritic sites indicatedin Figure 2C, thereby triggering three NMDA spikes at those sites. Despite these dendritic spikes being initiated at different times, they still sum in the soma, leading to a sodium spike there (Figure 2E)". A similar experiment has been performed in real cortical neurons (KD Oikonomu et al., 2012, PMID: 22934081), and could potentially be cited here. Briefly, Oikonomou et al. generated two dendritic plateau potentials in two dendritic branches and monitored the summation of these dendritic plateau potentials in the cell body.

      The reference has been added, on l. 54

      (7) Line 63. "We compared the behaviour of our simplified model with that of the full, detailedbiophysical model". Which detailed biophysical model? Please cite here the detailed biophysical model that you used for comparisons with your simplified abstract model.

      The reference to the paper has been added.

      (8) Line 65. "Figure 2F shows that spikes arriving at different times are summed in anintegrate and hold-like manner". In Fig. 2F, I am unable to see that spikes arriving at different times are summed in an integrate and hold-like manner. Which feature of Fig. 2F refers to the "hold-like manner"? Please explain in the manuscript.

      To clarify we have added "Figure 2F, top" in the text (l. 71).

      (9) Figure 2 caption. "(F) The voltage traces of the abstract model, with and without plateaus.Because of the extended time duration of the plateau potentials, they sum accurately to produce a somatic spike". I am unable to understand what an "accurate summation" in Fig. 2 is. Could the authors provide an illustrative example of a situation in which the neuronal potentials DID NOT sum accurately?

      To address this confusion, we have changed the wording to "...they are summed to reach threshold."

      (10) Line 75. "This is an important issue we intend to return to in future work". The authorspersonal plans should not be in the text discussing scientific results.

      We believe it is entirely reasonable to discuss scientific plans that are part of ongoing work, and this is quite common throughout the literature. Nonetheless, we have now reworded this to "This is an important issue for future work." (l. 81)

      (11) In Fig. 4F, the full-line and the dashed-line have not been identified! The readers are leftto guess.

      This has now been addressed both with text inserts in the figure, and specification in the figure caption.

      (12) Line 247. "would amount to scaling up the number of cells in a network to performcomputations that could, in principle, be performed by more robust single units". Did the authors mean to say: "would amount to scaling up the number of cells in a network to perform computations that could, in principle, be performed by a fewer (but more robust) single units"?

      We have replaced the sentence with the reviewer's suggestion (l. 259)

      (13) In the abstract, the authors repeat sentences: "the timescale of dendritic potentialsallows reliable integration of asynchronous inputs" and "nonlinear dendritic plateau potentials allow reliable integration of asynchronous spikes". Besides this being a bad writing style, the authors cannot decide if inputs to the model neuron are asynchronous, or spiking of the model neuron is asynchronous. Are these asynchronous spikes occurring in the neuron experiencing dendritic plateau potentials, or these asynchronous spikes occur in the neuronal network? This confusion of terms and ideas must be removed from the abstract.

      We have rewritten the second sentence, which now reads: "Using this model, we show that long-lived, nonlinear dendritic plateau potentials allow neurons to spike reliably when confronted with asynchronous input spikes."

      (14) In the abstract, the authors claim: "Our results provide empirically testable hypothesesfor the role of dendritic action potentials in cortical function". With great anticipation, I read throughout the manuscript, but I was unable to find one single experimental design that could support the authors' bald statement. In the text of the manuscript, the authors must carefully reveal the precise experimental outline that would test their specific hypothesis, or delete the untrue statement.

      We respectfully challenge the rather critical tone of the reviewer. The central hypothesis that plateaus enable robust summation, and that circuit level computations rely on this is an experimentally testable hypothesis. The precise experimental design of how to test such a hypothesis is always best left to an experimentalist to determine, as there are many possible means of doing this and each will depend on the preparation and methodology at hand. At the same time, we understand that there is an increasing culture of expecting explicit "testable hypotheses" spelled out to the reader. To satisfy this expectation while avoiding overly prescriptive ideas for how future work should proceed, we have now added more explicit descriptions of possible experimental tests in l. 231 and onwards.

      (15) Fig. 2F was submitted for review without a time scale, while at the same time the authorsheavily discuss specific numerical values for time intervals. Namely, the authors instruct readers to pay attention to a 10 ms time constant and 2-3 ms input decay (Fig. 2F), but they do not show the time scale in Fig. 2F.

      "We compared this to a situation where all inputs arrive at a soma with standard LIF dynamics and a 10 ms membrane time constant. This time constant is consistent with the high-conductance state of pyramidal neurons in the cortex [6]: Inputs decay after 2-3 ms, and fail to sum to spike threshold (Figure 2F, lower)".

      The time (and voltage) bars have now been added to Fig. 2F.

      (16) Line 75. "In the scope of what remains here we want to ask if integrate-and-hold isminimally feasible". This reviewer is unable to understand the meaning of the syntaxes "integrate-and-hold" and "minimally feasible" in the context of dendritic integration. This reviewer is worried that the majority of the journal readers would feel exactly the same. To alleviate this problem, the authors should explain both terms right here, in line 77.

      Integrate-and-hold is defined on line 57 (to be exact we write: "We refer to this behavior as “Leaky Integrate-and-Hold” (LIH)." To be more clear we could reuse the acronym LIH here, to emphasise that we are referring to the same thing. By 'minimally feasible' we mean biologically plausible given assumptions that are not strong. Can use another term, e.g. "biologically plausible under lenient assumptions".

      To address this point, we have rephrased the sentence as "In the scope of what remains here we want to ask if Leaky-Integrate-and-Hold (LIH) can easily and plausibly facilitate network computations with spikes." (l. 81), repeating the LIH definition.

      (17) Line 91. "Spikes arriving even slightly out of sync with each other introduces noise in themembrane potential ..." Introduce.

      The sentence has been fixed using the reviewer's correction.

      (18) The caption of the Fig. 3B was submitted for review without any explanation of thenormalization procedure used. Also, in the caption of the same figure, one cannot find explanation of the light-gray area surrounding the black curves. Also, the readers are left to wonder how the results of a simulation could possibly be greater than 1 in some simulation trials.

      We have added a description of the normalization and the shaded area to the caption of Fig. 3B.

      (19) Line 117. "We assumed that inputs to a network arrive at the dendrites within some timewindow, and their combined depolarisations are either sufficient to either elicit a dendritic spike or not, as shown in Figure 3". We could potentially compact the current text by deleting one instance of "either".

      We agree this is better writing; one of the occurrences of 'either' has been removed.

      (20) Line 127. "where incoming connections can be represented with a 1 (a spike arrives)..."Did you mean "a presynaptic spike arrives"?

      The sentence has been rewritten following the suggestion.

      (21) Line 134. "with each unit only having ..." Dendrite can be a unit. Dendritic spine can be aunit. Did you mean "with each unit (i.e. neuron) having ..."

      We have incorporated the suggestion.

      (22) Fig. 4, Caption. "Each point is a 2D input vector x, the colors represent the differentclasses". An effort was made to introduce 3 different classes. But then, no mention of "classes" thereafter. The three input vectors, mentioned in Line 170, perhaps represent the remnants of the class concept mentioned in the previous paragraph.

      We have now rewritten the sentence beginning with "These three input vectors ..." on l. 182 to emphasise that a correct answer means a correct classification.

      (23) Line 152. "The 2D input points were first projected onto a binary feature space, to obtain13D binary vectors". Did you mean to say: "The 2D input points (three classes, Fig. A) were first projected onto a binary feature space, to obtain three binary vectors; each 13D binary vector responding to a specific class".

      The sentence has been replaced with the reviewer's suggestion (l. 159).

      (24) Line 163. "Because our focus is to account for how transient signals can be summed andthresholded robustly, we are assuming that inhibition is implicitly accounted for in the lumped abstraction". Could you please explain your two ideas: [1] "inhibition is implicitly accounted for" and [2] "lumped abstraction", because this reviewer did not get neither idea.

      The reviewer is right that as it stood, the sentence was unclear. To clarify the point we have decided to expand upon that sentence and break it out as an individual paragraph (starting l. 171).

    1. eLife Assessment

      This study aggregates across five fMRI datasets and reports that a network of brain areas previously associated with response inhibition processes, including several in the basal ganglia, are more active on failed stop than successful stop trials. This study is valuable as a well-powered investigation of fMRI measures of stopping, and following revisions provides solid evidence for its conclusions.

    2. Reviewer #2 (Public review):

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

      Comments on revisions:

      The authors have been responsive to the feedback of both reviewers and they have significantly improved the manuscript. I now judge the work as valuable and solid. The authors have achieved their aims to characterize subcortical BOLD activation in the stop-signal paradigm.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      Comments on revised version:

      This is my second review of this article, now entitled "Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm" by Isherwood and colleagues.

      The authors have been very responsive to the initial round of reviews.

      I still think it would be helpful to see a combined investigation of the available 7T data, just to really drive the point home that even with the best parameters and a multi-study sample size, fMRI cannot detect any increases in BOLD activity on successful stop compared to go trials. However, I agree with the authors that these "sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST." As such, I don't have any more feedback.

      We thank the reviewer for their positive feedback, and for their thorough and constructive comments on our initial submission. 

      Reviewer 2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

      Since the initial submission, the authors have improved their theoretical synthesis and changed their SSRT calculation method to the more appropriate integration method with replacement for go omissions. They have also done a better job of explaining how these fMRI results situate within the broader response inhibition literature including work using other neuroscience methods.

      They have also included a new Bayes Factor analysis. In the process of evaluating this new analysis, I recognized the following comments that I believe justify additional analyses and discussion:

      First, if I understand the author's pipeline, for the ROI analyses it is not appropriate to run FSL's FILM method on the data that were generated by repeating the same time series across all voxels of an ROI. FSL's FILM uses neighboring voxels in parts of the estimation to stabilize temporal correlation and variance estimates and was intended and evaluated for use on voxelwise data. Instead, I believe it would be more appropriate to average the level 1 contrast estimates over the voxels of each ROI to serve as the dependent variables in the ROI analysis.

      We agree with the reviewer’s assertion that this approach could create estimation problems. However, in this instance, we turned off the spatial smoothing procedure that FSL’s FILM normally uses for estimating the amount of autocorrelation – thus, the autocorrelation was estimated based on each voxel’s timeseries individually. We also confirmed that all voxels within each ROI had identical statistics, which would not be the case if the autocorrelation estimates differed per voxel. We have added the following text to the Methods section under fMRI analysis: ROI-wise:

      Note that the standard implementation of FSL FILM uses a spatial smoothing procedure prior to estimating temporal autocorrelations which is suitable for use only on voxelwise data (Woolrich et al., 2001). We therefore turned this spatial smoothing procedure off and instead estimated autocorrelation using each voxel’s individual timeseries.

      Second, for the group-level ROI analyses there seems to be inconsistencies when comparing the z-statistics (Figure 3) to the Bayes Factors (Figure 4) in that very similar zstatistics have very different Bayes Factors within the same contrast across different brain areas, which seemed surprising (e.g., a z of 6.64 has a BF of .858 while another with a z of 6.76 has a BF of 3.18). The authors do briefly discuss some instances in the frequentist and Bayesian results differ, but they do not ever explain by similar z-stats yield very different bayes factors for a given contrast across different brain areas. I believe a discussion of this would be useful.

      We thank the reviewer for their keen observation, and agree that this is indeed a strange inconsistency. Upon reviewing this issue, we came across an error in our analysis pipeline, which led to inconsistent scaling of the parameter estimates between datasets. We corrected this error, and included new tables (Figures 3, 4, and Supplementary Figure 5) which now show improved correspondence between the frequentist results from FSL and the Bayesian results.

      We have updated the text of the Results section accordingly. In this revision, we have also updated all BFs to be expressed in log<sub>10</sub> form, to ensure consistency for the reader. Updates to the manuscript are given below.

      Results: Behavioural Analyses:

      Consistent with the assumptions of the standard horse-race model (Logan & Cowan, 1984), the median failed stop RT is significantly faster within all datasets than the median go RT (Aron_3T: p < .001, BF<sub>log10</sub> = 2.77; Poldrack_3T: p < .001, BF<sub>log10</sub> = 23.49; deHollander_7T: p < .001, B BF<sub>log10</sub> = 8.88; Isherwood_7T: p < .001, BF<sub>log10</sub> = 2.95; Miletic_7T: p = .0019, BF<sub>log10</sub> = 1.35). Mean SSRTs were calculated using the integration method and are all within normal range across the datasets.

      Results: ROI-wise GLMS: 

      To further statistically compare the functional results between datasets, we then fit a set of GLMs using the canonical HRF with a temporal derivative to the timeseries extracted from each ROI. Below we show the results of the group-level ROI analyses over all datasets using z-scores (Fig. 3) and log-transformed Bayes Factors (BF; Fig. 4). Note that these values were time-locked to the onset of the go signal. See Supplementary Figure 5 for analyses where the FS and SS trials were time-locked to the onset of the stop signal. To account for multiple comparisons, threshold values were set using the FDR method for the frequentist analyses. 

      For the FS > GO contrast, the frequentist analysis found significant positive z-scores in all regions bar left and right M1, and the left GPi. The right M1 showed a significant negative z-score; left M1 and GPi showed no significant effect in this contrast. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral IFG, preSMA, caudate, STN, Tha, and VTA, and right GPe. Bilateral M1 and left GPi showed moderate evidence for the null. Evidence for other ROIs was anecdotal (see Fig 4). 

      For the FS > SS contrast, we found significant positive z-scores in in all regions except the left GPi. The BFs showed moderate or greater evidence for right IFG, right GPi, and bilateral M1, preSMA, Tha, and VTA, and moderate evidence for the null in left GPi. Evidence for other ROIs was anecdotal (see Fig 4). 

      For the SS > GO contrast we found a significant positive z-scores in bilateral IFG, right Tha, and right VTA, and significant negative z-scores in bilateral M1, left GPe, right GPi, and bilateral putamen. The BFs showed moderate or greater evidence for the alternative hypothesis in bilateral M1 and right IFG, and moderate or greater evidence for the null in left preSMA, bilateral caudate, bilateral GPe, left GPi, bilateral putamen, and bilateral SN. Evidence for other ROIs was anecdotal (see Fig 4). 

      Although the frequentist and Bayesian analyses are mostly in line with one another, there were also some differences, particularly in the contrasts with FS. In the FS > GO contrast, the interpretation of the GPi, GPe, putamen, and SN differ. The frequentist models suggests significantly increased activation for these regions (except left GPi) in FS trials. In the Bayesian model, this evidence was found to be anecdotal in the SN and right GPi, and moderate in the right GPe, while finding anecdotal or moderate evidence for the null hypothesis in the left GPe, left GPi, and putamen. For the FS > SS contrast, the frequentist analysis showed significant activation in all regions except for the left GPi, whereas the Bayesian analysis found this evidence to be only anecdotal, or in favour of the null for a large number of regions (see Fig 4 for details).  

      Since the Bayes Factor analysis appears to be based on repeated measures ANOVA and the z-statistics are from Flame1+2, the BayesFactor analysis model does not pair with the frequentist analysis model very cleanly. To facilitate comparison, I would recommend that the same repeated measures ANOVA model should be used in both cases. My reading of the literature is that there is no need to be concerned about any benefits of using Flame being lost, since heteroscedasticity does not impact type I errors and will only potentially impact power.

      We agree with the reviewer that there are differences between the two analyses. The advantage of the z-statistics from FSL’s flame 1+2 is that these are based on a multi-level model in which measurement error in the first level (i.e., subject level) is taken into account in the group-level analysis. This is an advantage especially in the current paper since the datasets differ strongly in the degree of measurement error, both due to the differences in field strength and in the number of trials (and volumes). Although multilevel Bayesian approaches exist, none (except by use of custom code) allow for convolution with the HRF of a design matrix like typical MRI analyses. Thus, we extracted the participant-level parameter estimates (converted to percent signal change), and only estimated the dataset and group level parameters with the BayesFactor package. As such, this approach effectively ignores measurement error. However, despite these differences in the analyses, the general conclusions from the Bayesian and frequentist analyses are very aligned after we corrected for the error described above. The Bayesian results are more conservative, which can be explained by the unfiltered participantlevel measurement error increasing the uncertainty of the group-level parameter estimates. At worst, the BFs represent the lower bounds of the true effect, and are thus safe to interpret. 

      We have also included an additional figure (Supplementary Figure 7) that shows the correspondence between the BFs and the z scores. 

      Though frequentist statistics suggest that many basal ganglia structures are significantly more active in the FS > SS contrast (see 2nd row of Figure 3), the Bayesian analyses are much more equivocal, with no basal ganglia areas showing Log10BF > 1 (which would be indicative of strong evidence). The authors suggest that "the frequentist and Bayesian analyses are monst in line with one another", but in my view, this frequentist vs. Bayesian analysis for the FS > SS contrast seems to suggest substantially different conclusions. More specifically, the frequentist analyses suggest greater activity in FS than SS in most basal ganglia ROIs (all but 2), but the Bayesian analysis did not find *any* basal ganglia ROIs with strong evidence for the alternative hypothesis (or a difference), and several with more evidence for the null than the alternative hypothesis. This difference between the frequentist and Bayesian analyses seems to warrant discussion, but unless I overlooked it, the Bayesian analyses are not mentioned in the Discussion at all. In my view, the frequentist analyses are treated as the results, and the Bayesian analyses were largely ignored.

      The original manuscript only used frequentist statistics to assess the results, and then added Bayesian analyses later in response to a reviewer comment. We agree that the revised discussion did not consider the Bayesian results in enough detail, and have updated the manuscript throughout to more thoroughly incorporate the Bayesian analyses and improve overall readability. 

      In the Methods section, we have updated the fMRI analysis – general linear models (GLMs): ROIwise GLMs section to more thoroughly incorporate the Bayesian analyses as follows: 

      We compared the full model (H1) comprising trial type, dataset and subject as predictors to the null model (H0) comprising only the dataset and subject as predictor. Datasets and subjects were modeled as random factors in both cases. Since effect sizes in fMRI analyses are typically small, we set the scaling parameter on the effect size prior for fixed effects to 0.25, instead of the default of 0.5, which assumes medium effect sizes (note that the same qualitative conclusions would be reached with the default prior setting; Rouder et al., 2009). We divided the resultant BFs from the full model by the null model to provide evidence for or against a difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Andraszewicz et al., 2014; Jeffreys, 1939). To facilitate interpretation of the BFs, we converted them to the logarithmic scale. The approximate conversion between the interpretation of logarithmic BFs and standard interpretation on the adjusted Jeffreys’ scale can be found in Table 3.   

      The Bayesian results are also more incorporated into the Discussion as follows: 

      Evidence for the role of the basal ganglia in response inhibition comes from a multitude of studies citing significant activation of either the SN, STN or GPe during successful inhibition trials (Aron, 2007; Aron & Poldrack, 2006; Mallet et al., 2016; Nambu et al., 2002; Zhang & Iwaki, 2019). Here, we re-examined activation patterns in the subcortex across five different datasets, identifying differences in regional activation using both frequentist and Bayesian approaches. Broadly, the frequentist approach found significant differences between most ROIs in FS>GO and FS>SS contrasts, and limited differences in the SS>GO contrast. The Bayesian results were more conservative; while many of the ROIs showed moderate or strong evidence, some with small but significant z scores were considered only anecdotal by the Bayesian analysis. In our discussion, where the findings between analytical approaches differ, we focus mainly on the more conservative Bayesian analysis.

      Here, our multi-study results found limited evidence that the canonical inhibition pathways (the indirect and hyperdirect pathways) are recruited during successful response inhibition in the SST. We expected to find increased activation in the nodes of the indirect pathway (e.g., the preSMA, GPe, STN, SN, GPi, and thalamus) during successful stop compared to go or failed stop trials. We found strong evidence for activation pattern differences in the preSMA, thalamus, and right GPi between the two stop types (failed and successful), and limited evidence, or evidence in favour of the null hypothesis, in the other regions, such as the GPe, STN, and SN. However, we did find recruitment of subcortical nodes (VTA, thalamus, STN, and caudate), as well as preSMA and IFG activation during failed stop trials. We suggest that these results indicate that failing to inhibit one’s action is a larger driver of the utilisation of these nodes than action cancellation itself. 

      These results are in contention to many previous fMRI studies of the stop signal task as well as research using other measurement techniques such as local field potential recordings, direct subcortical stimulation, and animal studies, where activation of particularly the STN has consistently been observed (Alegre et al., 2013b; Aron & Poldrack, 2006; Benis et al., 2014; Fischer et al., 2017; Mancini et al., 2019; Wessel et al., 2016).

    1. eLife Assessment

      In this fundamental study, the authors describe ELF3 as a candidate driver of luminal progenitor transformation, such that its up-regulation during replicative stress conditions and in BRCA1 deficient cells may permit cell proliferation by suppressing genome instability. While the work is certainly of interest, the supporting data remain incomplete as luminal progenitor cells could not be isolated, which would be needed in order to definitively determine whether ELF3 is a driver of transformation in these cells. Overall this paper may offer insight into mechanisms by which BRCA1 deficiency fuels breast tumorigenesis.

    2. Reviewer #1 (Public review):

      The authors set out to define the molecular basis for LP as the origin of BRCA1-deficient breast cancers. They showed that LPs have the highest level of replicative stress, and hypothesise that this may account for their tendency to transform. They went on to identify ELF3 as a candidate driver of LP transformation and showed that ELF3 expression is up-regulated in response to replicative stress as well as BRCA1 deficiency. They went on to show that ELF3 inactivation led to a higher level of DNA damage, which may result from compromised replicative stress responses.

      While the manuscript supports the interesting idea wherein ELF3 may fuel LP cell transformation, it remains obscure how ELF3 promotes cell tolerance to DNA damage. Interestingly the authors proposed that ELF3 suppresses excessive genomic instability, but in my opinion, I do not see any evidence that supports this claim. In fact, one might think that genomic instability is key to cell transformation.

      Comments on revisions:

      The authors have addressed most of my concerns.

      This being said, the one major criticism raised by both Reviewers is the lack of evidence to support ELF3 as a driver of transformation of and in LP cells. The authors appear to have invested much resource and time but were not successful in isolating LP cells for experimentations. I would therefore suggest that the authors tone down their claims throughout the manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to define the molecular basis for LP as the origin of BRCA1deficient breast cancers. They showed that LPs have the highest level of replicative stress, and hypothesise that this may account for their tendency to transform. They went on to identify ELF3 as a candidate driver of LP transformation and showed that ELF3 expression is up-regulated in response to replicative stress as well as BRCA1 deficiency. They went on to show that ELF3 inactivation led to a higher level of DNA damage, which may result from compromised replicative stress responses.

      While the manuscript supports the interesting idea wherein ELF3 may fuel LP cell transformation, it remains obscure how ELF3 promotes cell tolerance to DNA damage. Interestingly the authors proposed that ELF3 suppresses excessive genomic instability, but in my opinion, I do not see any evidence that supports this claim. In fact, one might think that genomic instability is key to cell transformation.

      We greatly appreciate your thorough review and insightful comments on our manuscript. We have taken your feedback seriously and have made several key revisions to address your concerns.

      To your primary point about how ELF3 helps cells tolerate DNA damage, we have expanded our discussion to clarify the role of ELF3 in the context of BRCA1 deficiency and high replicative stress. We clarified that while ELF3 may not directly suppress excessive genomic instability, it plays a role in maintaining a balance that prevents catastrophic damage in BRCA1-deficient cells. Both BRCA1 deficiency and increased replication stress induce up-regulation of ELF3, which acts as a transcription factor, and it’s up-regulation leads to up-regulation of the expression of a variety of DNA replication-associated proteins that help to maintain homeostasis in the DNA replication process (Figure 5 E and F). Defects in ELF3 also do lead to disruption of the DNA replication process (Figure 5 G-I). While ELF3 cannot completely eliminate genomic instability, ELF3 essentially maintains genomic instability within a dangerous yet non-lethal range: higher than in normal cells, but not so high as to cause cell death.

      This precarious balance can facilitate the transformation of LPs into a malignant state, as you pointed out.

      In the revised manuscript, we emphasized that in cells with inherently low replicative stress, such as other non-LP mammary cells, the ELF3-associated mechanism might help cells endure the high replicative stress caused by BRCA1 deficiency without leading to cancerous changes. However, in LP cells, which naturally experience higher replicative stress, this ELF3-related mechanism may make them more susceptible to transformation into cancer cells. This supports our hypothesis that the combination of high replicative stress and BRCA1 deficiency specifically predisposes LP cells to tumorigenesis.

      We have modified the working model to make it clearer.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript focuses on a persistent question of why germline mutations in BRCA1 which impair homology-directed repair of DNA double-strand breaks predispose to primarily breast and ovarian cancers but not other tissues. The authors propose that replication stress is elevated in the luminal progenitor (LP) cells and apply the gene signature from Dreyer et al as a measure of replication stress in populations of cells selected by FACS previously (published by Lim et al.) and suggest an enrichment of replication stress among the LP cells. This is followed by single-cell RNA seq data from a small number of breast tissues from a small number of BRCA1 mutation carriers but the pathogenic variants are not listed. The authors perform an elegant analysis of the effects of BRCA1 knockdown in MCF10A cells, but these cells are not considered a model of LP cells.

      Overall, the manuscript suffers from significant gaps and leaps in logic among the datasets used. The connection to luminal progenitor cells is not adequately established because the models used are not representative of this population of cells. Therefore, the central hypothesis is not sufficiently justified.

      Strengths:

      The inducible knockdown of BRCA1 provided compelling data pointing to an upregulation of ELF3 in this setting as well as a small number of other genes. It would be useful to discuss the other genes for completeness and explain the logic for focusing on ELF3. Nonetheless, the connection with ELF 3 is reasonable. The authors provide significant data showing a role for ELF3 in breast epithelial cells and its role in cell survival.

      Weaknesses:

      The initial observations in primary breast cells have small sample sizes. The mutations in BRCA1 seem to be presumed to be all the same, but we know that pathogenic variants differ among individuals and range from missense mutations affecting interactions with one critical partner to large-scale truncations of the protein.

      The figure legends are missing critical details that make it difficult for the reader to evaluate the data. The data support the notion that ELF3 may participate in relieving replication stress, but does not appear to be limited to LP cells as proposed in the hypothesis.

      We would like to sincerely thank you for your thorough review and constructive feedback on our manuscript. Your insightful comments and suggestions have been invaluable in guiding our revisions.

      (1) Acknowledgment of Data Set Limitations and Additional Analyses:    We fully acknowledge the importance of the concerns raised regarding the datasets used in our study. We have supplemented our manuscript with the missing information you pointed out and conducted additional analyses as suggested. These efforts have

      (2) Challenges in LP Cell Experiments:

      One of the most critical issues you raised was the lack of validation in LP cells, particularly concerning the role of ELF3 in these cells. We are acutely aware of the significance of this point. Following your review, we made extensive efforts to isolate and culture LP cells from both BRCA1-proficient and BRCA1-deficient patient samples. We tried various methods and invested substantial resources, including time, manpower, and materials, to establish a reliable protocol for isolating and cultivating LP cells in vitro. Unfortunately, despite our best efforts, we were unable to obtain a sufficient number of high-quality cells to generate solid and reproducible results.

      The challenges we faced included the limited availability of patient tissues and the technical difficulties in consistently obtaining viable LP cells. Given the already extended timeline for the revision of this manuscript, we regretfully decided to forgo further attempts to perform these critical experiments with LP cells. In the revised manuscript, we have explicitly addressed the limitations of our cell models and provided a detailed discussion of the challenges faced in isolating LP cells. Despite these limitations, we believe that the consistency between our results and LP cell sequencing data provides valuable insights and a solid foundation for future studies.

      (3) Data Presentation Improvements:

      In response to your feedback, we have also made significant improvements to the data presentation in our manuscript. We updated and optimized figure legends and narrative sections to ensure that the data are clearly and accurately conveyed. These changes aim to enhance the readability and comprehensibility of our findings.

      We greatly appreciate your valuable feedback, which has significantly contributed to the improvement of our manuscript. Your suggestions have helped us refine our arguments and present a more robust and nuanced interpretation of our data. 

      Thank you once again for your critical and constructive review. We look forward to your feedback on our revised manuscript.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):  

      As such, in addition to consolidating the role of ELF3 in promoting cell tolerance to replicative stress (or in suppressing genomic instability), I have a few comments the authors should consider to improve their manuscript.  

      (1) I am not sure how cells have gained a growth advantage if they were arrested (Line 105-106). Perhaps the authors can elaborate.

      Thanks for pointing this out and we are sorry for the misleading statement. We have revised the manuscript and would like to clarify that “survival advantage” may be more accurate than “growth advantage”, and since long-term DOX treatment led to decreased cell survival indicated by decreased number of colonies in Supplemental Fig. S1D, thus many cells died during DOX treatment. Therefore, the cells able to survive throughout DOX treatment and being collected for sequencing may have gained survival advantage compared to their counterparts who fail to survive.  

      (2) Figure 3D - From Western blotting of ELF3, forced expression of E2F6 does not appear to "block" HU-induced ELF3 up-regulation, but merely down-regulate basal level of ELF3, with the effect of HU still notable.

      Thanks for the comment and we agree that E2F6 down-regulate ELF3 baseline expression levels and did not fully block ELF3 up-regulation. After calculating the foldchange after E2F6 overexpression, we did confirm that E2F6 overexpression still partially block HU-induced ELF3 up-regulation, with foldchange from 3.32 to 2.40, supporting our conclusion that HU-induced ELF3 upregulation is regulated by ATRChk1-E2F axis. It does, however, cannot be excluded that E2F6 also regulates ELF3 expression in other replication stress-independent ways, and we have revised the manuscript accordingly. 

      (3) Figure 3J & K - In my opinion, if BRCA1 knockdown were more efficient it remains formally possible that co-depletion of BRCA1 and GATA3 may exhibit additive effects in up-regulating ELF3 mRNA level.

      Thank you for the comment. Actually, the BRCA1 knockdown efficiency in Figure 3J was shown in Supplemental Fig. S3B, and notably both BRCA1 and GATA3 knockdown were numerically more efficient in the double-knockdown group than in the single-knockdown group, individually. Thus, the higher ELF3 up-regulation in double-knockdown group in Figure 3J could be cause by the superior knockdown efficiency of both BRCA1 and GATA3. Nonetheless, we agree that it might be possible that BRCA1 and GATA3 still have separate functions in this experimental setting and marginal additive effect may exist, and the manuscript was revised accordingly.

      (4) Figure 4 - Perhaps the authors can change its title to better summarise the findings. Cell sensitivity assays and xenograph experimentations may not necessarily relate to genomic instability.

      Thank you for the great suggestion. To summarize the results more accurately, we have revised the title as “ELF3 can help cells tolerate replication stress and sustain cell survival”.

      (5) Figure 5B&C - It would be important to document the time-dependent resolution of HU-induced DNA lesions by including additional time-points before, during, and after HU treatment.

      We appreciate the suggestion to include additional time points to document the timedependent resolution of HU-induced DNA lesions. In our experiments, we observed that ELF3 knockdown leads to genomic instability both in the presence and absence of HU treatment. Specifically, Figure 5A and Figure S5 demonstrate that ELF3 knockdown increases genomic instability without HU treatment, indicating its role in maintaining genomic stability under normal conditions. On the other hand, Figure 5B, 5C, and 5D show that ELF3 knockdown under HU-induced replication stress further exacerbates genomic instability. This observation aligns with our finding that ELF3 expression increases in response to replication stress, suggesting its critical role in maintaining replication homeostasis under such conditions. 

      6) Figure 5F&I - Which ELF3 siRNA was used in these experimentations? Since the authors did not exclude off-target effects perhaps it may be worthwhile to include both ELF3 siRNAs for Panel F.

      Thanks for your advice. The qPCR (Figure 5F) and DNA fiber assay (Figure 5I) were using siELF3-4 siRNA. And we repeat the qPCR experiments for Panel F using siELF3-5 siRNA (Supplement Fig. S5B).

      We sincerely thank you for your thoughtful feedback and constructive suggestions. Addressing these points has strengthened our manuscript, and we are grateful for the opportunity to refine and clarify our work. We appreciate your critical evaluation and look forward to further constructive dialogue.

      Reviewer #2 (Recommendations For The Authors):  

      (1) The data driving the hypothesis uses gene expression signatures as an indirect measure of replication stress. This is a critical concern.

      a. At this time, numerous gene expression signatures have been reported to be biomarkers of replication stress. Therefore, it would be valuable to apply additional gene expression signatures to examine the performance and the overlap in the results.

      The recent work by Takahashi et al., 2022 (https://pubmed.ncbi.nlm.nih.gov/36381660/) provides a signature that was derived independently and offers one that can be used to assess the performance of the signatures and stability of the conclusions.

      Thank you for the valuable suggestion. We have done the replication stress evaluation of mammary cell subgroups using the Repstress score developed in the work you mentioned. The result showed that LP cells have trends of higher replication stress compared with other subgroups, though no statistical significance. This result, consistent with our previous analysis, indicated that LP cells have higher trends of replication stress levels. And we have added this data as the last line of Figure 1A in revised vision.

      Author response image 1.

      Replication stress pathway scores of different human normal mammary cell  populations. The gene expression data were from Lim et al. (3).

      b. A direct measure of replication stress in LP cells would be important to confirm the gene expression signature. Therefore, performing immunostaining for markers of replication stress (eg gamma-H2AX foci, DNA fiber assays) would provide more direct data to support the assertions.

      Thank you for this suggestion and we totally agree that experiments revealing replication stress levels by investigating common markers, e.g., gamma-H2AX foci, DNA fiber assays, will provide vital evidence for our hypothesis. However, since our last response, we have been diligently trying to obtain LP cells for these experiments but encountered technical challenges while attempting to isolate and culture LP cells in vitro. 

      In the discussion part, we have revised the manuscript to emphasize that the data obtained from MCF10A should be interpreted with caution and there are certain gaps between the cell models and LP cells.

      (2) The depth of single-cell sequencing can often be limiting. Therefore, a supplementary table listing the genes used for the replication stress signature and the frequency that they are observed in the single-cell sequencing data. This is needed to ensure that the replication stress score does not reflect a small subset of the replication stress signature genes.

      Thanks very much for this evaluable suggestion. We have provided an expression matrix of genes for the replication stress signature in the revised version (Supplementary Table S1), And we also calculated the average expression level of each gene in the cells. As shown in Author response image 2, these genes expressed relatively low at the single-cell level (with counts≤10), The expression differences among genes are relatively small. Thus, we excluded the possibility that several high-expressed genes significantly affect the replicative stress score.

      Author response image 2.

      Average counts of Top 50 genes for the replication stress signature

      (3) As only 4 BRCA mutation carriers are analyzed, it is critical that the mutations be reported for these individuals because pathogenic variants differ in their effects and interactions with the DNA repair machinery in cells.

      Thanks for the suggestion and the information of 4 BRCA1 mutant carriers were added in Supplemental Table S2.

      (4) The figures throughout lack critical details making it difficult to evaluate. Figure 1A states that these are "replication stress pathway scores..." but there is no evaluation of levels of statistical differences. The heat map has what appears to be a log unit score between +2 and -2 but it is unclear whether it is log2 or log10 or some other unit. In 1B, the replication stress scores are visualized as relative values between 0 and 0.1, but there is no indication of what this means or whether there is a statistically significant difference in the levels among the populations. As tumors are composed of multiple cell types, it should be stated how the "tumor cells" are uniquely identified in the figure legend. The lack of critical information is common across many of the figures making review frustratingly difficult.

      Thanks for the suggestion. We have added the statistical analysis and scale in Figure 1A legend. For Figure 1B, replication stress was calculated by sum of replication stress gene expression and presented as ln value. We have provided a quantitative figure and statistical tests (by Mann-Whitney) of replication stress scores for various cell types (Supplementary Figure 1A). 

      In addition, we added details of identification of tumor cells in the method section in the revised manuscript. Briefly, the adjacent normal breast sample served as a control to filter various types of normal cells from tumor samples. the normal cells from the tumor sample were merged with the same types of normal cells from adjacent normal breast samples, leaving one cell cluster only generalized by tumor sample. These tumor specific clusters were considered as malignant cell populations. We further found that the malignant cell population showed higher UMI counts than the normal cell populations, consistent with active metabolism in the malignant cells. More importantly, ER, PR, and HER2 expression of the malignant cells in each case were exactly matched with the clinical records. Finally, we utilized InferCNV to validate malignant cells subset as higher copy number alterations (CNAs) detected in the malignant cells compared with normal cells.

      (5) The hypothesis states that the LP cells are uniquely sensitive to deficiency in BRCA1 compared to other cells. However, the authors use knockdown of BRCA1 in MCF10A cells which are generally considered to be basal cells and not LP cells.

      Thanks for the comment. We totally agree that MCF10A cannot reflect the LP features and was mainly used as a normal mammary cell line model. We have tried to obtain human LP to perform some experiments but have all failed due to the cell vulnerability and difficult to be passed on in vitro. The gap between MCF10A and LP cells was stressed in the discussion part.

      (6) Figure 2, the number of samples being compared is not listed for most of the panels. It appears that ELF3 is enriched in subsets of breast cancers, but much of the data is not focused on BRCA1-deficient tumors. Therefore, the data appears to show that ELF3 expression is more of a generalized feature of TNBCs (which has been reported previously) and dilutes the support for the hypothesis. Therefore, panels C-G raise concerns regarding the overall hypothesis that LP cells are the cell type that is affected.

      Thanks for the suggestion. We have added the number of samples in Figure 2 legends.

      Our analysis focus on basal subtype because of the well-known relationship between BRCA1 deficiency and this subtype. Our results demonstrate the association between ELF3 expression and basal, TNBC, as well as HER2+ subtype, consistent with previous reports. Since TNBC also has high replication stress levels (NPJ Breast Cancer. 2020 Sep 7;6:40.), ELF3 upregulation in this subtype may not be solely due to BRCA1 deficiency, and we totally agree that this analysis may dilute the relationship between ELF3 and BRCA1. We have revised the discussion part to be more precise on this. 

      (7) Figure 3 provides experimental support for the hypothesis. While panel A is of interest, the legend lacks any description beyond "normal mammary tissue" and that there are non-carriers and carriers of BRCA1 mutations. Is this from bulk RNAseq data or single-cell RNAseq data? How many carriers and how many noncarriers? Panel E is ENCODE data from MCF7 cells that are ER+ luminal subtype so it is unclear if this is relevant to the LP cells that are the focus of the hypothesis.

      Thanks for the comments. Figure 3 panel A was from single-cell RNAseq data, including 3 BRCA1 WT patients and 4 BRCA1 mutant patients. All cells (normal cells and tumor cells) are involving, and ELF3 expression was normalized by reads in each cell. We have added this information in the figure legend. 

      It has been difficult to obtain ENCODE data in LP cells. The effect of E2F1 on regulation of ELF3 was validated in MCF10A cells by experiment and consistent with MCF7 ENCODE data, thus we suggest this effect can be conserve in mammary cells, but further confirmation in LP cells is needed. We have revised the manuscript to note that.

      (8) In Figure 4, the authors use BRCA1-deficient breast cancer cells to show the reliance on ELF3 and suggest that this is specific to this genetic lesion and not other subtypes. However, there is no data to show that this is not observed using ER+ cells or TNBC that are not BRCA1-deficient cell lines or models.

      Thanks for pointing this out. As ELF3 knockdown in MCF10A resulted in increased genomic instability (Supplement Fig S5) and less capability to resolve replication stress (Figure 5B), we believe that ELF3 can help deal with replication stress not specifically in BRCA1-mutant cells, but also normal mammary cells, and also multiple cell lines with distinct backgrounds as suggested in Figure 4G, 4H and Supplement Fig S4G. The special link between ELF3 and BRCA1 is reflected by ELF3 significant upregulation upon BRCA1 deficiency, but not ELF3 downstream functions. 

      (9) Figure 5 provides the first direct evaluation of biomarkers of replication stress (gamma H2AX, 53BP1). DNA fiber assays provide the most direct evaluation of replication fork kinetics, and therefore, replication stress. The knockdown of BRCA1 and ELF3 appear to phenocopy one another in the HCC1937, but there is no other cell type to show whether this is specific for BRCA1-deficient cells. For example, the MCF7 cells show E2F1 binding to ELF3 (Figure 3E) and may show replication stress upon knockdown of ELF3. Without testing this, the authors cannot suggest that the effect is linked to BRCA1 status. The authors do not identify the BRCA1 mutation in these cells and whether there is homozygous loss. Similarly, the mutational status in the SUM149PT cells should also be stated. These need to be added to aid interpretation of the results.

      Thank you for the constructive advice. We have added information regarding BRCA1 status of HCC1937 and SUM149PT. As discussed before, the results from Figure 4G and 4H suggest that ELF3 expression is associated with sensitivity to replicationstress-inducing-drugs across many cell lines. Thus ELF3 can maintain the stability of DNA replication is not specific to BRCA1-deficient cells. The reliance of ELF3 in BRCA1-deficiency we proposed is mainly focus on the fact that ELF3 is upregulated in BRCA1 deficient conditions, plus ELF3 may help cells tolerate replication stress during the transformation, therefore the resulted tumor cells-that is BRCA1-deficient breast cancer cells-may be more sensitive when losing ELF3 expression.

      (10) While the data in Figure 6 are valuable extensions of the gene signature derived from the MCF10A cells with BRCA1 knockdown, only 2 BRCA1 carriers are reported. As carriers bear heterozygous mutations in BRCA1, haplo-insufficiency would be necessary to generate the signature. The authors do note the publication by Panthania et al, but there are relatively few examples of haploinsufficiency. It should be noted that Sedic et al., 2015 also suggested haploinsufficiency in breast epithelial cell cultures from BRCA1 heterozygotes which appears to cause premature senescence, possibly via replication stress. However, this was observed in the basal epithelial cells. Therefore, this appears to be a feature of the breast epithelium more generally and is not enriched or limited to the LP cells.

      Thanks very much for your valuable suggestion. We have revised the discussion part to involve this important work and we fully agree that BRCA1 deficiency can cause replication stress not limited to LP cells. While in fact, the point we would like to address in Figure 6 is that BRCA1 deficiency modules the transcription profile towards LP-like cells, but not other-subtype-like cells, in normal mammary cells. We observed surprisingly similar profile between BRCA1-deficient cells and LP cells, suggesting there might be an inherent function of BRCA1 to mediate LP genes transcription. Furthermore, the data indicate that ELF3 has a tighter association with LP genes than other recognized LP-specific transcription factors like ELF5 and EHF, which are of the same family of ELF3. This result is intriguing since ELF3 can be upregulated by BRCA1 deficiency and replication stress. We assume that ELF3 could be a transcription node downstream of BRCA1 deficiency and modulate LP genes expression, and this process might be limited to LP cells since ELF3 has the highest expression levels in LP. Nonetheless, this hypothesis is also needed to be validated in LP cells by experiments. 

      We would like to express our deepest gratitude to the reviewers for their thorough and constructive feedback. Their insightful comments have been invaluable in guiding the revisions of our manuscript, helping us to clarify our hypotheses and strengthen the presentation of our findings. While we encountered some challenges, particularly with the isolation and culturing of LP cells, we made significant efforts to address the reviewers' concerns to the best of our ability. We have updated our manuscript accordingly, ensuring that all issues raised have been addressed comprehensively. We believe that these revisions have substantially improved the quality and clarity of our work, and we are excited to share our findings with the scientific community. Thank you once again for the opportunity to revise our manuscript, and we look forward to your feedback on the updated version.

    1. eLife Assessment

      This important work advances our understanding of factors influencing efficacy assessments and biomarker viability for complement-directed gene therapy against age-related macular degeneration. The data presented is convincing and offers insights and teachings for the design of gene therapy and complement-targeted therapeutics in the eye and more broadly for future ocular biomarker studies.

    2. Reviewer #1 (Public review):

      Summary:

      This study analyzed biomarker data from 28 subjects with geographic atrophy (GA) in a Phase I/II clinical trial of PPY988, a subretinal AAV2 complement factor I (CFI) gene therapy, to evaluate pharmacokinetics and pharmacodynamics. Post-treatment, a 2-fold increase in vitreous humor (VH) FI was observed, correlating with a reduction in FB breakdown product Ba but minimal changes in other complement factors. The aqueous humor (AH) was found to be an unreliable proxy for VH in assessing complement activation. In vitro assays showed that the increase in FI had a minor effect on the complement amplification loop compared to the more potent C3 inhibitor pegcetacoplan. These findings suggest that PPY988 may not provide enough FI protein to effectively modulate complement activation and slow GA progression, highlighting the need for thorough biomarker review to determine optimal dosing in future studies.

      Strengths:

      This manuscript provides critical data on the efficacy of gene therapy for the eye, specifically introducing complement FI expression. It presents the results from a halted clinical trial, making the publication of this data essential for understanding the outcomes of this gene therapy approach. The findings offer valuable insights and lessons for future gene therapy attempts in similar contexts.

      Weaknesses:

      No particular weaknesses. The study was carefully performed and limitations are discussed.

      I have just some concerns about the methodology used. The authors use the MILLIPLEX assays, which allow for multiplexed detection of complement proteins and they mention extensive validation. How are the measurements with this assay correlating with gold standard methods? Is the specificity and the expected normal ranges preserved with this assay? This also stands for the Olink assay. Some of the proteins are measured by both assay and/or by standard ELISA. How do these measurements correlate?

      Comments on revisions:

      The authors answered part of my comments. Only one remained - please provide a comparison between ELISA/Multiplex and Olink data to judge the robustness of the Olinkl assay for complement.

    3. Reviewer #2 (Public review):

      Summary:

      The results presented demonstrate AAV2-CFI gene therapy delivers long-term and marginally higher FI protein in vitreous humor that results in a concomitant reduction in the FB activation product Ba. However, the lack of clinical efficacy in the phase I/II study, possibly due to lower in vitro potency when compared to currently approved pegcetacoplan, raise important considerations for the utility of this therapeutic approach. Despite the early termination of the PPY988 clinical development program, the study achieved significant milestones, including the implementation of subretinal gene therapy delivery in older adults, complement biomarker comparison between serial vitreous humor and aqueous humor samples and vitreous humor proteomic assessment via Olink.

      Strengths:

      Long-term augmentation of FI protein in vitreous humor over 96-weeks and reduction of FB breakdown product Ba in vitreous humor suggests modulation of the complement system. Developed a novel in vitro assay suggesting FI's ability to reduce C3 convertase activity is weaker than pegcetacoplan and FH and may suggest a higher dose of FI will be required for clinical efficacy. Warn of the poor correlation between vitreous humor and aqueous humor biomarkers and suggest aqueous humor may not be a reliable proxy for vitreous humor with regard to complement activation/inhibition studies.

      Weaknesses:

      The vitrectomy required for subretinal route of administration causes long-term loss of total protein and may influence interpretation of complement biomarker results even with normalization. The modified in vitro assay of complement activation suggests a several hundred-fold increase in FI protein is required to significantly affect C3a levels. Interestingly, the in vitro assay demonstrates 100% inhibition of C3a with pegcetacoplan and FH therapeutics, but only a 50% reduction with FI even at the highest concentrations tested. This observation suggests FI may not be rate-limiting for negative complement regulation under the in vitro conditions tested and potentially in the eye. It is unclear if pharmacokinetic and pharmacodynamic properties in aqueous humor and vitreous humor compartments are a reliable predictor of FI level/activity after subretinal delivery AAV2-CFI gene therapy.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Hallam et al describes the analysis of various biomarkers in patients undergoing complement factor I supplementation treatment (PPY988 gene therapy) as part of the FOCUS Phase I/II clinical trial. The authors used validated methods (multiplexed assays and OLINK proteomics) for measuring multiple soluble complement proteins in the aqueous humour (AH) and vitreous humour (VH) of 28 patients over a series of timepoints, up to and including 96 weeks. Based on biomarker comparisons, the levels of FI synthesised by PPY988 were believed to be insufficient to achieve the desired level of complement inhibition. Subsequent comparative experiments showed that PPY988-delievred FI was much less efficacious than Pegceptacoplan (FDA approved complement inhibitor under the name SYFORVE) when tested in an artificial VH matrix.

      Strengths:

      The manuscript is well written with data clearly presented and appropriate statistics used for the analysis itself. It's great to see data from real clinical samples that can help support future studies and therapeutic design. The identification that complement biomarker levels present in the AH do not represent the levels found in the VH is an important finding for the field, given the number of complement-targeting therapies in development and the desperate need for good biomarkers for target engagement. This study also provides a wealth of baseline complement protein measurements in both human AH and VH (and companion measurements in plasma) that will prove useful for future studies.

      Weaknesses:

      No real weaknesses in the manuscript itself. It is only a shame that it would appear that FI supplementation is not a viable way forward for treating GA secondary to AMD.

      Comments on revisions:

      I think the authors have done all that they can to present this study in the most robust manner possible.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study analyzed biomarker data from 28 subjects with geographic atrophy (GA) in a Phase I/II clinical trial of PPY988, a subretinal AAV2 complement factor I (CFI) gene therapy, to evaluate pharmacokinetics and pharmacodynamics. Post-treatment, a 2-fold increase in the vitreous humor (VH) FI was observed, correlating with a reduction in FB breakdown product Ba but minimal changes in other complement factors. The aqueous humor (AH) was found to be an unreliable proxy for VH in assessing complement activation. In vitro assays showed that the increase in FI had a minor effect on the complement amplification loop compared to the more potent C3 inhibitor pegcetacoplan. These findings suggest that PPY988 may not provide enough FI protein to effectively modulate complement activation and slow GA progression, highlighting the need for a thorough biomarker review to determine optimal dosing in future studies.

      Strengths:

      This manuscript provides critical data on the efficacy of gene therapy for the eye, specifically introducing complement FI expression. It presents the results from a halted clinical trial, making sharing this data essential for understanding the outcomes of this gene therapy approach. The findings offer valuable insights and lessons for future gene therapy attempts in similar contexts.

      Weaknesses:

      No particular weaknesses. The study was carefully performed and limitations are discussed.

      I have just some concerns about the methodology used. The authors use the MILLIPLEX assays, which allow for multiplexed detection of complement proteins and they mention extensive validation. How are the measurements with this assay correlating with gold standard methods? Is the specificity and the expected normal ranges preserved with this assay? This also stands for the Olink assay. Some of the proteins are measured by both assay and/or by standard ELISA. How do these measurements correlate?

      The authors thank the reviewer for the positive response. Regarding the ELISA assays used to measure the array of complement proteins described, these were extensively validated for the following parameters: specificity, intra-assay and inter-assay precision, accuracy, stability, reference range, and parallelism. All assays were validated in plasma, vitreous and aqueous humour. Due to the limited volume and availability of ocular fluids from individuals in the study, validation in vitreous and aqueous matrices was performed using a pool of several samples from post-mortem donors. At the time this study was initiated, the Millipore Luminex complement panels and the Quidel C3a and Ba EIA were the most sensitive assays and the only commercially available options capable of measuring the proteins of interest in the context of limited vitreous and aqueous humor sample. The concentrations measured were observed at similar ranges as those published in the literature using assays in distinct patient populations e.g. in (Mandava et al, Invest Ophthalmol Vis Sci, 2020).

      Measurements from vitreous and aqueous from subject samples were deemed reportable if they were within the quantifiable ranges defined for these sample types during the validation (coefficient of variation of 20%, or 30% when results were below the lower limit of quantification but above limit of detection). Notably, given the limited amount of biomarker data due to small sample size, we share results from outlier biomarker measurements, to illustrate the heterogeneity in sample quality. We further publish plasma sample biomarker results in supplemental table 5 wherein complement protein concentrations can be observed and compared to normal ranges in the literature.

      Adding confidence to the robustness of our assays was the observation that some of the complement proteins quantified by standard assay (e.g. plate and bead-based ELISAs) were also measured by the OLINK assay, and there was a general trend observed for positive correlation between results from both assays for FI levels post-treatment. However, we did not provide detailed correlative statistical analyses for further complement proteins as OLINK findings were deemed highly exploratory and hypothesis generating, and because the OLINK assay produced normalised results which are challenging to directly compare to ELISA results that were absolute.

      Reviewer #2 (Public Review):

      Summary:

      The results presented demonstrate that AAV2-CFI gene therapy delivers long-term and marginally higher FI protein in vitreous humor that results in a concomitant reduction in the FB activation product Ba. However, the lack of clinical efficacy in the phase I/II study, possibly due to lower in vitro potency when compared to currently approved pegcetacoplan, raises important considerations for the utility of this therapeutic approach. Despite the early termination of the PPY988 clinical development program, the study achieved significant milestones, including the implementation of subretinal gene therapy delivery in older adults, complement biomarker comparison between serial vitreous humor and aqueous humor samples and vitreous humor proteomic assessment via Olink.

      Strengths:

      Long-term augmentation of FI protein in vitreous humor over 96 weeks and reduction of FB breakdown product Ba in vitreous humor suggests modulation of the complement system. Developed a novel in vitro assay suggesting FI's ability to reduce C3 convertase activity is weaker than pegcetacoplan and FH and may suggest a higher dose of FI will be required for clinical efficacy. Warn of the poor correlation between vitreous humor and aqueous humor biomarkers and suggest aqueous humor may not be a reliable proxy for vitreous humor with regard to complement activation/inhibition studies.

      Weaknesses:

      The vitrectomy required for the subretinal route of administration causes a long-term loss of total protein and may influence the interpretation of complement biomarker results even with normalization. The modified in vitro assay of complement activation suggests a several hundred-fold increase in FI protein is required to significantly affect C3a levels. Interestingly, the in vitro assay demonstrates 100% inhibition of C3a with pegcetacoplan and FH therapeutics, but only a 50% reduction with FI even at the highest concentrations tested. This observation suggests FI may not be rate-limiting for negative complement regulation under the in vitro conditions tested and potentially in the eye. It is unclear if pharmacokinetic and pharmacodynamic properties in aqueous humor and vitreous humor compartments are reliable predictors of FI level/activity after subretinal delivery AAV2-CFI gene therapy.

      The authors thank the reviewer for the positive response and we agree that a limitation of the biomarker strategy for ocular gene therapy delivered to the retinal tissues is inferring PK/PD from vitreous and aqueous samples, which are the fluid sample compartments accessible from subjects available to measure molecular treatment response. We agree that these compartments may not accurately represent sub-retinal and tissue level complement turnover. In the discussion, line 508, we state: ‘Overall, the data suggests that fully functional FI is being secreted into the VH, but the regulatory effects on the level of Ba may be representative of convertase formation in the VH and not the macula retina/RPE nor the choroid. To validate this hypothesis, one approach would be to conduct vitreal sampling using an effective drug targeting C3 for GA in a larger cohort’.

      However, the observation of elevation of FI in VH (and AH) post treatment, and changes in levels of downstream complement proteins that align with prior knowledge of control of alternative pathway activation, is compelling evidence that these measurements reflect modest but direct consequences of an FI-gene therapy that was delivered to the subretinal space. We add to the discussion, line 479: ‘the findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment.’

      Furthermore, the observation of moderately raised FI levels in modelled VH post treatment being insufficient to control CS activation in vitro accords with the lack of clinical response observed at phase II. We note that measuring FI and complement biomarkers in retinal tissues from treated eyes at post-mortem would be one way to explore the PK/PD effects from AAV2-FI gene therapy.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Hallam et al describes the analysis of various biomarkers in patients undergoing complement factor I supplementation treatment (PPY988 gene therapy) as part of the FOCUS Phase I/II clinical trial. The authors used validated methods (multiplexed assays and OLINK proteomics) for measuring multiple soluble complement proteins in the aqueous humour (AH) and vitreous humour (VH) of 28 patients over a series of time points, up to and including 96 weeks. Based on biomarker comparisons, the levels of FI synthesised by PPY988 were believed to be insufficient to achieve the desired level of complement inhibition. Subsequent comparative experiments showed that PPY988-delivered FI was much less efficacious than Pegceptacoplan (FDA-approved complement inhibitor under the name SYFORVE) when tested in an artificial VH matrix.

      Strengths:

      The manuscript is well written with data clearly presented and appropriate statistics used for the analysis itself. It's great to see data from real clinical samples that can help support future studies and therapeutic design. The identification that complement biomarker levels present in the AH do not represent the levels found in the VH is an important finding for the field, given the number of complement-targeting therapies in development and the desperate need for good biomarkers for target engagement. This study also provides a wealth of baseline complement protein measurements in both human AH and VH (and companion measurements in plasma) that will prove useful for future studies.

      Weaknesses:

      Perhaps the conclusions drawn regarding the lack of observed efficacy are not fully justified. The authors focus on the hypothesis that not enough FI was synthesised in these patients receiving the PPY988 gene therapy, suggesting a delivery/transduction/expression issue. But beyond rare CFI genetic variants, most genetic associations with AMD imply that it is a FI-cofactor disease. A hypothesis supported by the authors' own experiments when they supplement their artificial VH matrix with FH and achieve a significantly greater breakdown of C3b than achieved with PPY988 treatment alone. Justification around doubling FI levels driving complement turnover refers to studies conducted in blood, which has an entirely different complement protein profile than VH. In Supplemental Table 5 we see there is approx. 10-fold more FH than FI (533ug/ml vs 50ug/ml respectively) so increasing FI levels will have a direct effect. Yet in Supplemental Table 3 we see there is more FI than FH in VH (608ng/ml vs 466ng/ml respectively). Therefore, adding more FI without more co-factors would have a very limited effect. Surely this demonstrates that the study was delivering the wrong payload, i.e. FI, which hit a natural ceiling of endogenous co-factors within the eye?

      See response to reviewer 3’s review after reviewer 3 recommendations section below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The authors present strong evidence using validated complement biomarker assays and comprehensive proteomic profiling that support their findings. The presentation of complement biomarker data in vitreous humor and aqueous humor after FI augmentation is presented in a clear and concise format. The direct comparison of complement biomarkers in vitreous humor and aqueous humor from the same patients and demonstrating similarities and differences is important for the nascent complement gene therapy field. Developing a novel in vitro complement model and comparing pegcetacoplan, FH, and FI inhibitors provides the field with a valuable assay to benchmark other complement therapeutics. As currently designed, the in vitro assay supports why FI augmentation did not contribute to clinical success. It also suggests that non-physiological concentrations of FI protein (over 100 µg/mL) maximally inhibit C3a signal by ~50%, whereas both pegcetacoplan and FH reduce the signal by 100%. Does this suggest that CFI is not an appropriate therapeutic target to control complement overactivation in the eye?

      We agree with the reviewer that the new data from the novel in vitro assay coupled with the clinical findings from the phase II gene therapy trial does now suggest FI is less attractive as a therapeutic target for controlling complement activation in the retinal tissues of subjects with Geographic Atrophy.

      Reviewer #3 (Recommendations For The Authors):

      I think the authors have done a great job collecting and analysing these clinical samples and elucidating the baseline complement protein profile in both the AH and VH. I only have minimal suggested changes.

      Perhaps a more direct discussion around the limitations of adding more FI into environments where there is no excess of FI-cofactors present? And a discussion around the limitations of VH (and VA for that matter) biomarker sampling for a disease that primarily affects the neurosensory retina and outer blood/retinal barrier: perhaps the landscape of complement proteins is different yet again (although, admittedly, impossible to sample in a patient)? Finally, would it not have been better to perform complement activation experiments using the VH of treated patients directly rather than creating an artificial VH matrix (there may, or may not, be a couple of things in human VH that directly affect complement turnover...)?

      We thank the reviewer for the supportive comments. This study is the first to describe FI and FH levels and respective ratios in vitreous humour (plus aqueous and plasma) from GA subjects, before and after sub-retinal gene therapy. It is compelling to observe that in the VH the levels of FI are greater than FH, the primary fluid phase co-factor for FI enzymatic activity. This new information does indeed argue against further FI supplementation (using gene therapy) being of added benefit to controlling the complement system in the broader population in individuals with Geographic Atrophy. We note that at the start of the clinical development of GT005/PPY988 AAV2-FI gene therapy, there was limited information on FI and FH levels in AMD in ocular fluids to inform the pharmacodynamics of complement activation. Now, by running the FOCUS phase I clinical trial and measuring the complement biomarker data using validated assays we have added to our understanding on the levels and ratio of FI to FH and other complement proteins in a larger number of GA subjects’ ocular samples.  We report the levels of complement proteins measured in ocular and systemic samples, to show the ranges and also the differences in ratios between the different matrices.   

      Regarding the statement that FI supplementation could likely be ineffective due to limited FH cofactor; FH is not the only co-factor that FI may partner with at cell surfaces to become enzymatically active (others include MCP (CD46) and CR1 (CD35), although the latter is known to be of limited expression in the eye), as such, it is certainly true that other proteins may be present in the tissue altering the kinetics of FI’s activity after sub-retinal gene-therapy. In addition, the ratio between FI and FH detected in the VH may not be the same as in retinal tissue. As such, we agree that drawing insights from biomarkers in the VH may not fully reflect the disease processes and treatment response at the retinal cell layers, but it is the closest fluid sample available to sample tissue released soluble proteins. We acknowledge that VH biomarkers will not fully capture retinal disease processes and treatment responses, but due to their proximity, will reflect retina-released soluble proteins. The findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment. We agree modelling different inhibitor effects on complement activation directly using subject’s vitreous would be informative, but this was not possible due to the limitations of very small sample volume.

      We add several sentences to the discussion regarding the points above. Line 473: ‘Notably, that FI does not reduce C3a breakdown to baseline even at supermolecular concentrations suggests cofactor limitation that might be more pronounced in VH given FH is not in excess of FI as is the case in blood 27. Moreover, there are additional cell-bound cofactors for FI that may be present in retinal tissue that are not present in the VH and could further alter the kinetics of the assay, such as MCP (CD46) albeit with disease related changes observed 37. However, the findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment.’

      Minor comments:

      Line 237: Missing parenthesis at the end of the sentence

      Manuscript updated.

      Line 435: Missing secondary parenthesis after .....Figure 3A)......

      Manuscript updated.

      Line 536: I don't think suggesting the addition of FHR proteins into the neurosensory retina/VH is such a good idea

      The reference to FHRs has been clarified in the manuscript, line 558. The authors note that FHR dimerization domains have been engineered to dimerize Factor H constructs increasing half-life and potency for drugs currently in development.

    1. eLife Assessment

      In this article, Cheng et al present an important finding that advances the understanding of mitochondrial stress response(s). The authors employed mass spectrometry-based methods in conjunction with standard molecular and cellular biology techniques to provide compelling evidence that phosphatidylethanolamine-binding protein 1 (PEBP1) acts as a pivotal regulator of the mitochondrial component of integrated stress response. Notwithstanding that this discovery is likely to be of significant interest to researchers across a broad spectrum of disciplines ranging from cell biology to neuroscience, it was thought that further mechanistic dissection of the role of PEBP1 in modulating integrated stress response may further strengthen this study.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors use thermal proteome profiling to capture changes in protein stability following a brief (30 min) treatment of cells with various mitochondrial stressors. This approach identified PEBP1 as a potentiator of Integrated Stress Response (ISR) induction by various mitochondrial stressors, although the specific dynamics vary by stressor. PEBP1 deletion attenuates DELE1-HRI-mediated activation of the ISR, independent of its known role in the RAF/MEK/ERK pathway. These effects can be bypassed by HRI overexpression and do not affect DELE1 processing. Interestingly, in cells, PEBP1 physically interacts with eIF2alpha, but not its phosphorylated form (eIF2alpha-P), leading the authors to suggest that PEBP1 functions as a scaffold to promote eIF2alpha phosphorylation by HRI.

      Strengths:

      The authors present a clear and well-structured study, beginning with an original and unbiased approach that effectively addresses a novel question. The investigation of PEBP1 as a specific regulator of the DELE1-HRI signaling axis is particularly compelling, supported by extensive data from both genetic and pharmacological manipulations. Including careful titrations, time-course experiments, and orthogonal approaches strengthens the robustness of their findings and bolsters their central claims.

      Moreover, the authors skillfully integrate publicly available datasets with their original experiments, reinforcing their conclusions' generality and broader relevance. This comprehensive combination of methodologies underscores the reliability and significance of the study's contributions to our understanding of stress signaling.

      Weaknesses:

      While the study presents exciting findings, there are a few areas that could benefit from further exploration. The HRI-DELE1 pathway was only recently discovered, leaving many unanswered questions. The observation that PEBP1 interacts with eIF2alpha, but not with its phosphorylated form, suggests a novel mechanism for regulating the Integrated Stress Response (ISR). However, as they note themselves, the authors do not delve into the biochemical or molecular mechanisms through which PEBP1 promotes HRI signaling. Given the availability of antibodies against phosphorylated HRI, it would have been interesting to explore whether PEBP1 influences HRI phosphorylation. Furthermore, since the authors already have recombinant PEBP1 protein (as shown in Figure 1D), additional in vitro experiments such as in vitro immunoprecipitation, FRET, or surface plasmon resonance (SPR) could have confirmed the interaction with eIF2alpha. Future studies might investigate whether PEBP1 directly interacts with HRI, stimulates its auto-phosphorylation or kinase activity, or serves as a template for oligomerization, potentially supported by structural characterization of the complex and mutational validation.

      Another point of weakness is the unclear significance of the 1.5-2x enhanced interaction with eIF2alpha upon PEBP1 phosphorylation, as there is little evidence to show that this increase has any downstream effects. The ATF4-luciferase reporter experiments, comparing WT and S153D overexpression, may have reached saturation with WT, making it difficult to detect further stimulation by S153D. Additionally, expression levels for WT and mutant forms are not provided, making it challenging to interpret the results. It would also be interesting to explore whether combined mitochondrial stress and PMA treatment further enhance the ISR.

      Lastly, while the authors claim that oligomycin does not significantly alter the melting temperature of recombinant PEBP1 in vitro, the data in Figure S1D suggest a small shift. Without variance measures across replicates or background subtraction, this claim is less convincing. The inclusion of statistical analyses would strengthen the interpretation of these results.

      Impact on the field:

      The study's relevance is underscored by the fact that overactive ISR is linked to a broad range of neurodegenerative diseases and cognitive disorders, a field actively being explored for therapeutic interventions, with several drugs currently in clinical trials. Similarly, mitochondrial dysfunction plays a well-established role in brain health and other diseases. Identifying new targets within these pathways, like PEBP1, could provide alternative therapeutic strategies for treating such conditions. Therefore, gaining a deeper understanding of the mechanisms through which PEBP1 influences ISR regulation is highly pertinent and could have far-reaching implications for the development of future therapies.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Cheng et al use the TPP/MS-CETSA strategy to discover new components for the mitochondria arm of the Integrated Stress Response. By using short exposures of several drugs that potentially induce mitochondrial stress, they find significant CETSA shifts for the scaffold protein PEBP1 both for antimycinA and oligomycin, making PEBP1 a candidate for mitochondrial-induced ISR signaling. After extensive follow-up work, they provide good support that PEBP1 is likely involved in ISR, and possibly act through an interaction with the key ISR effector node EIF2a.

      Strengths:

      The work adds an important understanding of ISR signaling where PEBP1 might also constitute a druggable node to attenuate cellular stress. Although CETSA has great potential for dissecting cellular pathways, there are few studies where this has been explored, particularly with such an extensive follow-up, also giving the work methodological implications. Together I therefore think this study could have a significant impact.

      Weaknesses:

      The TPP/MS-CETSA experiment is quite briefly described and might have a too relaxed cut-off. The assays confirming interactions between PEBP1 and EIF2a might not be fully conclusive.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Chang and Meliala et al. demonstrate that PEBP1 is a modulator of the ISR, specifically through the induction of mitochondrial stress. The authors utilize thermal proteome profiling (TPP) by which they identify PEPB1 as a thermally stabilized protein upon oligomycin treatment, indicating its role in mitochondrial stress. Moreover, RNA-sequencing analysis indicated that PEBP1 may be specifically modulating the mitochondrial stress-induced ISR, as PEBP1 knock-out reduces phosphorylation of eIF2α. They also show that PEBP1 function is independent of ER stress specifically tunicamycin treatment and loss of PEBP1 does affect mitochondrial ISR but in an OMA1, DELE1 independent manner. Thus, the authors hypothesized that PEBP1 interacts directly with eIF2α, functioning as a scaffolding protein. However, direct co-immunoprecipitation failed to demonstrate PEBP1 and eIF2α potential interaction. The authors then used a NanoBiT luminescence complementation assay to show the PEBP1-eIF2a interaction and its disruption by S51 phosphorylation.

      Strengths:

      Taken together, this work is novel, and the data presented suggests PEBP1 has a role as a modulator of the mitochondrial ISR, enhancing the signal to elicit the necessary response.

      Weaknesses:

      The one major issue of this work is the lack of a mechanism showing precisely how PEBP1 amplifies the mitochondrial integrated stress response. The work, as it is described, presents data suggesting PEBP1's role in the ISR but fails to present a more conclusive mechanism.

    5. Author response:

      We thank all the reviewers for their insightful comments to help further improving this work.

      Response to Reviewer #1:

      We greatly appreciate your comments on the general reliability and significance of our work. We fully agree that it would have been ideal to have additional evidence related to the role of PEBP1 in HRI activation. Unfortunately, we have not been able to find phospho-HRI antibodies that work reliably. The literature seems to agree with this as a band shift using total-HRI antibodies is usually used to study HRI activation. However, with the cell lines showing the most robust effect with PEBP1 knockout or knockdown, we are yet to convince ourselves with the band shifts we see. This could be addressed by optimizing phos-tag gels although these gels can be a bit tricky with complex samples such as cell lysates which contain many phosphoproteins.

      To address the interaction between PEBP1 and eIF2alpha more rigorously we were inspired by the insights you and reviewer #2 provided. We now think we might be able to do this with either using the purified proteins and/or CETSA WB. These experiments could also provide further evidence for the role of PEBP1 phosphorylation. Although phosphorylation of PEBP1 at S153 has been implicated as being important for other functions of PEBP1, we are not sure about its role here. It may indeed have little relevance for ISR signalling. The CETSA WB assay could also provide further insight into the in vitro stability changes of PEBP1 in response to oligomycin.

      For the currently shown in vitro thermal shift assay, we have performed two independent experiments. While it appears that there is a slight destabilization of PEBP1 by oligomycin, the ultimate conclusion of this experiment remains incomplete as there could be alternative explanations despite the apparent simplicity of the assay due the fluorescence background by oligomycin only. We now provide a lysate based CETSA analysis which does not display the same PEBP1 stabilization as the intact cell experiment. As for the signal saturation in ATF4-luciferase reporter assay, this is a valid point.

      Response to Reviewer #2:

      We strongly agree that CETSA has a lot of potential to inform us about cellular state changes and this was indeed the starting point for this project. We apologize for being (too) brief with the explanations of the TPP/MS-CETSA approach and we have now added a bit more detail. With regard to the cut-offs used for the mass spectrometry analysis, you are absolutely right that we did not establish a stringent cut-off that would show the specificity of each drug treatment. Our take on the data was that using the p values (and ignoring the fold-changes) of individual protein changes as in Fig 1D, we can see that mitochondrial perturbations display a coordinated response. We now realize that the downside of this representation is that it obscures the largest and specific drug effects. As mentioned in the response to Reviewer #1, we now also think that it would be possible to obtain more evidence for the potential interaction between PEBP1 and eIF2alpha using CETSA-based assays.

      Response to Reviewer #3:

      Thank you for your assessment, we agree that this manuscript would have been made much stronger by having clearer mechanistic insights. As mentioned in the responses to other reviewers above, we aim to address this limitation in part by looking at the putative interaction between PEBP1 and eIF2alpha with orthogonal approaches. However, we do realize that analysis of protein-protein interactions can be notoriously challenging due to false negative and false positive findings. As with any scientific endeavor, we will keep in mind alternative explanations to the observations, which could eventually provide that cohesive model explaining how precisely PEBP1, directly or indirectly, influences ISR signalling.

    1. eLife Assessment

      In this valuable study, the authors propose a model wherein the bacterial redox state plays a crucial role in the differentiation of Chlamydia trachomatis into elementary and reticulate bodies. They provide solid evidence to argue that a highly oxidising environment favours the formation of elementary bodies while a reducing condition slows down development. Overall, the study convincingly demonstrates that Chlamydial redox states play a role in differentiation, an observation that may have implications for the study of other bacterial systems.

    2. Reviewer #1 (Public review):

      Summary:

      Chlamydia spp. has a biphasic developmental cycle consisting of an extracellular, infectious form called an elementary body (EB) and an intracellular, replicative form known as a reticular body (RB). The structural stability of EBs is maintained by extensive cross linking of outer membrane proteins while the outer membrane proteins of RBs are in a reduced state. The overall redox state of EBs is more oxidized than RBs. The authors propose that redox state may be a controlling factor in the developmental cycle. To test this, alkyl hydroperoxide reductase subunit C (ahpC) was overexpressed or knocked down to examine effects on developmental gene expression. KD of ahpC induced increased expression of EB-specific genes and accelerated EB production. Conversely, overexpression of phpC delayed differentiation to EBs. The results suggest that chlamydial redox state may play a role in differentiation.

      Strengths:

      Uses modern genetic tools to explore the difficult area of temporal gene expression throughout the chlamydial developmental cycle.

      Weaknesses:

      The environmental signals triggering ahpC expression/activity are not determined.

      Comments on revisions:

      I am satisfied with the modifications made to the manuscript.

    3. Reviewer #2 (Public review):

      The factors that influence the differentiation of EBs and RBs during Chlamydial development are not clearly understood. A previous study had shown a redox oscillation during the Chlamydial developmental cycle. Based on this observation, the authors hypothesize that the bacterial redox state may play a role in regulating the differentiation in Chlamydia. To test their hypothesis, they make knock-down and overexpression strains of the major ROS regulator, ahpC. They show that the knock-down of ahpC leads to a significant increase in ROS levels leading to an increase in the production of elementary bodies and overexpression leads to a decrease in EB production likely caused by a decrease in oxidation. From their observations, they present an interesting model wherein an increase in oxidation favors the production of EBs.

      Comments on revisions:

      Major concerns have been satisfactorily addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chlamydia spp. has a biphasic developmental cycle consisting of an extracellular, infectious form called an elementary body (EB) and an intracellular, replicative form known as a reticular body (RB). The structural stability of EBs is maintained by extensive cross-linking of outer membrane proteins while the outer membrane proteins of RBs are in a reduced state. The overall redox state of EBs is more oxidized than RBs. The authors propose that the redox state may be a controlling factor in the developmental cycle. To test this, alkyl hydroperoxide reductase subunit C (ahpC) was overexpressed or knocked down to examine effects on developmental gene expression. KD of ahpC induced increased expression of EB-specific genes and accelerated EB production. Conversely, overexpression of ahpC delayed differentiation to EBs. The results suggest that chlamydial redox state may play a role in differentiation.

      Strengths:

      Uses modern genetic tools to explore the difficult area of temporal gene expression throughout the chlamydial developmental cycle.

      Weaknesses:

      The environmental signals triggering ahpC expression/activity are not determined.

      Thank you for your comments. Our data and those of others have shown that ahpC is expressed as a mid-developmental cycle gene (i.e., when RBs predominate in the population). This is true of most chlamydial genes, and the factors that determine developmental expression are not fully understood. As we noted in the Discussion, Chlamydia lacks AhpF/D orthologs, so it is not clear how AhpC activity is regulated. Related to determining environmental signals that trigger activity of AhpC, then this is a non-trivial issue in an obligate intracellular bacterium. Our assumption is that AhpC is constitutively active and that the increasing metabolic production of ROS eventually overcomes the innate (and stochastic) activity of AhpC to handle it, hence the threshold hypothesis. Importantly, the stochasticity is consistent with what we know about secondary differentiation in Chlamydia. We have tried to clarify these points in the Discussion.

      Reviewer #2 (Public Review):

      The factors that influence the differentiation of EBs and RBs during Chlamydial development are not clearly understood. A previous study had shown a redox oscillation during the Chlamydial developmental cycle. Based on this observation, the authors hypothesize that the bacterial redox state may play a role in regulating the differentiation in Chlamydia. To test their hypothesis, they make knock-down and overexpression strains of the major ROS regulator, ahpC. They show that the knock-down of ahpC leads to a significant increase in ROS levels leading to an increase in the production of elementary bodies and overexpression leads to a decrease in EB production likely caused by a decrease in oxidation. From their observations, they present an interesting model wherein an increase in oxidation favors the production of EBs.

      Major concern:

      In the absence of proper redox potential measurements, it is not clear if what they observe is a general oxidative stress response, especially when the knock-down of ahpC leads to a significant increase in ROS levels. Direct redox potential measurement in the ahpC overexpression and knock-down cells is required to support the model. This can be done using the roGFP-based measurements mentioned in the Wang et al. 2014 study cited by the authors.

      Thank you for this suggestion. It is definitely something that we are looking to implement. However, our current vectors don’t allow for roGFP2 in combination with inducible expression of a gene of interest. We will need to completely redesign our vectors, and, in Chlamydia, the validation of such new vectors together with ahpC OE or KD may literally take a year or longer.

      In lieu of this, we used the CellRox redox reactive dye to image live chlamydiae during normal growth or ahpC KD. During ahpC KD, these organisms are clearly much brighter than the control, uninduced conditions. These data are included as new Figure 5 to go along with the data we previously reported from the plate reader measurements. These data also clearly indicate that the readings we observed are from Chlamydia and not the host cell.

      As far as a general oxidative stress response, Chlamydia lacks any transcriptional regulators akin to OxyR. The response we’ve measured, earlier expression of genes associated with secondary differentiation, would be an odd stress response not consistent with a focused program to respond to oxidative stress. We added new RNAseq data further showing this effect of a global earlier increase in late gene transcripts.

      Reviewer #3 (Public Review):

      Summary:

      The study reports clearly on the role of the AhpC protein as an antioxidant factor in Chlamydia trachomatis and speculates on the role of AhpC as an indirect regulator of developmental transcription induced by redox stress in this differentiating obligate intracellular bacterium.

      Strengths:

      The question posed and the concluding model about redox-dependent differentiation in chlamydia is interesting and highly relevant. This work fits with other propositions in which redox changes have been reported during bacterial developmental cycles, potentially as triggers, but have not been cited (examples PMID: 2865432, PMID: 32090198, PMID: 26063575). Here, AhpC over-expression is shown to protect Chlamydia towards redox stress imposed by H2O2, CHP, TBHP, and PN, while CRISPRi-mediated depletion of AhpC curbed intracellular replication and resulted in increased ROS levels and sensitivity to oxidizing agents. Importantly, the addition of ROS scavengers mitigated the growth defect caused by AhpC depletion. These results clearly establish the role of AhpC affects the redox state and growth in Ct (with the complicated KO genetics and complementation that are very nicely done).

      Weaknesses:

      However, with respect to the most important implication and claims of this work, the role of redox in controlling the chlamydial developmental cycle rather than simply being a correlation/passenger effect, I am less convinced about the impact of this work. First, the study is largely observational and does not resolve how this redox control of the cell cycle could be achieved, whereas in the case of _Caulobacte_r, a clear molecular link between DNA replication and redox has been proposed. How would progressive oxidation in RBs eventually trigger the secondary developmental genes to induce EB differentiation? Is there an OxyR homolog that could elicit this change and why would the oxidation stress in RBs gradually accumulate during growth despite the presence of AhpC? In other words, the role of AhpC is simply to delay or dampen the redox stress response until the trigger kicks in, again, what is the trigger? Is this caused by increasing oxidative respiration of RBs in the inclusion? But what determines the redox threshold?

      Thank you for your comments. As the reviewer notes, our work clearly demonstrates that AhpC acts as an antioxidant in Chlamydia trachomatis. Further, we have shown that transcription of the late cycle genes is altered upon altered activity of AhpC, which acts as a proof of concept that redox is (one of) the key factor(s) controlling developmental cycle progression in Chlamydia. Our new RNAseq data indicate that a broad swath of well characterized late genes is activated, which contradicts the argument that what we’ve measured is a stress response (unless activation of late genes in Chlamydia is generally a stress response (not the case based on other models of stress) – in which case we would not be able to differentiate between these effects). We hypothesize that ROS production from the metabolic activities of RBs serves as a signal to trigger secondary differentiation from RBs to EBs. How this exact threshold is determined is currently unknown as Chlamydia does not have any annotated homolog for OxyR. It is beyond the scope of the present manuscript to identify and then characterize what specific factor(s) control(s) this response. We fully anticipate that multiple factors are likely impacted by increasing oxidation, so dissecting the exact contributions of any one factor will represent (a) potential separate manuscript(s). Nonetheless, this remains an overarching goal of the lab yet remains challenging given the obligate intracellular nature of Chlamydia. Strategies that would work in a model system, like Caulobacter, that can be grown in axenic media are not easily implemented in Chlamydia.

      As we noted above in another response, ahpC is transcribed as a mid-cycle gene with a peak of transcription corresponding to the RB phase of growth. We hypothesize that the gradual accumulation of ROS from metabolic activity will eventually supercede the ability of AhpC to detoxify it. This would result in any given RB asynchronously and stochastically passing this threshold (and triggering EB formation), which is consistent with what we know about secondary differentiation in Chlamydia.

      I also find the experiment with Pen treatment to have little predictive power. The fact that transcription just proceeds when division is blocked is not unprecedented. This also happens during the Caulobacter cell cycle when FtsZ is depleted for most developmental genes, except for those that are activated upon completion of the asymmetric cell division and that is dependent on the completion of compartmentalization. This is a smaller subset of developmental genes in caulobacter, but if there is a similar subset that depends on division on chlamydia and if these are affected by redox as well, then the argument about the interplay between developmental transcription and redox becomes much stronger and the link more intriguing. Another possibility to strengthen the study is to show that redox-regulated genes are under the direct control of chlamydial developmental regulators such as Euo, HctA, or others and at least show dual regulation by these inputs -perhaps the feed occurs through the same path.

      Comparisons to other model systems are generally of limited value with Chlamydia. All chlamydial cell division genes are mid-cycle (RB-specific) genes, just like ahpC. There is no evidence of a redox-responsive transcription factor (whether EUO, HctA, or another) that activates or represses a subset of genes in Chlamydia. Similarly, there is no evidence that redox directly and specifically impacts transcription of cell division genes based on our new RNAseq data. The types of experiments suggested are not easily implemented in Chlamydia, but we would certainly like to be able to do them.

      As it pertains to penicillin specifically, we and others have shown that treating chlamydiae with Pen blocks secondary differentiation (meaning late genes are not transcribed). Effectively, Pen treatment freezes the organism in an RB state with continued transcription of RB genes. What we have shown is that, even during Pen treatment (which blocks late gene transcription), ahpC KD can overcome this block, which shows that elevated oxidation is able to trigger late gene expression even when the organisms are phenotypically blocked from progressing to EBs. The comparison from our perspective to Caulobacter is of limited value.

      This redox-transcription shortcoming is also reflected in the discussion where most are about the effects and molecular mitigation of redox stress in various systems, but there is little discussion on its link with developmental transcription in bacteria in general and chlamydia.

      We have edited the Discussion to include a broader description of the results and included additional citations as suggested by the reviewer (PMID: 32090198, PMID: 26063575). However, we found one suggested article (PMID: 2865432) is not relevant to our study, so we didn’t cite it in our present manuscript. There may have been a typo, so feel free to provide us the correct PMID that can be cited.

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 146. A minor point, but inclusion-forming units directly measure infectious EBs. In some cases, the particle-to-infectivity ratio approaches unity. I don't believe IFUs are a "proxy".

      Following reviewers comment we have modified the statement.

      (2) Figure 2E. Results are normalized to uninduced. The actual number of IFUs in the uninduced should be provided.

      In the revised version of the manuscript, we have provided actual number of IFUs at 24 and 48 hpi in the uninduced condition of both ahpC OE and EV.

      (3) Figures 3B&D. The shades of gray are not possible to distinguish. I'd suggest color or direct labeling.

      Following reviewer’s suggestion, in the latest version of the manuscript we have replaced gray shaded graphs with RGB colored graphs for better visualization and understanding.

      (4) Lines 217-224, Figure 4. Is it possible to get micrographs of the reporter retention in chlamydiae to demonstrate that it is chlamydial ROS levels that are being measured and not cellular?

      Following reviewer’s comment, we performed live-cell microscopy using uninfected HeLa cells and ahpC KD in the uninduced and induced conditions at 24 and 40 hpi. We have created new Fig. 5A with the quantitative ROS measurement graph done by the plate reader (old figure 4 E) and these new 24 hpi/40 hpi microscopy images (Fig 5B and S4).

      (5) The Discussion is overly long and redundant. Large portions of the discussion are simply a rehash of the Results listing by figure number the relevant conclusions.

      Following reviewer’s suggestion, the discussion is modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 2, ahpC is significantly overexpressed at 14 hpi. An IFA as in 2B for 14hpi will be useful. This will help to understand how quick the effect of ahpC overexpression is on development.

      We have added 14 hpi IFA of ahpC and EV as part of Fig 2B.

      (2) In Figure 2E, is there a reason that there is no increase in recoverable IFUs between 24h and 48h for the EV?

      The graph in 2E is % of uninduced. For more clarity, we have mentioned absolute IFUs of uninduced samples in the revised manuscript and IFU level at 48 hpi IFU is higher than the 24 hpi.

      (3) In Figure 3, Can relative levels of RB vs EB measured? This will provide a convincing case for the production of more EBs even when only less/more RBs are present. The same stands for Figure 4.

      We assumed that the comment is for Fig. 2 not the Fig. 3 and following reviewer’s constructive suggestion, we have attempted to resolve the issue. We normalized log10 IFUs/ml with log10 gDNA for 24 hpi and added as figure 2F and 4E. This may resolve the reviewer’s concern about the levels of RBs and EBs.

      (4) A colour-coded Figure 3B and D, instead of various shades of grey, will be easy for the reader to interpret.

      Agreed with the reviewer. For better visualization and understanding of the data, we have replaced gray shaded graphs with RGB colored graphs in the latest version of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Other comments:

      (1) The first paragraph of the discussion should be deleted. It's not very useful or revealing and just delivers self-citations.

      Following reviewer’s suggestion, we rewrote the discussion.

      (2) The first paragraph of the results section does not present results. It's an intro.

      We incorporated this information into the Intro as suggested.

      (3) Has the redox difference between RBs and EBs been experimentally verified by these authors as depicted and claimed in Figure 1A with the cell-permeable, fluorogenic dye CellROX Deep Red for example? It is important to confirm this for EBs and RBs in this setup.

      The difference between redox status of RBs and EBs is studied and established before by previous studies such as Wang et al., 2014.

      (4) l77. Obligate intracellular alpha-proteobacteria also differentiate ... not only chlamydiae.

      We have modified the sentence.

      (5) l127. Is the redox state altered upon ahpC overexpression?

      The ahpC overexpression strain showed hyper resistance for the tested oxidizing agents (including the highest concentration tested) indicating highly reduced conditions as a result of higher activity of AhpC.

    1. eLife Assessment

      This is a valuable study on the efficacy of a live attenuated vaccine that was tested in different animal models and the evidence is convincing. The study has been strengthened after revisions.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors evaluate the attenuation, immunogenicity, and protection efficacy of a live-attenuated SARS-CoV-2 vaccine candidate (BK2102) against SARS-CoV-2.

      Strengths:

      The authors demonstrate that intranasal inoculation of BK2102 is safe and able to induce humoral and cellular immune responses in hamsters, without apparent signs of damage in the lungs, that protects against homologous SARS-CoV-2 and Omicron BA.5 challenge. Safety of BK2102 was further confirmed in a new hACE2 transgenic mouse model generated by the authors.

      Weaknesses:

      The authors have addressed my previous comments on the first submission of the document.

    3. Reviewer #3 (Public review):

      Summary:

      Suzuki-Okutani and collogues reported a new live-attenuated SARS-CoV-2 vaccine (BK2102) containing multiple deletion/substitution mutations. They show that the vaccine candidate is highly attenuated and demonstrates great safety profile in multiple animal models (hamsters and Tg-Mice). Of importance, their data show that singe intranasal immunization with BK2102 leads to strong protection of hamsters against D614G and BA.5 challenge in both lungs and URT (nasal wash). Both humoral and cellular responses were induced, and neutralization activity remained for >360 after single inoculation.

      Strengths:

      The manuscript describes a comprehensive study that evaluates safety, immunogenicity, and efficacy of a new live-attenuated vaccine. Strengths of the study include: 1) strong protection against immune evasive variant BA.5 in both lungs and NW; 2) durability of immunity for >360 days; 3) confirmation of URT protection through a transmission experiment.<br /> While first-generation COVID-19 vaccines have achieved much success, new vaccines that provide mucosal and durable protection remain needed. Thus, the study is significant.

      Weaknesses:

      Lack of a more detailed discussion of this new vaccine approach in the context of reported live-attenuated SARS-CoV-2 vaccines in terms of its advantages and/or weakness<br /> Antibody endpoint titers could be presented.<br /> Lack of elaboration on immune mechanisms of protection at the upper respiratory tract (URT) against an immune evasive variant in the absence of detectable neutralizing antibodies

      Comments on revisions:

      In the revised submission, the authors have added new data and have modified the manuscript accordingly. They have reasonably addressed my comments raised in the previous round of review. The quality and clarity of the manuscript are improved.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the Editor and the Reviewers for their time and effort in thoroughly reviewing our manuscript and providing valuable feedback. We hope we have addressed their comments effectively and improved the clarity of our manuscript as a result.

      The major revisions in the updated manuscript are as follows:

      (1) Immunization experiments using mRNA in Syrian hamsters were performed (Supplementary figures 2A, B and C).

      (2) An ELISPOT assay to evaluate cellular immunity in Syrian hamsters inoculated with BK2102 was conducted (Figure 2F).

      (3) IgA titers in BK2102-inoculated Syrian hamsters were successfully measured (Supplementary figure 2B).

      (4) New immunogenicity data for BK2102 in monkeys was additionally included (Supplementary figure 3B).

      (5) The discussion section has been thoroughly revised to integrate the new data.

      These results have been incorporated into the manuscript, and additional text has been added accordingly.

      Below, we provide point-by-point responses to the reviewers’ comments and concerns.

      Public Reviews:

      Reviewer #1:

      (1) A comparative safety assessment of the available m-RNA and live attenuated vaccines will be necessary. The comparison should include details of the doses, neutralizing antibody titers with duration of protection, tissue damage in the various organs, and other risks, including virulence reversal.

      We agree with the Reviewer’s comment regarding the lack of data to compare BK2102 with an mRNA vaccine. Unfortunately, we were unable to obtain commercially available mRNA vaccines for research purposes and could not produce mRNA vaccines of equivalent quality. As a result, a direct comparison of the safety profiles of BK2102 and mRNA vaccines was not possible. To address this, we conducted a GLP study with an additional twelve monkeys to evaluate the safety of BK2102. Following three intranasal inoculations of BK2102 at two-week intervals, no toxic effects were observed in any of the parameters assessed, including tissue damage, respiratory rate, functional observational battery (FOB), hematology, or fever. These results are detailed in lines 115-117.

      Furthermore, we compared the immunogenicity of BK2102 with that of an in-house prepared mRNA vaccine. The mRNA vaccine was designed to target the spike protein of SARS-CoV-2, and its immunogenicity was evaluated in hamsters. When serum neutralizing antibody titers were found to be comparable between the two, intranasal inoculation of BK2102 induced higher IgA levels in nasal wash samples compared to those from hamsters injected intramuscularly with the self-made mRNA vaccine (Supplementary figures. 2A and B, respectively). Additionally, while the mRNA vaccine induced Th1 and Th2 immune responses, as indicated by the detection of IgG1 and IgG2/3 (Supplementary figure. 2C), BK2102 mainly induced a Th1 response in hamsters. These explanatory sentences have been added to the manuscript (lines 140-150).

      (2) The vaccine's effect on primates is doubtful. The study fails to explain why only two of four monkeys developed neutralizing antibodies. Information about the vaccine's testing in monkeys is also missing: What was the level of protection and duration of the persistence of neutralizing antibodies in monkeys? Were the tissue damages and other risks assessed?

      We believe that the reason neutralizing antibody titers were observed in only 2 out of 4 monkeys in the immunogenicity study reported in the original manuscript is that only a single-dose was administered. We measured the neutralizing antibody titers in sera collected from monkeys used in the GLP study and confirmed the induction of neutralizing antibody in all 6 monkeys that received three inoculations of BK2102. This data has been included in a new figure (Supplementary figure 3B). While we would have liked to evaluate the persistence of immunity and conduct a protection study in monkeys, limitations related to facility availability and cost prevented us from doing so. As noted in (1), tissue injury and other risk assessments were evaluated in the GLP study, which showed no evidence of tissue injury or other toxic effects. These results are described in lines 113-117.

      (3) The vaccine's safety in immunosuppressed individuals or individuals with chronic diseases should be assessed. Authors should make specific comments on this aspect.

      In general, live-attenuated vaccines are contraindicated for immunosuppressed individuals or those with chronic conditions, and therefore BK2102 is also not intended for use in these patients.

      This information has been added to the Discussion section (lines 309-311).

      (4) The candidate vaccine has been tested with a limited number of SARS-CoV-2 strains. Of note, the latest Omicron variants have lesser virulence than many early variants, such as the alfa, beta, and delta strains.

      We have added the results of a protection study against the SARS-CoV-2 gamma strain to Supplementary figures 5A and B. No weight loss was observed in BK2102-inoculated hamsters following infection with the gamma strain. These results are described in lines 109-111, 158-162.

      (5) Limitations of the study have not been discussed.

      We apologize for the ambiguity in the description of the Limitations of this paper. One major limitation of this study is that, despite observing high immunogenicity in hamsters, it remains uncertain whether the same positive results would be achieved in humans. Differences in susceptibility exist between species, which are not solely attributed to weight differences. For instance, while a single dose of 10<sup>3</sup> PFU of BK2102 was sufficient to induce neutralizing antibodies in hamsters, a higher dose of 10<sup>7</sup> PFU in monkeys was required to induce antibodies in only about 50% of the monkeys. Additionally, two more challenges in development of BK2102 were added to the discussion. The first was the limited availability of analytical reagents for hamster models, which restricted the detailed immunological characterization of the response. Second, it took time to gather preclinical data due to the space-related restrictions of BSL3 facilities, which delayed the clinical trials for BK2102 until many individuals had already acquired immunity against SARS-CoV-2. It remains to be seen whether our candidate will be optimal for human use, as the immunogenicity of live-attenuated vaccines is generally influenced by pre-existing immunity.

      We added these considerations to the discussion section (lines 300-309).

      Reviewer #2:

      No major weaknesses were identified, however, this reviewer notes the following:

      The authors missed the opportunity to include a mRNA vaccine to demonstrate that the immunity and protection efficacy of their live attenuated vaccine BK2102 is better than a mRNA vaccine.

      One of the potential advantages of live-attenuated vaccines is their ability to induce mucosal

      immunity. It would be great if the authors included experiments to assess the mucosal immunity of their live-attenuated vaccine BK2102.

      We agree with the Reviewer’s suggestion regarding the importance of comparing BK2102 with the mRNA vaccine modality and evaluating the mucosal immunity induced by BK2102. In hamsters, under conditions where serum neutralizing antibody titers were equivalent, intranasal inoculation of BK2102 induced higher levels of antigen-specific IgA in nasal wash compared to intramuscular injection of the conventional mRNA vaccine. This new data has been added in Supplementary figures 2A and B, and corresponding sentences have been included in the Results and Discussion sections (lines 140-145, 292-299).

      Reviewer #3:

      Lack of a more detailed discussion of this new vaccine approach in the context of reported live-attenuated SARS-CoV-2 vaccines in terms of its advantages and/or weaknesses.

      sCPD9 and CoviLiv<sup>TM</sup>, two previously reported live-attenuated vaccines, achieve attenuation through codon deoptimization or a combination of codon deoptimization and FCS deletion. These two strategies affect viral proliferation but do not directly impact virulence. In contrast, the temperature sensitivity-related substitutions in NSP14 included in BK2102 selectively restrict the infection site, reducing the likelihood of lung infection and providing a safety advantage over the other live-attenuated vaccines. As mentioned in the response to comment (5) of Reviewer #1, a limitation of BK2102 is that its development began later than that of the previously reported live-attenuated vaccines. Consequently, we must consider the impact of pre-existing immunity in future human trials. Based on these points, we have added sentences discussing the advantages and disadvantages to the Discussion section (lines 302-305, 312-319).

      Antibody endpoint titers could be presented.

      Thank you for your suggestion. We calculated the antibody endpoint titers for Figure 2A and included the results in lines 105-107 of the revised manuscript.

      Lack of elaboration on immune mechanisms of protection at the upper respiratory tract (URT) against an immune evasive variant in the absence of detectable neutralizing antibodies.

      We appreciate the comment. The potential role of cellular and mucosal immunity in protection has been discussed in more detail in the revised manuscript, specifically in lines 283-295. According to the reference we initially cited, Hasanpourghadi et al. evaluated their adenovirus vector vaccine candidates and reported that the protection was enhanced by co-expression of the nucleocapsid protein rather than relying solely on the spike protein (Hasanpourghadi et al., Microbes Infect, 2023). Therefore, cellular immunity against the nucleocapsid and/or other viral proteins induced by BK2102 may also contribute to protection, as evidenced by more pronounced cellular immunity to the nucleocapsid detected through ELISPOT assay. Moreover, antigen-specific mucosal immunity was successfully detected in additional studies. The involvement of mucosal immunity in protection against mutant strains has been documented in the previously cited reference (Thwaites et al., Nat Commun, 2023). We have included these new data in Figure 2F and Supplementary figure 2B. Additionally, the results and discussion regarding the mechanisms of protection in the upper respiratory tract, in the absence of detectable neutralizing antibodies, have been incorporated into the revised lines 136-139, 143-145 and 283-295, respectively.

      Recommendations for the authors:

      Reviewer #2:

      Figure 1: Please include the LOD and statistical analysis in both panels. Please consider passaging the virus in Vero cell s, approved for human vaccine production, to assess the stability of BK2102 after serial passage in vitro, which is important for its implementation as a live-attenuated vaccine. The authors should consider evaluating viral replication in different cell lines, and also assessing the plaque phenotype.

      Thank you for your valuable comments. First, we have added the statistical analysis and the limit of detection (LOD) to Figure 1. In response to the comments regarding the stability of BK2102 after serial passage in Vero cells, as well as its replication and plaque phenotype in different cell lines, we manufactured test substances for GLP studies and clinical trials by passaging BK2102 in Vero cells, which are approved for human vaccine production. We confirmed that BK2102 is stable (data not shown). Additionally, we verified that BK2102 replicates in BHK, Vero E6, and Vero E6/TMPRSS2 cells, in addition to Vero cells. Among these options, we selected Vero cells due to their high proliferative capacity and ability to produce clear plaques.

      Figure 2: Please, include statistical analysis in panels A, B, and D. Please, include the LOD in panels A and D. Please, include viral titers from these experiments in hamsters and NHPs.

      First, we would like to note that Figure 2D has been replaced by Figure 2C in the revised manuscript, and the data on neutralizing antibody titers in non-human primates (NHPs), originally presented as Figure 2C, have been moved to the Supplementary figure 3A.

      We have added the statistical analysis to Figure 2B and C, as well as the LOD to Figure 2C. Figure 2A (Spike-specific IgG ELISA) was intended for qualitative evaluation based on OD values, so the LOD was not defined. We have also added a detailed description of virus titer in the Methods section under the headings “Evaluation of Immunogenicity in Hamsters” and “Evaluation of Immunogenicity in Monkeys”, and updated the information in the Figure legends of the revised manuscript (lines 451, 459, 468-474, 566-567, 576-578, 582-584, 661-662).

      Figure 3: Please, include the viral titers of the challenge virus in the NT and lungs.

      We have added the virus titers for the challenge experiments to the Results section under the heading “BK2102 induced protective immunity against SARS-CoV-2 infection” (lines 168-174).

      Figure 4: Please, include statistical analysis in panels B and C and evaluate viral titers.

      We have added the statistical analysis to Figure 4B and C. Unfortunately, all samples in Figure 4 were fixed in formalin for histopathological examination, so virus titers could not be measured. However, in past experiments, we measured viral titers in the nasal wash samples and lungs of hamsters three days post-infection with D614G and BK2102. We confirmed that infectious virus was detected in both the nasal wash and lungs of the hamsters infected with D614G strain (2.9 log10 PFU/mL and 5.3 log10 PFU/g, respectively), but not in the lungs of the hamsters with BK2102. The viral titers in the nasal wash of BK2102-infected hamsters were equivalent to those of the hamsters infected with the D614G wild-type strain (3.0 log10 PFU/mL). However, we did not include this data to the revised manuscript.

      Figure 5: Please, include viral titers in different tissues with the different vaccines (panels A and B). Please, include the body weight changes.  Finally, please, consider the possibility of challenging the vaccinated mice with the same SARS-CoV-2 strains used in the manuscript to demonstrate similar protection efficacy in this new ACE2 transgenic mice.

      The different tissues of Tg mice were not sampled, as no gross abnormalities were observed in organs other than lungs and brains during necropsy. We have added new data on the body weight of Tg mice after infection to Supplementary figures 9B and 9C in the revised manuscript, along with additional lines in the Results section (lines 228-230 and 247-248). Although we do not know the reason, we have observed that immunization of this animal model does not lead to an increase in antibody titers. Therefore, we do not consider this animal model suitable for the protection study as you suggested. However, it could be useful in passive immunization experiments.

      Supplementary Figure 1: Since most of the manuscript focuses on BK2102, the authors should consider removing the other live-attenuated vaccines (Supplementary Figure 1A).

      We agree with the Reviewer’s suggestion and have simplified the description for Supplementary Figure 1A (lines 93-97).

      Supplementary Figure 3: Please, include statistical analysis.

      In the revised manuscript, Supplementary Figure 3 from the original manuscript has been moved to Supplementary Figure 2D. The IgG subclass ELISA was intended for a qualitative evaluation based on OD values, and therefore the results were included in the Supplementary figure. However, we realized the description was not clear, so we added further clarification in the Results section (lines 145-147).

      Supplementary Figure 4: Please, include the viral titers in both infected and contact hamsters from this experiment.

      In the revised manuscript, Supplementary Figure 4 in the original manuscript has been moved to Supplementary Figure 6. Unfortunately, due to limited breeding space for the hamsters, we were unable to prepare groups for the evaluation of viral titer, and instead prioritized evaluation by body weight.

      Reviewer #3:

      (1) It would be helpful to discuss this new vaccine in the context of other reported live-attenuated vaccines in terms of its advantages and/or disadvantages.

      Please refer to our response to the Reviewer’s “first comment” above, as well as to the response in Public comment (5) of Reviewer #1. The modifications made in the manuscript are described in lines 302-305 and 312-319.

      (2) Figure 2A: end-point titers could be presented, other than OD values.

      This comment is addressed in the reviewer’s second public comment. The endpoint titer has been included in lines 105-107 of the revised manuscript.

      (3) Figure 2C: it is unclear why only 2 out of 4 NHPs show neutralization titers. This could be moved to a supplementary figure.

      As suggested by the Reviewer, Figure 2C of the original manuscript has been moved to Supplementary Figure 3A in the revised manuscript. In response Public comment (2) from Reviewer #1, we have also added new data on neutralizing antibodies in the monkeys as Supplementary figure 3B.

      (4) Figures 2E-F: bulk measurement of cytokine production in supernatants is not an optimal way to measure vaccine-induced Ag-specific T cells. ELISPOT or ICS are better. T-cell ELSIPOT for hamsters is available. This should at least be discussed.

      Please refer to our response to this Reviewer’s third public comment. We have added the new results in Figure 2F of the revised manuscript.

      (5) It is quite interesting that no N-specific cellular response was observed, given that it is a live-attenuated vaccine. What about N-specific binding Abs?

      We conducted the ELISPOT assay as suggested by the Reviewer and detected cellular immunity against both spike and nucleocapsid proteins (Figure 2F). We did not examine nucleocapsid-specific antibodies, as they do not contribute to the neutralizing activity; however, nucleocapsid-specific cellular immunity was confirmed.

      (6) Figure 3: limit of detection for virological assays could be labeled.

      We have added the LOD in Figures 3C, D, F and G.

      (7) Figures 3E-F: it is interesting to see that the vaccine elicits almost complete protection at URT against BA.5, despite no BA.5 neutralizing titers being detected at all. What mechanism of URT protection by BK2102 would the authors speculate? T cells or other Ab effector functions?

      Please refer to the response to this Reviewer’s third public comment. We have added new results regarding cellular and mucosal immunity (Figure 2F and Supplementary figure 2B) and discussed the mechanisms of protection in the upper respiratory tract in the absence of detectable neutralizing antibodies (lines 136-139, 143-145 and 283-295, respectively).

      (8) Figure 3I: the durability of protection is a strength of the study. Other than body weight changes, what about viral loads in the animals after the challenge?

      We primarily assessed the effect of the vaccine by monitoring changes in body weight, as the differences compared to the naïve group were clear. Unfortunately, we did not collect samples at different time points throughout the study, which prevented us from evaluating the viral titers.

      In addition, we made corrections to several other sections identified during the revision process. The revised parts are as follows:

      - In the Methods section under the title “Evaluation of BK2102 pathogenicity in hamsters”, the infectious virus titer of D614G strain has been corrected (line 478).

      - In the Methods section under the title “In vivo passage of BK2102 in hamsters”, infectious virus titer of BK2102 and A50-18 strain has been corrected (line 487).

      - The collection time of splenocytes after inoculation has been corrected in the figure legend of Figure 2D, (line 583).

      - There was an error in Figure 2D. The figure has been replaced with the appropriate version.

      - A new reference on NSP1 deletion (Ueno et al., Virology, 2024) has been added to the references.

      - Several methods have been described more clearly.

    1. eLife Assessment

      In this important study, the authors investigate the biogenesis of extracellular vesicles in mycobacteria and provide several observations to link VirR with vesiculogenesis, peptidoglycan metabolism, lipid metabolism, and cell wall permeability. The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. The evidence presented in the revised manuscript is convincing and creates several avenues for further research.

    2. Reviewer #1 (Public Review):

      Summary:

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      Strengths:

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species.

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins.

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant.

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain.

      Altogether, the work is comprehensive, experiments are designed well, and conclusions were made based on the data generated after verification using multiple complementary approaches.

      Weaknesses:

      The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. Authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript-the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      Comments on revisions:

      The authors have addressed my comments. I have no further issues.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vasculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vasculogenesis.

      Strengths:

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production.

      Comments on the revision.

      Authors have addressed the concerns, specifically regarding the expression of downstream genes. It appears that they are not altered significantly.

      Data in Fig 6C shows significantly higher expresssion of VirR compared to control or knock down. In the absence of using a regulatable expression such as nitrile, this is expected from a constitutive promoter.

      I have no further questions for the author.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Comments on the revised version:

      Concerns flagged about using CRISPR -guide RNA mediated knockdown of viral has yet to be addressed entirely. I understand that the authors could not get knock out despite attempts and hence they have guide RNA mediated knockdown strategy. However, I wondered if the authors looked at the levels of the downstream genes in this knockdown.

      We thank the reviewer for bringing this up since it is known that certain artifacts derived from this approach may be related with changes in expression of downstream genes. We run a qPCR of Rv0432 and Rv0433 and confirmed that no significant differences in expression of virR downstream genes were detected in the virR mutant or the complemented strains relative to WT. This is now indicated in the method section on Generation of the CRISPR mutants. The data is now presented as Supplementary Figure 13.

      Authors have used the virmut-Comp strain for some of the experiments. However, the materials and methods must describe how this strain was generated. Given the mutant is a CRISPR-guide RNA mediated knockdown. The CRISPR construct may have taken up the L5 loci. Did authors use episomal construct for complementation? If so, what is the expression level of virR in the complementation construct? What are the expression levels of downstream genes in mutant and complementation strains? This is important because the transcriptome analysis was redone by considering complementation strain. The complemented strain is written as virmut-C or virmut-Comp. This has to be consistent.

      We apologize for not having included the information about the generation of the complemented strain in our last version of the manuscript. We took the complementing vector from a previous paper on VirR (Rath et al., (2013) PNAS 110(49):E4790). This vector was constructed as follows: Complementation plasmids were cloned using Gateway® Cloning Technology (Invitrogen). E. coli strains expressing the following Gateway vectors were kindly provided by Dirk Schnappinger and Sabine Ehrt: pDO221A, pDO23A, pEN23A-linker1, pEN41A-TO2, pEN21A-Hsp60, pDE43-MEH. PCR was used to amplify the following target sequences from H37Rvgenomic DNA: coding sequence of Rv0431, coding sequence of Rv0431 with a FLAG tag either in its C-terminus or its N-terminus, and the predicted cytosolic sequence of Rv0431 with a FLAG tag in its new C-terminus. The primers used for PCR were designed such that the amplicons would be flanked with Gateway® cloning- specific attachment (att) sites. These PCR products were recombined into Gateway® donor vectors using bacteriophage-derived integrase and integration host factor, resulting in entry vectors. The recombination events are specific to the attB sites on the PCR products and to the attP sequences on the donor vectors, such that the orientation of the target sequence is maintained during the recombination reaction, also known as the BP reaction, for attB-attP recombination. Using the MultiSite Gateway® system, three DNA fragments, derived from each of three distinct entry vectors, can be simultaneously inserted into a final complementation vector called the destination vector in a specific order and orientation. Multisite recombination events are mediated by Integrase and Integrase Host Factor, in a process called the LR reaction (for the attL and attR sites in the entry and destination vectors). The Gateway® entry vectors thus generated were recombined with another entry vector containing either the Hsp60 promoter, an empty entry vector, and a complementation vector (episomal) to give rise to the final destination vector. The destination vector (episomal) was engineered to contain a hygromycin resistance cassette. These vectors were used to transform competent Rv0431-deficient Mtb. The transformation mixture was plated on 7H11 plates containing OADC and hygromycin (50 μg/ml). Colonies, typically observed 3-5 weeks later, were isolated and grown in 7H9 media and characterized.

      For simplicity, we have just referenced our previous paper to indicate that the complementing plasmid is the same used in that study.

      Regarding the virR expression levels in the WT, virR<sup>mut</sup> and complemented virR strains please see previous Figure 6 C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have revised the manuscript in light of previous reviews. The authors have addressed some of my concerns appropriately. However, the specific dataset remains unchanged and unclear.

      Fig 8G and H: In response to a comment on the mechanism of how VirR mediates EV release, the authors have added new data showing an increase in the abundance of deacetylated muropeptides in the mutant. This observation is linked to altered lysozyme activity or PG fragility. In my opinion, this is another indirect observation. More concerning is the complemented strain, which also showed a comparable increase in deacetylated muropeptides, indicating that the altered muropeptides could be unrelated to VirR.

      We must disagree here with the reviewer assessment about the fact that the abundance of deacetylated muropeptides is an indirect indication of PG fragility. We consider that this observation and quantitative fact is another additional evidence that indicate a more fragile PG. We believe that considering each of the supporting facts individually may be seen as indirect, but we would like that the reviewer take all the evidence together: (i) sensitivity to lysozyme; (ii) enlargement and altered physicochemical morphological characteristics including porosity or thickness; (iii) altered penetrance of FDAAs; and (iv) increased released of muropeptides. In this later fact, the complemented strain may not display the WT features, but this may be due to some artifacts derived from the complementation.

      Taking all together, we believe that the PG of virR<sup>mut</sup> is more fragile than that of the WT and the complemented strains based on a series of evidence. We hope the reviewer may consider this perspective when analyzing such a complex feature like PG fragility. So far, there is not a direct method to assess this condition.

      Lipid analyses are not comprehensive. The issue related to the need for more clarity of DIMA and DIMB still needs to be addressed. I understand that the authors do not have facilities to perform radioactive assays. However, they could have repeated the experiment to generate a better-quality image. Similarly, the newly generated SL-1, PAT, and DAT TLC could be of better quality. Bands still need to be resolved. The solvent front is irregular. The same is true for PIMs and DPG TLCs. With the evidence provided, the deregulation of cell wall lipids is incomplete.

      We agree with the reviewer that the quality of the TLC is not appropriate. We have no repeated the PDIM TLC (new Fig 7D). In addition, we have repeated the TLCs resolving sulfolipids in a 2D mode. For simplicity we just run the glycerol condition including the three strains. This is now part of a new Supplementary figure 8 B. For PIMs, we have a 1D and a 2D analysis that, after checking previous papers using similar approaches with no radioactivity, we consider that it has the desired quality to identify the indicated lipids.

      We hope this new data and repeated experiments satisfy the reviewer concerns.

      Thank you very much for your assessment and time to review this paper.

    1. eLife Assessment

      The authors performed extensive coarse-grained molecular dynamics simulations of 140 different prion-like domain variants to interrogate how specific amino acid substitutions determine the driving forces for phase separation. The analyses are solid, and the derived predictive scaling laws can aid in identifying potential phase-separating regions in uncharacterized proteins. Overall, this is a valuable contribution to the field of biomolecular condensates. It exemplifies how data-driven methodologies can uncover new insights into complex biological phenomena.

    2. Reviewer #1 (Public review):

      Summary:

      In this preprint, the authors systematically and rigorously investigate how specific classes of residue mutations alter the critical temperature as a proxy for the driving forces for phase separation. The work is well executed, the manuscript well-written, and the results reasonable and insightful.

      Strengths:

      The introductory material does an excellent job of being precise in language and ideas while summarizing the state of the art. The simulation design, execution, and analysis are exceptional and set the standard for large-scale simulation studies. The results, interpretations, and Discussion are largely nuanced, clear, and well-motivated, and the pedagogical nature with which sampling convergence is discussed is greatly appreciated. Finally, the underlying data are shared in a clear and accessible manner. Overall, the manuscript is a model

      Weaknesses:

      The simplicity of a one-bead-per-residue model parameterized to capture UCST-type phase behavior does perhaps impact some aspects of the generality of this work. That said, the authors carefully acknowledge these limitations, and overall, this is not seen as a major weakness of the conclusions drawn or the manuscript, given those conclusions are appropriately couched.

    3. Reviewer #2 (Public review):

      This is an interesting manuscript where a CA-only CG model (Mpipi) was used to examine the critical temperature (Tc) of phase separation of a set of 140 variants of prion-like low complexity domains (PLDs). The key result is that Tc of these PLDs seems to have a linear dependence on substitutions of various sticker and space residues. This is potentially useful for estimating the Tc shift when making novel mutations of a PLD.

      Comments on revisions: The authors have addressed concerns raised previously.

    4. Reviewer #3 (Public review):

      Summary:

      "Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws" by Maristany et al. offers a significant contribution to the understanding of phase separation in prion-like domains (PLDs). The study investigates the phase separation behavior of PLDs, which are intrinsically disordered regions within proteins that have the propensity to undergo liquid-liquid phase separation (LLPS). This phenomenon is crucial in forming biomolecular condensates, which play essential roles in cellular organization and function. The authors employ a data-driven approach to establish predictive scaling laws that describe the phase behavior of these domains.

      Strengths:

      The study benefits from a robust dataset encompassing a wide range of PLDs, which enhances the generalizability of the findings. The authors' meticulous curation and analysis of this data add to the study's robustness. The scaling laws derived from the data provide predictive insights into the phase behavior of PLDs, which can be useful in the future for the design of synthetic biomolecular condensates.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer # 1 (Public Review):

      Summary:

      Inthispreprint, theauthorssystematicallyandrigorouslyinvestigatehowspecificclassesofresiduemutations alter the critical temperature as a proxy for the driving forces for phase separation. The work is well executed, the manuscript well-written, and the results reasonable and insightful.

      Strengths:

      The introductory material does an excellent job of being precise in language and ideas while summarizing the state of the art. The simulation design, execution, and analysis are exceptional and set the standard for these types of large-scale simulation studies. The results, interpretations, and Discussion are largely nuanced, clear, and well-motivated.

      We thank the reviewer for their assessment of our work and for highlighting the key strengths of the paper.

      Weaknesses:

      This is not exactly a weakness, but I think it would future-proof the authors’ conclusions to clarify a few key caveats associated with this work. Most notably, given the underlying implementation of the Mpipi model, temperature dependencies for intermolecular interactions driven by solvent effects (e.g., hydrophobic effect and charge-mediated interactions facilitated by desolvation penalties) are not captured. This itself is not a “weakness” per se, but it means I would imagine CERTAIN types of features would not be wellcaptured; notably, my expectation is that at higher temperatures, proline-rich sequences drive intermolecular interactions, but at lower temperatures, they do not. This is likely also true for the aliphatic residues, although these are found less frequently in IDRs. As such, it may be worth the authors explicitly discussing.

      We also thank the reviewer for pointing out that a more detailed discussion of the model limitations is needed. The original Mpipi model was designed to probe UCST-type transitions (that are associative in nature) of disordered sequences. The reviewer is correct, that in its current form, the model does not capture LCST-type transitions that depend on changes in solvation of hydrophobic residues with temperature. We have amended the discussion to highlight this fact.

      Similarly, prior work has established the importance of an alpha-helical region in TDP-43, as well as the role of aliphatic residues in driving TDP-43’s assembly (see Schmidt et al 2019). I recognize the authors have focussed here on a specific set of mutations, so it may be worth (in the Discussion) mentioning [1] what impact, if any, they expect transient or persistent secondary structure to have on their conclusions and [2] how they expect aliphatic residues to contribute. These can and probably should be speculative as opposed to definitive.

      Again - these are not raised as weaknesses in terms of this work, but the fact they are not discussed is a minor weakness, and the preprint’s use and impact would be improved on such a discussion.

      We agree with the reviewer that the effects of structural changes/propensities on these scaling behaviors would be an interesting and important angle to probe. We also comment on this in the discussion.

      Reviewer # 2 (Public Review):

      This is an interesting manuscript where a CA-only CG model (Mpipi) was used to examine the critical temperature (Tc) of phase separation of a set of 140 variants of prion-like low complexity domains (PLDs). The key result is that Tc of these PLDs seems to have a linear dependence on substitutions of various sticker and space residues. This is potentially useful for estimating the Tc shift when making novel mutations of a PLD. However, I have strong reservations about the significance of this observation as well as some aspects of the technical detail and writing of the manuscript.

      We thank the reviewer for their thoughtful and detailed feedback on the manuscript.

      (1) Writing of the manuscript: The manuscript can be significantly shortened with more concise discussions. The current text reads as very wordy in places. It even appears that the authors may be trying a bit too hard to make a big deal out of the observed linear dependence.

      The manuscript needs to be toned done to minimize self-promotion throughout the text. Some of the glaring examples include the wording “unprecedented”, “our research marks a significant milestone in the field of computational studies of protein phase behavior ..”, “Our work explores a new framework to describe, quantitatively, the phase behavior ...”, and others.

      We thank the reviewer for their suggestions on the writing of the manuscript. We understand the concern regarding the length and tone of the manuscript, and in response to their feedback, we have revised the language throughout the manuscript.

      There is really little need to emphasize the need to manage a large number of simulations for all 140 variants. Yes, some thoughts need to go into designing and managing the jobs and organizing the data, but it is pretty standard in computational studies. For example, large-scale protein ligand-free energy calculations can require one to a few orders of magnitude larger number of runs, and it is pretty routine.

      We fully agree with the reviewer that this aspect of the study is relatively standard in computational research and does not require special emphasis. In response, we have revised the manuscript to shorten the aforementioned section, focusing instead on the scientific insights gained from the simulations rather than the logistical challenges of managing them.

      When discussing the agreement with experimental results on Tm, it should be noted that the values of R > 0.93 and RMSD < 14 K are based on only 16 data points. I am not sure that one should refer to this as “extended validation”. It is more like a limited validation given the small data size.

      We thank the reviewer for their consideration of our validation set. Indeed, the agreement with experimental results is based on 16 data points, as this set represents the available published data at the time of writing of this manuscript. The term “extended validation” is used to signify that our current dataset builds upon previous validations (in Joseph, Reinhardt et al. Nat Comput. Sci. 2021), incorporating additional variants not previously examined. The metrics of an r>0.93 and a low RMSD indicate a strong agreement between the model and experiments, and an improvement with respect to other reported models. We are committed to continue validating our methods.

      Results of linear fitting shown in Eq 4-12 should be summarized in a single table instead of scattering across multiple pages.

      We considered the reviewer’s suggestion to compile all the laws into a single table. However, we believe it would be more effective for readers to reference each relationship directly where it is first discussed in the text. That said, we do include Table 1 in the original manuscript, which provides a summary of all the laws.

      The title may also be toned down a bit given the limited significance of the observed linear dependence.

      We respectfully disagree with the reviewer and believe that the current title accurately captures the scope of the manuscript.

      (2) Significance and reliability of Tc: Given the simplicity of Mpipi (a CA-only model that can only describe polymerchaindimension)andthelowcomplexitynatureofPLDs, thesequencecompositionitselfisexpected to be the key determinant of Tc. This is also reflected in various mean-field theories. It is well known that other factors will contribute, such as patterning (examined in this work as well), residual structures, and conformational preferences in dilute and dense phases. The observed roughly linear dependence is a nice confirmation but really unsurprising by itself. It appears how many of the constructs deviate from the expected linear dependence (e.g., Figure 4A) may be more interesting to explore.

      While linear dependencies in critical solution temperatures may appear expected for certain systems, for example, symmetric hard spheres, the heterogeneity of intrinsically disordered regions (IDRs), like prion-like domains (PLDs), make this finding notable. The simplicity of our linear scaling law belies the underlying complexity of multivalent interactions and sequence-dependent behaviors in a certain sequence regime, which has not been quantitatively characterized in this manner before. Likewise, although linear dependencies may be expected in simplified models, the real-world applicability and empirical validation of these laws in biologically relevant systems are not guaranteed. Our chemically based model provides the robustness needed to do that. The linear relationship observed is significant because it provides a predictive framework for understanding how specific mutations affect a diverse set of PLDs. The framework presented can be extended to other protein families upon the application of a validated model, which might or might not yield linear relationships depending on the cooperative effects of their collective behavior. This extends beyond confirming known theories—it offers a practical tool for predicting phase behavior based on sequence composition

      We agree with the reviewer that, while the overarching linear trend is clear, deviations from linearity observed in constructs like those in Figure 4A point to additional, and interesting, layers of complexity. These deviations offer interesting avenues for future research and suggest that while linearity might dominate PLD critical behavior, other factors may modulate this behavior under specific conditions.

      This is an excellent suggestion from the reviewer that, while it falls outside the scope of the current study, we are interested in exploring in the future.

      Finally, the relationships are all linear, they have been normalized in different ways—the strength of the study also lies in that. Instead of focusing solely on linearity, our study explores the physical mechanisms that underlie these relationships. This approach provides a more complete understanding of how sequence composition and the underlying chemistry of the mutated residues influence T<sub>c</sub.

      The assumption that all systems investigated here belong to the same universality class as a 3D Ising model and the use of Eqn 20 and 21 to derive Tc is poorly justified. Several papers have discussed this issue, e.g., see Pappu Chem Rev 2023 and others. Muthukumar and coworkers further showed that the scaling of the relevant order parameters, including the conserved order parameter, does not follow the 3D Ising model. More appropriate theoretical models including various mean field theories can be used to derive binodal from their data, such as using Rohit Pappu’s FIREBALL toolset. Imposing the physics of the 3D Ising model as done in the current work creates challenges for equivalence relationships that are likely unjustified.

      We thank the reviewer for raising this point and for highlighting the FIREBALL toolset. Based on our understanding, FIREBALL is designed to fit phase diagrams using mean-field theories, such as Flory–Huggins and Gaussian Cluster Theory. Our experience with this toolset suggests that it places a higher weight on the dilute arm of the binodal. However, in our slab simulations, we observe greater uncertainty in the density of the dilute arm. This leads to only a moderate fit of the data to the mean-field theories employed in the toolset. While we agree that there is no reason to assume the phase behavior of these systems is fully captured by the 3D Ising model, we expect that such a model will describe the behavior near the critical point better than mean-field theories. Testing our results further with different critical exponents would be valuable in assessing how these predictions compare to a broader set of experimental data. Additionally, we have made the raw data points for the phase diagrams available on our GitHub, enabling practitioners to apply alternative fitting methods.

      While it has been a common practice to extract Tc when fitting the coexistence densities, it is not a parameter that is directly relevant physiologically. Instead, Csat would be much more relevant to think about if phase separation could occur in cells.

      WhileitistruethatCsatisdirectlyrelevanttowhetherphaseseparationcanoccurincellsunder physiological conditions, T<sub>c</sub> should not be dismissed as irrelevant.T<sub>c</sub> provides fundamental insights into the thermodynamics of phase separation, reflecting the overall stability and strength of interactions driving condensate formation. This stability is crucial for understanding how environmental factors, such as temperature or mutations, might affect phase behavior. In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when consideringC<sub>sat</sub> at a fixed temperature.

      Reviewer # 3 (Public Review):

      Summary:

      “Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws” by Maristany et al. offers a significant contribution to the understanding of phase separation in prion-like domains (PLDs). The study investigates the phase separation behavior of PLDs, which are intrinsically disordered regions within proteins that have a propensity to undergo liquid-liquid phase separation (LLPS). This phenomenon is crucial in forming biomolecular condensates, which play essential roles in cellular organization and function. The authors employ a data-driven approach to establish predictive scaling laws that describe the phase behavior of these domains.

      Strengths:

      The study benefits from a robust dataset encompassing a wide range of PLDs, which enhances the generalizability of the findings. The authors’ meticulous curation and analysis of this data add to the study’s robustness. The scaling laws derived from the data provide predictive insights into the phase behavior of PLDs, which can be useful in the future for the design of synthetic biomolecular condensates.

      We thank the reviewer for highlighting the importance of our work and for their critical feedback.

      Weaknesses:

      While the data-driven approach is powerful, the study could benefit from more experimental validation. Experimental studies confirming the predictions of the scaling laws would strengthen the conclusions. For example, in Figure 1, the Tc of TDP-43 is below 300 K even though it can undergo LLPS under standard conditions. Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable.

      In the manuscript, we have leveraged existing experimental data for the A1-LCD variants, extracting critical temperatures and saturation concentrations to compare with our model and scaling law predictions. We acknowledge that a larger set of experiments would be beneficial. By selecting sequences that are related, we hypothesize that the scaling laws described herein should remain robust. In the case of TDP-43, to our knowledge this protein does not phase separate on its own under standard conditions. In vitro experiments that report phase separation at/above 300 K involve either the use of crowding agents (such as dextran or PEG) or multicomponent mixtures that include RNA or other proteins. Therefore, our predictions for TDP-43 are consistent with experiments. In general, we hope that the scaling laws presented in our work will inspire other researchers to further test their validity.

      The authors may wish to consider checking if the scaling behavior is only observed for Tc or if other experimentally relevant quantities such as Csat also show similar behavior. Additionally, providing more intuitive explanations could make the findings more broadly accessible.

      In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when considering C<sub>sat</sub> at a fixed temperature.

      The study focuses on a particular subset of intrinsically disordered regions. While this is necessary for depth, it may limit the applicability of the findings to other types of phase-separating biomolecules. The authors may wish to discuss why this is not a concern. Some statements in the paper may require careful evaluation for general applicability, and I encourage the authors to exercise caution while making general conclusions. For example, “Therefore, our results reveal that it is almost twice more destabilizing to mutate Arg to Lys than to replace Arg with any uncharged, non-aromatic amino acid...” This may not be true if the protein has a lot of negative charges.

      A significant number of proteins, in addition to those mentioned in the manuscript, that contain prion-like low complexity domains have been reported to exhibit phase separation behaviors and/or are constituents of condensates inside cells. We therefore expect these laws to be applicable to such systems and have further revised the text to emphasize this point. As the reviewer suggests, we have also clarified that the reported scaling of various mutations applies to these systems.

      I am surprised that a quarter of a million CPU hours are described as staggering in terms of computational requirements.

      We have removed the note on CPU hours from the manuscript. However, we would like to clarify that the amount of CPU hours was incorrectly reported. The correct estimate is 1.25 million hours, but this value was unfortunately misrepresented during the editing process. We thank the reviewer for catching this mistake on our part.

      Reviewer # 1 (Recommendations For The Authors):

      Some minor points here:

      “illustrating that IDPs indeed behave like a polymer in a good solvent [43]. ” Whether or not an IDP depends as a polymer in a good solvent depends on the amino acid sequence - the referenced paper selected a set of sequences that do indeed appear on average to map to a good-solvent-like polymer, but lest we forget SAXS experiments require high protein concentrations and until the recent advent of SEC-SAXS, your protein essentially needed to be near infinitely soluble to be measured. As such, this paper’s conclusions are, apparently, ignorant of the limitations associated with the data they are describing, drawing sweeping generalizations that are clearly not supported by a multitude of studies in which sequence-dependencies have led to ensembles with a scaling exponent far below 0.59 (See Riback et al 2017, Peng et al 2019, Martin et al 2020, etc).

      We thank the reviewer for raising this point. To avoid making incorrect generalizations and potentially misleading readers, we have removed the quoted statement from our manuscript.

      As of right now, the sequences are provided in a convenient multiple-sequence alignment figure. However, it would be important also to provide all sequences in an Excel table to make it easy for folks to compare.

      In addition to the sequence alignment figure, we now provide all tested sequences in an Excel table format in the GitHub repository.

      Maybe I’m missing it, but it would be extremely valuable if the coexistence points plot in all the figures were provided as so-called source data; this could just be on the GitHub repository, but I’m envisaging a scenario where for each sequence you have a 4 column file where Col1=concentration and Col2=temperature, col3=fit concentration and col4=fit temperature, such that someone could plot col1 vs. col2 and col3 vs. col4 and reproduce the binodals in the various figures. Given the tremendous amount of work done to achieve binodals:

      The coexistence points used to plot the figures are now provided in the GitHub, in a format similar to that suggested by the reviewer.

      It would be nice to visually show how finite size effects are considered/tested for (which they are very nicely) because I think this is something the simulation field should be thinking about more than they are.

      Thank you for highlighting this point. In our previous work (supporting information of the original Mpipi paper), we demonstrated a thorough approach by varying both the cross-sectional area of the box and the long axis while keeping the overall density constant. In this work, we verified that the cross-sectional area was larger than the average R<sub>g</sub> of the protein. We then maintained a fixed cross-sectional area to long-axis ratio, varying the number of proteins while keeping the overall density constant. We have updated Appendix 1–Figure 2 to clarify our procedure and revised the caption to better explain how we ensured the number of proteins was adequate.

      When explaining the law of reticular diameters, it would be good to explain where the 3.06 exponent comes from.

      Based on the reviewer’s suggestion, we have added to the text: “The constant 3.06 in the equation is a dimensionless empirical factor that was derived from simulations of the 3D Ising model.”

      The NCPR scale in Figure 5 being viridis is not super intuitive and may benefit from being seismic or some other r-w-b colormap just to make it easier for a reader to map the color to meaning.

      We thank the reviewer for this suggestion and have replaced the scale with a r-w-b colormap.

      The “sticker and spacer” framework has received critiques recently given its perceived simplicity. However, this work seems to clearly illustrate that certain types of residues have a large effect on Tc when mutated, whereas others have a smaller effect. It may be worth re-phrasing the sticker-spacer introduction not as “everyone knows aromatic/arginine residues are stickers” but as “aromatic and arginine residues have been proposed to be stickers, yet other groups have argued all residues matter equally” and then go on to make the point that while a black-and-white delineation is probably not appropriate, based on the data, certain residues ARE demonstrably more impactful on Tc than others, which is the definition of stickers. With this in mind, it may be useful to separate out a sticker and a spacer distribution in Figure 1D, because the different distribution between the two residues types is not particularly obvious from the overlapping points.

      We have revised the introduction of the sticker–spacer model in the manuscript for clarity. As the reviewer suggests, we have also separated the sticker and spacer distribution, which is now summarized in new Appendix 0–figure 8.

      Reviewer # 3 (Recommendations For The Authors):

      Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable. The following sentence may be revised to reflect this: “Our extended validation set confirms that the Mpipi potential can ...”

      Based on the reviewer’s suggestion, we have revised the text: “Our validation set, which expands the range of proteins variants originally tested [32], highlights that the Mpipi potential can effectively capture the thermodynamic behavior of a wide range of hnRNPA1-PLD variants, and suggests that Mpipi is adequate for proteins with similar sequence compositions, as in the set of proteins analyzed in this study. In recent work by others [66], Mpipi was tested against experimental radius of gyration data for 137 disordered proteins and the model produced highly accurate results, which further suggests the applicability of the approach to a broad range of sequences.”

    1. eLife Assessment

      The authors propose that positive biodiversity-ecosystem functioning relationships found in experiments have been exaggerated because commonly used statistical analyses are flawed. To remedy this, a new type of analysis based on a concept of "partial density monoculture yield" is proposed. However, the presented concept and analysis methods are not reproducibly described (how can partial density monoculture yield experimentally be assessed?), do not appear to be complete, and are inadequate for hypothesis testing. The reviewers found that the authors misinterpret current research in the field and made limited efforts to understand or address the reviewer comments about this study.

    2. Joint Public Review:

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability.

      Readers are encouraged to consider the original reviews in full, which outline the strengths and weaknesses of the work:

      First version: https://elifesciences.org/reviewed-preprints/98073v1/reviews

      Second version: https://elifesciences.org/reviewed-preprints/98073v2/reviews

      There are no further reviews for this version because the authors declined to make further improvements to their manuscript.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive.

      This is not true; positive CE does not require positive RY deviations of all species. CE is positive as long as average RY deviation is greater than 0. In a 2-species mixture, for example, if the RY deviation of one species is -0.2 and that of the other species is +0.3, CE would be still positive. Positive CE can be associated with negative NE (net biodiversity effects) when more productivity species have smaller negative RY deviation compared to positive RY deviation of less productive species. Therefore, the suggestion by the reviewer “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0)” is not correct.   

      When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The use of word “mitigate” indicates that the effects of niche complementarity and competition are in opposite directions, which is not true with biodiversity experiments based on replacement design. We have explained this in detail in our first responses to reviewers.    

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      Agree. However, If CE and SE are not meant to be biological mechanisms, as suggested by the reviewer, the argument “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0)” would be invalid.  

      Lines 108-123 are not on our method.   

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      True. Research findings indicate that biodiversity effect detected with AP is not constant.    

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche.

      Competitive ability is not necessarily associated with species niche space. Both generalist and specialist species can be more productive at a particular study site, as long as they are more capable of obtaining resources from a local pool. Remember, biodiversity experiments are conducted at a site of particular conditions, not across a range of species niche space at landscape level.

      Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As explained in lines 370-376, the mathematical form is a linear approximation as the relationship between competitive growth responses and species relative competitive ability is generally unknow but would be likely nonlinear. Once the relationship is determined in future research, the scaling factor is not needed.    

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      Comments on revised version:

      Only minimal changes were made to the manuscript, and they do not address the main points that were raised.

      Reviewer #2 (Public review):

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      No responses.

      Comments on revised version:

      The authors changed only one minor detail in response to the last round of reviews.

      Reviewer #3 (Public review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript's null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      Our approach adopts two hypotheses, null hypothesis that is also with the additive partitioning model and competitive hypothesis that is new. Null hypothesis assumes that inter- and intra-specie interactions are the same, while competitive hypothesis assumes that species differ in competitive ability and growth rate. Therefore, our approach is an extension of current approach. Our approach separates effects of competitive interactions from those of other species interactions, while the current approach does not.      

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning.

      We have explained in our first responses that competition and biodiversity effects are studied in different experimental approaches, i.e., additive and replacement designs. Results from one approach are not compatible with those from the other. For example, competition effect with additive design is negative but generally positive with replacement design that is used extensively in biodiversity experiments. We have considered species competitive ability, density-growth relationship, and different effects of competitive interactions between additive and replacement design, while the current method does not reflect any of those.        

      The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them.

      We used simulation data, as partial density monocultures are generally not available in previous biodiversity experiments.

      Finally, it is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others.

      Our null hypothesis is the same as the null hypothesis with the additive partitioning assuming that inter- and intra-species interactions are the same, while our competitive hypothesis assumes that species differ in competitive ability and growth rate. Rejecting null hypothesis means that inter- and intra-species interactions are different, whereas rejecting competitive hypothesis indicates existence of positive/negative species interactions. This would be interesting to everyone.       

      Comments on revised version:

      Please see review comments on the previous version of this manuscript. The authors have not revised their manuscript to address most of the issues previously raised by reviewers.

      No responses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Do take reviews seriously. Even if you think the reviewers all are wrong and did not understand your work, then this seems to indicate that it was not clearly presented.

      Reviewer #2 (Recommendations for the authors):

      I can understand that the authors are perhaps frustrated with what they perceive as a basic misunderstanding of their goals and approach. This misunderstanding however, provides with it an opportunity to clarify. I believe that the authors have tried to clarify in rebutting our statements but would do better to clarify in the manuscript itself. If we reviewers, who are deeply invested in this field, don't understand the approach and its value, then it is likely that many readers will not as well.

      The additive partitioning has been publicly questioned at least for serval times since the conception of the method in 2001. Our work provides an alternative.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer (1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer 2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer 3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1 Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size

      The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns

      While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:

      (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We will add the confidence intervals to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9),and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we will change Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We will highlight these motivations more clearly in the Methods of the revised manuscript.

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “3.4 Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Response: Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Response: Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.“

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As mentioned in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We will make this clearer in the Methods of the revised manuscript.

      Our consistent results of group differences across all three  EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Figure R1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      Response: The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).“

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      Response: The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “3.5 Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      Response: As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      Response: The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).<br /> - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      Response: We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      Response: In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).“

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the end of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies that relate MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:<br /> Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 2.5 years.

      We took care of the validity of our results with two measures; first, we assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 28 additional individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022). In the revised manuscript, we more explicitly inform the reader about this data quality difference between regions in the Methods (Pages 11-12, MRS Data Quality/Table 2) and Discussion (Page 25, Lines 644- 647).

      Importantly, while in the present study data quality differed between the frontal and visual cortex voxel, it did not differ between groups (Supplementary Material S6).  

      Further, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we added the recently published MRS quality assessment form to the supplementary materials (Supplementary Excel File S1). Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel. Finally, EEG data quality did not differ between frontal and occipital electrodes; therefore, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we have more clearly indicated that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject (Page 23, Lines 609-613).

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences. 

      In the revised manuscript, we discussed the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable (Page 5, Lines 143 – 145, Lines 147-149). 

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we have clearly indicated that the exploratory correlation analyses are reported to put forth hypotheses for future studies (Page 4, Lines 118-128; Page 5, Lines 132-134; Page 25, Lines 644- 647).

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to be predominantly driven by an effect of chronological age.

      In the revised manuscript, we added the linear regressions with age as a covariate (Supplementary Material S16, referred to in the main Results, Page 21, Lines 534-537), demonstrating the significant relationship between aperiodic intercept and Glx concentration in the CC group. 

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we improved the phrasing (Page 5, Lines 130-132) and consistently reported the correlations as exploratory in the Methods and Discussion. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we added this analysis to the Supplementary Material (Supplementary Material S14) and referred to it in our Results (Page 20, Lines 513-514).

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration.

      We have now explicitly stated this in the Limitations section (Page 25, Lines 654-655).

      However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature (Page 23, Lines 609-611).

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz). 

      As stated in the Discussion section and Response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we have more clearly indicated in the Discussion that these are possible post-hoc interpretations (Page 23, Lines 584-586; Page 24, Lines 609-620; Page 24, Lines 644-647; Pages 25, Lines 650 - 657). We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such.

      We have now made this clear in all the relevant parts of the manuscript (Introduction, Page 5, Lines 132-135; Methods, Page 16, Line 415; Results, Page 21, Figure 4; Discussion, Page 22, Line 568, Page 25, Lines 644-645, Page 25, Lines 650-657).

      The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we have detailed the advantages (Methods, Page 5, Lines 143 – 145, Lines 147-149; Discussion, Page 26, Lines 677-678) and disadvantages (Discussion, Page 25, Lines 650-657) of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our Discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. In the revised manuscript, we added the sample sizes of previous studies using MRS in permanently blind individuals (Page 4, Lines 108 - 109). It is worth noting that our EEG results fully align with those of larger samples of congenital cataract reversal individuals (Page 25, Lines 666-676, Supplementary Material S18, S19) (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      In the revised manuscript, we have more clearly marked the correlation analyses as exploratory (Introduction, Page 4, Lines 118-128 and Page 5, Lines 132-134; Methods Page 16, Line 415; Discussion Page 22, Line 568, Page 24, Lines 644-645, Page 25, Lines 650-657); note that we do not base most of our discussion on the results of these analyses.

      As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot. Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights. 

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      We have now clarified the motivation for these conditions in the Introduction (Page 4, Lines 122-125) and the Methods (Page 9, Lines 219-224).

      In the revised manuscript, we added the rationale for parametric analyses for our outcomes (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9). Note that in the Supplementary Materials (S12, S14), we have reported the correlations between visual history metrics and MRS/EEG outcomes, thereby investigating whether the variance in visual history might have driven these results. Specifically, we found a (negative) correlation between visual cortex Glx/GABA+ concentration during eye closure and the visual acuity in the CC group (Figure 2c). None of the other exploratory correlations between MRS/EEG outcomes vs time since surgery, duration of blindness or visual acuity were significant in the CC group (Supplementary Material S12, S15).  

      The alpha level used for the ANOVA models specified in the Methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the Methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age, recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition. The ANOVA conducted on the EEG metrics was 2x3 because it had two groups (CC, SC) and three conditions (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the Methods and Figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and have uploaded the consistency report with the revised Supplementary Material (Supplementary Report 1).

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we have cited those studies not already included in the Introduction (Page 3, Lines 92-94).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects.

      This is now mentioned in the Methods, Page 13, Line 344.

      There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. Note that Ossandón et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range.

      In the revised Discussion, we removed this section. We primarily interpret the increased offset and prior findings from fMRI-BOLD data (Raczy et al., 2023) as an increase in broadband neuronal firing.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning (Page 9, Lines 229-237). We now address this explicitly in the Methods in the “MRS Data Quality” section. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey (Oeltzschner et al., 2020), which was released in 2020 and uses linear combination modeling to fit the peak, as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited-spectrum analysis toolbox at the time, and still is widely used.

      In the revised manuscript, we re-analyzed the data using linear combination modeling with Osprey (Oeltzschner et al., 2020), and reported that the main findings remained the same, i.e. the Glx/GABA+ concentration ratio was lower in the visual cortex of congenital cataract reversal individuals compared to normally sighted controls, regardless of whether participants were scanned with eyes open or with eyes closed. Further, NAA concentration did not differ between groups (Supplementary Material S3). Thus, we demonstrate that our findings were robust to analysis pipelines, and state this in the Methods (Page 9, Lines 242-246) and Results (Page 19, Lines 464-467).

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript, we have removed the statement regarding stability and the associated section.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we have rewritten the Discussion and removed this section.   

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and Reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We have indicated clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the Discussion as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revised manuscript, we have checked that speculations are clearly marked, and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023). In the revised manuscript, we have rephrased the statement as “to provide initial evidence” (Page 22, Line 676).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript, we rephrased the sentence and added “might imply” to better indicate the hypothetical character of this idea (Page 22, Lines 586-587).

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we added a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandón et al (Supplementary Material S18). We adapted the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandón et al. (2023) (Page 25, Lines 671-672).

      References (Public Review)

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for The Authors):

      Thank you for the interesting submission. I have inserted my comments to the authors here. Some of them will be more granular comments related to the concerns raised in the public review.

      (1) Introduction:

      Could you please justify the rationale for using eyes open and eyes closed in the MRS condition, and the use of the three different conditions in the EEG experiment? If these resulted in negative findings, then the implications should be discussed.

      Previous work with MRS in sighted individuals has suggested that eye opening in darkness results in a decrease of visual cortex GABA+ concentration, while visual stimulation results in an increase of Glx concentration, compared to a baseline concentration at eye closure (Kurcyus et al., 2018). Moreover visual stimulation/eye opening is known to result in an alpha desynchronization (Adrian & Matthews, 1934).

      While previous work of our group has shown significantly reduced alpha oscillatory activity in congenital cataract reversal individual, desynchronization following eye opening was indistinguishable when compared to normally sighted controls (Ossandón et al., 2023; Pant et al., 2023).

      Thus, we decided to include both conditions to test whether a similar pattern of results would emerge for GABA+/Glx concentration.

      We added our motivation to the Introduction of the revised manuscript (Page 4, Lines 122-125) along with the Methods (Page 9, Lines 219-223).

      It does not become clear from the introduction why a higher intercept is predicted in the EEG measure. The rationale for this hypothesis needs to be explained better.

      Given the prior findings suggesting an increased E/I ratio in CC individuals and the proposed link between neuronal firing (Manning et al., 2009) and the aperiodic intercept, we expected a higher intercept for the CC compared to the SC group.

      We have now added this explanation to the Introduction (Page 4, Lines 126-128).

      (2) Participants

      Were participants screened for common MRS exclusion criteria such as history of psychiatric conditions or antidepressant medication, which could alter neurochemistry? If not, then this needs to be pointed out.

      All participants were clinically screened at the LV Prasad Eye Institute, and additionally self-reported no neurological or psychiatric conditions or medications. Moreover, all subjects were screened based exclusion criteria for being scanned using the standard questionnaire of the radiology center.

      We have now made this clear in the Methods (Page 7, Lines 168-171).

      Table 1 needs to show the age of the participant, which can only be derived by adding the columns 'duration of deprivation' and 'time since surgery'. Table 1 also needs to include the controls.

      We have accordingly modified Table 1 in the revised manuscript and added age for the patients as well as the controls (Table 1, Pages 6-7).

      The control cohort is not specific enough to exclude reduced visual acuity, or co-morbidities, as the primary driver of the differences between groups. Ideally, a cohort with developmental cataracts is recruited. Normally sighted participants as a control cohort cannot distinguish between different types of sight loss, or stages of plasticity.

      The goal of this study was not to distinguish between different types of sight loss or stages of plasticity. We aimed to assess whether the most extreme forms of visual deprivation (i.e. congenital and total patterned vision loss) affected the E/I ratio. Low visual acuity and nystagmus are genuine diagnostic criteria (Methods, Page 5, Lines 142-145). Visual acuity cannot solely explain the current findings, since the MRS data were acquired both with eyes closed or diffuse visual stimulation in a dimly lit room, without any visual task.

      With the awareness of the present results, we consider it worthwhile for the future to investigate additional groups such as developmental cataract-reversal individuals, to narrow down the contribution of the age of onset and degree of visual deprivation to the observed group differences.

      (3) Data collection and analysis

      - More detail is needed: how long were the sessions, how long was each part?

      We have added this information on Page 7, Lines 178-181 of the Methods. MRS scanning took between 45 and 60 minutes, EEG testing took 20 minutes excluding the time for capping, and visual acuity testing took 3-5 minutes.

      - It should be mentioned here that the EEG data is a reanalysis of a subset of legacy data, published previously in Ossandón et al., 2023; Pant et al., 2023.

      In the revised manuscript, we explicitly state at the beginning of the “Electrophysiology recordings” section of the Methods (Page 13, Lines 331-334) that the EEG datasets were a subset of previously published data.

      (4) MRS Spectroscopy

      - Please fill out the minimum reporting standards form (Lin et al., 2021), or report all the requested measures in the main document https://pubmed.ncbi.nlm.nih.gov/33559967/

      We have now filled out this form and added it as Supplementary Material (Supplementary Excel File 1). Additionally, all the requested information has been moved to the Methods section of the main document (MRS Data Quality, Pages 10-12).

      - Information on how the voxels were placed is missing. The visual cortex voxel is not angled parallel to the calcarine, as is a common way to capture processing in the early visual cortex. Describe in the paper what the criteria for successful placement were, and how was it ensured that non-brain tissue was avoided in a voxel of this size.

      Voxel placement was optimized in each subject to avoid the meninges, ventricles, skull and subcortical structures, ensured by examining the voxel region across slices in the acquired T1 volume for each subject. Saturation bands were placed to nullify the skull signal during MRS acquisition, at the anterior (frontal) and posterior (visual) edge of the voxel for every subject. Due to limitations in the clinical scanner rotated/skewed voxels were not possible, and thus voxels were not always located precisely parallel to the calcarine.

      We have added this information to Page 9 (Lines 229-237) of the revised manuscript.

      - Figure 1. shows voxels that are very close to the edge of the brain (frontal cortex) or to the tentorium (visual cortex). Could the authors please calculate the percentage overlap between the visual cortex MRS voxel and the visual cortex, and compare them across groups to ensure that there is no between-group bias from voxel placement?

      We have now added the requested analysis to Supplementary Material S2 and referred to it in the main manuscript on Page 9, Lines 236-237.

      Briefly, the percentage overlap with areas V1-V6 in every individual subject’s visual cortex voxel was 60% or more; the mean overlap in the CC group was 67% and the SC group 70%. The percentage overlap did not differ between groups ( t-test (t(18) = -1.14, p = 0.269)).

      - Figure 1. I would recommend displaying data on a skull-stripped image to avoid identifying information from the participant's T1 profile.

      We have now replaced the images in Figure 1 with skull-stripped images. Note that images from SPM12 were used instead of GannetCoregister, as GannetCoregister only displays images with the skull.

      - Please show more rigor with the MRS quality measures. Several examples of inconsistency and omissions are below.

      • SNR was quantified and shows a difference in SNR between voxel positions, with lower SNR in the frontal cortex. No explanation or discussion of the difference was provided.

      • Looking at S1, the linewidth of NAA seems to be a lot broader in the frontal cortex than in the visual cortex. The figures suggest that acquisition quality was very different between voxel locations, making the comparison difficult.

      • Linewidth of NAA is a generally agreed measure of shim quality in megapress acquisitions (Craven et al., 2022).

      The data quality difference between the frontal and visual cortices has been observed in the literature (Juchem & Graaf, 2017; Rideaux et al., 2022). We nevertheless chose a frontal cortex voxel as control site instead of the often-chosen sensorimotor cortex. The main motivation was to avoid any cortical region linked to sensory processing since crossmodal compensation as a consequence of visual deprivation is a well-documented phenomenon.

      We now make this clearer in the Methods (Page 11, Lines 284 – 299), in the Discussion/Limitations (Page 25, Lines 662 - 665).  

      - To get a handle on the data quality, I would recommend that the authors display their MRS quality measures in a separate section 'MRS quality measure', including NAA linewidth, NAA SNR, GABA+ CRLB, Glx CRLB, and test for the main effects and interaction of voxel location (VC, FC) and group (SC, CC) and discuss any discrepancies.

      We have moved all the quality metric values for GABA+, Glx and NAA from the supplement to the Methods section (see Table 2), and added the requested section titled “MRS Data quality.”

      We have conducted the requested analyses and reported them in Supplementary Material S6: there was a strong effect of region confirming that data quality was better in the visual than frontal region. We have referred to this in the main manuscript on Page 11, Line 299.

      In the revised manuscript, we discuss the data quality in the frontal cortex, and how we ensured it was comparable to prior work. Moreover, there were no significant group effects, or group-by-region interactions, suggesting that group differences observed for the visual cortex voxel cannot be accounted for by differences in data quality. We now included a section on data quality, both in the Methods (Page 11, Lines 284 – 299), and the limitations section of the Discussion (Page 25, Lines 662 - 665).

      Please clarify the MRS acquisition, "Each MEGA- PRESS scan lasted for 8 minutes and was acquired with the following specifications: TR = 2000 ms, TE = 68 ms, Voxel size = 40 mm x 30 mm x 25mm, 192 averages (each consists of two TRs). "192 averages x 2 TRs x 2s TR = 12.8 min, not 8 min, apologies if I have misunderstood these details.

      We have corrected this error in the revised manuscript and stated the parameters more clearly – there were a total of 256 averages, resulting in an (256 repetitions with 1 TR * 2 s/60) 8.5-minute scan (Page 8, Lines 212-213).

      - What was presented to participants in the eyes open MRS? Was it just normal room illumination or was it completely dark? Please add details to your methods.

      The scans were conducted in regular room illumination, with no visual stimulation.

      We have now clarified this on Page 9 (Lines 223-224) of the Methods.

      (5) MRS analysis

      How was the tissue fraction correction performed? Please add or refer to the exact equation from Harris et al., 2015.

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      (6) Statistical approach

      How was the sample size determined? Please add your justification for the sample size

      We collected as many qualifying patients as we were able to recruit for this study within 2.5 years of data collection (commencing August 2019, ending February 2022), given the constraints of the patient population and the pandemic. We have now made this clear in the Discussion (Page 25, Lines 650-652).

      Please report the tests for normality.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the descriptions of the statistical analyses (Methods, Page13, Lines 326-329 and Page 15, Lines 400-402).

      Calculate the Bayes Factor where possible.

      As our analyses are all frequentist, instead of re-analyzing the data within a Bayesian framework, we added partial eta squared values for all the reported ANOVAs (η<sub>p</sub><sup>²</sup>) for readers to get an idea of the effect size (Results).

      I recommend partial correlations to control for the influence of age, duration, and time of surgery, rather than separate correlations.

      Given the combination of small sample size and the expected multicollinearity in our variables (duration of blindness, for example, would be expected to correlate with age, as well as visual acuity post-surgery), partial correlations could not be calculated on this data.

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we had exploratorily planned to relate behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Introduction (Page 5, Lines 133-135), Methods (Page 15, Lines 408-410) and Discussion section (Page 24, Line 634, Page 25, Lines 652-65) to ensure that the results are interpreted with appropriate caution.

      (7) Visual acuity

      Is the VA monocular average, from the dominant eye, or bilateral?

      We have now clarified that the VA reported here is bilateral (Methods, Page 7 Line 165 and Page 15, Line 405). Bilateral visual acuity in congenital cataract-reversal individuals typically corresponds to the visual acuity of the best eye.

      It is mentioned here that correlations with VA are exploratory, please be consistent as the introduction mentions that there was a hypothesis that you sought to test.

      We have now accordingly modified the Introduction (Page 5, Lines 133-135) and added the appropriate caveats in the discussion with regards to interpretations (Page 25, Lines 652-665).

      (8) Correlation analyses between MRS and EEG

      It is mentioned here that correlations between EEG and MRS are exploratory, please consistently point out the exploratory nature, as these results are preliminary and should not be overinterpreted ("We did not have prior hypotheses as to the best of our knowledge no extant literature has tested the correlation between aperiodic EEG activity and MRS measures of GABA+,Glx and Glx/GABA+." ).

      In the revised manuscript, we explicitly state the reported associations between EEG (aperiodic component) and MRS parameters allow for putting forward directed / more specific hypotheses for future studies (Introduction, Page 5, Lines 133-135; Methods, Page 15, Line 415. Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (9) Results

      Figure 2 uses the same y-axis for the visual cortex and frontal cortex to facilitate a comparison between the two locations. Comparing Figure 2 a with b demonstrates poorer spectral peaks and reduced amplitudes. Lower spectral quality in the frontal cortex voxel could contribute to the absence of a group effect in the control voxel location. The major caveat that spectral quality differs between voxels needs to be pointed out and the limitations thereof discussed.

      We have now explicitly pointed out this issue in the Methods (MRS Data Quality, Supplementary Material S6) and Discussion in the Limitations section (Page 25, Lines 662-665). While data quality was lower for the frontal compared to the visual cortex voxels, as has been observed previously (Juchem & Graaf, 2017; Rideaux et al., 2022), this was not an issue for the EEG recordings. Thus, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures. Crucially, data quality did not differ between groups.

      The results in 2c are the result of multiple correlations with metabolite values ("As in previous studies, we ran a number of exploratory correlation analyses between GABA+, Glx, and Glx/GABA+ concentrations, and visual acuity at the date of testing, duration of visual deprivation, and time since surgery respectively in the CC group"), it seems at least six for the visual acuity measure (VA vs Glx, VA vs GABA+, VA vs Glx/GABA+ x 2 conditions). While the trends are interesting, they should be interpreted with caution because of the exploratory nature, small sample size, the lack of multiple comparison correction, and the influence of two extreme data points. The authors should not overinterpret these results and should point out the need for replication.

      See response to (6) last section, which we copy here for convenience:

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we exploratorily related behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Discussion section to ensure that the results are interpreted with appropriate caution (Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (10) Discussion:

      Please explain the decrease in E/I balance from MRS in view of recent findings on an increase in E/I balance in CC using RSN-fMRI (Raczy et al., 2022) and EEG (Ossandon et al. 2023).

      We have edited our Abstract (Page 1-2, Lines 31-35) and Discussion (Page 23, Lines 584-590; Page 24, Lines 613-620). In brief, we think our results reflect a homeostatic regulation of E/I balance, that is, an increase in inhibition due to an increase in stimulus driven excitation following sight restoration.

      Names limitations but does nothing to mitigate concerns about spatial specificity. The limitations need to be rewritten to include differences in SNR between the visual cortex and frontal lobe. Needs to include caveats of small samples, including effect inflation.

      We have now discussed the data quality differences between the visual and frontal cortex voxel in MRS data quality, which we find irrespective of group (MRS Data Quality, Supplementary Material S6). We also reiterate why this might not explain our results; data quality was comparable to prior studies which have found group differences in frontal cortex (Methods Page 11, Lines 284 – 299), and data quality did not differ between groups. Further, EEG data quality did not differ across frontal and occipital regions, but group differences in EEG datasets were localized to the occipital cortex.

      Reviewer #2 (Recommendations for The Authors):

      To address the main weakness, the authors could consider including data from a third group, of congenitally blind individuals. Including this would go a very long way towards making the findings interpretable and relating them to the rest of the literature.

      Unfortunately, recruitment of these groups was not possible due to the pandemic. Indeed, we would consider a pre- vs post- surgery approach the most suitable design in the future, which, however, will require several years to be completed. Such time and resource intensive longitudinal studies are justified by the present cross-sectional results.

      We have explicitly stated our contribution and need for future studies in the Limitations section of the Discussion (Page 25, Lines 650-657).

      Analysing the amplitude of alpha rhythms, as well as the other "aperiodic" components, would be useful to relate the profile of the tested patients with previous studies. Visual inspection of Figure 3 suggests that alpha power with eyes closed is not reduced in the patients' group compared to the controls. This would be inconsistent with previous studies (including research from the same group) and it could suggest that the small selected sample is not really representative of the sight-recovery population - certainly one of the most heterogeneous study populations. This further highlights the difficulty of drawing conclusions on the effects of visual experience merely based on this N=10 set of patients.

      Alpha power was indeed reduced in the present subsample of 10 CC individuals (Supplementary Material S19). A possible source of the confusion (that the graphs of the CC and SC group look so similar for the EC condition in Figure 3) likely is that the spectra are shown with aperiodic components not yet removed, and scales to accommodate very different alpha power values. As documented in Supplementary Material S18 and S19, alpha power and the aperiodic intercept/slope results of the resting state data in the present 10 CC individuals correspond to the results from a larger sample of CC individuals (n = 28) in Ossandón et al., 2023. We explicitly highlight this “replication” in the main manuscript (Page 25 -26, Lines 671-676). Thus, the present sub-sample of CC individuals are representative for their population.

      To further characterise the MRS results, the authors may consider an alternative normalisation scheme. It is not clear whether the lack of significant GABA and GLX differences in the face of a significant group difference in the GLX/GABA ratio is due to the former measures being noisier since taking the ratio between two metabolites often helps reduce inter-individual variability and thereby helps revealing group differences. It remains an open question whether the GABA or GLX concentrations would show significant group differences after appropriate normalisation (e.g. NAA?).

      We repeated the analysis with Creatine-normalized values of GABA+ and Glx, and the main results i.e. reduced Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no such difference in the frontal cortex, remained the same (Supplementary Material S5).

      Further, we re-analyzed the data using Osprey, an open-source toolbox that uses linear combination modeling, and found once more that our results did not change (Supplementary Material S3). We refer to these findings in the Methods (Page 10, Lines 272-275) and Results (Page 10, Lines 467-471) of the main manuscript.

      In fact, the Glx concentration in the visual cortex of CC vs SC individuals was significantly decreased when Cr-normalized values were used (which was not significant in the original analysis). However, we do not interpret this result as it was not replicated with the water-normalized values from Gannet or Osprey.

      I suggest revising the discussion to present a more balanced picture of the existent evidence of the relation between E/I and EEG indices. Although there is evidence that the 1/f slope changes across development, in a way that could be consistent with a higher slope reflecting more immature and excitable tissue, the link with cortical E/I is far from established, especially when referring to specific EEG indices (intercept vs. slope, measured in lower vs. higher frequency ranges).

      We have revised the Introduction (Page 4, Line 91, Lines 101-102) and Discussion (Page 22, Lines 568-569, Page 24, Lines 645-647 and Lines 654-657) in the manuscript accordingly; we allude to the fact that the links between cortical E/I and aperiodic EEG indices have not yet been unequivocally established in the literature.

      Minor:

      - The authors estimated NAA concentration with different software than the one used to estimate GLX and GABA; this examined the OFF spectra only; I suggest that the authors consider running their analysis with LCModel, which would allow a straightforward approach to estimate concentrations of all three metabolites from the same edited spectrum and automatically return normalised concentrations as well as water-related ones.

      We re-analyzed all of the MRS datasets using Osprey, which uses linear combination modelling and has shown quantification results similar to LCModel for NAA (Oeltzschner et al., 2020). The results of a lower Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no difference in NAA concentration, were replicated using this pipeline.

      We have now added these analyses to the Supplementary Material S3 and referred to them in the Methods (Page 9, Lines 242-246) and Results (Page 18, Lines 464-467).

      - Of course the normalisation used to estimate GABA and GLX values is completely irrelevant when the two values are expressed as ratio GLX/GABA - this may be reflected in the text ("water normalised GLX/GABA concentration" should read "GLX/GABA concentration" instead).

      We have adapted the text on Page 16 (Line 431) and have ensured that throughout the manuscript the use of “water-normalized” is in reference to Glx or GABA+ concentration, and not the ratio.

      - Please specify which equation was used for tissue correction - is it alpha-correction?

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      - Since ANOVA was used, the assumption is that values are normally distributed. Please report evidence supporting this assumption.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the Methods (Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Reviewer #3 (Recommendations for The Authors):

      In addition to addressing major comments listed in my Public Review, I have the following, more granular comments, which should also be addressed:

      (1) The paper's structure could be improved by presenting visual acuity data before diving into MRS and EEG results to better contextualize the findings.

      We now explicitly state in the Methods (Page 5, Line 155) that lower visual acuity is expected in a cohort of CC individuals with long lasting congenital visual deprivation.

      We have additionally included a plot of visual acuities of the two groups (Supplementary Material S1).

      (2) The paper should better explain the differences between CC for which sight is restored and congenitally blind patients. The authors write in the introduction that there are sensitive periods/epochs during the lifespan for the development of local inhibitory neural circuits. and "Human neuroimaging studies have similarly demonstrated that visual experience during the first weeks and months of life is crucial for the development of visual circuits. If human infants born with dense bilateral cataracts are treated later than a few weeks from birth, they suffer from a permanent reduction of not only visual acuity (Birch et al., 1998; Khanna et al., 2013) and stereovision (Birch et al., 1993; Tytla et al., 1993) but additionally from impairments in higher-level visual functions, such as face perception (Le Grand et al., 2001; Putzar et al., 2010; Röder et al., 2013)...".

      Thus it seems that the current participants (sight restored after a sensitive period) seem to be similarly affected by the development of the local inhibitory circuits as congenitally blind. To assess the effect of plasticity and sight restoration longitudinal data would be necessary.

      In the Introduction (Page 2, Lines 59-64; Page 3, Lines 111-114) we added that in order to identify sensitive periods e.g. for the elaboration of visual neural circuits, sight recovery individuals need to be investigated. The study of permanently blind individuals allows for investigating the role of experience (whether sight is necessary to introduce the maturation of visual neural circuits), but not whether visual input needs to be available at early epochs in life (i.e. whether sight restoration following congenital blindness could nevertheless lead to the development of visual circuits).

      This is indeed the conclusion we make in the Discussion section. We have now highlighted the need for longitudinal assessments in the Discussion (Page 25, Lines 654-656).

      (3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).

      (4) "For this scan, participants were instructed to keep their eyes closed and stay as still as possible." Why should it be important to have the eyes closed during a T1w data acquisition? This statement at this location does not make sense.

      To avoid misunderstandings, we removed this statement in this context.

      (5) "Two SC subjects did not complete the frontal cortex scan for the EO condition and were excluded from the statistical comparisons of frontal cortex neurotransmitter concentrations."<br /> Why did the authors not conduct whole-brain MRS, which seems to be on the market for quite some time (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590062/) ?

      Similar to previous work (Coullon et al., 2015; Weaver et al., 2013) our hypothesis was related to the visual cortex, and we chose the frontal cortex voxel as a control. This has now been clarified in the Introduction (Page 4, Lines 103-114), Methods (Page 9, Lines 225-227) and Discussion (Page 25, Lines 662-665).

      (6) In "....during visual stimulation with stimuli that changed in luminance (LU) (Pant et al., 2023)." the authors should provide a link on the visual stimulation, which is provided further below

      In the revised manuscript, we have moved up the description of the visual stimulation (Page 13, Line 336).

      (7) "During the EO condition, participants were asked to fixate on a blank screen." This is not really possible. Typically, resting state EO conditions include a fixation cross, as the participants would not be able to fixate on a blank screen and move their eyes, which would impact the recordings.

      We have now rephrased this as “look towards” with the goal of avoiding eye movements (Page 14, Line 347).

      (8) "Components corresponding to horizontal or vertical eye movements were identified via visual inspection and removed (Plöchl et al., 2012)." It is unclear what the Plöchl reference should serve for. Is the intention of the authors to state that manual (and subjective) visual inspection of the ICA components is adequate? I would recommend removing this reference.

      The intention was to provide the basis for classification during the visual inspection, as opposed to an automated method such as ICLabel.

      We stated this clearly in the revised manuscript (Page 14 Lines 368-370).

      (9) "The datasets were divided into 6.25 s long epochs corresponding to each trial." This is a bit inaccurate, as the trial also included some motor response task. Thus, I assume the 6.25 s are related to the visual stimulation.

      We have modified the sentence accordingly (Page 15, Line 378).

      (10) Figure 2. a & b. Just an esthetic suggestion: I would recommend removing the lines between the EC and EO conditions, as they suggest some longitudinal changes. Unless it is important to highlight the changes between EC and EO within each subject.

      In fact, EC vs. EO was a within-subject factor with expected changes for the EEG and possible changes in the MRS parameters. To allow the reader to track changes due to EC vs. EO for individual subjects (rather than just comparing the change in the mean scores), we use lines.  

      (11) Figure 3A: I would plot the same y-axis range for both groups to make it more comparable.

      We have changed Figure 3A accordingly.

      (12) " flattening of the intercept" replaces flattening, as it is too related to slope.

      We have replaced “flattening” with “reduction” (Page 20, Line 517).

      (13) The plotting of only the significant correlation between MRS measures and EEG measures seems to be rather selective reporting. For this type of exploratory analysis, I would recommend plotting all of the scatter plots and moving the entire exploratory analysis to the supplementary (as this provides the smallest evidence of the results).

      We have made clear in the Methods (Page 16, Lines 415-426), Results and Discussion (page 24, Lines 644-645), as well as in the Supplementary material, that the reason for only reporting the significant correlation was that this correlation survived correction for multiple comparisons, while all other correlations did not. We additionally explicitly allude to the Supplementary Material where the plots for all correlations are shown (Results, Page 21, Lines 546-552).

      (14) "Here, we speculate that due to limited structural plasticity after a phase of congenital blindness, the neural circuits of CC individuals, which had adapted to blindness after birth, employ available, likely predominantly physiological plasticity mechanisms (Knudsen, 1998; Mower et al., 1985; Röder et al., 2021), in order to re-adapt to the newly available visual excitation following sight restoration."

      I don't understand the logic here. The CC individuals are congenitally blind, thus why should there be any physiological plasticity mechanism to adapt to blindness, if they were blind at birth?

      With “adapt to blindness” we mean adaptation of a brain to an atypical or unexpected condition when taking an evolutionary perspective (i.e. the lack of vision). We have made this clear in the revised manuscript (Introduction, Page 4, Lines 111-114; Discussion, Page 23, Lines 584-591).

      (15) "An overall reduction in Glx/GABA ratio would counteract the aforementioned adaptations to congenital blindness, e.g. a lower threshold for excitation, which might come with the risk of runaway excitation in the presence of restored visually-elicited excitation."

      This could be tested by actually investigating the visual excitation by visual stimulation studies.

      The visual stimulation condition in the EEG experiment of the present study found a higher aperiodic intercept in CC compared to SC individuals. Given the proposed link between the intercept and spontaneous neural firing (Manning et al., 2009), we interpreted the higher intercept in CC individuals as increased broadband neural firing during visual stimulation (Results Figure 3; Discussion Page 24, Lines 635-640). This idea is compatible with enhanced BOLD responses during an EO condition in CC individuals (Raczy et al., 2022). Future work should systematically manipulate visual stimulation to test this idea.

      (16) As the authors also collected T1w images, the hypothesis of increased visual cortex thickness in CC. Was this investigated?

      This hypothesis was investigated in a separate publication which included this subset of participants (Hölig et al., 2023), and found increased visual cortical thickness in the CC group. We refer to this publication, and related work (Feng et al., 2021) in the present manuscript.

      (17) The entire discussion of age should be omitted, as the current data set is too small to assess age effects.

      We have removed this section and just allude to the fact that we replicated typical age trends to underline the validity of the present data (Page 26, Lines 675-676).

      (18) Table1: should include the age and the age at the time point of surgery.

      We added age to the revised Table 1. We clarified that in CC individuals, duration of blindness is the same as age at the time point of surgery (Page 6, Line 163).

      (19) Why no group comparisons of visual acuity are reported?

      Lower visual acuity in CC than SC individuals is a well-documented fact.

      We have now added the visual acuity plots for readers (Supplementary Material S1, referred to in the Methods, Page 5, Line 155) which highlight this common finding.

      References (Recommendations to the Authors)

      Adrian, E. D., & Matthews, B. H. C. (1934). The berger rhythm: Potential changes from the occipital lobes in man. Brain. https://doi.org/10.1093/brain/57.4.355

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Feng, Y., Collignon, O., Maurer, D., Yao, K., & Gao, X. (2021). Brief postnatal visual deprivation triggers long-lasting interactive structural and functional reorganization of the human cortex. Frontiers in Medicine, 8, 752021. https://doi.org/10.3389/FMED.2021.752021/BIBTEX

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Hölig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Röder, B. (2023). Sight restoration in congenitally blind humans does not restore visual brain structure. Cerebral Cortex, 33(5), 2152–2161. https://doi.org/10.1093/CERCOR/BHAC197

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Raczy, K., Holig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Roder, B. (2022). Typical resting-state activity of the brain requires visual input during an early sensitive period. Brain Communications, 4(4). https://doi.org/10.1093/BRAINCOMMS/FCAC146

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

    2. eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

    3. Reviewer #1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

    4. Reviewer #2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

    5. Reviewer #3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1) Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      (2 3.2) Small Sample Size<br /> The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      (3 3.3) Statistical Concerns<br /> While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      Several points require clarification or improvement:<br /> (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.<br /> (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.<br /> (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.<br /> (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.<br /> (8) Figure 2C<br /> Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.<br /> (9 3.4) Interpretation of Aperiodic Signal<br /> Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.<br /> (10) Additionally, the authors state:<br /> "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."<br /> (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.<br /> (12 3.5) Problems with EEG Preprocessing and Analysis<br /> Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).<br /> (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.<br /> (14) The authors mention:<br /> "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."<br /> The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

    1. eLife Assessment

      This valuable study provides insights into the structure and function of bacterial contractile injection systems that are present in the cytoplasm of many Streptomyces strains. A convincing high-resolution model of the structure of extended forms of the cytoplasmic contractile injection system assembly from Streptomyces coelicolor is presented, with some investigation of the membrane protein CisA in attachment of the extended assembly to the inner face of the cytoplasmic membrane and the firing of the system. The work expands the current understanding of these diverse bacterial nanomachines.

    2. Reviewer #1 (Public review):

      Contractile Injection Systems (CIS) are versatile machines that can form pores in membranes or deliver effectors. They can act extra or intracellularly. When intracellular they are positioned to face the exterior of the cell and hence should be anchored to the cell envelope. The authors previously reported the characterization of a CIS in Streptomyces coelicolor, including significant information on the architecture of the apparatus. However, how the tubular structure is attached to the envelope was not investigated. Here they provide a wealth of evidence to demonstrate that a specific gene within the CIS gene cluster, cisA, encodes a membrane protein that anchors the CIS to the envelope. More specifically, they show that:

      - CisA is not required for assembly of the structure but is important for proper contraction and CIS-mediated cell death<br /> - CisA is associated to the membrane (fluorescence microscopy, cell fractionation) through a transmembrane segment (lacZ-phoA topology fusions in E. coli)<br /> - Structural prediction of interaction between CisA and a CIS baseplate component<br /> - In addition they provide a high-resolution model structure of the >750-polypeptide Streptomyces CIS in its extended conformation, revealing new details of this fascinating machine, notably in the baseplate and cap complexes.

      All the experiments are well controlled including trans-complemented of all tested phenotypes.

      One important information we miss is the oligomeric state of CisA.

      While it would have been great to test the interaction between CisA and Cis11, to perform cryo-electron microscopy assays of detergent-extracted CIS structures to maintain the interaction with CisA, I believe that the toxicity of CisA upon overexpression or upon expression in E. coli render these studies difficult and will require a significant amount of time and optimization to be performed. It is worth mentioning that this study is of significant novelty in the CIS field because, except for Type VI secretion systems, very few membrane proteins or complexes responsible for CIS attachment have been identified and studied.

    3. Reviewer #2 (Public review):

      Summary:

      The overall question that is addressed in this study is how the S. coelicolor contractile injection system (CISSc) works and affects both cell viability and differentiation, which it has been implicated to do in previous work from this group and others. The CISSc system has been enigmatic in the sense that it is free-floating in the cytoplasm in an extended form and is seen in contracted conformation (i.e. after having been triggered) mainly in dead and partially lysed cells, suggesting involvement in some kind of regulated cell death. So, how do the structure and function of the CISSc system compare to those of related CIS from other bacteria, does it interact with the cytoplasmic membrane, how does it do that, and is the membrane interaction involved in the suggested role in stress-induced, regulated cell death? The authors address these questions by investigating the role of a membrane protein, CisA, that is encoded by a gene in the CIS gene cluster in S. coelicolor. Further, they analyse the structure of the assembled CISSc, purified from the cytoplasm of S. coelicolor, using single-particle cryo-electron microscopy.

      Strengths:

      The beautiful visualisation of the CIS system both by cryo-electron tomography of intact bacterial cells and by single-particle electron microscopy of purified CIS assemblies are clearly the strengths of the paper, both in terms of methods and results. Further, the paper provides genetic evidence that the membrane protein CisA is required for the contraction of the CISSc assemblies that are seen in partially lysed or ghost cells of the wild type. The conclusion that CisA is a transmembrane protein and the inferred membrane topology are well supported by experimental data. The cryo-EM data suggest that CisA is not a stable part of the extended form of the CISSc assemblies. These findings raise the question of what CisA does.

      Weaknesses:

      The investigations of the role of CisA in function, membrane interaction, and triggering of contraction of CIS assemblies, are important parts of the paper and are highlighted in the title. However, the experimental data provided to answer these questions appear partially incomplete and not as conclusive as one would expect.

      The stress-induced loss of viability is only monitored with one method: an in vivo assay where cytoplasmic sfGFP signal is compared to FM5-95 membrane stain. Addition of a sublethal level of nisin lead to loss of sfGFP signal in individual hyphae in the WT, but not in the cisA mutant (similarly to what was previously reported for a CIS-negative mutant). Technically, this experiment and the example images that are shown give rise to some concern. Only individual hyphal fragments are shown that do not look like healthy and growing S. coelicolor hyphae. Under the stated growth conditions, S. coelicolor strains would normally have grown as dense hyphal pellets. It is therefore surprising that only these unbranched hyphal fragments are shown in Fig. 4ab. Further, S. coelicolor would likely be in a stationary phase when grown 48 h in the rich medium that is stated, giving rise to concern about the physiological state of the hyphae that were used for the viability assay. It would be valuable to know whether actively growing mycelium is affected in the same way by the nisin treatment, and also whether the cell death effect could be detected by other methods.

      The model presented in Fig. 5 suggests that stress leads to a CisA-dependent attachment of CIS assemblies to the cytoplasmic membrane, and then triggering of contraction, leading to cell death. This model makes testable predictions that have not been challenged experimentally. Given that sublethal doses of nisin seem to trigger cell death, there appear to be possibilities to monitor whether activation of the system (via CisA?) indeed leads to at least temporally increased interaction of CIS with the membrane. Further, would not the model predict that stress leads to an increased number of contracted CIS assemblies in the cytoplasm? No clear difference in length of the isolated assemblies if Fig. S7 is seen between untreated and nisin-exposed cells, and also no difference between assemblies from WT and cisA mutant hyphae.

      The interaction of CisA with the CIS assembly is critical for the model but is only supported by Alphafold modelling, predicting interaction between cytoplasmic parts of CisA and Cis11 protein in the baseplate wedge. An experimental demonstration of this interaction would have strengthened the conclusions.

      The cisA mutant showed a similarly accelerated sporulation as was previously reported for CIS-negative strains, which supports the conclusion that CisA is required for function of CISSc. But the results do not add any new insights into how CIS/CisA affects the progression of the developmental life cycle and whether this effect has anything to do with the regulated cell death that is caused by CIS. The same applies to the effect on secondary metabolite production, with no further mechanistic insights added, except reporting similar effects of CIS and CisA inactivations.

      Concluding remarks:<br /> The work will be of interest to anyone interested in contractile injection systems, T6SS, or similar machineries, as well for people working on the biology of streptomycetes. There is also a potential impact of the work in the understanding of how such molecular machineries could have been co-opted during evolution to become a mechanism for regulated cell death. However, this latter aspect remains still poorly understood. Even though this paper adds excellent new structural insights and identifies a putative membrane anchor, it remains elusive how the Streptomyces CIS may lead to cell death. It is also unclear what the advantage would be to trigger death of hyphal compartments in response to stress, as well as how such cell death may impact (or accelerate) the developmental progression. Finally, it is inescapable to wonder whether the Streptomyces CIS could have any role in protection against phage infection.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Casu et al. have reported the characterization of a previously uncharacterized membrane protein CisA encoded in a non-canonical contractile injection system of Streptomyces coelicolor, CISSc, which is a cytosolic CISs significantly distinct from both intracellular membrane-anchored T6SSs and extracellular CISs. The authors have presented the first high-resolution structure of extended CISSc structure. It revealed important structural insights in this conformational state. To further explore how CISSc interacted with cytoplasmic membrane, they further set out to investigate CisA that was previously hypothesized to be the membrane adaptor. However, the structure revealed that it was not associated with CISSc. Using fluorescence microscope and cell fractionation assay, the authors verified that CisA is indeed a membrane-associated protein. They further determined experimentally that CisA had a cytosolic N-terminal domain and a periplasmic C-terminus. The functional analysis of cisA mutant revealed that it is not required for CISSc assembly but is essential for the contraction, as a result, the deletion significantly affects CISSc-mediated cell death upon stress, timely differentiation, as well as secondary metabolite production. Although the work did not resolve the mechanistic detail how CisA interacts with CISSc structure, it provides solid data and a strong foundation for future investigation toward understanding the mechanism of CISSc contraction, and potentially, the relation between the membrane association of CISSc, the sheath contraction and the cell death.

      Strengths:

      The paper is well-structured, and the conclusion of the study is supported by solid data and careful data interpretation was presented. The authors provided strong evidence on (1) the high-resolution structure of extended CISSc determined by cryo-EM, and the subsequent comparison with known eCIS structures, which sheds light on both its similarity and different features from other subtypes of eCISs in detail; (2) the topological features of CisA using fluorescence microscopic analysis, cell fractionation and PhoA-LacZα reporter assays, (3) functions of CisA in CISSc-mediated cell death and secondary metabolite production, likely via the regulation of sheath contraction.

      Weaknesses:

      The data presented are not sufficient to provide mechanistic details of CisA-mediated CISSc contraction, as authors are not able to experimentally demonstrate the direct interaction between CisA with baseplate complex of CISSc (hypothesized to be via Cis11 by structural modeling), since they could not express cisA in E. coli due to its potential toxicity. Therefore, there is a lack of biochemical analysis of direct interaction between CisA and baseplate wedge. In addition, there is no direct evidence showing that CisA is responsible for tethering CISSc to the membrane upon stress, and the spatial and temporal relation between membrane association and contraction remains unclear. Further investigation will be needed to address these questions in future.

      Discussion:

      Overall, the work provides a valuable contribution to our understanding on the structure of a much less understood subtype of CISs, which is unique compared to both membrane-anchored T6SSs and host-membrane targeting eCISs. Importantly, the work serves as a good foundation to further investigate how the sheath contraction works here. The work contributes to expanding our understanding of the diverse CIS superfamilies.

    5. Author response:

      We thank the editor and the three reviewers for the positive assessment and constructive feedback on how to improve our manuscript. We greatly appreciate that our work is considered valuable to the field, the recognition of the high-resolution model we presented, and the comments on our investigation of CisA’s role in the attachment and firing mechanism of the extended assembly. It is truly gratifying to know that our study contributes to expanding the current understanding of the biology of Streptomyces and the role of these functionally diverse and fascinating bacterial nanomachines.

      We have provided specific responses to each reviewer's comments below. In summary, we intend to address the following requested revisions:

      We will expand our bioinformatic analysis of CisA and provide additional information on the oligomeric state of CisA. We will also modify the text, figures, and figure legends to improve the clarity of our work and experimental procedures.

      Some reviewer comments would require additional experimental work, some of which would involve extensive optimization of experimental conditions. Because both lead postdoctoral researchers involved in this work have now left the lab, we currently do not have the capability to perform additional experimental work.

      Reviewer #1 (Public review):

      Contractile Injection Systems (CIS) are versatile machines that can form pores in membranes or deliver effectors. They can act extra or intracellularly. When intracellular they are positioned to face the exterior of the cell and hence should be anchored to the cell envelope. The authors previously reported the characterization of a CIS in Streptomyces coelicolor, including significant information on the architecture of the apparatus. However, how the tubular structure is attached to the envelope was not investigated. Here they provide a wealth of evidence to demonstrate that a specific gene within the CIS gene cluster, cisA, encodes a membrane protein that anchors the CIS to the envelope. More specifically, they show that:

      - CisA is not required for assembly of the structure but is important for proper contraction and CIS-mediated cell death

      - CisA is associated to the membrane (fluorescence microscopy, cell fractionation) through a transmembrane segment (lacZ-phoA topology fusions in E. coli)

      - Structural prediction of interaction between CisA and a CIS baseplate component<br /> - In addition they provide a high-resolution model structure of the >750-polypeptide Streptomyces CIS in its extended conformation, revealing new details of this fascinating machine, notably in the baseplate and cap complexes.

      All the experiments are well controlled including trans-complemented of all tested phenotypes.

      One important information we miss is the oligomeric state of CisA.

      While it would have been great to test the interaction between CisA and Cis11, to perform cryo-electron microscopy assays of detergent-extracted CIS structures to maintain the interaction with CisA, I believe that the toxicity of CisA upon overexpression or upon expression in E. coli render these studies difficult and will require a significant amount of time and optimization to be performed. It is worth mentioning that this study is of significant novelty in the CIS field because, except for Type VI secretion systems, very few membrane proteins or complexes responsible for CIS attachment have been identified and studied.

      We thank this reviewer for their highly supportive and positive comments on our manuscript. We are grateful for this reviewer’s recognition of the novelty of our study, particularly in the context of membrane proteins and complexes involved in CIS attachment.

      We agree that further experimental evidence on the direct interaction between CisA and Cis11 would have strengthened our model of CisA function. However, as noted by this reviewer, this additional work is technically challenging and currently beyond the scope of this study.

      We thank Reviewer #1 for suggesting discussing the potential oligomeric state of CisA. We will perform additional AlphaFold modelling of CisA and discuss the result of this analysis in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The overall question that is addressed in this study is how the S. coelicolor contractile injection system (CISSc) works and affects both cell viability and differentiation, which it has been implicated to do in previous work from this group and others. The CISSc system has been enigmatic in the sense that it is free-floating in the cytoplasm in an extended form and is seen in contracted conformation (i.e. after having been triggered) mainly in dead and partially lysed cells, suggesting involvement in some kind of regulated cell death. So, how do the structure and function of the CISSc system compare to those of related CIS from other bacteria, does it interact with the cytoplasmic membrane, how does it do that, and is the membrane interaction involved in the suggested role in stress-induced, regulated cell death? The authors address these questions by investigating the role of a membrane protein, CisA, that is encoded by a gene in the CIS gene cluster in S. coelicolor. Further, they analyse the structure of the assembled CISSc, purified from the cytoplasm of S. coelicolor, using single-particle cryo-electron microscopy.

      Strengths:

      The beautiful visualisation of the CIS system both by cryo-electron tomography of intact bacterial cells and by single-particle electron microscopy of purified CIS assemblies are clearly the strengths of the paper, both in terms of methods and results. Further, the paper provides genetic evidence that the membrane protein CisA is required for the contraction of the CISSc assemblies that are seen in partially lysed or ghost cells of the wild type. The conclusion that CisA is a transmembrane protein and the inferred membrane topology are well supported by experimental data. The cryo-EM data suggest that CisA is not a stable part of the extended form of the CISSc assemblies. These findings raise the question of what CisA does.

      We thank Reviewer #2 for the overall positive evaluation of our manuscript and the constructive criticism. 

      Weaknesses:

      The investigations of the role of CisA in function, membrane interaction, and triggering of contraction of CIS assemblies, are important parts of the paper and are highlighted in the title. However, the experimental data provided to answer these questions appear partially incomplete and not as conclusive as one would expect.

      We acknowledge that some aspects of our work have not been fully answered. We believe that providing additional experimental data is currently beyond the scope of this study. To improve this study, we will modify the text and clarify experimental procedures and figures where possible in the revised version of our manuscript.

      The stress-induced loss of viability is only monitored with one method: an in vivo assay where cytoplasmic sfGFP signal is compared to FM5-95 membrane stain. Addition of a sublethal level of nisin lead to loss of sfGFP signal in individual hyphae in the WT, but not in the cisA mutant (similarly to what was previously reported for a CIS-negative mutant). Technically, this experiment and the example images that are shown give rise to some concern. Only individual hyphal fragments are shown that do not look like healthy and growing S. coelicolor hyphae. Under the stated growth conditions, S. coelicolor strains would normally have grown as dense hyphal pellets. It is therefore surprising that only these unbranched hyphal fragments are shown in Fig. 4ab.

      We thank Reviewer #2 for their thoughtful criticism regarding our stress-induced viability assay and the data presented in Figure 4. We acknowledge the importance of ensuring that the presented images should reflect the physiological state of S. coelicolor under the stated growth conditions and recognize that hyphal fragments shown in Figure 4 do not fully capture the typical morphology of S. coelicolor. As pointed out by this reviewer, S. coelicolor grows in large hyphal clumps when cultured in liquid media, making the quantification of fluorescence intensities in hyphae expressing cytoplasmic GFP and stained with the membrane dye FM5-95 particularly challenging. To improve the image analysis and quantification of GFP and FM5-95-fluorescent intensities across the three S. coelicolor strains (wildtype, cisA deletion mutant and the complemented cisA mutant), we vortexed the cell samples briefly before imaging to break up hyphal clumps, increasing hyphal fragments. The hyphae shown in our images were selected as representative examples across three biological replicates. 

      Further, S. coelicolor would likely be in a stationary phase when grown 48 h in the rich medium that is stated, giving rise to concern about the physiological state of the hyphae that were used for the viability assay. It would be valuable to know whether actively growing mycelium is affected in the same way by the nisin treatment, and also whether the cell death effect could be detected by other methods.

      The reasoning behind growing S. coelicolor for 48 h before performing the fluorescence-based viability assay was that we (DOI: 10.1038/s41564-023-01341-x ) and others (e.g.: DOI: 10.1038/s41467-023-37087-7 ) previously showed that the levels of CIS particles peak at the transition from vegetative to reproductive/stationary growth, thus indicating that CIS activity is highest during this growth stage. The obtained results in this manuscript are in agreement with our previous study, in which we showed a similar effect on the viability of wildtype versus cis-deficient S. coelicolor strains (DOI: 10.1038/s41564-023-01341-x ) using nisin, the protonophore CCCP and UV light, and supported by biological replicate experiments and appropriate controls. Furthermore, our results are in agreement with the findings reported in a complementary study by Vladimirov et al. (DOI: 10.1038/s41467-023-37087-7 ) that used a different approach (SYTO9/PI staining of hyphal pellets) to demonstrate that CIS-deficient mutants exhibit decreased hyphal death. We agree that it would be interesting to test if actively growing hyphae respond differently to nisin treatment, and such experiments will be considered in future work. 

      Taken together, we believe that the results obtained from our fluorescence-based viability assay are consistent with data reported by others and provide strong experimental evidence that functional CIS mediate hyphal cell death. 

      The model presented in Fig. 5 suggests that stress leads to a CisA-dependent attachment of CIS assemblies to the cytoplasmic membrane, and then triggering of contraction, leading to cell death. This model makes testable predictions that have not been challenged experimentally. Given that sublethal doses of nisin seem to trigger cell death, there appear to be possibilities to monitor whether activation of the system (via CisA?) indeed leads to at least temporally increased interaction of CIS with the membrane.

      We thank this reviewer for their suggestions on how to test our model further. In the meantime, we have performed co-immunoprecipitation experiments using S. coelicolor cells that produced CisA-FLAG as bait and were treated with a sub-lethal nisin concentration for 0/15/45 min.  Mass spectrometry analysis of co-eluted peptides did not show the presence of CIS-associated peptides. While we cannot exclude the possibility that our experimental assay requires further optimization to successfully demonstrate a CisA-CIS interaction (e.g. optimization of the use of detergents to improve the solubilization of CisA from Streptomyces membrane, which is currently not an established method), an alternative and equally valid hypothesis is that the interaction between CIS particles and CisA is transient and therefore difficult to capture. We would like to mention that we did detect CisA peptides in crude purifications of CIS particles from nisin-stressed cells (Supplementary Table 2, manuscript: line 265/266), supporting our model that CisA associates with CIS particles in vivo.

      Further, would not the model predict that stress leads to an increased number of contracted CIS assemblies in the cytoplasm? No clear difference in length of the isolated assemblies if Fig. S7 is seen between untreated and nisin-exposed cells, and also no difference between assemblies from WT and cisA mutant hyphae.

      The reviewer is correct that there is no clear difference in length in the isolated CIS particles shown in Figure S7. This is in line with our results, which show that CisA is not required for the correct assembly of CIS particles and their ability to contract in the presence and absence of nisin treatment. The purpose of Figure S7 was to support this statement. We would like to note that the particles shown in Figure S7 were purified from cell lysates using a crude sheath preparation protocol, during which CIS particles generally contract irrespective of the presence or absence of CisA. Thus, we cannot comment on whether there is an increased number of contracted CIS assemblies in the cytoplasm of nisin-exposed cells. To answer this point, we would need to acquire additional cryo-electron tomograms (cyroET) of the different strains treated with nisin. We appreciate this reviewer's suggestions. However, cryoET is an extremely time and labour-intensive task, and given that we currently don’t know the exact dynamics of the CIS-CisA interaction following exogenous stress, we believe this experiment is beyond the scope of this work.

      The interaction of CisA with the CIS assembly is critical for the model but is only supported by Alphafold modelling, predicting interaction between cytoplasmic parts of CisA and Cis11 protein in the baseplate wedge. An experimental demonstration of this interaction would have strengthened the conclusions.

      We agree that direct experimental evidence of this interaction would have further strengthened the conclusions of our study, and we have extensively tried to provide additional experimental evidence. Unfortunately, due to the toxicity of CisA expression in E. coli and the transient nature of the interaction under our experimental conditions, we were unable to pursue direct biochemical or biophysical validation methods, such as co-purification or bacterial two-hybrid assays. While these challenges limited our ability to experimentally confirm the interaction, the AlphaFold predictions provided a valuable hypothesis and mechanistic insight into the role of CisA.

      The cisA mutant showed a similarly accelerated sporulation as was previously reported for CIS-negative strains, which supports the conclusion that CisA is required for function of CISSc. But the results do not add any new insights into how CIS/CisA affects the progression of the developmental life cycle and whether this effect has anything to do with the regulated cell death that is caused by CIS. The same applies to the effect on secondary metabolite production, with no further mechanistic insights added, except reporting similar effects of CIS and CisA inactivations.

      We thank this reviewer for their thoughtful feedback and for highlighting the connections between CisA, CIS function, and their effects on the developmental life cycle and secondary metabolite production in S. coelicolor. The main focus of this study was to provide further insight into how CIS contraction and firing are mediated in Streptomyces, and we used the analysis of accelerated sporulation and secondary metabolite production to assess the functionality of CIS in the presence or absence of CisA.

      We agree that we still don’t fully understand the nature of the signals that trigger CIS contraction, but we do know that the production of CIS assemblies seems to be an integral part of the Streptomyces multicellular life cycle as demonstrated in two independent previous studies (DOI: 10.1038/s41564-023-01341-x and DOI: 10.1038/s41467-023-37087-7 ). We propose that the assembly and firing of Streptomyces CIS particles could present a molecular mechanism to sacrifice only a part of the mycelium to either prevent the spread of local cellular damage or to provide additional nutrients for the rest of the mycelium and delay the terminal differentiation into spores and affect the production of secondary metabolites.

      We recognize the importance of understanding the regulation and mechanistic details underpinning the proposed CIS-mediated regulated cell death model. This will be further explored in future studies.

      Concluding remarks:

      The work will be of interest to anyone interested in contractile injection systems, T6SS, or similar machineries, as well for people working on the biology of streptomycetes. There is also a potential impact of the work in the understanding of how such molecular machineries could have been co-opted during evolution to become a mechanism for regulated cell death. However, this latter aspect remains still poorly understood. Even though this paper adds excellent new structural insights and identifies a putative membrane anchor, it remains elusive how the Streptomyces CIS may lead to cell death. It is also unclear what the advantage would be to trigger death of hyphal compartments in response to stress, as well as how such cell death may impact (or accelerate) the developmental progression. Finally, it is inescapable to wonder whether the Streptomyces CIS could have any role in protection against phage infection.

      We thank Reviewer #2 for their supportive assessment of our work. In the revised manuscript, we will briefly discuss the impact of functional CIS assemblies on Streptomyces development. We previously tested if Streptomyces could defend against phages but have not found any experimental evidence to support this idea. The analysis of phage defense mechanisms is an underdeveloped area in Streptomyces research, partly due to the currently limited availability of a diverse phage panel.

      Reviewer #3 (Public review):

      Summary:

      In this work, Casu et al. have reported the characterization of a previously uncharacterized membrane protein CisA encoded in a non-canonical contractile injection system of Streptomyces coelicolor, CISSc, which is a cytosolic CISs significantly distinct from both intracellular membrane-anchored T6SSs and extracellular CISs. The authors have presented the first high-resolution structure of extended CISSc structure. It revealed important structural insights in this conformational state. To further explore how CISSc interacted with cytoplasmic membrane, they further set out to investigate CisA that was previously hypothesized to be the membrane adaptor. However, the structure revealed that it was not associated with CISSc. Using fluorescence microscope and cell fractionation assay, the authors verified that CisA is indeed a membrane-associated protein. They further determined experimentally that CisA had a cytosolic N-terminal domain and a periplasmic C-terminus. The functional analysis of cisA mutant revealed that it is not required for CISSc assembly but is essential for the contraction, as a result, the deletion significantly affects CISSc-mediated cell death upon stress, timely differentiation, as well as secondary metabolite production. Although the work did not resolve the mechanistic detail how CisA interacts with CISSc structure, it provides solid data and a strong foundation for future investigation toward understanding the mechanism of CISSc contraction, and potentially, the relation between the membrane association of CISSc, the sheath contraction and the cell death.

      Strengths:

      The paper is well-structured, and the conclusion of the study is supported by solid data and careful data interpretation was presented. The authors provided strong evidence on (1) the high-resolution structure of extended CISSc determined by cryo-EM, and the subsequent comparison with known eCIS structures, which sheds light on both its similarity and different features from other subtypes of eCISs in detail; (2) the topological features of CisA using fluorescence microscopic analysis, cell fractionation and PhoA-LacZα reporter assays, (3) functions of CisA in CISSc-mediated cell death and secondary metabolite production, likely via the regulation of sheath contraction.

      Weaknesses:

      The data presented are not sufficient to provide mechanistic details of CisA-mediated CISSc contraction, as authors are not able to experimentally demonstrate the direct interaction between CisA with baseplate complex of CISSc (hypothesized to be via Cis11 by structural modeling), since they could not express cisA in E. coli due to its potential toxicity. Therefore, there is a lack of biochemical analysis of direct interaction between CisA and baseplate wedge. In addition, there is no direct evidence showing that CisA is responsible for tethering CISSc to the membrane upon stress, and the spatial and temporal relation between membrane association and contraction remains unclear. Further investigation will be needed to address these questions in future.

      We thank Reviewer #3 for the supportive evaluation and constructive criticism of our study in the public and non-public review. We appreciate your recognition of the technical limitations of experimentally demonstrating a direct interaction between CisA and CIS baseplate complex, and we agree that further investigations in the future will hopefully provide a full mechanistic understanding of the spatiotemporal interaction of CisA and CIS particular and the subsequent CIS firing.

      To further improve the manuscript, we will revise the text and clarify figures and figure legends as suggested in the non-public review.

      Discussion:

      Overall, the work provides a valuable contribution to our understanding on the structure of a much less understood subtype of CISs, which is unique compared to both membrane-anchored T6SSs and host-membrane targeting eCISs. Importantly, the work serves as a good foundation to further investigate how the sheath contraction works here. The work contributes to expanding our understanding of the diverse CIS superfamilies.

      Thank you.

    1. eLife Assessment

      This is a valuable study and a promising development for the field of open-source microscopy for educational purposes. The strengths include the low cost of constructing the microscope, impressive performance and detailed resources including a dedicated website and YouTube channel. The claims are generally supported by solid evidence, however, the manuscript would be strengthened by inclusion of further details on standard performance metrics (e.g. signal to noise ratio etc.) compared to existing systems and further details and clarification on the microscope, construction and operation.

    2. Reviewer #1 (Public review):

      Summary:

      Carter et al. present the eduWOSM imaging platform, a promising development in open-source microscopy for educational purposes. The paper outlines the construction and setup of this versatile microscope, demonstrating its capabilities through three key examples: single fluorophore tracking of tubulin heterodimers in gliding microtubules, 4D deconvolution imaging and tracking of chromosome movements in dividing human cells, and automated single-particle tracking in vitro and in live cells, with motion classified into sub-diffusive, diffusive, and super-diffusive behaviors.

      The paper is well-written and could be strengthened by providing more empirical data on its performance, addressing potential limitations, and offering detailed insights into its educational impact. The project holds great potential and more discussion on long-term support and broader applications would provide a more comprehensive view of its relevance in different contexts.

      Strengths:

      (1) The eduWOSM addresses a crucial need in education, providing research-quality imaging at a lower cost (<$10k). The fact that it is open-source adds significant value, enabling broad accessibility even in under resourced areas.<br /> (2) There is availability of extensive resources, including a dedicated website, YouTube channel, and comprehensive tutorial guides to help users replicate the microscope.<br /> (3) The compact, portable, and stable design makes it easy to build multiple systems for use in diverse environments, including crowded labs and classrooms. This is further enhanced by the fact multiple kind of imaging experiments can be run on the system, from live imaging to super-resolution imaging.<br /> (4) The paper highlights the user-friendly nature of the platform, with the imaging examples in the paper being acquired by undergrad students.

      Weaknesses:

      (1) The paper mentions the microscope is suitable not just for education but even for research purposes. This claim needs validation through quantitative comparison to existing research-grade microscopes in terms of resolution, signal-to-noise ratio, and other key metrics. Adding more rigorous comparisons would solidify its credibility for research use, which would immensely increase the potential of the microscope.<br /> (2) The open-source microscope field is crowded with various options catering to hobby, educational, and research purposes (e.g., openFLexure, Flamingo, Octopi, etc.). The paper would benefit from discussing whether any aspects set the eduWOSM platform apart or fulfill specific roles that other microscopes do not.<br /> (3) While the eduWOSM platform is designed to be user-friendly, the paper would benefit from discussing whether the microscope can be successfully built and operated by users without direct help from the authors. It's important to know if someone with basic technical knowledge, relying solely on the provided resources (website, YouTube tutorials, and documentation), can independently assemble, calibrate, and operate the eduWOSM.<br /> (4) Ensuring long-term support and maintenance of the platform is crucial. The paper would benefit from addressing how the eduWOSM developers plan to support updates, improvements, or troubleshooting.

    3. Reviewer #2 (Public review):

      The main strength of this work is the impressive performance of a microscope assembled for a fraction of the cost of a commercial, turnkey system. The authors have created a very clever design that removes everything that is not essential. They show compelling time-lapse data looking at single molecules, tracking particles visible in brightfield mode, and looking at cell division with multiple labels in a live cell preparation.

      The weaknesses of the paper include:<br /> (1) the lack of more comprehensive explanations of the microscope and what it takes to build and operate it.<br /> For example, the dimensions of the microscope, how samples are mounted, which lenses are compatible, and whether eduWOSMs have been built by groups other than the authors would be useful information.<br /> (2) the absence of more detailed descriptions of some of the experiments, such as frame rates and Z-stack information.<br /> (3) the lack of standardized measures of performance.<br /> For example, images of subresolution tetraspeck beads and measurements of PSF would provide estimates on resolution in XY, resolution in Z, axial chromatic aberrations and lateral chromatic aberrations. Repeating these measurements on different eduWOSMs will provide an idea of how reliably the performance can be achieved.<br /> If these issues were addressed, it would make it more likely that other groups could build and operate this system successfully.

      Overall, the authors have designed and built an impressive system at low cost. Providing a bit more information in the manuscript would make it much more likely that other laboratories could replicate this design in their own environments.

    4. Author response:

      Both reviewers made thoughtful and constructive comments, suggesting improvements that we are keen to provide. The comments fall under 3 headings (1) Further validation of the design, regarding both optical performance and utility, for both education and research (2) Further description and facilitation of the build process and (3) Further description of future plans, in particular plans for dissemination and long-term support. We think these requirements will be best served by adding new content to our Github site and our YouTube channel. We will create this new content and provide a revised manuscript in which these materials are linked from our existing narrative.

    1. eLife Assessment

      This study provides valuable insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The authors use a detailed mathematical model to show how the pump can shift a cell's normal firing patterns and disrupt the coordination of signals when inputs change quickly. The computational methods used to establish the claims in this work are solid and can be used as a starting point for further studies, yet the conclusions would be more convincing with experimental evidence or testable predictions regarding some of the proposed mechanisms across different cell types.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.<br /> (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.<br /> (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.<br /> (2) The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

    3. Reviewer #2 (Public review):

      Summary:

      The paper 'The electrogenicity of the Na+/K+-ATPase poses challenges for computation in highly active spiking cells' by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes-specialized highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells for each spike. This ion imbalance must be restored after each spike, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular volume. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. This does not pose an issue in most cells since the firing rate is much slower, and other compensatory mechanisms and other pumps can effectively restore the ion imbalances. In electrocytes of weakly electric fish, however, that operate under very different circumstances, the firing rate is exceptionally high. On top of this, these cells are also involved in critical communication and survival behaviors, emphasizing their reliable functioning.

      In a computation model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Additionally, their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implication of this cell in the context of chirps - a means of communication between individual fishes. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors show that it is necessary to include the extracellular potassium buffer to have a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte followed by a decay to the baseline. For reliable occurrence of this, they emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is warranted. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of Na and K currents to include the dynamics of the NaK pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for exploring and testing in in vivo experiments which of these proposed solutions the fish use and their relative importance.

      Weaknesses:

      The modeling work makes assumptions and simplifications that should be listed explicitly. For example, it assumes only potassium ions constitute the leak current, which may not be true as other ions (chloride and calcium) may also cross the cell membrane. This implies<br /> that the leak channels' reversal potential may differ from that of potassium. Additionally, the spikes are composed of sodium and potassium currents only and no other ion type (no calcium). Further, these ion channels are static and do not undergo any post-translational modifications. For instance, a sodium-dependent potassium pump could fine-tune the potassium leak currents and modulate the spike amplitude (Markham et al., 2013).

      This model considers only NaK pumps. In many cell types, several other ion pumps/exchangers/symporters are simultaneously present and actively participate in restoring the ion gradients. It may be true that only NaK pumps are expressed in the weakly electric fish Eigenmannia virescens. This limits the generalizability of the results to other cell types. While this does not invalidate the results of the present study, biological processes may find many other solutions to address the non-electroneutral nature of the NaK pump. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      Finally, including testable hypotheses for these computational models would strengthen this work.

    1. eLife Assessment

      The work presented in this paper provides an important insight into how early life experience shapes adult behavior in fruit bats. The authors raised juvenile bats either in an impoverished or enriched environment and studied their foraging behaviors. The evidence is convincing that bats raised in enriched environments are more active, bold, and exploratory, although further exploration of the data and clarification of the analysis would strengthened the evidence. The work will be of interest to ethologists and developmental psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that early life experience of juvenile bats shape their outdoor foraging behaviors. They achieve this by raising juvenile bats either in an impoverished or enriched environment. They subsequently test the behavior of bats indoors and outdoors. The authors show that behavioral measures outdoors were more reliable in delineating the effect of early life experiences as the bats raised in enriched environments were more bold, active and exhibit higher exploratory tendencies.

      Strengths:

      The major strength of the study is providing a quantitative study of animal "personality" and how it is likely shaped by innate and environmental conditions. The other major strength is the ability to do reliable long term recording of bats in the outdoors giving researchers the opportunity to study bats in their natural habitat. To this point, the study also shows that the behavioral variables measured indoors do not correlate to that measured outdoors, thus providing a key insight into the importance of testing animal behaviors in their natural habitat.

      Weaknesses:

      It is not clear from the analysis presented in the paper how persistent those environmentally induced changes, do they remain with the bats till the end of their lives.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a paper that attempts to tackle an important question, with potential impact far beyond the field of animal behavior research: what are the relative contributions of innate personality traits versus early life experience on individual behavior in the wild? The study, performed on Egyptian fruit bats that are caught in the wild and later housed in an outdoor colony, is solidly executed, and benefits greatly from a unique setup in which controlled laboratory experiments are combined with monitoring of individuals as they undertake undirected, free exploration of their natural environment.

      The primary finding of the paper is that there is a strong effect of early life experience on behavior in the wild, where individual bats that were exposed to an enriched environment as juveniles later travelled farther and over greater distances when permitted to explore and forage ad libitum, as compared with individual bats who were subjected to a more impoverished environment. Meanwhile, no prominent effect of innate "personality", as assessed by indices of indoor foraging behavior early on, before the bats were exposed to the controlled environmental treatment, was observed on three metrics of outdoor foraging behavior. The authors conclude that the early environment plays a larger role than innate personality on the behavior of adult bats.

      Strengths:

      (1) Elegant design of experiments and impressive combination of methods<br /> Bats used in the experiment were taken from wild colonies in different geographical areas, but housed during the juvenile stage in a controlled indoor environment. Bats are tested on the same behavioral paradigm at multiple points in their development. Finally, the bats are monitored with GPS as they freely explore the area beyond the outdoor colony.

      (2) Development of a behavioral test that yields consistent results across time<br /> The multiple-foraging box paradigm, in which behavioral traits such as overall activity, levels of risk-taking, and exploratoriness can be evaluated as creative, and suggestive of behavioral paradigms other animal behavior researchers might be able to use. It is especially useful, given that it can be used to evaluate the activity of animals seemingly at most stages of life, and not just in adulthood.

      Weaknesses:

      (1) Robustness and validity of personality measures<br /> Coming up with robust measures of "personality" in non-human animals is tricky. While this paper represents an important attempt at a solution, some of the results obtained from the indoor foraging paradigm raise questions as to the reliability of this task for assessing "personality".

      (2) Insufficient exploitation of data<br /> Between the behavioral measures and the very multidimensional GPS data, the authors are in possession of a rich data set. However, I don't feel that this data has been adequately exploited for underlying patterns and relationships. For example, many more metrics could be extracted from the GPS data, which may then reveal correlations with early measures of personality or further underscore the role of the early environment. In addition, the possibility that these personality measures might in combination affect outdoor foraging is not explored.

      (3) Interpretation of statistical results and definition of statistical models<br /> Some statistical interpretations may not be entirely accurate, particularly in the case of multiple regression with generalized linear models. In addition, some effects which may be present in the data are dismissed as not significant on the basis of null hypothesis testing.

      Below I have organized the main points of critique by theme, and ordered subordinate points by order of importance:

      (1) Assessing personality metrics and the indoor paradigm: While I applaud this effort and think the metrics used are justified, I see a few issues in the results as they are currently presented:<br /> (a) [Major] I am somewhat concerned that here, the foraging box paradigm is being used for two somewhat conflicting purposes: (1) assessing innate personality and (2) measuring changes in personality as a result of experience. If the indoor foraging task is indeed meant to measure and reflect both at the same time, then perhaps this can be made more explicit throughout the manuscript. In this circumstance, I think the authors could place more emphasis on the fact that the task, at later trials/measurements, begins to take on the character of a "composite" measure of personality and experience.

      (b) [Major] Although you only refer to results obtained in trials 1 and 2 when trying to estimate "innate personality" effects, I am a little worried that the paradigm used to measure personality, i.e. the stable components of behavior, is itself affected by other factors such as age (in the case of activity, Fig. 1C3, S1C1-2), the environment (see data re trial 3), and experience outdoors (see data re trials 4/5).

      Ideally, a study that aims to disentangle the role of predisposition from early-life experience would have a metric for predisposition that is relatively unchanging for individuals, which can stand as a baseline against a separate metric that reflects behavioral differences accumulated as a result of experience.

      I would find it more convincing that the foraging box paradigm can be used to measure personality if it could be shown that young bats' behavior was consistent across retests in the box paradigm prior to any environmental exposure across many baseline trials (i.e. more than 2), and that these "initial settings" were constant for individuals. I think it would be important to show that personality is consistent across baseline trials 1 and 2. This could be done, for example, by reproducing the plots in Fig. 1C1-3 while plotting trial 1 against trial 2. (I would note here that if a significant, positive correlation were to be found (as I would expect) between the measures across trial 1 and 2, it is likely that we would see the "habituation effect" the authors refer to expressed as a steep positive slope on the correlation line (indicating that bold individuals on trial 1 are much bolder on trial 2).)

      (c) Related to the previous point, it was not clear to me why the data from trial 2 (the second baseline trial) was not presented in the main body of the paper, and only data from trial 1 was used as a baseline.

      In the supplementary figure and table, you show that the bats tended to exhibit more boldness and exploratory behavior, but fewer actions, in trial 2 as compared with trial 1. You explain that this may be due to habituation to the experimental setup, however, the precise motivation for excluding data from trial 2 from the primary analyses is not stated. I would strongly encourage the authors to include a comparison of the data between the baseline trials in their primary analysis (see above), combine the information from these trials to form a composite baseline against which further analyses are performed, or further justify the exclusion of data as a baseline.

      (2) Comparison of indoor behavioral measures and outdoor behavioral measures<br /> Regarding the final point in the results, correlation between indoor personality on Trial 4 and outdoor foraging behavior: It is not entirely clear to me what is being tested (neither the details of the tests nor the data or a figure are plotted). Given some of the strong trends in the data - namely, (1) how strongly early environment seems to affect outdoor behavior, (2) how strongly outdoor experience affects boldness, measured on indoor behavior (Fig. 1D) - I am not convinced that there is no relationship, as is stated here, between indoor and outdoor behavior. If this conclusion is made purely on the basis of a p-value, I would suggest revisiting this analysis.

      (3) Use of statistics/points regarding the generalized linear models<br /> While I think the implementation of the GLMM models is correct, I am not certain that the interpretation of the GLMM results is entirely correct for cases where multivariate regression has been performed (Tables 4s and S1, and possibly Table 3). (You do not present the exact equation they used for each model (this would be a helpful addition to the methods), therefore it is somewhat difficult to evaluate if the following critique properly applies, however...)

      The "estimate" for a fixed effect in a regression table gives the difference in the outcome variable for a 1 unit increase in the predictor variable (in the case of numeric predictors) or for each successive "level" or treatment (in the case of categorical variables), compared to the baseline, the intercept, which reflects the value of the outcome variable given by the combination of the first value/level of all predictors. Therefore, for example, in Table 4a - Time spend outside: the estimate for Bat sex: male indicates (I believe) the difference in time spent outside for an enriched male vs. an enriched female, not, as the authors seem to aim to explain, the effect of sex overall. Note that the interpretation of the first entry, Environmental condition: impoverished, is correct. I refer the authors to the section "Multiple treatments and interactions" on p. 11 of this guide to evaluating contrasts in G/LMMS: https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

    1. eLife Assessment

      This valuable study presents findings linking prophage carriage to lifestyle regulation in the marine bacterium Shewanella fidelis, with potential implications for niche occupation within a host (Ciona robusta) and mediation of host immune responses. The study leverages a unique animal model system that offers distinct advantages in identifying select phenotypes to present generally solid evidence that supports findings relating to the impact of a prophage on host-microbe interaction. Understanding the role of integrated lysogenic phages in bacterial fitness, both within a host and in the environment, is a significant concept in bacterial eco-physiology, potentially contributing to the success of certain strains.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript aims to elucidate the impact of a prophage within the genome of Shewanella fidelis on its interaction with the marine tunicate Ciona robusta. The authors made a deletion mutant of S. fidelis that lacks one of its two prophages. This mutant exhibited an enhanced biofilm phenotype, as assessed through crystal violet staining, and showed reduced motility. The authors examined the effect of prophage deletion on several genes that could modulate cyclic-diGMP levels. While no significant changes were observed under in vitro conditions, the gene for one protein potentially involved in cyclic-diGMP hydrolysis was overexpressed during microbe-host interactions. The mutant was retained more effectively within a one-hour timeframe, whereas the wild-type (WT) strain became more abundant after 24 hours. Fluorescence microscopy was used to visualize the localization patterns of the two strains, which appeared to differ. Additionally, a significant difference in the expression of one immune protein was noted after one hour, but this difference was not evident after 23 hours. An effect of VCBC-C addition on the expression of one prophage gene was also observed.

      Strengths:

      I appreciate how the authors integrate diverse expertise and methods to address questions regarding the impact of prophages on gut microbiome-host interactions. The chosen model system is appropriate, as it allows for high-throughput experimentation and the application of simple imaging techniques.

      Weaknesses:

      My primary concern is that the manuscript primarily describes observations without providing insight into the molecular mechanisms underlying the observed differences. It is particularly unclear how the presence of the prophage leads to the phenotypic changes related to bacterial physiology and host-microbe interactions. Which specific prophage genes are critical, or is the insertion at a specific site in the bacterial genome the key factor? While significant effects on bacterial physiology are reported under in vitro conditions, there is no clear attribution to particular enzymes or proteins. In contrast, when the system is expanded to include the tunicate, differences in the expression of a cyclic-diGMP hydrolase become apparent. Why do we not observe such differences under in vitro conditions, despite noting variations in biofilm formation and motility? Furthermore, given that the bacterial strain possesses two prophages, I am curious as to why the authors chose to target only one and not both.

      Regarding the microbe-host interaction, it is not clear why the increased retention ability of the prophage deletion strain did not lead to greater cell retention after 24 hours, especially since no differences in the immune response were observed at that time point.

      Concerning the methodological approach, I am puzzled as to why the authors opted for qPCR instead of transcriptomics or proteomics. The latter approaches could have provided a broader understanding of the prophage's impact on both the microbe and the host.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, "Prophage regulation of Shewanella fidelis 3313 motility and biofilm formation: implications for gut colonization dynamics in Ciona robusta", the authors are experimentally investigating the idea that integrated viruses (prophages) within a bacterial colonizer of the host Ciona robusta affect both the colonizer and the host. They found a prophage within the Ciona robusta colonizing bacterium Shewanella fidelis 3313, which affected both the bacteria and host. This prophage does so by regulating the phosphodiesterase gene pdeB in the bacterium when the bacterium has colonized the host. The prophage also regulates the activity of the host immune gene VCBP-C during early bacterial colonization. Prophage effects on both these genes affect the precise localization of the colonizing bacterium, motility of the bacterium, and bacterial biofilm formation on the host. Interestingly, VCBP-C expression also suppressed a prophage structural protein, creating a tripartite feedback loop in this symbiosis. This is exciting research that adds to the emerging body of evidence that prophages can have beneficial effects not only on their host bacteria but also on how that bacteria interacts in its environment. This study establishes the evolutionary conservation of this concept with intriguing implications of prophage effects on tripartite interactions.

      Strengths:

      This research effectively shows that a prophage within a bacterium colonizing a model ascidian affects both the bacterium and the host in vivo. These data establish the prophage effects on bacterial activity and expand these effects to the natural interactions within the host animal. The effects of the prophage through deletion on a suite of host genes are a strength, as shown by striking microscopy.

      Weaknesses:

      Unfortunately, there are abundant negative data that cast some limitations on the interpretation of the data. That is, examining specific gene expression has its limitations, which could be avoided by global transcriptomics of the bacteria and the host during colonization by the prophage-containing and prophage-deleted bacteria (1 hour and 24 hours). In this way, the tripartite interactions leading to mechanism could be better established.

      Impact:

      The authors are correct to speculate that this research can have a significant impact on many animal microbiome studies, since bacterial lysogens are prevalent in most microbiomes. Screening for prophages, determining whether they are active, and "curing" the host bacteria of active prophages are effective tools for understanding the effects these mobile elements have on microbiomes. There are many potential effects of these elements in vivo, both positive and negative, this research is a good example of why this research should be explored.

      Context:

      The research area of prophage effects on host bacteria in vitro has been studied for decades, while these interactions in combination with animal hosts in vivo have been recent. The significance of this research shows that there could be divergent effects based on whether the study is conducted in vitro or in vivo. The in vivo results were striking. This is particularly so with the microscopy images. The benefit of using Ciona is that it has a translucent body which allows for following microbial localization. This is in contrast to mammalian studies where following microbial localization would either be difficult or near impossible.

    4. Reviewer #3 (Public review):

      In this manuscript, Natarajan and colleagues report on the role of a prophage, termed SfPat, in the regulation of motility and biofilm formation by the marine bacterium Shewanella fidelis. The authors investigate the in vivo relevance of prophage carriage by studying the gut occupation patterns of Shewanella fidelis wild-type and an isogenic SfPat- mutant derivative in a model organism, juveniles of the marine tunicate Ciona robusta. The role of bacterial prophages in regulating bacterial lifestyle adaptation and niche occupation is a relatively underexplored field, and efforts in this direction are appreciated.

      While the research question is interesting, the work presented lacks clarity in its support for several major claims, and, at times, the authors do not adequately explain their data.

      Major concerns:

      (1) Prophage deletion renders the SfPat- mutant derivative substantially less motile and with a higher biofilm formation capacity than the WT (Fig. 2a-b). The authors claim the mutant is otherwise isogenic to the WT strain upon sequence comparison of draft genome sequences (I'll take the opportunity to comment here that GenBank accessions are preferable to BioSample accessions in Table 1). Even in the absence of secondary mutations, complementation is needed to validate functional associations (i.e., phenotype restoration). A strategy for this could be phage reintegration into the mutant strain (PMID: 19005496).

      (2) The authors claim that the downshift in motility (concomitant with an upshift in biofilm formation) is likely mediated by the activity of c-di-GMP turnover proteins. Specifically, the authors point to the c-di-GMP-specific phosphodiesterase PdeB as a key mediator, after finding lower transcript levels for its coding gene in vivo (lines 148-151, Fig. 2c), and suggesting higher activity of this protein in live animals (!)(line 229). I have several concerns here:<br /> (2.1) Findings shown in Fig. 2a-b are in vitro, yet no altered transcript levels for pdeB were recorded (Fig. 2c). Why do the authors base their inferences only on in vivo data?<br /> (2.2) Somewhat altered transcript levels alone are insufficient for making associations, let alone solid statements. Often, the activity of c-di-GMP turnover proteins is local and/or depends on the activation of specific sensory modules - in the case of PdeB, a PAS domain and a periplasmic sensor domain (PMID: 35501424). This has not been explored in the manuscript, i.e., specific activation vs. global alterations of cellular c-di-GMP pools (or involvement of other proteins, please see below). Additional experiments are needed to confirm the involvement of PdeB. Gaining such mechanistic insights would greatly enhance the impact of this study.<br /> (2.3) What is the rationale behind selecting only four genes to probe the influence of the prophage on Ciona gut colonization by determining their transcript levels in vitro and in vivo? If the authors attribute the distinct behavior of the mutant to altered c-di-GMP homeostasis, as may be plausible, why did the authors choose those four genes specifically and not, for example, the many other c-di-GMP turnover protein-coding genes or c-di-GMP effectors present in the S. fidelis genome? This methodological approach seems inadequate to me, and the conclusions on the potential implication of PdeB are premature.

      (3) The behavior of the WT strain and the prophage deletion mutant is insufficiently characterized. For instance, how do the authors know that the higher retention capacity reported for the WT strain with respect to the mutant (Fig. 3b) is not merely a consequence of, e.g., a higher growth rate? It would be worth investigating this further, ideally under conditions reflecting the host environment.

      (4) Related to the above, sometimes the authors refer to "retention" (e.g., line 162) and at other instances to "colonization" (e.g., line 161), or even adhesion (line 225). These are distinct processes. The authors have only tracked the presence of bacteria by fluorescence labeling; adhesion or colonization has not been assessed or demonstrated in vivo. Please revise.

      (5) The higher CFU numbers for the WT after 24 h (line 161) might also indicate a role of motility for successful niche occupation or dissemination in vivo. The authors could test this hypothesis by examining the behavior of, e.g., flagellar mutants in their in vivo model.

      (6) The endpoint of experiments with a mixed WT-mutant inoculum (assumedly 1:1? Please specify) was set to 1 h, I assume because of the differences observed in CFU counts after 24 h. In vivo findings shown in Fig. 3c-e are, prima facie, somewhat contradictory. The authors report preferential occupation of the esophagus by the WT (line 223), which seems proficient from evidence shown in Fig. S3. Yet, there is marginal presence of the WT in the esophagus in experiments with a mixed inoculum (Fig. 3d) or none at all (Fig. 3e). Likewise, the authors claim preferential "adhesion to stomach folds" by the mutant strain (line 225), but this is not evident from Fig. 3e. In fact, the occupation patterns by the WT and mutant strain in the stomach in panel 3e appear to differ from what is shown in panel 3d. The same holds true for the claimed "preferential localization of the WT in the pyloric cecum," with Fig. 3d showing a yellow signal that indicates the coexistence of WT and mutant.

      (7) In general, and especially for in vivo data, there is considerable variability that precludes drawing conclusions beyond mere trends. One could attribute such variability in vivo to the employed model organism (which is not germ-free), differences between individuals, and other factors. This should be discussed more openly in the main text and presented as a limitation of the study. Even with such intrinsic factors affecting in vivo measurements, certain in vitro experiments, which are expected, in principle, to yield more reproducible results, also show high variability (e.g., Fig. 5). What do the authors attribute this variability to?

      (8) Line 198-199: Why not look for potential prophage excision directly rather than relying on indirect, presumptive evidence based on qPCR?

    1. eLife Assessment

      This manuscript describes the generation of a fused dorsal-ventral organoid system to model interactions between the cortex and striatum to study the onset and progression of Huntington's disease (HD) and other neurodegenerative disorders. While this approach is valuable, further methodological and analytical work is needed to fully support the interpretations and claims of the authors. Incomplete evidence suggests choroid plexus (ChP) abnormalities are an important component of HD pathogenesis.

    2. Reviewer #1 (Public review):

      In the manuscript "Identification of neurodevelopmental organization of the cell populations of Juvenile Huntington's disease using dorso-ventral HD organoids and HD mouse embryos," the authors establish a fused dorso-ventral system that mimics cortex-striatum interactions within a single organoid and use this system to investigate neurodevelopmental impairments caused by HD. Specifically, they describe certain phenotypes in 60-day HD organoids and the brains of humanized mouse embryos, utilizing both wet-lab and single-cell sequencing techniques. The authors also develop dorsal/ventral and ventral/dorsal mosaic control/HD organoids, showing a capacity to rescue some HD phenotypes.

      The manuscript could be a valuable contribution to the field, however it has relevant drawbacks, the most significant being a lack of clarity regarding the replicates used for each genotype in the sequencing analyses. The lack of information on replicates raises the possibility that only a single replicate was analyzed for each organoid and brain sample. This approach may lead to concerns regarding the reproducibility of the findings, and it may be necessary for the authors to generate additional data to strengthen their conclusions. In addition, the analysis of the HD samples was conducted by pooling distinct cell populations from different brain regions (CTX, HIP, ChP for the dorsal brain, and STR, HYP, TH for the ventral brain). It is unclear why scRNA seq was used on pooled brain regions, which could obscure region-specific insights.

      Another issue pertains to their proposed outcome: "Finally, we found that TTR protein, a choroid plexus marker, is elevated in the adult HD mouse serum, indicating that TTR may be a promising marker for detecting HD". This statement appears to lack statistical support, which makes this set of data potentially misleading and inconclusive.

      The authors are encouraged to provide evidence of biological replicates, remove outcomes that lack statistical support, and address a series of points as detailed elsewhere.

    3. Reviewer #2 (Public review):

      The article titled "Identification of neurodevelopmental organization of the cell populations of juvenile Huntington's disease using dorso-ventral HD organoids and HD mouse embryos" analyses an in vitro human brain organoid model containig dorsal and ventral telencephalum structures derived from human iPSC from Huntington's disease patients or control subjects.

      The authors describe differences in the pattern of expression of genes related to proliferation and neuronal maturation, with a slower pattern of differentiation present in HD cells. Moreover, the authors described a higher differentiation capacity of HD cells to generate choroid plexus identity following dorsal telencephalon prime protocol differentiation when compared to control cells. Whereas the claims related to Choroid plexus identity are intriguing, most of the claims made through the manuscript are not sustained by quantitative data or consistent results in the different conditions analysed, or many experiments seem to be missing to reach final conclusions.

      In addition, the quality of the organoids used for experiments does not seem to have been assessed or satisfactorily presented in the figures of this paper. Many important details related to the experimental execution are missing in the current version of this manuscript.

    1. eLife Assessment

      This fundamental study examines infection of the liver and hepatocytes during Mycobacterium tuberculosis infection using different systems including aerosol infection of mice and guinea pigs to demonstrate appreciable infection of the liver as well as the lung. The authors present convincing evidence that hepatocyte infection leads to metabolic dysfunction that promotes M. tuberculosis growth, in part potentially mediated by a nuclear receptor called PPARg. Overall, this is an interesting paper on an area of tuberculosis research which has been understudied, representing a significant advancement in the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors showed the presence of Mtb in human liver biopsy samples of TB patients and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infects macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patients. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. The authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.<br /> Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. The authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drugs is possible only when DME of Mtb inside is up regulated or the target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if the Drug tolerant phenotype section can be rewritten to clarify the facts.

    3. Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in the clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in a similar line for M.tb infection. As the liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, the greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

    4. Reviewer #3 (Public review):

      This manuscript by Sarkar et al. examines the infection of the liver and hepatocytes during M. tuberculosis infection. They demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. Transcriptomic analysis of HepG2 cells showed differential regulation of metabolic pathways including fatty acid metabolic processing. Hepatocyte infection is assisted by fatty acid synthesis in the liver and inhibiting this caused reduced Mtb growth. The nuclear receptor PPARg was upregulated by Mtb infection and inhibition or agonism of its activity caused a reduction or increase in Mtb growth, respectively, supporting data published elsewhere about the role of PPARg in lung macrophage Mtb infection. Finally, the authors show that Mtb infection of hepatocytes can cause upregulation of enzymes that metabolize antibiotics, resulting in increased tolerance of these drugs by Mtb in the liver.

      Overall, this is an interesting paper on an area of TB research where we lack understanding. However, some additions to the experiments and figures are needed to improve the rigor of the paper and further support the findings. Most importantly, although the authors show that Mtb can infect hepatocytes in vitro, they fail to describe how bacteria get from the lungs to the liver in an aerosolized infection. They also claim that "PPARg activation resulting in lipid droplets formation by Mtb might be a mechanism of prolonging survival within hepatocytes" but do not show a direct interaction between PPARg activation and lipid droplet formation and lipid metabolism, only that PPARg promotes Mtb growth. Thus, the correlations with PPARg appear to be there but causation, implied in the abstract and discussion, is not proven.

      The human photomicrographs are important and overall well done (lung and liver from the same individuals is excellent). However, in lines 120-121, the authors comment on the absence of studies on the precise involvement of different cells in the liver. In this study there is no attempt to immunophenotype the nature of the cells harboring Mtb in these samples (esp. hepatocytes). Proving that hepatocytes specifically harbor the bacteria in these human samples would add significant rigor to the conclusions made.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors showed the presence of Mtb in human liver biopsy samples of TB patients and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infects macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patients. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. The authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      We thank the reviewer for the positive feedback and for highlighting the strengths of our study.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. The authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drugs is possible only when DME of Mtb inside is up regulated or the target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if the Drug tolerant phenotype section can be rewritten to clarify the facts.

      We agree that several case studies regarding liver infection in pulmonary TB patients have been reported in the literature, however this report is the first comprehensive study that establishes hepatocytes to be a favourable niche for Mtb survival and growth.

      Drug tolerance is a phenomenon that is exhibited by the bacteria and in the course of host-pathogen interactions, can be influenced by both intrinsic (bacterial) and extrinsic (host-mediated) factors. Multiple examples of tolerance being attributed to host driven factors can be found in literature (PMID 32546788, PMID: 28659799, PMID: 32846197). Our studies demonstrate that Mtb infected hepatocytes create a drug tolerant environment by modulating the expression of Drug modifying enzymes (DMEs) in the hepatocytes.

      As suggested by the reviewer we will rewrite the drug tolerant phenotype section.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in the clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in a similar line for M.tb infection. As the liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, the greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      We thank the reviewer for emphasizing on the strengths of our study and how it can lead to further investigations in the field.

      Reviewer #3 (Public review):

      This manuscript by Sarkar et al. examines the infection of the liver and hepatocytes during M. tuberculosis infection. They demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. Transcriptomic analysis of HepG2 cells showed differential regulation of metabolic pathways including fatty acid metabolic processing. Hepatocyte infection is assisted by fatty acid synthesis in the liver and inhibiting this caused reduced Mtb growth. The nuclear receptor PPARg was upregulated by Mtb infection and inhibition or agonism of its activity caused a reduction or increase in Mtb growth, respectively, supporting data published elsewhere about the role of PPARg in lung macrophage Mtb infection. Finally, the authors show that Mtb infection of hepatocytes can cause upregulation of enzymes that metabolize antibiotics, resulting in increased tolerance of these drugs by Mtb in the liver.

      Overall, this is an interesting paper on an area of TB research where we lack understanding. However, some additions to the experiments and figures are needed to improve the rigor of the paper and further support the findings. Most importantly, although the authors show that Mtb can infect hepatocytes in vitro, they fail to describe how bacteria get from the lungs to the liver in an aerosolized infection. They also claim that "PPARg activation resulting in lipid droplets formation by Mtb might be a mechanism of prolonging survival within hepatocytes" but do not show a direct interaction between PPARg activation and lipid droplet formation and lipid metabolism, only that PPARg promotes Mtb growth. Thus, the correlations with PPARg appear to be there but causation, implied in the abstract and discussion, is not proven.

      The human photomicrographs are important and overall, well done (lung and liver from the same individuals is excellent). However, in lines 120-121, the authors comment on the absence of studies on the precise involvement of different cells in the liver. In this study there is no attempt to immunophenotype the nature of the cells harboring Mtb in these samples (esp. hepatocytes). Proving that hepatocytes specifically harbor the bacteria in these human samples would add significant rigor to the conclusions made.

      We thank the reviewer for nicely summarizing our manuscript.

      Our study establishes the involvement of liver and hepatocytes in pulmonary TB infection in mice. Understanding the mechanism of bacterial dissemination from the lung to the liver in aerosol infections demands a detailed separate study.

      Figure 6E and 6F shows how PPARγ agonist and antagonist modulate (increase and decrease respectively) bacterial growth in hepatocytes (further supported by the CFU data in Supplementary Figure 9B). Again, the number of lipid droplets in hepatocytes increase and decrease with the application of PPARγ agonist and antagonist respectively as shown in Figure 6G and 6H. Collectively, these studies provide strong evidence that PPARγ activation leads to more lipid droplets that support better Mtb growth.

      We thank the reviewer for finding our human photomicrographs convincing. In the manuscript, we provide evidence for the direct involvement of the hepatocytes (and liver) in Mtb infection. We perform detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we will further stain the infected hepatocytes with anti-albumin antibody.

    1. eLife Assessment

      In this valuable study, the authors provide solid evidence that the likelihood of looking behaviour is predicted by the expected information gain, hence constituting a valuable formal model and explanation of habituation. Such modelling can represent crucial advances in explanation, over-and-above less specified models that can be fitted post hoc to any empirical pattern, although contrast testing with other accounts are desired. The findings would be of interest to researchers studying cognitive development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper proposes a new model of perceptual habituation and tests it over two experiments with both infants and adults. The model combines a neural network for visual processing with a Bayesian rational model for attention (i.e., looking time) allocation. This Bayesian framework allows the authors to measure elegantly diverse factors that might drive attention, such as expected information gain, current information gain, and surprise. The model is then fitted to infant and adult participants' data over two experiments, which systematically vary the amount of habituation trials (Experiment 1) and the type of dishabituation stimulus (familiarity, pose, number, identity, and animacy). Results show that a model based on (expected) information gain performs better than a model based on surprise. Additionally, while novelty preference is observed when exposure to familiar stimuli is elevated, no familiarity preference is observed when exposure to familiar stimuli is low or intermediate, which is in contrast with past work.

      Strengths:

      There are three key strengths of this work:

      (1) It integrates a neural network model with a Bayesian rational learner, thus bridging the gap between two fields that have often been disconnected. This is rarely seen in the cognitive science field, but the advantages are very clear from this paper: It is possible to have computational models that not only process visual information, but also actively explore the environment based on overarching attentional processes.

      (2) By varying parametrically the amount of stimulus exposure and by testing the effects of multiple novel stimulus types, this work allowed the authors to put classical theories of habituation to the test on much finer scales than previous research has done.

      (3) The Bayesian model allows the authors to test what specific aspects are different in infants and adults, showing that infants display greater values for the noise parameter.

      Weaknesses:

      Although a familiarity preference is not found, it is possible that this is related to the nature of the stimuli and the amount of learning that they offer. While infants here are exposed to the same perceptual stimulus repeatedly, infants can also be familiarised to more complex stimuli or scenarios. Classical statistical learning studies for example expose infants to specific pseudo-words during habituation/familiarisation, and then test their preference for familiar vs novel streams of pseudo-words. The amount of learning progress in these probabilistic learning studies is greater than in perceptual studies, and familiarity preferences may thus be more likely to emerge there. For these reasons, I think it is important to frame this as a model of perceptual habituation. This would also fit well with the neural net that was used, which is processing visual stimuli rather than probabilistic structures. If statements in the discussion are limited to perceptual paradigms, they would make the arguments more compelling.

    3. Reviewer #2 (Public review):

      Summary:

      This paper extends a Bayesian perception/action model of habituation behavior (RANCH) to infant-looking behavior. The authors test the model predictions against data from several groups of infants and adults tested in habituation paradigms that vary the number of familiarisation stimuli and the nature of the test stimuli. Model sampling was taken as a proxy for looking times. The predictions of the model generally resemble the empirical data collected, though there are some potentially important differences.

      Strengths:

      This study addresses an important question, given the fundamental nature of habituation to learning and memory. Previous explanations of infant habituation have typically not been in the form of formal models, making falsification difficult. This Bayesian model is relatively simple but also incorporates a CNN to which the actual stimulus image can be presented, which enables principled predictions about image similarity to be derived.

      The paper contains data from a relatively large number of adults and infants, allowing parameter differences across age to be probed.

      The data suggests that the noise prior parameter is higher in infants, suggesting one mechanism through which infant and adult habituation is different, though of course, this depends on whether there is sufficient empirical evidence that other explanations can be ruled out, which isn't clear in the manuscript currently.

      Weaknesses:

      There are no formal tests of the predictions of RANCH against other leading hypotheses or models of habituation. This makes it difficult to evaluate the degree to which RANCH provides an alternative account that makes distinct predictions from other accounts. I appreciate that because other theoretical descriptions haven't been instantiated in formal models this might be difficult, but some way of formalising them to enable comparison would be useful.

      The justification for using the RMSEA fitting approach could also be stronger - why is this the best way to compare the predictions of the formal model to the empirical data? Are there others? As always, the main issue with formal models is determining the degree to which they just match surface features of empirical data versus providing mechanistic insights, so some discussion of the level of fit necessary for strong inference would be useful.

      The difference in model predictions for identity vs number relative to the empirical data seems important but isn't given sufficient weight in terms of evaluating whether the model is or is not providing a good explanation of infant behavior. What would falsification look like in this context?

      For the novel image similarity analysis, it is difficult to determine whether any differences are due to differences in the way the CNN encodes images vs in the habituation model itself - there are perhaps too many free parameters to pinpoint the nature of any disparities. Would there be another way to test the model without the CNN introducing additional unknowns?

      Related to that, the model contains lots of parts - the CNN, the EIG approach, and the parameters, all of which may or may not match how the infant's brain operates. EIG is systematically compared to two other algorithms, with KL working similarly - does this then imply we can't tell the difference between an explanation based on those two mechanisms? Are there situations in which they would make distinct predictions where they could be pulled apart? Also in this section, there doesn't appear to be any formal testing of the fits, so it is hard to determine whether this is a meaningful difference. However, other parts of the model don't seem to be systematically varied, so it isn't always clear what the precise question addressed in the manuscript is (e.g. is it about the algorithm controlling learning? or just that this model in general when fitted in a certain way resembles the empirical data?)

    1. eLife Assessment

      This important theoretical study examines the possibility of encoding genomic information in a collective of short overlapping strands (e.g., the Virtual Circular Genome (VCG) model). The study presents solid theoretical arguments, simulations and comparisons to experimental data to point at potential features and limitations of such distributed collective encoding of information. The work should be of relevance to colleagues interested in molecular information processing and to those interested in pre-Central Dogma or prebiotic models of self-replication.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to so-called sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from "feedstock" pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling<br /> (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.<br /> (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.<br /> (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (~10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L0<br /> Strengths:

      High-quality theoretical modeling of an important problem is implemented.

      Weaknesses:

      The conclusions are somewhat convoluted and could be presented better.

    3. Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      - The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.<br /> - Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.<br /> - Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

      Weaknesses:

      - Most of the experimental work on the VCG has been performed with the bridged 2-aminoimidazolium dinucleotides, which are not featured in the kinetic model of this work. Oher studies by Szostak and colleagues have demonstrated that non-enzymatic RNA extension with bridged dinucleotides have superior kinetics (Walton et al. JACS 2016, Li et al. JACS 2017), fidelity (Duzdevich et al. NAR 2021), and regioselectivity (Giurgiu et al. JACS 2017) compared to activated monomers, establishing the bridged dinucleotides as important for non-enzymatic RNA replication. Therefore, the omission of these species in the kinetic model presented here can be perceived as problematic. The major claim that avoidance of oligo ligations is beneficial for VCGs may be irrelevant if bridged dinucleotides are used as the extending species, because oligo ligations (V + V in this work) are kinetically orders of magnitude slower than monomer extensions (F + V in this work) (Ding et al. NAR 2022). Formally adding the bridged dinucleotides to the kinetic model is likely outside of the scope of this work, but perhaps the authors could test if this should be done in the future by simply increasing the rate of monomer extension (F + V) to match the bridged dinucleotide rate without changing rate of V + V ligation?<br /> - The kinetic and thermodynamic parameters for oligo binding appear to be missing two potentially important components. First, base-paired RNA strands that contain gaps where an activated monomer or oligo can bind have been shown to display significantly different kinetics of ligation and binding/unbinding than complexes that do not contain such gaps (see Prywes et al. eLife 2016, Banerjee et al. Nature Nanotechnology 2023, and Todisco et al. JACS 2024). Would inclusion of such parameters alter the overall kinetic model? Second, it has been shown that long base-paired RNA can tolerate mismatches to an extent that can result in monomer ligation to such mismatched duplexes (see Todisco et al. NAR 2024). Would inclusion of the parameters published in Todisco et al. NAR 2024 alter the kinetic model significantly?

    1. eLife Assessment

      This manuscript describes an important finding of the transcriptional control of a chimeric gene transfer agents (GTA) cluster in Bartonella by a processive anti-termination factor (BrrG). The evidence provided is solid. This manuscript will interest researchers working on transcriptional regulation, horizontal gene transfer, and phages.

    2. Reviewer #1 (Public review):

      Summary:

      Gene transfer agent (GTA) from Bartonella is a fascinating chimeric GTA that evolved from the domestication of two phages. Not much is known about how the expression of the BaGTA is regulated. In this manuscript, Korotaev et al noted the structural similarity between BrrG (a protein encoded by the ror locus of BaGTA) to a well-known transcriptional anti-termination factor, 21Q, from phage P21. This sparked the investigation into the possibility that BaGTA cluster is also regulated by anti-termination. Using a suite of cell biology, genetics, and genome-wide techniques (ChIP-seq), Korotaev et al convincingly showed that this is most likely the case. The findings offer the first insight into the regulation of GTA cluster (and GTA-mediated gene transfer) particularly in this pathogen Bartonella. Note that anti-termination is a well-known/studied mechanism of transcriptional control. Anti-termination is a very common mechanism for gene expression control of prophages, phages, bacterial gene clusters, and other GTAs, so in this sense, the impact of the findings in this study here is limited to Bartonella.

      Strengths:

      Convincing results that overall support the main claim of the manuscript.

      Weaknesses:

      A few important controls are missing.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identified and characterized a regulatory mechanism based on transcriptional anti-termination that connects the two gene clusters, capsid and run-off replication (ROR) locus, of the bipartite Bartonella gene transfer agent (GTA). Among genes essential for GTA functionality identified in a previous transposon sequencing project, they found a potential antiterminatior of phage origin within the ROR locus. They employed fluorescence reporter and gene transfer assays of overexpression and knockout strains in combination with ChiPSeq and promoter-fusions to convincingly show that this protein indeed acts as an antiterminator counteracting attenuation of the capsid gene cluster expression.

      Impact on the field:

      The results provide valuable insights into the evolution of the chimeric BaGTA, a unique example of phage co-domestication by bacteria. A similar system found in the other broadly studied Rhodobacterales/Caulobacterales GTA family suggests that antitermination could be a general mechanism for GTA control.

      Strengths:

      Results of the selected and carefully designed experiments support the main conclusions.

      Weaknesses:

      It remains open why overexpression of the antiterminator does not increase the gene transfer frequency.

    1. eLife Assessment

      Using a TN-seq based approach, the authors identified the genetic determinants of drug tolerance in M. abscessus. Since M. abscessus is resistant to multiple antibiotics, the study is valuable in generating new knowledge linking antibiotic tolerance with ROS in this non-tuberculosis mycobacterial (NTM) species. However, the study is incomplete due to a need for more validation of the Tn-seq data, inconsistency with the clinical strains, and insufficient experiments confirming the role of ROS detoxification in drug tolerance.

    2. Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tn-seq was carried out to identify genes involved in spontaneous (nutrient rich) or starvation-induced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      - Fig. 3 - 5 of the hits from the transposon screen were reconstructed as clean deletion strains and tested for persistence. However, only 1 (katG) gave a strong and 1 (Mab_1456c) exhibited a minor defect. Two of the clones did not show any persistence phenotype (blaR and recR) and one (pafA) showed a minor phenotype, however it was not clear if this difference was really relevant as the mutant exhibited differences at Day 0, prior to the addition of antibiotics. Considering these results from the validation, the conclusion would be that the Tn-seq approach to screen persistence defects is not reliable and is more likely to result in misses than hits.

      - Fig 3 - Why is there such a huge difference in the extent of killing of the control strain in media, when exposed to TIG/LZD, when compared to Fig. 1C and Fig. 4. In Fig. 1C, M. abs grown in media decreases by >1 log by Day 3 and >4 log by Day 6, whereas in Fig. 3, the bacterial load decreases by <1 log by Day 3 and <2 log by Day 6. This needs to be clarified, if the experimental conditions were different, because if comparing to Fig. 1C data then the katG mutant strain phenotype is not very different.

    3. Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved in two major ways. First, word-choice decisions could better conform to the published literature. Alternatively, novel definitions could be included. In particular, the data support the concept of phenotypic tolerance, not persistence. Second, two of the novel observations could be explored more extensively to provide mechanistic explanations for the phenomena.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are field-solidifying observations.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses. They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

    1. eLife Assessment

      In this innovative study, Carpenet C et al explore the use of nanobody-based PET imaging to track proliferative cells after in vivo transplantation in mice, in a fully immunocompetent setting. The development of a unique set of PET tracers and mouse strains to track genetically-unmodified transplanted cells in vivo is an important novel asset that could potentially facilitate cell tracking. The evidence provided is compelling as the new method proposed might facilitate overcoming certain limitations of alternative approaches, such as full sized immunoglobulins and small molecules, while the specific claims would gain further support by additional experimentation and methodological details.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of nanobody-based PET imaging is important and holds great potential for real-world applications since nanobodies have many advantages over full sized immunoglobulins and small molecules.

      Strengths:

      The submitted manuscript contains quite a bit of interesting data from a collaborative team of well-respected researchers. The authors are to be congratulated for presenting results that may not have turned out the way they had hoped, and doing so in a transparent fashion.

      Weaknesses:

      However, the manuscript could be considered to be a collection of exploratory findings rather than a complete and mature scientific exposition. Most of the sample sizes were 3 per group, which is fine for exploratory work, but insufficient to draw strong statistically robust conclusions for definitive results.

    3. Reviewer #2 (Public review):

      Summary:

      This is a strong and well-described study showing for the first time the use and publicly available resources to use a specific PET tracer to track proliferating transplanted cells in vivo, in a full murine immunecompetent environment.

      In this study the authors described a previously developed set of VHH-based PET tracers to track transplants (cancer cells, embryo's) in a murine immune-competent environment.

      Strengths:

      Unique set of PET tracer and mouse strain to track transplanted cells in vivo without genetic modification of the transplanted cells. This is a unique asset, and a first-in-kind.

      Weaknesses:

      -some methodological aspects and controls are missing

      -no clinical relevance?

    1. eLife Assessment

      This important work presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. The evidence supporting the claims is generally convincing but could be improved if the authors expanded the phenotypic characterization of the compound and its potential effects on both viral and host targets.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently, the authors present their findings of the efficacy of the inhibitor both on cell culture, as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings.

      Weaknesses:

      (a) The selection of Targ1 and MacroD2 as off-target human macrodomains is poor as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity.

      (b) The authors utilize only replication efficiency and general infection markers as read out for their Mac1 inhibitor. It would be good if they could show impact on the ADP-ribosylation of a known Mac1 target such as PARP14.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have a reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

    1. eLife Assessment

      This work describes a valuable method, SICKO, for real-time longitudinal quantification of bacterial colonization in the gut of individual C. elegans. The authors present convincing evidence to support the validity of the approach. SICKO provides an experimental framework that will enable progress in our understanding of host-microbe interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The imaging pipeline presented in this paper is a useful tool for visualizing and dynamically tracking bacterial colony formation at the individual worm level, enabling the study of microbiome colonization's association with host physiology, including lifespan, infection severity, and genetic mutations in real-time. This technique allows for certain biological information to be obtained that was previously missed such as pmk-1 mutants exhibiting a higher rate of colonization by E. coli OP50 than wild-type animals. Overall, this platform could be of interest to many labs studying C. elegans interactions with their microbiome and with bacterial pathogens.

      Strengths:

      This platform allows for unbiased quantifications of microbe colonization of bacteria at scale. This is particularly important in a field studying dynamic responses or potentially more subtle or variable phenotypes.

      Platform could be adapted for multiple uses or potentially other species of nematodes for evolutionary comparisons.

      The platform allows researchers to correlate bacterial colonization with predicted lifespan.

      Weaknesses:

      Platform will require optimization for any given bacteria species which restricts its ease of use for researchers that won't regularly be studying the same bacteria.

      Requires the bacteria to be genetically tractable so cannot be easily adapted to microbes that do not have established ways of expressing GFP or other reporters.

      This platform requires the use of relatively older adult animals that are more prone to larger gut colonies of bacteria. Thus, studies using this platform are restricted to studying older populations.

      The relationship between bacterial colonization and host lifespan requires further investigation. The current SICKO platform and experimentation cannot fully address whether animals in poorer health are more susceptible to colonization, or whether colonization casually contributes to a decline in health. Furthermore, while such effects are statistically significant their effect size in some cases is modest.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Espejo et al describe a method, SICKO, that allows for long-term longitudinal examination of bacterial colonization in the gut of C. elegans. SICKO utilizes a well-plate format where single worms are housed in each well with a small NGM pad surrounded by an aversive palmitic acid barrier to prevent worms from fleeing the well. The main benefit of this method is that it captures longitudinal data across individual worms with the ability to capture tens to hundreds of worms at once. The output data of SICKO in the heatmap is also very clear and robustly shows bacterial colonization in the gut across a large sample size, which is far superior to the current gold standard of imaging 10-20 worms in a cross-sectional matter at various timepoints of aging. They then provide a few examples of how this method can be applied to understand how colonization correlates with animal health.

      Strengths:

      -The method presented in this manuscript is sure to be of great utility to the host-pathogen field of C. elegans. The method also allows for utilization of large sample sizes and a way to present highly transparent data, both of which are excellent for promoting rigor and reproducibility of science.<br /> -The manuscript also does a great job in describing the limitations of the system, which is always appreciated.<br /> -The methods section for the SICKO data analysis pipeline and the availability of the code on Github are strong pluses.

      Weaknesses:

      -There are minor weaknesses in the methods that could be addressed relatively easily by expanding the explanation of how to set up the individual worm chambers (see comment 1 below).

      I am making all my comments and suggestions to the reviewers public, as I believe these comments can be useful to the general readership as well. Comment 1 is important to make the methods more accessible and comment 2 is important to make the data presentation more accessible to a broader audience. However, comments 3-4 are things/suggestions that should be considered by the authors and future users of SICKO for interpretation of all the data presented in the manuscript.

      (1) The methods section needs to be described in more detail. Considering that this is a methods development paper, more detailed explanation is required to ensure that readers can actually adapt these experiments into their labs.<br /> (a) What is the volume of lmNGM in each well?<br /> (b) Recommended volume of bacteria to seed in each well?<br /> (c) A file for the model for the custom printed 3D adaptor should be provided.<br /> (d) There should be a bit more detail on how the chambers should be assembled with all the components. After reading this, I am not sure I would be able to put the chamber together myself.<br /> (e) What is the recommended method to move worms into individual wells? Manual picking? Pipetting in a liquid?<br /> (f) Considering that a user-defined threshold is required (challenging for non-experienced users), example images should be provided on what an acceptable vs. nonacceptable threshold would look like.

      (2) The output data in 1e is very nice - it is a very nice and transparent plot, which I like a lot. However, since the data is complex, a supplemental figure to explain the data better would be useful to make it accessible for a broader audience. For example, highlighting a few rows (i.e., individual worms) and showing the raw image data for each row would be useful. What I mean is that it would be useful to show what does the worm actually look like for a "large colony size" or "small colony size"? What is the actual image of the worm that represents the yellow (large), versus dark blue (small), versus teal (in the middle)? And also the transition from dark blue to yellow would also be nice to be shown. This can probably also just be incorporated into Fig. 1d by just showing what color each of those worm images from day 1 to day 8 would represent in the heat map (although I still think a dedicated supplemental figure where you highlight a few rows and show matching pictures for each row in image files would be better).

      (3) I am not sure that doing a single-time point cross-sectional data is a fair comparison since several studies do multi-timepoint cross-sectional studies (e.g., day 1, day 5, day 9). This is especially true for using only day 1 data - most people do gut colonization assays at later timepoints since the gut barrier has been shown to break down at older ages, not day 1. The data collected by SICKO is done every day across many individuals worms and is clearly superior to this type of cross-sectional data (even with multiple timepoints), and I think this message would be further strengthened by comparing it directly to cross-sectional data collected across more than 1 timepoint of aging.

      (4) The authors show that SICKO can detect differences in wild-type vs. pmk-1 loss of function and between OP50 and PA14. However, these are very dramatic conditions that conventional methods can easily detect. I would think that the major benefit of SICKO over conventional methods is that it can detect subtle differences that cross-sectional methods would fail to visualize. It might be useful to see how well SICKO performs for these more subtle effects (e.g., OP50 on NGM vs. bacteria-promoting media; OP50 vs. HT115; etc.).<br /> (a) Similar to the above comment, the authors discuss how pmk-1 has colonization-independent effects on host-pathogen interactions. Maybe using a more direct approach to affect colonization (e.g., perturbing gut actin function like act-5) would be better.

    1. eLife Assessment

      This useful paper systematically evaluates B-cell receptor (BCR) repertoires across tumors, tumor-draining lymph nodes, and peripheral blood in patients with melanoma, lung adenocarcinoma, and colorectal cancer. It investigates the interplay between the tumor microenvironment and immune responses, revealing differences in BCR clonotype maturity, hypermutation, and spatial distribution. The study highlights the heterogeneity in immune responses and provides solid insights into the potential of tumor-infiltrating B cells for therapeutic applications, despite limitations in patient cohort size and sequencing methodology.

    2. Reviewer #3 (Public Review):

      In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumor, lymph node and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete.

      Major strengths:

      (1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.

      (2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.

      Comments on revisions:

      Your efforts in addressing concerns related to methodological details, narrative clarity, and data representation are commendable. The expanded descriptions of Fig. 1A and the experimental design, as well as the restructuring of the discussion, have greatly enhanced the manuscript's clarity and coherence.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      Concerns and comments on current version:

      The revision has improved the manuscript but, in my opinion, remains inadequate. While most of my requested changes have been made, I do not see an expansion of Fig1A legend to incorporate more details about the analysis. Lacking details of methodology was a concern from all reviewers.

      To address this concern, we expanded Fig.1A legend, and also significantly expanded the text describing experimental design, to also include the description of the data analysis approach.

      “BCR repertoires libraries were obtained using the 5’-RACE (Rapid Amplification of cDNA Ends) protocol as previously described21 and sequenced with 150+150 bp read length. This approach allowed us to achieve high coverage for the obtained libraries (Table S1) to reveal information on clonal composition, CDR-H3 properties, IgM/IgG/IgA isotypes and somatic hypermutation load within CDR-H3. For B cell clonal lineage reconstruction and phylogenetic analysis, however, 150+150 bp read length is suboptimal because it does not cover V-gene region outside CDR-H3, where hypermutations also occur. Therefore, to verify our conclusions based on the data obtained by 150+150 bp sequencing (“short repertoires”), for some of our samples we also generated BCR libraries by IG RNA Multiplex protocol (See Materials and Methods) and sequenced them at 250+250 bp read length (“long repertoires”). Libraries obtained by this protocol cover V gene sequence starting from CDR-H1 and capture most of the hypermutations in the V gene. Conclusions about clonal lineage phylogeny were drawn only when they were corroborated by “long repertoire” analysis.

      For BCR repertoire reconstruction from sequencing data, we first performed unique molecular identifier (UMI) extraction and error correction (reads/UMI threshold = 3 for 5`RACE and 4 for IG Multiplex libraries). Then, we used MIXCR58 software to assemble reads into clonotypes, determine germline V, D, and J genes, isotypes, and find the boundaries of target regions, such as CDR-H3. Only

      UMI counts, and not read counts, were used for quantitative analysis. Clonotypes derived from only one UMI were excluded from the analysis of individual clonotype features but were used to analyze clonal lineages and hypermutation phylogeny, where sample size was crucial. Samples with 50 or less clonotypes left after preprocessing were excluded from the analysis.”

      Similarly, the 'fragmented' narrative was a concern of all reviewers. These matters have not been dealt with adequately enough - there are parts of the manuscript which remain fragmented and confusing.

      Unfortunately, the reviewers do not give us a hint as to which parts of the text are the most problematic in their opinion. We identified the parts describing physicochemical properties of CDR3s, Intratumoral heterogeneity and Intra-LN heterogeneity as the most problematic, and edited these parts significantly. Also, we significantly edited the Discussion section (please see the Comparison file for details). Other parts sections were also edited to improve readability and clarity.

      The narrative and analysis does not explain how the plasma cell bias has been dealt with adequately and in fact is simply just confusing. There is a paragraph at the beginning of the discussion re the plasma cell bias, which should be re-written to be clearer and moved to have a prominent place early in the results. Why are these results not properly presented? They are key for interpretation of the manuscript. Furthermore, the sorted plasma cell sequencing analysis also has only been performed on two patients.

      In response to this concern, we moved the section describing plasma cell bias in the bulk BCR repertoires to the main text.

      Another issue is that some disease cohorts are entirely composed of patients with metastasis, some without but metastasis is not mentioned. Metastasis has been shown to impact the immune landscape.

      Intrinsic heterogeneity of the cohort is indeed one of the weaknesses of our work, which could negatively impact the statistical significance of our results and, as a consequence, mask certain observations or make them less statistically significant. We mention this in the discussion section. It should not, in our understanding, lead to any false conclusions. We did not, however, pool data from primary and metastatic tumor samples, and all tumor samples that we mention are primary tumors.

      The following part of a sentence was added to the discussion:

      “...which could negatively impact the statistical significance of our results and, as a consequence, mask certain observations or make them less statistically significant.”

      A reviewer brought up a concern about the overlap analysis and I also asked for an explanation on why this F2 metric was chosen. Part of the rebuttal argues that another metric was explored showing similar results, thus the conclusion reached is reasonable. Remarkably, these data are not only omitted from the manuscript, but are not even provided for the reviewers.

      We did not intend to conceal any data from the reviewers, and we now added the panel for D metric to the S1 figure. We would also like to point out that the panel describing R metric for repertoire overlaps (a measure of similarity of overlapping clonotype frequencies), was included in the first version of the S2 Figure (now S1 Figure), and it also showed a similar trend. We hope that now the data are fully conclusive.

      This manuscript certainly includes some interesting and useful work. Unfortunately, a comprehensive re-write was required to make the work much clearer and easier to understand and this has not been realized.

      Again, we thank the reviewers for their thorough evaluation, and hopefully we could make the text clearer in the second reviewed version.

    1. eLife Assessment

      This important study presents a finding on the role of the Inferior Colliculus in sensory prediction, cognitive decision-making, and reward prediction. The evidence supporting the claims of the authors is compelling and convincing. The work will be of broad interest to sensory neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, what are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      Comments on revised version:

      The authors have adequately addressed all my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors show us two major findings based on their experiments. The first one is climbing effect, which means that neurons' activities continue to increase along time course. The second one is reward effect, which refers to sudden increase of IC neurons' activities when the rewarding is given. Climbing effect is a surprising finding, but reward effect has not been explored clearly here.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as the auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming work with hardship, and this will offer more approximate knowledge of how the human brain works.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors.

      About 'reward effect', it is still unknown if the true nature of reward effect is the simple response to the sound elicited by the electromagnetic valve of rewarding system. The authors claimed the testing space is sound-proofed and believed this is enough to support their opinion. Since the electromagnetic valve was connected to the water tube, and the water tube was attached to a monkey-chair or even in monkey's mouth, the click sound may transmit to the monkey independently on air. There are simple ways to test what happens. One is to add a few trials without reward and see what happens, or to vary the latency between sound sequence and reward.

      Only one of the major findings is convincing, this definitely reduces the credibility of the authors' statements.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.

      We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.

      Weaknesses:

      (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.

      We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:

      • Introduction: "However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored." (P.5, Line. 16-17)

      • Discussion: "However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. " (P.21, Line. 7-9)

      (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.

      We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."

      To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.

      We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.

      (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is common in many electrophysiological studies, particularly those conducted in behaving primates, where directly manipulating specific neural circuits to establish causality presents significant challenges, especially in comparison to research in mice.

      This complexity is further compounded when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could have a widespread impact on auditory responses in downstream pathways, potentially influencing sensory prediction and decision-making processes.

      Despite this limitation, our study provides novel evidence suggesting that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research aimed at exploring the underlying mechanisms and broader functional implications of these signals.

      To address the reviewer's concerns, we have made the following adjustments to the manuscript:

      (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we have referred to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships.

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.

      (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.

      We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.

      To address this concern, we have made the following revisions to the manuscript:

      (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.

      “The accumulation of the climbing effect alongside repetitive sound presentations suggests a potential linkage to reward prediction or sensory prediction, reflecting an increased probability of receiving a reward and the strengthening of sound prediction as the sound sequence progresses.” (P.10, Line. 17-20)

      “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)

      (2) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.

      “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)

      “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)

      We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.

      (5) The logic between different sections of Results is not clear.

      We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.

      To address this concern, we have made the following revisions:

      (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.

      “These results demonstrate that reward anticipation does not drive the climbing effect, thereby reinforcing the idea that sensory prediction is the primary factor influencing the accumulation of the climbing effect in the IC.” (P.12, Line. 4-7)

      “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)

      (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.

      (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.

      We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.

      To address these concerns, we have made the following revisions:

      (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.

      “We demonstrated that the climbing effect is dynamically modulated (Figure 2D-G), and this modulation is driven primarily by sensory prediction rather than reward anticipation, as controlling for reward effects showed minimal impact on the response profile (Figure 3D, E). This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 1-5)

      (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.

      “Metzger and colleagues reported a gradual increase in neural activity—termed late-trial ramping—in the IC during an auditory saccade task. Similar to our results, they observed no climbing effect in the absence of a behavioral task. Both studies support the idea that the climbing effect depends on both behavioral engagement and reward. While both pieces of research emphasize the IC's complex role in integrating auditory processing with cognitive functions related to reward and behavior, our findings provide further insight by distinguishing between the effects of sensory prediction and reward anticipation on IC neuronal activity.” (P.16, Line. 16-24)

      We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.

      We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.

      We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.

      We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We fully acknowledge the importance of distinguishing between correlation and causality. As outlined in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the inherent challenges in establishing direct causal links, particularly in electrophysiological studies involving behaving primates, and given the lower-level role of the IC in the auditory pathway.

      We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?

      We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:

      (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective and recognize its relevance. However, we believe that rhythm is unlikely to be a significant contributor to the 'climbing effect' for two key reasons:

      a) The 'climbing effect' begins as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), before any rhythm or pattern could be fully established, since rhythm generally requires at least three repetitions to form.

      b) In our reward experiment (Figs. 4-5), the sounds were also presented at regular ISIs, which could have facilitated rhythmic learning, yet the observed climbing effect was comparatively small in those conditions.

      Unfortunately, we did not explore variable ISIs in this current study, so we cannot directly address this concern with the available data.

      (2) Reward Expectation and Climbing Effect:

      The reviewer raises a valid concern regarding whether the 'climbing effect' might be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise.

      In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100% reward delivery), we did not observe a significant climbing effect in the auditory response. Additionally, the presence of reward prediction error (Fig. 4D) further supports the idea that while the monkeys may indeed form reward expectations, these expectations do not directly drive the climbing effect in the IC.

      To make this distinction clearer, we have added sentences in the revised manuscript explicitly discussing the relationship between reward expectation and the climbing effect.

      “Within the oddball paradigm, both sensory and reward predictions intensify alongside the recurrence of standard sounds, suggesting that the strength of these predictions could significantly influence neuronal responses. Our experimentation with rewards has effectively dismissed the role of reward prediction (Figures 3 and 4), highlighting the potential significance of sensory prediction in molding the climbing effect.” (P.17, Line. 14-19)

      We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and effectively address the reviewer's concerns. We sincerely thank the reviewer for these valuable suggestions, which have allowed us to improve the clarity and depth of our manuscript.

      "Reward effect" on IC neurons' responses were shown in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.

      We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.

      To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.

      “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.

      Strengths:

      This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.

      We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.

      Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.

      We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      Major Comments:

      (1) Structural Clarity and Logic Flow:

      The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."

      We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.

      To address the reviewer's concerns, we have made the following revisions:

      (1) Reorganization of Figures and Results:

      We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.

      We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.

      “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)

      “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)

      (2) Revised Title:

      In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.

      (3) Improved Logic Flow:

      We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.

      (2) Clarification of Data Analysis:

      Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.

      We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:

      (1) Detailed Explanation of Experimental Design:

      We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.

      (2) Cohesive Presentation of Data Analysis:

      Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.

      We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.

      (3) Enhanced Statistical Analysis Details:

      We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. This relevant information is highlighted in the Results section or figure legends to facilitate understanding.

      We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.

      (3) Reward Prediction Analysis:

      The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.

      We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.

      In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:

      (1) Inclusion of Population-Level Data for IC Neurons:

      In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.6), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.

      “This phenomenon was further supported by examining the responses in the duration deviation detection task. Since most IC neurons exhibit motor responses during key presses (Supplementary Figure 6), which can complicate distinguishing between reward-related activity and motor responses, we specifically selected two neurons without motor responses during key presses (Figure 5).” (P.13, Line. 10-15)

      (2) Addition of Data on Key Press Errors and No-Response Trials:

      In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure below). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.

      When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons.

      Author response image 1.

      (A)  PSTH of the neuron from Figure 5A during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 5A during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 5B.

      We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One of the major issues of this work is that its writing fails to convey the focus and significance of the work. Sentences are too long and multiple pieces of information are often integrated in one sentence, causing great confusion.

      We appreciate the reviewer's feedback regarding the clarity and structure of the manuscript. We agree that scientific writing should be clear and concise to effectively communicate the significance of the work. In response to this comment, we have undertaken the following revisions to improve the readability and focus of the manuscript:

      (1) Simplified Sentence Structure:<br /> We have revisited the manuscript and revised sentences that were overly complex or contained multiple pieces of information. Long sentences have been broken into shorter, more digestible statements to improve clarity and readability. Each sentence now conveys a single, focused idea.

      (2) Improved Flow and Focus:<br /> We have restructured certain paragraphs to ensure that the narrative flows logically and highlights the key findings. This restructuring includes placing the most significant results in prominent positions within paragraphs and ensuring that each section begins with a clear statement of purpose.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (3) Refined Significance of the Work:<br /> In response to the reviewer's concern that the manuscript fails to clearly convey the significance of the work, we have revised the Introduction and Discussion sections to better emphasize the focus and impact of our findings. We now explicitly highlight the novel contributions of this research to the understanding of the multifaceted role of the IC in sensory prediction, decision-making, and reward processing.

      “In this research, we embarked on a deviation detection task centered around sound duration with trained monkeys, performing extracellular recordings in the IC. Our observations unveiled a 'climbing effect'—a progressive increase in firing rate after sound onset, not attributable to reward but seemingly linked to sensory experience such as sensory prediction. Moreover, we identified signals of reward prediction error and decision-making. These findings propose that the IC's role in auditory processing extends into the realm of complex perceptual and cognitive tasks, challenging previous assumptions about its functionality.” (P.6, Line. 1-8)

      “Overall, our results strongly suggest that the inferior colliculus is actively engaged in sensory experience, reward prediction and decision making, shedding light on its intricate functions in these processes.” (P.16, Line. 10-12)

      We believe these revisions address the reviewer's concern and will make the manuscript more accessible to readers. Thank you for the valuable suggestion, which has led to a more precise and effective presentation of our work.

      Reviewer #2 (Recommendations for the authors):

      (1) In oddball paradigm, inter-stimuli-interval of 0.6 seconds was used. Vary the inter-stimulus-interval should prove whether this effect is rhyme learning. It is better to choose random inter-stimuli-interval and inter-trial-interval for each experiment across whole experiment in case monkeys try to remember the rhythm.

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds may lead to a rhythmic auditory response, allowing monkeys to anticipate sounds. This is a valuable suggestion, and we appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in driving the 'climbing effect.' The 'climbing effect' starts as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), which is before any rhythm or pattern could be fully established. Typically, rhythm learning requires at least three repetitions to form a predictable sequence.

      Unfortunately, we did not vary the inter-stimuli-interval in the current study, so we cannot directly test this hypothesis with the current dataset. However, we agree with the reviewer that using random ISIs would be an effective way to rule out any potential contribution of rhythm learning to the climbing effect directly.

      (2) Regarding "reward effect" on IC neurons' responses, we should rule out the possibility of simple auditory response to the switching of electromagnetic valve.

      We appreciate the reviewer’s concern about the potential confounding factor of the electromagnetic valve's click sound during water reward delivery, which could be interpreted as an auditory response rather than a true reward-related response. Anticipating this issue, we took measures to eliminate this possibility by placing the electromagnetic valve outside the soundproof room where neuronal recordings were conducted. This setup ensured that any potential auditory noise from the valve was minimized and unlikely to influence the IC neuronal activity.

      To address this concern more explicitly, we have added a description in the Methods section detailing this setup. This revision clarifies the steps we took to rule out this potential confound, strengthening the validity of our claim that the observed IC activity is genuinely related to reward processing and not a simple auditory response to the valve's operation.

      We thank the reviewer for bringing attention to this critical aspect of our experimental design, and we hope this clarification enhances the interpretation of our findings.

      “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)

      (3) Since monkeys are smart, simple Go/NoGo design is not a good strategy. The task with more buttons to press, such as 2-AFC or 4-AFC task, may prevent artificial effect of unwanted behaviors and offer us more reliable and useful data.

      We appreciate the reviewer’s suggestion to implement a more complex behavioral task, such as a 2-Alternative Forced Choice (2-AFC) or 4-AFC design, to reduce the possibility of unwanted behaviors and to gather more reliable data. We agree that such paradigms could offer additional insights and help control the monkeys’ decision-making processes by reducing potential confounding factors related to the simplicity of Go/NoGo responses.

      In our current study, we chose the Go/NoGo task because it aligns with our primary experimental goal: investigating the relationship between IC activity and sensory prediction, decision-making, and reward processing in a simplified manner. This task allowed us to focus on reward prediction and sensory responses without introducing additional complexity that could increase the cognitive load on the monkeys and affect their performance. It is worth noting that training monkeys to perform auditory tasks is generally more challenging compared to visual tasks, though they are indeed capable of complex learning.

      Moreover, this novelty detection task was initially designed as an oddball paradigm to explore predictive coding along the auditory pathway. Our lab has concentrated on this topic for several years, with the majority of current research focusing on non-behavioral subjects such as rodents. Implementing a more advanced paradigm like 2-AFC would have increased training time and required a different approach than our core objective.

      That said, we agree that future studies would benefit from using more sophisticated tasks, such as 2-AFC or 4-AFC paradigms, as they could offer a more refined understanding of decision-making processes while enhancing the quality of data by minimizing unwanted behaviors. We believe that incorporating more advanced behavioral paradigms in future work will further enhance the rigor and reliability of our findings.

      (4) Line 52, "challenges...", sounds a little bit too much. The authors tried to sell the ideal that IC is more than simple sensory relay point. I agree with that and I know the experiments on monkeys are not easy to gain too much comprehensive data. But to support authors' further bold opinions, more analysis is need to be done.

      We appreciate the reviewer’s feedback on the tone of the statement in Line 52, where we describe the findings as “challenging” conventional views of the IC as a simple sensory relay point. We agree that while our data provides intriguing insights into the multifunctionality of the IC, especially in sensory prediction, decision-making, and reward processing.

      To address this, we have toned down the language in the revised manuscript to better reflect the current state of our findings. Rather than presenting the results as a direct challenge to existing knowledge, we now describe them as contributing to a growing body of evidence that suggests the IC plays a more integrative role in auditory processing and cognitive functions.

      “This research highlights a more complex role for the IC than traditionally understood, showcasing its integral role in cognitive and sensory processing and emphasizing its importance in integrated brain functions.” (Abstract, P.3, Line.12-15)

      “This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 3-5)

      (5) Line 143, "peak response", it is better not to refer this transient response as "peak response". How about "transient response" or "transient peak response"?

      Thank you for your suggestion regarding the terminology used in Line 143. We agree with the reviewer that referring to this as simply a "peak response" could be misleading. To improve clarity and precision, we have revised the term to "transient peak response" as recommended.

      We believe this adjustment better captures the nature of the neuronal activity observed and avoids confusion. The manuscript has been updated accordingly, and we appreciate the reviewer’s valuable input.

      (6) Is it possible to manipulate IC area and check the affection in behavior task?

      We appreciate the reviewer’s suggestion to manipulate the IC area and observe its effect on behavior during the task. Indeed, this would provide valuable causal evidence regarding the role of the IC in sensory prediction, decision-making, and reward processing, which would complement the correlational findings we have presented.

      However, in this particular study, we focused on electrophysiological recordings to observe naturally occurring neuronal activity in behaving monkeys. While it is certainly feasible to manipulate IC activity, such as through pharmacological inactivation, optogenetics, or electrical stimulation, these techniques pose technical challenges in primates. Moreover, manipulating the IC, given its role as a lower-level relay station in the auditory pathway, could potentially disrupt auditory processing more broadly, complicating the interpretation of behavioral outcomes.

      That said, we agree that introducing such manipulations in future studies would significantly enhance our understanding of the causal role of the IC in cognitive and sensory functions. We have now emphasized this as a key future research direction in the revised manuscript’s discussion section. Thank you for this insightful suggestion.

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      Reviewer #3 (Recommendations for the authors):

      Minor Comments:

      (1) Figure Labeling:

      The figures require more precise labeling, particularly concerning the analysis time windows, to facilitate reader understanding of the results.

      We thank the reviewer for highlighting the importance of precise figure labeling, particularly regarding the analysis time windows. We understand that clear labeling is critical for conveying our findings effectively.

      In response to your suggestion, we have revised the figures to include more precise and detailed labels, especially for the analysis time windows. These changes will help guide readers through the experimental design and clarify the interpretation of the results. We hope these improvements enhance the overall clarity and accessibility of the figures.

      (2) Discrepancies in Figures and Text:

      There are discrepancies in the manuscript that could confuse readers. For example, on line 154, what was referred to as Supplementary Figure 1 seemed to actually be Supplementary Figure 2. Similar issues were noted on lines 480 and 606.

      We appreciate the reviewer bringing this issue to our attention. We apologize for the discrepancies between the figures referenced in the text and their actual labels in the manuscript, as this could indeed confuse readers.

      We have carefully reviewed the entire manuscript and corrected all discrepancies between the figures and their corresponding references in the text, including the issues noted on lines 154, 480, and 606. We have ensured that the figure and supplementary figure references are now consistent and accurate throughout the manuscript.

      (3) Inconsistent Formatting in Figure legends:

      Ensuring a more professional and uniform presentation throughout the manuscript would be appreciated. There was inconsistent use of uppercase and lowercase letters in legends.

      We appreciate the reviewer’s attention to detail regarding the formatting of figure legends. Ensuring a professional and consistent presentation is crucial for enhancing the readability and overall quality of the manuscript.

      We have carefully reviewed all figure legends and made the necessary corrections to ensure consistent use of uppercase and lowercase letters, as well as uniform formatting throughout the manuscript. This includes ensuring that all abbreviations and terminology are used consistently across the text and legends.

    1. eLife Assessment

      The paper presents a streamlined new approach for functional validation of genes known to underlie fragile bone disorders in a relatively high throughput, using CRISPR-mediated knockouts and a number of phenotypic assessments in zebrafish. Convincing data demonstrate the feasibility and validity of this approach, which presents an important tool for rapid functional validation of candidate gene(s) associated with heritable bone diseases identified from genetic studies.