10,000 Matching Annotations
  1. Dec 2024
    1. Reviewer #1 (Public review):

      Summary<br /> In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism.<br /> The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.<br /> The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

    2. Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

    1. eLife Assessment

      This useful study provides the first assessment of potentially interactive effects of seasonality and blood source on mosquito fitness, together in one study. During revision, the manuscript has been improved, providing additional solid data to support the robustness of observations. However, the discussion still requires further refinement to present the conclusions in manner that is consistent with the data presented. Overall, this interesting study will advance our current understanding of mosquito biology.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

    3. Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.<br /> (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.<br /> (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.<br /> (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?<br /> (4) According to eLife policy, data must be made freely available (not just upon request).

    4. Author response:

      The following is the authors’ response to the previous reviews.

      We have carefully addressed all the reviewers' suggestions, and detailed responses are provided at the end of this letter. In summary:

      • We conducted two additional replicates of the study to obtain more robust and reliable data.

      • The Introduction has been revised for greater clarity and conciseness.

      • The Results section was shortened and reorganized to highlight the key findings more effectively.

      • The Discussion was modified according to the reviewers' suggestions, with a focus on reorganization and conciseness.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision: 

      Overall, I am not quite convinced about the possible shift in host use in the Argentinian populations of Cx. quinquefasciatus. The evidence from the papers that the authors cite is not strong enough to derive this conclusion. Therefore, I think that the introduction and discussion parts where they talk about host shift in Cx. quinquefasciatus should be removed completely as it misleads the readers. I suggest limiting the manuscript to talking only about the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quinquefasciatus

      As mentioned in the previous revision, we agree on the reviewer observation about the lack of evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from Argentina. We include this topic in the discussion.

      Additionally, we also added a paragraph in the discussion section to include the limitations of our study and conclusions. One of them is the fact that our results are based on controlled conditions experiments. Future studies are needed to elucidate if the same trend is found in the field.

      Reviewer #1 (Recommendations for the authors): 

      Abstract

      Line 73: shift in feeding behavior

      Accepted as suggested. 

      Discussion

      Line 258: addressed that Accepted as suggested.

      Line 263: blood is nutritionally richer

      Accepted as suggested.

      Reviewer #2 (Public Review): 

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer concerns, with several exception that continue to cause concern about the conclusions of the study. 

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field. 

      (3) The manuscript has become a lot clearer and easier to read with the revisions - thank you to the authors for working hard to make many of the suggested changes. 

      Weaknesses:

      (1) The authors have decided not to follow the suggestion of conducting experimental replicates of the study. This is understandable given the significant investment of resources and time necessary, however, it leaves the study lacking support. Experimental replication is an important feature of a strong study and helps to provide confidence that the observed patterns are real and replicable. Without replication, I continue to lack confidence in the conclusions of the study. 

      We included replicates as suggested.  

      (2) The authors have included some additional discussion about the counterintuitive nature of their results, but the paragraph discussing this in the discussion was confusing. I believe that this should be revised. This is a key point of the paper and needs to be clear to the reader.

      Revised as suggested. 

      (3) There should be more discussion of the host switching observed in the two studies conducted in Argentina referenced by the authors. Since host switching is the foundation for the hypothesis tested in this paper, it is important to fully explain what is currently known in Argentina. 

      Accepted as suggested.

      (4) In some cases, the explanations of referenced papers are not entirely accurate. For example, when referencing Erram et al 2022, I think the authors misrepresented the paper's discussion regarding pre-diuresis- Erram et al. are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility (rather than leading to higher fecundity on birds, as stated in this manuscript). The study performed by Erram et al. also didn't prove this phenomenon, they just suggest it as a possible mechanism to explain their results, so that should be made clear when referencing the paper. 

      Changed as suggested.

      (5) In some cases, the conclusions continue to be too strongly worded for the evidence available. For example, lines 322-324: I don't think the data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. 

      Redaction was modified as suggested to tight our discussion with results.

      (6) There is limited mention of the caveat that this experiment performed with simulated seasonality that does not perfectly replicate seasonality in the field. I think this caveat should be discussed in the discussion (e.g. that humidity is held constant).

      This topic is now included in the discussion as suggested. 

      Reviewer #2 (Recommendations for the authors): 

      59-60: These terms should end with -phagic instead of -philic. These papers study blood feeding patterns, not preference. I understand that the Janssen papers calls it "mammalophilic" in their title, but this was an incorrect use of the term in their paper. There are some review papers that explain the difference in this terminology if it's helpful.

      Accepted as suggested. 

      73: edit to "in" feeding behavior 

      Accepted as suggested.

      77-78: Given that the premise of your study is based on the phenomenon of host switching, I suggest that you expand your discussion of these two papers. What did they observe? Which hosts did they switch from / to and how dramatic was the shift?

      Accepted as suggested. 

      79: replace acknowledged with experienced 

      Accepted as suggested.

      79-80: the way that this is written is misleading. It suggests that Spinsanti showed that seasonal variation in SLEV could be attributed to a host shift, which isn't true. This citation should come before the comma and then you should use more cautious language in the second half. E.g which MIGHT be possible to attribute to .... 

      Accepted as suggested.

      80-82: this is not convincing. Even if the Robin isn't in Argentina, Argentina does have migrating birds, so couldn't this be the case for other species of birds? Do any of the birds observed in previous blood meal analyses in Argentina migrate? If so, couldn't this hypothesis indeed play a role? 

      A paragraph about this topic was added to the discussion as suggested.

      90: hypotheses for what? The fall peak in cases? Or host switching? 

      Changed to be clearer.

      98: where was this mentioned before? I think "as mentioned before" can be removed. 

      Accepted as suggested.

      101: edit to "whether an interaction effect exists" 

      Accepted as suggested.

      104: edit to "We hypothesize that..." 

      Accepted as suggested.

      106: reported host USE changes, not host PREFERENCE changes, right? 

      All the terminology was change to host pattern and not preference to avoid confusion.

      200: Briefly reading Carsey and Harden, it looks like the methodology was developed for social science. Is there anything you can cite to show this applied to other types of data? If not, I think this requires more explanation in your MS. 

      This was removed as replicates were included.

      237-239: I think it is best not to make a definitive statement about greater/higher if it isn't statistically significant; I suggest modifying the sentences to state that the differences you are listing were not significantly different up front rather than at the end, otherwise if people aren't reading carefully, they may get the wrong impression. 

      Accepted as suggested.

      245: you only use the term MS-I once before and I forgot what it meant since it wasn't repeated, so I had to search back through with command-F. I suggest writing this out rather than using the acronym. 

      Accepted as suggested.

      249: edit to: "an interaction exists between the effect of..." 

      Accepted as suggested.

      253-254: greater compared to what? 

      Change for clearness. 258-260: edit for grammar 

      Accepted as suggested.

      260-262: edit for grammar; e.g. "However, this assumption lacks solid evidence; there is a scarcity of studies regarding nutritional quality of avian blood and its impact on mosquito fitness." 

      Accepted as suggested.

      263: edit: blood is nutritionally... 

      Accepted as suggested.

      264-267: This doesn't sound like an accurate interpretation of what the paper suggests regarding pre-diuresis in their discussion - they are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility. They also don't show this, they just suggest it as a possible mechanism to explain their results. 

      This topic was removed given the restructuring of discussion.

      253-269: You should tie this paragraph back to your results to explicitly compare/contrast your findings with the previous literature. 

      Accepted as suggested.

      270-282: This paragraph would be a good place to explain the caveat of working in the laboratory - for example, humidity was the same across the two seasons which I'm guessing isn't the case in the field in Argentina. You can discuss what aspects of laboratory season simulation do not accurately replicate field conditions and how this can impact your findings. You said in your response to the reviewers that you weren't interested in measuring other variables (which is fair, and not expected!), but the beauty of the discussion section is to be able to think about how your experimental design might impact your results - one possibility is that your season simulation may not have produced the results produced by true seasonal shifts. 

      Accepted as suggested.

      279-281: You say your experiment was conducted within the optimal range, which would suggest that both summer and autumn were within that range, but then you only talk about summer as optimal in the following sentence. 

      Changed for clearness.

      281-282: You should clarify this sentence - state what the interaction has an effect on. 

      Accepted as suggested.

      283-291: I appreciate that your discussion now acknowledges the small sample size and the questions that remain unanswered due to the results being opposite to that of the hypothesis, but this paragraph lacks some details and in places doesn't make sense. 

      I think you need to emphasize which groups had small sample size and which conclusions that might impact. I also think you need to explain why the sample size was substantially smaller for some groups (e.g. did they refuse to feed on the mouse in the autumn?). I appreciate that sample sizes are hard to keep high across many groups and two gonotrophic periods, but unfortunately, that is why fitness experiments are so hard to do and by their nature, take a long time. I understand that other papers have even lower sample size, but I was not asked to review those papers and would have had the same critique of them. I don't believe that creating simulated data via a Monte Carlo approach can make up for generating real data. As I understand it from your explanation, you are parametrizing the Monte Carlo simulations with your original data, which was small to begin with for autumn mouse. Using this simulation doesn't seem like a satisfactory replacement for an experimental replicate in my opinion. I maintain that at least a second replicate is necessary to see whether the patterns that you have observed hold. 

      The performing of a power analysis and addition of more replicates tried to solve the issue of sample size. More about this critic is added in the discussion. The simulation approach was totally removed.

      Regarding the directionality of the interaction effect, I think this warrants more discussion. Lines 287-291 don't make sense to me. You suggest that feeding on birds in the autumn may confer a reproductive advantage when conditions are more challenging. But then why wouldn't they preferentially feed on birds in the autumn, rather than mammals? I suggest rewriting this paragraph to make it clearer. 

      Accepted as suggested.

      297: earlier mentioned treatments? Do you mean compared to the first gonotrophic cycle? This isn't clear. 

      Changed for clearness.

      302-303: Did you clarify whether you are allowed to reference unpublished data in eLife? 

      This was removed to follow the guidelines of eLife.

      316-317: "it becomes apparent" sounds awkward, I suggest rewording and also explaining how this conclusion was made. 

      Accepted as suggested.

      322-324: I think that this statement is too strongly worded. I don't think your data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. Please modify this and make your conclusions more cautious and closely linked to what you actually demonstrated. 

      Accepted as suggested.

      325: change will perform to would have 

      Accepted as suggested.

      326: add to the sentence: "and vice versa in the summer" 

      Accepted as suggested.

      330: possible explanations, not explaining scenarios. 

      Accepted as suggested.

      517: I think you should repeat the abbreviation definitions in the caption to make it easier for readers, otherwise they have to flip back and forth which can be difficult depending on formatting.

      Accepted as suggested. 

      In general, I think that your captions need more information. I think the best captions explain the figure relatively thoroughly such that the reader can look at the figure and caption and understand without reading the paper in depth. (e.g. the statistical test used).

      Data availability: The eLife author instructions do say that data must be made available, so there should be a statement on data availability in your MS. I also suggest you make the code available.

      Accepted as suggested.

    1. eLife Assessment

      This useful study presents a genetically encoded barcoding system that could advance transcriptomic studies and that has the potential for further applications, such as in high-throughput population-scale behavioral measurements. The evidence supporting the claims of the authors is solid and highlights both the usefulness and the limitations of the approach.

    2. Reviewer #1 (Public review):

      The aim of this paper is to describe a novel method for genetic labelling of animals or cell populations, using a system of DNA/RNA barcodes.

      Strengths:

      • The author's attempt at providing a straightforward method for multiplexing Drosophila samples prior to scRNA-seq is commendable. The perspective of being able to load multiple samples on a 10X Chromium without antibody labelling is appealing.<br /> • The authors are generally honest about potential issues in their method, and areas that would benefit from future improvement.<br /> • The article reads well. Graphs and figures are clear and easy to understand.

      Weaknesses:

      • The usefulness of TaG-EM for phototaxis, egg laying or fecundity experiments is questionable. The behaviours presented here are all easily quantifiable, either manually or using automated image-based quantification, even when they include a relatively large number of groups and replicates. Despite their claims (e.g., L311-313), the authors do not present any real evidence about the cost- or time-effectiveness of their method in comparison to existing quantification methods.<br /> • Behavioural assays presented in this article have clear outcomes, with large effect sizes, and therefore do not really challenge the efficiency of TaG-EM. By showing a T-maze in Fig 1B, the authors suggest that their method could be used to quantify more complex behaviours. Not exploring this possibility in this manuscript seems like a missed opportunity.<br /> • Experiments in Figs S3 and S6 suggest that some tags have a detrimental effect on certain behaviours or on GFP expression. Whereas the authors rightly acknowledge these issues, they do not investigate their causes. Unfortunately, this question the overall suitability of TaG-EM, as other barcodes may also affect certain aspects of the animal's physiology or behaviour. Revising barcode design will be crucial to make sure that sequences with potential regulatory function are excluded.<br /> • For their single-cell experiments, the authors have used the 10X Genomics method, which relies on sequencing just a short segment of each transcript (usually 50-250bp - unknown for this study as read length information was not provided) to enable its identification, with the matching paired-end read providing cell barcode and UMI information (Macosko et al., 2015). With average fragment length after tagmentation usually ranging from 300-700bp, a large number of GFP reads will likely not include the 14bp TaG-EM barcode. When a given cell barcode is not associated with any TaG-EM barcode, then demultiplexing is impossible. This is a major problem, which is particularly visible in Figs 5 and S13. In 5F, BC4 is only detected in a couple of dozen cells, even though the Jon99Ciii marker of enterocytes is present in a much larger population (Fig 5C). Therefore, in this particular case, TaG-EM fails to detect most of the GFP-expressing cells. Similarly, in S13, most cells should express one of the four barcodes, however many of them (maybe up to half - this should be quantified) do not. Therefore, the claim (L277-278) that "the pan-midgut driver were broadly distributed across the cell clusters" is misleading. Moreover, the hypothesis that "low expressing driver lines may result in particularly sparse labelling" (L331-333) is at least partially wrong, as Fig S13 shows that the same Gal4 driver can lead to very different levels of barcode coverage.<br /> • Comparisons between TaG-EM and other, simpler methods for labelling individual cell populations are missing. For example, how would TaG-EM compare with expression of different fluorescent reporters, or a strategy based on the brainbow/flybow principle?<br /> • FACS data is missing throughout the paper. The authors should include data from their comparative flow cytometry experiment of TaG-EM cells with or without additional hexameric GFP, as well as FSC/SSC and fluorescence scatter plots for the FACS steps that they performed prior to scRNA-seq, at least in supplementary figures.<br /> • The authors should show the whole data described in L229, including the cluster that they chose to delete. At least, they should provide more information about how many cells were removed. In any case, the fact that their data still contains a large number of debris and dead cells despite sorting out PI negative cells with FACS and filtering low abundance barcodes with Cellranger is concerning.

      Overall, although a method for genetic tagging cell populations prior to multiplexing in single-cell experiments would be extremely useful, the method presented here is inadequate. However, despite all the weaknesses listed above, the idea of barcodes expressed specifically in cells of interest deserves more consideration. If the authors manage to improve their design to resolve the major issues and demonstrate the benefits of their method more clearly, then TaG-EM could become an interesting option for certain applications.

      Comments on revisions:

      The authors have addressed many important points, providing reassurances about the initial weaknesses of their work. Although the TaG-EM is unlikely to have a significant influence on the field due to its limited benefits, the results are now sound and provide the reader with an unbiased view of the possibilities and limitations of the method.

    3. Reviewer #2 (Public review):

      The authors developed the TaG-EM system to address challenges in multiplexing Drosophila samples for behavioral and transcriptomic studies. This system integrates DNA barcodes upstream of the polyadenylation site in a UAS-GFP construct, enabling pooled behavioral measurements and cell type tracking in scRNA-seq experiments. The revised manuscript expands on the utility of TaG-EM by demonstrating its application to complex assays, such as larval gut motility, and provides a refined analysis of its limitations and cost-effectiveness.

      Strengths

      (1) Novelty and Scope: The study demonstrates the potential for TaG-EM to streamline multiplexing in both behavioral and transcriptomic contexts. The additional application to labor-intensive larval gut motility assays highlights its scalability and practical utility.

      (2) Data Quality and Clarity: Figures and supplemental data are mostly clear and significantly enhanced in the revised manuscript. The addition of Supplemental Figures 18-21 addresses initial concerns about scRNA-seq data and driver characterization.

      (3) Cost-Effectiveness Analysis: New analyses of labor and cost savings (e.g., Supplemental Figure 8) provide a practical perspective.

      (4) Improvements in Barcode Detection and Analysis: Enhanced enrichment protocols (Supplemental Figures 18-19) demonstrate progress in addressing limitations of barcode detection and increase the detection rate of labeled cells.

      Weaknesses

      (1) Barcode Detection Efficiency: While improvements are noted, the low barcode detection rate (~37% in optimized conditions) limits the method's scalability in some applications, such as single-cell sequencing experiments with complex cell populations.

      (2) Sparse Labeling: Sparse labeling of cell populations, particularly in scRNA-seq assays, remains a concern. Variability in driver strength and regional expression introduces inconsistencies in labeling density.

      (3) Behavioral Applications: The utility of TaG-EM in quantifying more complex behaviors remains underexplored, limiting the generalizability of the method beyond simpler assays like phototaxis and oviposition.

      (4) Driver Line Characterization: While improvements in driver line characterization were made, variability in expression patterns and sparse labeling emphasize the need for further refinement of constructs and systematic backcrossing to standardize the genetic background.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this paper is to describe a novel method for genetic labelling of animals or cell populations, using a system of DNA/RNA barcodes.

      Strengths:

      • The author's attempt at providing a straightforward method for multiplexing Drosophila samples prior to scRNA-seq is commendable. The perspective of being able to load multiple samples on a 10X Chromium without antibody labelling is appealing.

      • The authors are generally honest about potential issues in their method, and areas that would benefit from future improvement.

      • The article reads well. Graphs and figures are clear and easy to understand.

      We thank the reviewer for these positive comments.

      Weaknesses:

      • The usefulness of TaG-EM for phototaxis, egg laying or fecundity experiments is questionable. The behaviours presented here are all easily quantifiable, either manually or using automated image-based quantification, even when they include a relatively large number of groups and replicates. Despite their claims (e.g., L311-313), the authors do not present any real evidence about the cost- or time-effectiveness of their method in comparison to existing quantification methods.

      While the behaviors that were quantified in the original manuscript were indeed relatively easy to quantify through other methods, they nonetheless demonstrated that sequencing-based TaG-EM measurements faithfully recapitulated manual behavioral measurements. In response to the reviewer’s comment, we have added additional experiments that demonstrate the utility of TaG-EM-based behavioral quantification in the context of a more labor-intensive phenotypic assay (measuring gut motility via food transit times in Drosophila larvae, Figure 4, Supplemental Figure 7). We found that food transit times in the presence and absence of caffeine are subtly different and that, as with larger effect size behaviors, TaG-EM data recapitulates the results of the manual assay. This experiment demonstrates both that TaG-EM can be used to streamline labor-intensive behavioral assays (we have included an estimate of the savings in hands-on labor for this assay by using a multiplexed sequencing approach, Supplemental Figure 8) and that TaG-EM can quantify small differences between experimental groups. We also note in the discussion that an additional benefit of TaGEM-based behavioral assays is that the observed is blinded as to the experimental conditions as they are intermingled in a single multiplexed assay. We have added the following text to the paper describing these experiments.

      Results:

      “Quantifying food transit time in the larval gut using TaG-EM

      Gut motility defects underlie a number of functional gastrointestinal disorders in humans (Keller et al., 2018). To study gut motility in Drosophila, we have developed an assay based on the time it takes a food bolus to transit the larval gut (Figure 4A), similar to approaches that have been employed for studying the role of the microbiome in human gut motility (Asnicar et al., 2021). Third instar larvae were starved for 90 minutes and then fed food containing a blue dye. After 60 minutes, larvae in which a blue bolus of food was visible were transferred to plates containing non-dyed food, and food transit (indicated by loss of the blue food bolus) was scored every 30 minutes for five hours (Supplemental Figure 7). 

      Because this assay is highly labor-intensive and requires hands-on effort for the entire five-hour observation period, there is a limit on how many conditions or replicates can be scored in one session (~8 plates maximum). Thus, we decided to test whether food transit could be quantified in a more streamlined and scalable fashion by using TaG-EM (Figure 4B). Using the manual assay, we observed that while caffeinecontaining food is aversive to larvae, the presence of caffeine reduces transit time through the gut (Figure 4C, Supplemental Figure 7). This is consistent with previous observations in adult flies that bitter compounds (including caffeine) activate enteric neurons via serotonin-mediated signaling and promote gut motility (Yao and Scott, 2022). We tested whether TaG-EM could be used to measure the effect of caffeine on food transit time in larvae. As with prior behavioral tests, the TaG-EM data recapitulated the results seen in the manual assay (Figure 4D). Conducting the transit assay via TaGEM enables several labor-saving steps. First, rather than counting the number of larvae with and without a food bolus at each time point, one simply needs to transfer nonbolus-containing larvae to a collection tube. Second, because the TaG-EM lines are genetically barcoded, all the conditions can be tested at once on a single plate, removing the need to separately count each replicate of each experimental condition. This reduces the hands-on time for the assay to just a few minutes per hour.  A summary of the anticipated cost and labor savings for the TaG-EM-based food transit assay is shown in Supplemental Figure 8.”

      Discussion:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8). An additional benefit of multiplexed TaG-EM behavioral measurements is that the experimental conditions are effectively blinded as the multiplexed conditions are intermingled in a single assay.”

      Methods:

      “Larval gut motility experiments

      Preparing Yeast Food Plates

      Yeast agar plates were prepared by making a solution containing 20% Red Star Active Dry Yeast 32oz (Red Star Yeast) and 2.4% Agar Powder/Flakes (Fisher) and a separate solution containing 20% Glucose (Sigma-Aldrich). Both mixtures were autoclaved with a 45-minute liquid cycle and then transferred to a water bath at 55ºC. After cooling to 55ºC, the solutions were combined and mixed, and approximately 5 mL of the combined solution was transferred into 100 x 15 mm petri dishes (VWR) in a PCR hood or contamination-free area. For blue-dyed yeast food plates, 0.4% Blue Food Color (McCormick) was added to the yeast solution. For the caffeine assays, 300 µL of a solution of 100 mM 99% pure caffeine (Sigma-Aldrich) was pipetted onto the blue-dyed yeast plate and allowed to absorb into the food during the 90-minute starvation period.

      Manual Gut Motility Assay

      Third instar Drosophila larvae were transferred to empty conical tubes that had been misted with water to prevent the larvae from drying out. After a 90-minute starvation period the larvae were moved from the conical to a blue-dyed yeast plate with or without caffeine and allowed to feed for 60 minutes. Following the feeding period, the larvae were transferred to an undyed yeast plate. Larvae were scored for the presence or absence of a food bolus every 30 minutes over a 5-hour period. Up to 8 experimental replicates/conditions were scored simultaneously. 

      TaG-EM Gut Motility Assay

      Third instar larvae were starved and fed blue dye-containing food with or without caffeine as described above. An equal number of larvae from each experimental condition/replicate were transferred to an undyed yeast plate. During the 5-hour observation period, larvae were examined every 30 minutes and larvae lacking a food bolus were transferred to a microcentrifuge tube labeled for the timepoint. Any larvae that died during the experiment were placed in a separate microcentrifuge tube and any larvae that failed to pass the food bolus were transferred to a microcentrifuge tube at the end of the experiment. DNA was extracted from the larvae in each tube and TaG-EM barcode libraries were prepared and sequenced as described above.”

      • Behavioural assays presented in this article have clear outcomes, with large effect sizes, and therefore do not really challenge the efficiency of TaG-EM. By showing a Tmaze in Fig 1B, the authors suggest that their method could be used to quantify more complex behaviours. Not exploring this possibility in this manuscript seems like a missed opportunity.

      See the response to the previous point.

      • Experiments in Figs S3 and S6 suggest that some tags have a detrimental effect on certain behaviours or on GFP expression. Whereas the authors rightly acknowledge these issues, they do not investigate their causes. Unfortunately, this question the overall suitability of TaG-EM, as other barcodes may also affect certain aspects of the animal's physiology or behaviour. Revising barcode design will be crucial to make sure that sequences with potential regulatory function are excluded.

      We have determined that the barcode (BC#8) that had no detectable Gal4induced gene expression in Figure S6 (now Supplemental Figure 9) has a deletion in the GFP coding region that ablates GFP function. Interestingly, the expressed TaG-EM barcode transcript is still detectable in single cell sequencing experiments, but obviously this line cannot be used for cell enrichment (at least based solely on GFP expression from the TaG-EM construct). While it is unclear how this line came to have a lesion in the GFP gene, we have subsequently generated >150 additional TaG-EM stocks and we have tested the GFP expression of these newly established stocks by crossing them to Mhc-Gal4. All of the additional stocks had GFP expression in the expected pattern, indicating that the BC#8 construct is an outlier with respect to inducibility of GFP. We have added the following text to the results section to address this point:

      “No GFP expression was visible for TaG-EM barcode number 8, which upon molecular characterization had an 853 bp deletion within the GFP coding region (data not shown). We generated and tested GFP expression of an additional 156 TaG-EM barcode lines (Alegria et al., 2024), by crossing them to Mhc-Gal4 and observing expression in the adult thorax. All 156 additional TaG-EM lines had robust GFP expression (data not shown).”

      It is certainly the case that future improvements to the construct design may be necessary or desirable and that back-crossing could likely be used to alleviate line-toline differences for specific phenotypes, we also address this point in the discussion with the following text:

      “We excluded this poor performing barcode line from the fecundity tests, however, backcrossing is often used to bring reagents into a consistent genetic background for behavioral experiments and could also potentially be used to address behavior-specific issues with specific TaG-EM lines. In addition, other strategies such as averaging across multiple barcode lines or permutation of barcode assignment across replicates could also mitigate such deficiencies.”

      • For their single-cell experiments, the authors have used the 10X Genomics method, which relies on sequencing just a short segment of each transcript (usually 50-250bp - unknown for this study as read length information was not provided) to enable its identification, with the matching paired-end read providing cell barcode and UMI information (Macosko et al., 2015). With average fragment length after tagmentation usually ranging from 300-700bp, a large number of GFP reads will likely not include the 14bp TaG-EM barcode. 

      The 10x Genomics 3’ workflows that were used for sequencing TaG-EM samples reads the cell barcode and UMI in read one and the expressed RNA sequence in read two. We sequenced the samples shown in Figure 5 in the initial manuscript using a run configuration that generated 150 bp for read two. The TaG-EM barcodes are located just upstream of the poly-adenylation sites (based on the sequencing data, we observe two different poly-A sites and the TaG-EM barcode is located 35 and 60 bp upstream of these sites). Based on the location of the TaG-EM barcodes,150 bp reads is sufficient to see the barcode in any GFP-associated read (when using the 3’ gene expression workflow). In addition to detecting the expression of the TaG-EM barcodes in the 10x Genomics gene expression library, it is possible to make a separate library that enriches the barcode sequence (similar to hashtag or CITE-Seq feature barcode libraries). We have added experimental data where we successfully performed an enrichment of the TaG-EM barcodes and sequenced this as a separate hashtag library (Supplemental Figure 18). We have added text to the results describing this work and also included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. 

      Results:

      “In antibody-conjugated oligo cell hashing approaches, sparsity of barcode representation is overcome by spiking in an additional primer at the cDNA amplification step and amplifying the hashtag oligo by PCR. We employed a similar approach to attempt to enrich for TaG-EM barcodes in an additional library sequenced separately from the 10x Genomics gene expression library. Our initial attempts at barcode enrichment using spike-in and enrichment primers corresponding to the TaG-EM PCR handle were unsuccessful (Supplemental Figure 18). However, we subsequently optimized the TaG-EM barcode enrichment by 1) using a longer spike-in primer that more closely matches the annealing temperature used during the 10x Genomics cDNA creation step, and 2) using a nested PCR approach to amplify the cell-barcode and unique molecular identifier (UMI)-labeled TaG-EM barcodes (Supplemental Figure 18). Using the enriched library, TaG-EM barcodes were detected in nearly 100% of the cells at high sequencing depths (Supplemental Figure 19). However, although we used a polymerase that has been engineered to have high processivity and that has been shown to reduce the formation of chimeric reads in other contexts (Gohl et al., 2016), it is possible that PCR chimeras could lead to unreliable detection events for some cells. Indeed, many cells had a mixture of barcodes detected with low counts and single or low numbers of associated UMIS. To assess the reliability of detection, we analyzed the correlation between barcodes detected in the gene expression library and the enriched TaG-EM barcode library as a function of the purity of TaG-EM barcode detection for each cell (the percentage of the most abundant detected TaG-EM barcode, Supplemental Figure 19). For TaG-EM barcode detections where the most abundance barcode was a high percentage of the total barcode reads detected (~75%-99.99%), there was a high correlation between the barcode detected in the gene expression library and the enriched TaG-EM barcode library. Below this threshold, the correlation was substantially reduced. 

      In the enriched library, we identified 26.8% of cells with a TaG-EM barcode reliably detected, a very modest improvement over the gene expression library alone (23.96%), indicating that at least for this experiment, the main constraint is sufficient expression of the TaG-EM barcode and not detection. To identify TaG-EM barcodes in the combined data set, we counted a positive detection as any barcode either identified in the gene expression library or any barcode identified in the enriched library with a purity of >75%. In the case of conflicting barcode calls, we assigned the barcode that was detected directly in the gene expression library. This increased the total fraction of cells where a barcode was identified to approximately 37% (Figure 6B).”

      Methods:

      “The resulting pool was prepared for sequencing following the 10x Genomics Single Cell 3’ protocol (version CG000315 Rev C), At step 2.2 of the protocol, cDNA amplification, 1 µl of TaG-EM spike-in primer (10 µM) was added to the reaction to amplify cDNA with the TaG-EM barcode. Gene expression cDNA and TaG-EM cDNA were separated using a double-sided SPRIselect (Beckman Coulter) bead clean up following 10x Genomics Single Cell 3’ Feature Barcode protocol, step 2.3 (version CG000317 Rev E). The gene expression cDNA was created into a library following the CG000315 Rev C protocol starting at section 3. Custom nested primers were used for enrichment of TaG-EM barcodes after cDNA creation using PCR.  The following primers were tested (see Supplemental Figure 18):

      UMGC_IL_TaGEM_SpikeIn_v1:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCTTCCAACAACCGGAAGT*G*A UMGC_IL_TaGEM_SpikeIn_v2:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A

      UMGC_IL_TaGEM_SpikeIn_v3:

      TGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A D701_TaGEM:

      CAAGCAGAAGACGGCATACGAGATCGAGTAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGC*T*T

      SI PCR Primer:

      AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC*T*C

      UMGC_IL_DoubleNest:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGG*A*A

      P5: AATGATACGGCGACCACCGA

      D701:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATGCCGTCTTCTGCTTG

      D702:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAATCTCGTATGCCGTCTTCTGCTTG

      After multiple optimization trials, the following steps yielded ~96% on-target reads for the TaG-EM library (Supplemental Figure 18, note that for the enriched barcode data shown in Figure 6 and Supplemental Figure 19, a similar amplification protocol was used TaG-EM barcodes were amplified from the gene expression library cDNA and not the SPRI-selected barcode pool). TaG-EM cDNA was amplified with the following PCR reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl UMGC_IL_DoubleNest primer (10 µM), 2.5 µl SI_PCR primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 15 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the first PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40 µL of nuclease-water. A second round of PCR was run with following reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl D702 primer (10 µM), 2.5 µl p5 Primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 10 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the second PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40uL of nuclease-water. The resulting 3’ gene expression library and TaG-EM enrichment library were sequenced together following Scenario 1 of the BioLegend “Total-Seq-A Antibodies and Cell Hashing with 10x Single Cell 3’ Reagents Kit v3 or v3.1” protocol. Additional sequencing of the enriched TaG-EM library also done following Scenario 2 from the same protocol.” 

      When a given cell barcode is not associated with any TaG-EM barcode, then demultiplexing is impossible. This is a major problem, which is particularly visible in Figs 5 and S13. In 5F, BC4 is only detected in a couple of dozen cells, even though the Jon99Ciii marker of enterocytes is present in a much larger population (Fig 5C). Therefore, in this particular case, TaG-EM fails to detect most of the GFP-expressing cells. 

      Figure 5 in the original manuscript represented data from an experiment in which there were eight different TaG-EM barcoded samples present, including four replicates of the pan-midgut driver (each of which included enterocyte populations). One would not expect the BC4 enterocyte driver expression to be observed in all of the Jon99Ciii cells, since the majority of the GFP+ cells shown in the UMAP plot were likely derived from and are labeled by the pan-midgut driver-associated barcodes. Thus, the design and presentation of this particular experiment (in particular, the presence of eight distinct samples in the data set) is making the detection of the TaG-EM barcodes look sparser than it actually is. We have added a panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, for this experiment.

      However, the reviewer’s overall point regarding barcode detection is still valid in that if we consider all eight barcodes, we only see TaG-EM barcode labeling associated with about a quarter of all the cells in this gene expression library, or about 37% of cells when we include the enriched TaG-EM barcode library. While improving barcode detection will improve the yield and is necessary for some applications (such as robust detection of multiplets), we would argue that even at the current level of success this approach has significant utility. First, if one’s goal is to unambiguously label a cell cluster and trace it to a defined cell population in vivo, sparse labeling may be sufficient. Second, demultiplexing is still possible (as we demonstrate) but involves a trade off in yield (not every cell is recovered and there is some extra sequencing cost as some sequenced cells cannot be assigned to a barcode). 

      Similarly, in S13, most cells should express one of the four barcodes, however many of them (maybe up to half - this should be quantified) do not. Therefore, the claim (L277278) that "the pan-midgut driver were broadly distributed across the cell clusters" is misleading. Moreover, the hypothesis that "low expressing driver lines may result in particularly sparse labelling" (L331-333) is at least partially wrong, as Fig S13 shows that the same Gal4 driver can lead to very different levels of barcode coverage.

      As described above, since this experiment included eight different TaG-EM barcodes expressed by five different drivers, the expectation is that only about half of the cells in Figure S13 (now Figure S20) should express a TaG-EM barcode. It is not clear why BC2 is underrepresented in terms of the number of cells labeled and BC7 is overrepresented. We agree with the reviewer that this should be described more accurately in the paper and that it does impact our interpretation related to driver strength and barcode detection. We have revised this sentence in the discussion and also added additional text in the results describing the within driver variability seen in this experiment.

      Results text:

      “As expected, the barcodes expressed by the pan-midgut driver were broadly distributed across the cell clusters (Supplemental Figure 20). However, the number of cells recovered varied significantly among the four pan-midgut driver associated barcodes.”

      Discussion text:

      “It is likely that the strength of the Gal4 driver contributes to the labeling density. However, we also observed variable recovery of TaG-EM barcodes that were all driven by the same pan-midgut Gal4 driver (Supplemental Figure 20).”

      • Comparisons between TaG-EM and other, simpler methods for labelling individual cell populations are missing. For example, how would TaG-EM compare with expression of different fluorescent reporters, or a strategy based on the brainbow/flybow principle?

      The advantage of TaG-EM is that an arbitrarily large number of DNA barcodes can be used (contingent upon the availability of transgenic lines – we described 20 barcoded lines in our initial manuscript and we have now extended this collection to over 170 lines), while the number of distinguishable FPs is much lower. Brainbow/Flybow uses combinatorial expression of different FPs, but because this combinatorial expression is stochastic, tracing a single cell transcriptome to a defined cell population in vivo based on the FP signature of a Brainbow animal would likely not be possible (and would almost certainly be impossible at scale).

      • FACS data is missing throughout the paper. The authors should include data from their comparative flow cytometry experiment of TaG-EM cells with or without additional hexameric GFP, as well as FSC/SSC and fluorescence scatter plots for the FACS steps that they performed prior to scRNA-seq, at least in supplementary figures.

      We have added Supplemental Figures with the FACS data for all of the single cell sequencing data presented in the manuscript (Supplemental Figures 12 and 14).

      • The authors should show the whole data described in L229, including the cluster that they chose to delete. At least, they should provide more information about how many cells were removed. In any case, the fact that their data still contains a large number of debris and dead cells despite sorting out PI negative cells with FACS and filtering low abundance barcodes with Cellranger is concerning.

      This description was referring to the unprocessed Cellranger output (not filtered for low abundance barcodes). Prior to filtering for cell barcodes with high mitochondria or rRNA (or other processing in Seurat/Scanpy), we saw two clusters, one with low UMI counts and enrichment of mitochondrial genes (see Cellranger report below). 

      Author response image 1.

      These cell barcodes were removed by downstream quality filtering and the remaining cells showed expression of expected intestinal stem cell and enteroblast marker genes.

      Overall, although a method for genetic tagging cell populations prior to multiplexing in single-cell experiments would be extremely useful, the method presented here is inadequate. However, despite all the weaknesses listed above, the idea of barcodes expressed specifically in cells of interest deserves more consideration. If the authors manage to improve their design to resolve the major issues and demonstrate the benefits of their method more clearly, then TaG-EM could become an interesting option for certain applications.

      We thank the reviewer for this comment and hope that the above responses and additional experiments and data that we have added have helped to alleviate the noted weaknesses.

      Reviewer #2 (Public Review):

      In this manuscript, Mendana et al developed a multiplexing method - Targeted Genetically-Encoded Multiplexing or TaG-EM - by inserting a DNA barcode upstream of the polyadenylation site in a Gal4-inducible UAS-GFP construct. This Multiplexing method can be used for population-scale behavioral measurements or can potentially be used in single-cell sequencing experiments to pool flies from different populations. The authors created 20 distinctly barcoded fly lines. First, TaG-EM was used to measure phototaxis and oviposition behaviors. Then, TaG-EM was applied to the fly gut cell types to demonstrate its applications in single-cell RNA-seq for cell type annotation and cell origin retrieving.

      This TaG-EM system can be useful for multiplexed behavioral studies from nextgeneration sequencing (NGS) of pooled samples and for Transcriptomic Studies. I don't have major concerns for the first application, but I think the scRNA-seq part has several major issues and needs to be further optimized.

      Major concerns:

      (1) It seems the barcode detection rate is low according to Fig S9 and Fig 5F, J and N. Could the authors evaluate the detection rate? If the detection rate is too low, it can cause problems when it is used to decode cell types.

      See responses to Reviewer #1 on this topic above.  

      (2) Unsuccessful amplification of TaG-EM barcodes: The authors attempted to amplify the TaG-EM barcodes in parallel to the gene expression library preparation but encountered difficulties, as the resulting sequencing reads were predominantly offtarget. This unsuccessful amplification raises concerns about the reliability and feasibility of this amplification approach, which could affect the detection and analysis of the TaG-EM barcodes in future experiments.

      As noted above, we have now established a successful amplification protocol for the TaG-EM barcodes. This data is shown in Figure 6, and Supplemental Figures 18-19 and we have included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. We have also included code in the paper’s Github repository for assigning TaG-EM barcodes from the enriched library to the associated 10x Genomics cell barcodes.

      (3) For Fig 5, the singe-cell clusters are not annotated. It is not clear what cell types are corresponding to which clusters. So, it is difficult to evaluate the accuracy of the assignment of barcodes.

      We have added annotation information for the cell clusters based on expression of cell-type-specific marker genes (Figure 6A, Supplemental Figures 16-17).

      (4) The scRNA-seq UMAP in Fig 5 is a bit strange to me. The fly gut epithelium contains only a few major cell types, including ISC, EB, EC, and EE. However, the authors showed 38 clusters in fig 5B. It is true that some cell types, like EE (Guo et al., 2019, Cell Reports), have sub-populations, but I don't expect they will form these many subtypes. There are many peripheral small clusters that are not shown in other gut scRNAseq studies (Hung et al., 2020; Li et al., 2022 Fly Cell Atlas; Lu et al., 2023 Aging Fly Cell Atlas). I suggest the authors try different data-processing methods to validate their clustering result.

      For all of the single cell experiments, after doublet and ambient RNA removal (as suggested below), we have reclustered the datasets and evaluated different resolutions using Clustree. As the Reviewer points out, there are different EE subtypes, as well as regionalized expression differences in EC and other cell populations, so more than four clusters are expected (an analysis of the adult midgut identified 22 distinct cell types). With this revised analysis our results more closely match the cell populations observed in other studies (though it should be noted that the referenced studies largely focus on the adult and not the larval stage).  

      (5) Different gut drivers, PMC-, PC-, EB-, EC-, and EE-GAL4, were used. The authors should carefully characterize these GAL4 expression in larval guts and validate sequencing data. For example, does the ratio of each cell type in Fig 5B reflect the in vivo cell type ratio? The authors used cell-type markers mostly based on the knowledge from adult guts, but there are significant morphological and cell ratio differences between larval and adult guts (e.g., Mathur...Ohlstein, 2010 Science).

      We have characterized the PC driver which is highlighted in Supplemental Figure 13, and the EC and EE drivers which are highlighted in Figure 6G-N in detail in larval guts and have added this data to the paper (Supplemental Figure 21). The EB driver was not characterized histologically as EB-specific antibodies are not currently available. The PMG-Gal4 line exhibits strong expression throughout the larval gut (Figure 5B and barcodes are recovered from essentially all of the larval gut cell clusters using this driver (Supplemental Figure 20). We don’t necessarily expect the ratios of cells observed in the scRNA-Seq data to reflect the ratios typically observed in the gut as we performed pooled flow sorting on a multiplexed set of eight genotypes and driver expression levels, flow sorting, and possibly other processing steps could all influence the relative abundance of different cell types. However, detailed characterization of these driver lines did reveal spatial expression patterns that help explain aspects of the scRNA-Seq data. We have also added the following text to the paper to further describe the characterization of the drivers:

      Results:

      “Detailed characterization of the EC-Gal4 line indicated that although this line labeled a high percentage of enterocytes, expression was restricted to an area at the anterior and middle of the midgut, with gaps between these regions and at the posterior (Supplemental Figure 21). This could explain the absence of subsets of enterocytes, such as those labeled by betaTry, which exhibits regional expression in R2 of the adult midgut (Buchon et al., 2013).”

      “Detailed characterization of the EE-Gal4 driver line indicated that ~80-85% of Prospero-positive enteroendocrine cells are labeled in the anterior and middle of the larval midgut, with a lower percentage (~65%) of Prospero-positive cells labeled in the posterior midgut (Supplemental Figure 21). As with the enterocyte labeling, and consistent with the Gal4 driver expression pattern, the EE-Gal4 expressed TaG-EM barcode 9 did not label all classes of enteroendocrine cells and other clusters of presumptive enteroendocrine cells expressing other neuropeptides such as Orcokinin, AstA, and AstC, or neuropeptide receptors such as CCHa2 (not shown) were also observed.”

      Methods:

      “Dissection and immunostaining

      Midguts from third instar larvae of driver lines crossed to UAS-GFP.nls or UAS-mCherry were dissected in 1xPBS and fixed with 4% paraformaldehyde (PFA) overnight at 4ºC. Fixed samples were washed with 0.1% PBTx (1xPBS + 0.1% Triton X-100) three times for 10 minutes each and blocked in PBTxGS (0.1% PBTx + 3% Normal Goat Serum) for 2–4 hours at RT. After blocking, midguts were incubated in primary antibody solution overnight at 4ºC. The next day samples were washed with 0.1% PBTx three times for 20 minutes each and were incubated in secondary antibody solution for 2–3 hours at RT (protected from light) followed by three washes with 0.1% PBTx for 20 minutes each. One µg/ml DAPI solution prepared in 0.1% PBTx was added to the sample and incubated for 10 minutes followed by washing with 0.1% PBTx three times for 10 minutes each. Finally, samples were mounted on a slide glass with 70% glycerol and imaged using a Nikon AX R confocal microscope. Confocal images were processed using Fiji software. 

      The primary antibodies used were rabbit anti-GFP (A6455,1:1000 Invitrogen), mouse anti-mCherry (3A11, 1:20 DSHB), mouse anti-Prospero (MR1A, 1:50 DSHB) and mouse anti-Pdm1 (Nub 2D4, 1:30 DSHB). The secondary antibodies used were goat antimouse and goat anti-rabbit IgG conjugated to Alexa 647 and Alexa 488 (1:200) (Invitrogen), respectively. Five larval gut specimens per Gal4 line were dissected and examined.”

      (6) Doublets are removed based on the co-expression of two barcodes in Fig 5A. However, there are also other possible doublets, for example, from the same barcode cells or when one cell doesn't have detectable barcode. Did the authors try other computational approaches to remove doublets, like DoubleFinder (McGinnis et al., 2019) and Scrublet (Wolock et al., 2019)?

      We have included DoubleFinder-based doublet removal in our data analysis pipeline. This is now described in the methods (see below).

      (7) Did the authors remove ambient RNA which is a common issue for scRNA-seq experiments?

      We have also used DecontX to remove ambient RNA. This is now described in the methods:

      “Datasets were first mapped and analyzed using the Cell Ranger analysis pipeline (10x Genomics). A custom Drosophila genome reference was made by combining the BDGP.28 reference genome assembly and Ensembl gene annotations. Custom gene definitions for each of the TaG-EM barcodes were added to the fasta genome file and .gtf gene annotation file. A Cell Ranger reference package was generated with the Cell Ranger mkref command. Subsequent single-cell data analysis was performed using the R package Seurat (Satija et al., 2015). Cells expressing less than 200 genes and genes expressed in fewer than three cells were filtered from the expression matrix. Next, percent mitochondrial reads, percent ribosomal reads cells counts, and cell features were graphed to determine optimal filtering parameters. DecontX (Yang et al., 2020) was used to identify empty droplets, to evaluate ambient RNA contamination, and to remove empty cells and cells with high ambient RNA expression. DoubletFinder (McGinnis et al., 2019) to identify droplet multiplets and remove cells classified as multiplets. Clustree (Zappia and Oshlack, 2018) was used to visualize different clustering resolutions and to determine the optimal clustering resolution for downstream analysis. Finally, SingleR (Aran et al., 2019) was used for automated cell annotation with a gut single-cell reference from the Fly Cell Atlas (Li et al., 2022). The dataset was manually annotated using the expression patterns of marker genes known to be associated with cell types of interest. To correlate TaG-EM barcodes with cell IDs in the enriched TaG-EM barcode library, a custom Python script was used (TaGEM_barcode_Cell_barcode_correlation.py), which is available via Github: https://github.com/darylgohl/TaG-EM.”

      (8) Why does TaG-EM barcode #4, driven by EC-GAL4, not label other classes of enterocyte cells such as betaTry+ positive ECs (Figures 5D-E)? similarly, why does TaG-EM barcode #9, driven by EE-GAL4, not label all EEs? Again, it is difficult to evaluate this part without proper data processing and accurate cell type annotation.

      As noted in the response to a comment by Reviewer #1 above, part of this apparent sparsity of labeling is due to the way that this experiment was designed and visualized. We have added a new Figure panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, to better illustrate the efficacy of barcode detection. See also the response to point 5 above. Both the lack of labelling of betaTry+ ECs and subsets of EEs is consistent with the expression patterns of the EC-Gal4 and EE-Gal4 drivers.

      (9) For Figure 2, when the authors tested different combinations of groups with various numbers of barcodes. They found remarkable consistency for the even groups. Once the numbers start to increase to 64, barcode abundance becomes highly variable (range of 12-18% for both male and female). I think this would be problematic because the differences seen in two groups for example may be due to the barcode selection rather than an actual biologically meaningful difference.

      While there is some barcode-to-barcode variability for different amplification conditions, the magnitude of this variation is relatively consistent across the conditions tested. We looked at the coefficient of variation for the evenly pooled barcodes or for the staggered barcodes pooled at different relative levels. While the absolute magnitude of the variation is higher for the highly abundant barcodes in the staggered conditions, the CVs for these conditions (0.186 for female flies and for 0.163 male flies) were only slightly above the mean CV (0.125) for all conditions (see Supplemental Figure 3):

      We have added this analysis as Supplemental Figure 3 and added the following text to the paper:(

      “The coefficients of variation were largely consistent for groups of TaG-EM barcodes pooled evenly or at different levels within the staggered pools (Supplemental Figure 3).”

      (10) Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to backcross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 above on this topic.

      Reviewer #3 (Public Review):

      The work addresses challenges in linking anatomical information to transcriptomic data in single-cell sequencing. It proposes a method called Targeted Genetically-Encoded Multiplexing (TaG-EM), which uses genetic barcoding in Drosophila to label specific cell populations in vivo. By inserting a DNA barcode near the polyadenylation site in a UASGFP construct, cells of interest can be identified during single-cell sequencing. TaG-EM enables various applications, including cell type identification, multiplet droplet detection, and barcoding experimental parameters. The study demonstrates that TaGEM barcodes can be decoded using next-generation sequencing for large-scale behavioral measurements. Overall, the results are solid in supporting the claims and will be useful for a broader fly community. I have only a few comments below:

      We thank the reviewer for these positive comments.

      Specific comments:

      (1) The authors mentioned that the results of structure pool tests in Fig. 2 showed a high level of quantitative accuracy in detecting the TaG-EM barcode abundance. Although the data were generally consistent with the input values in most cases, there were some obvious exceptions such as barcode 1 (under-represented) and barcodes 15, 20 (overrepresented). It would be great if the authors could comment on these and provide a guideline for choosing the appropriate barcode lines when implementing this TaG-EM method.

      See the response to point 9 from Reviewer 2. Although there seem to be some systematic differences in barcode amplification, the coefficient of variation was relatively consistent across all of the barcode combinations and relative input levels that we examined. Our recommendation (described in the text) is to average across 3-4 independent barcodes (which yielded a R2 values of >0.99 with expected abundance in the structured pooled tests).  

      (2) In Supplemental Figure 6, the authors showed GFP antibody staining data with 20 different TaG-EM barcode lines. The variability in GFP antibody staining results among these different TaG-EM barcode lines concerns the use of these TaG-EM barcode lines for sequencing followed by FACS sorting of native GFP. I expected the native GFP expression would be weaker and much more variable than the GFP antibody staining results shown in Supplemental Figure 6. If this is the case, variation of tissue-specific expression of TaG-EM barcode lines will likely be a confounding factor.

      Aside from barcode 8, which had a mutation in the GFP coding sequence, we did not see significant variability in expression levels either in the wing disc. Subtle differences seen in this figure most likely result from differences in larval staging. Similar consistent native (unstained) GFP expression of the TaG-EM constructs was seen in crosses with Mhc-Gal4 (described above). 

      (3) As the authors mentioned in the manuscript, multiple barcodes for one experimental condition would be a better experimental design. Could the authors suggest a recommended number of barcodes for each experiential condition? 3? 4? Or more? 

      See response to Reviewer #3, point number 1 above.

      (3b) Also, it would be great if the authors could provide a short discussion on the cost of such TaG-EM method. For example, for the phototaxis assay, if it is much more expensive to perform TaG-EM as compared to manually scoring the preference index by videotaping, what would be the practical considerations or benefits of doing TaG-EM over manual scoring?

      While this will vary depending on the assay and the scale at which one is conducting experiments, we have added an analysis of labor savings for the larval gut motility assay (Supplemental Figure 8). We have also added the following text to the Discussion describing some of the trade-offs to consider in assessing the potential benefit of incorporating TaG-EM into behavioral measurements:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8).”

      Recommendations for the authors:  

      While recognising the potential of the TaG-EM methodology, we had a few major concerns that the authors might want to consider addressing:

      As stated above, we are grateful to the reviewers and editor for their thoughtful comments. We have addressed many of the points below in our responses above, so we will briefly respond to these points and where relevant direct the reader to comments above.

      (1) We were concerned about the efficacy of TaG-EM in assessing more complex behaviours than oviposition and phototaxis. We note that Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to back-cross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 and Reviewer #2, item 10, above.

      (2) We were unable to assess the drop-out rates of the TaG-EM barcode from the sequencing. The barcode detection rate is low (Fig S9 and Fig 5F, J and N). This would be a considerable drawback (relating to both experimental design and cost), if a large proportion of the cells could not be assigned an identity.

      See comments above addressing this point.

      (3) The effectiveness of TaG-EM scRNA-seq on the larvae gut is not very effective - the cells are not well annotated, the barcodes seem not to have labelled expected cell types (ECs and EEs), and there is no validation of the Gal4 drivers in vivo.

      See previous comments. We have addressed specific comments above on data processing and annotation, included a visualization of the overall effectiveness of labeling, added a protocol and data on enriched TaG-EM barcode libraries, and have added detailed characterization of the Gal4 drivers in the larval gut (Figure 6, Supplemental Figures 17-21).

      (4) A formal assessment of the cost-effectiveness would be an important consideration in broad uptake of the methodology.

      While this is difficult to do in a comprehensive manner given the breadth of potential applications, we have included estimates of labor savings for one of the behavioral assays that we tested (Supplemental Figure 8). We have also included a discussion of some of the factors that would make TaG-EM useful or cost-effective to apply for behavioral assays (see response to Reviewer #3, comment 3b, above). We have also added the following text to the discussion to address the cost considerations in applying TaG-EM for scRNA-Seq:

      “For single cell RNA-Seq experiments, the cost savings of multiplexing is roughly the cost of a run divided by the number of independent lines multiplexed, plus labor savings by also being able to multiplex upstream flow cytometry, minus loss of unbarcoded cells. Our experiments indicated that for the specific drivers we tested TaG-EM barcodes are detected in around one quarter of the cells if relying on endogenous expression in the gene expression library, though this fraction was higher (~37%) if sequencing an enriched TaG-EM barcode library in parallel (Figure 6, Supplemental Figures 18-19).”

      (5) Similarly, a formal assessment of the effect of the insertion on the variability in GFP expression and the behaviour needs to be documented.

      See responses to Reviewer #1, Reviewer #2, item 9, and Reviewer #3, item 2 above.

      Reviewer #1 (Recommendations For The Authors):

      (in no particular order of importance)

      • L84-85: the authors should either expand, or remove this statement. Indeed, lack of replicates is only true if one ignores that each cell in an atlas is indeed a replicate. Therefore, depending on the approach or question, this statement is inaccurate.

      This sentence was meant to refer to experiments where different experimental conditions are being compared and not to more descriptive studies such as cell atlases. We have revised this sentence to clarify.

      “Outside of descriptive studies, these costs are also a barrier to including replicates to assess biological variability; consequently, a lack of biological replicates derived from independent samples is a common shortcoming of single-cell sequencing experiments.”

      • L103-104: this sentence is unclear.

      We have revised this sentence as follows:

      “Genetically barcoded fly lines can also be used to enable highly multiplexed behavioral assays which can be read out using high throughput sequencing.”

      • In Fig S1 it is unclear why there are more than 20 different sequences in panel B where the text and panel A only mention the generation of 20 distinct constructs. This should be better explained.

      The following text was added to the Figure legend to explain this discrepancy:

      “Because the TaG-EM barcode constructs were injected as a pool of 29 purified plasmids, some of the transgenic lines had inserts of the same construct. In total 20 unique lines were recovered from this round of injection.”

      • It would be interesting to compare the efficiency of TaG-EM driven doublet removal (Fig 5A) with standard doublet-removing software (e.g., DoubletFinder, McGinnis et al., 2019).

      We have done this comparison, which is now shown in Supplemental Figure 15.

      • I would encourage the authors to check whether barcode representation in Fig S13  can be correlated to average library size, as one would expect libraries with shorter reads to be more likely to include the 14-bp barcode and therefore more accurately recapitulate TaG-EM barcode expression.

      These are not independent sequencing libraries, but rather data from barcodes that were multiplexed in a single flow sort, 10x droplet capture, and sequencing library. Thus, there must be some other variable that explains the differential recovery of these barcodes.

      • Fig 4A should appear earlier in the paper.

      We have moved Figure 4A from the previous manuscript (a schematic showing the detailed design of the TaG-EM construct) to Figure 1A in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) There is a typo for Fig S13 figure legends: BC1, BC1, BC3... should be BC1, BC2, BC3.

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      Comments to authors:

      (1) It would be great if the authors could provide an additional explanation on how these 29 barcode sequences were determined.

      Response: This information is in the Methods section. For the original cloned plasmids:

      “Expected construct size was verified by diagnostic digest with _Eco_RI and _Apa_LI. DNA concentration was determined using a Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific) and the randomer barcode for each of the constructs was determined by Sanger sequencing using the following primers:

      SV40_post_R: GCCAGATCGATCCAGACATGA

      SV40_5F: CTCCCCCTGAACCTGAAACA”

      For transgenic flies, after DNA extraction and PCR enrichment (details also in the Methods section):

      “The barcode sequence for each of the independent transgenic lines was determined by Sanger sequencing using the SV40_5F and SV40_PostR primers.”

      (2) Why did the authors choose myr-GFP as the backbone instead of nls-GFP if the downstream application is to perform sequencing?

      We initially chose myr::GFP as we planned to conduct single cell and not single nucleus sequencing and myr::GFP has the advantage of labeling cell membranes which could facilitate the characterization or confirmation of cell type-specific expression, particularly in the nervous system. However, we have considered making a version of the TaG-EM construct with a nuclear targeted GFP (thereby enabling “NucEM”). In the Discussion, we mention this possibility as well as the possibility of using a second nuclear-GFP construct in conjunction with TaG-EM lines is nuclear enrichment is desired:

      “In addition, while the original TaG-EM lines were made using a membrane-localized myr::GFP construct, variants that express GFP in other cell compartments such as the cytoplasm or nucleus could be constructed to enable increased expression levels or purification of nuclei. Nuclear labeling could also be achieved by co-expressing a nuclear GFP construct with existing TaG-EM lines in analogy to the use of hexameric GFP described above.”

      Minor comments:

      (1) Line 193, Supplemental Figure 4 should be Supplemental Figure 5

      Fixed.

      (2) Scale bars should be added in Figure 4, Supplemental Figures 6, 7, and 8A.

      We have added scale bars to these figures and also included scale bars in additional Supplemental Figures detailing characterization of the gut driver lines.

      (3) Were Figure 4C and Supplemental Figure 7 data stained with a GFP antibody?

      No, this is endogenous GFP signal. This is now noted in the Figure legends.

      (4) Line 220, specify the three barcode lines (lines #7, 8, 9) in the text. 

      Added this information.

      Same for Lines 251-254. Line 258, which 8 barcode Gal4 line combinations?

      (5) Line 994, typo: (BC1, BC1, BC3, and BC7)-> (BC1, BC2, BC3, and BC7)

      Fixed.

      (6) Figure 5 F, J and N, add EC-Gal4, EB-Gal4, and EE-Gal4 above each panel to improve readability.

      We have added labels of the cell type being targeted (leftmost panels), the barcode, and the marker gene name to Figure 6 C-N.

    1. eLife Assessment

      Modulation of BMP signalling affects body size in the nematode Caenorhabditis elegans, and this paper examines the effects on C. elegans body size brought about by the modulation of BMP signalling. Thw study provides valuable analyses of ChIP-seq and RNA-Seq data to understand the function of SMA-3 (Smad) and SMA-9 (Schnurri) in this model. The authors provide compelling evidence that the BMP-dependent body size effect could be due to defects in cuticle collagen secretion, a finding of interest to those studying organismal growth and epidermal function.

    2. Reviewer #1 (Public review):

      Summary:

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The Authors use SMA-3 and SMA-9 ChIP-Seq data and RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up- or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. In case of such opposing effects, the SMA-9 appears to act epistatically to SMA-3. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the Authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation.

      Strengths:

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans.

      Weaknesses:

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure.

      (2) The Authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs likely act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which still has to be experimentally substantiated. In the revised version of the manuscript, the authors acknowledge this.

      (3) The second part of the results (the collagen story) is loosely connected the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9.

      Comments on revisions:

      In comparison to the first version of the manuscript, the authors have significantly improved the "readability" of the paper, made the Discussion much better, and toned down some of the less supported arguments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation.

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have updated the text to clarify this point.

      Strengths:

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans.

      Weaknesses:

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect.

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We have revised the text to highlight what this study and others demonstrate about SMA-9’s role in body size. Simply stated, SMA-9 is needed together with SMA-3 to promote the expression of genes involved in one-carbon metabolism, collagens, and chaperones, all of which are required for body size. SMA-3 has additional, SMA-9-independent transcriptional targets, including chaperones and ER secretion factors, that also contribute to body size. Finally, SMA-9 regulates additional targets independent of SMA-3 that likely have a minimal role in body size. We have adjusted Figure 5 with new graphs of the original data to make these points more clear.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down.

      We acknowledge that we have not demonstrated a physical interaction between SMA-3 and SMA-9 through a co-immunoprecipitation, and we have indicated in the text that a formal biochemical demonstration would be required to make this point. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. As the reviewer has indicated, a physical interaction between Smads and Schnurris has been amply demonstrated in other systems. A limitation in these previous studies is that only a small number of target genes were analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale. Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We have revised the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9.

      We appreciate this suggestion and have clarified in the text how SMA-9 contributes to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion.

      We thank the reviewer for this suggestion. We have added more context to the Discussion.

      Reviewer #2 (Public Review):

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers.

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data.

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses?

      We agree that these are intriguing questions, and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect.

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which.

      We thank the reviewer for this suggestion. col-94 and col-153 were identified as direct targets of both SMA-3 and SMA-9. We noted this in the Discussion.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated.

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.).

      We have added more information to the Discussion.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 - Figure 3: The authors might want to think about condensing this into two figures.

      To avoid confusion with the different workflows, we prefer to keep these as three separate figures.

      Figure 1a-b: Measurement unit missing on X.

      We added the unit “bps” to these graphs.

      Line 244-246: The authors should stress in the Results that they analyzed publicly available ChIP-Seq data, which was not generated by them, - not just by providing a reference to Kudron et al., 2018. As far as I understood, ChIP was performed with an anti-GFP antibody. Please mention this, and specify the information about the vendor and the catalog number in the Methods.

      We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have clarified these issues in the text.  We have also added information regarding the anti-GFP antibody to the Methods.

      Line 267-270: The authors should either provide experimental evidence that SMA-3 and SMA-9 form complexes or write something like "significant overlap between SMA-3 and SMA-9 peaks may indicate complex formation between these two transcription factors as shown in Drosophila" - but in the absence of proof, this must be a point for the Discussion, not for the Results. Moreover, similar behavior of fat-6 (overlapping ChIP peaks) and nhr-114 (non-overlapping ChIP peaks) in SMA-3 and SMA-9 mutants may be interpreted as a circumstantial argument against SMA-3/SMA-9 complex formation (see Lines 342-348). Importantly, since ChIP-Seq data are available for a wide array of C. elegans TFs, it would be very useful to have an estimate of whether SMA-3/SMA-9 peak overlap is significantly higher than the peak overlap between SMA-3 and several other TFs expressed at the same L2 stage.

      We have clarified our goals regarding SMA-3 and SMA-9 interactions and softened our conclusions by indicating in the text that a formal biochemical demonstration would be required to demonstrate a physical interaction. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. We have added an analysis of HOT sites to address overlap of binding with other transcription factors. We disagree with the interpretation that transcription factors with non-overlapping sites cannot act together to regulate gene expression; however, nhr-114 also has an overlapping SMA-3 and SMA-9 site, so this point becomes less relevant. We have clarified the categorization of nhr-114 in the text.

      Lines 272-292: The authors do not comment on the seemingly quite small overlap between the RNA-Seq and the ChIP-Seq dataset, but I think they should. They have 3205 SMA-3 ChIP peaks and 1867 SMA-3 DEGs, but the amount of directly regulated targets is 367. It is important that the authors provide information on the number of genes to which their peaks have been assigned. Clearly, this will not be one gene per peak, but if it were, this would mean that just 11.5% of bound targets are really affected by the binding. The same number would be 4.7% for the SMA-9 peaks.

      We have added a discussion of the discrepancy between binding sites and DEGs. The high number of additional sites classified as non-functional could represent the detection of weak affinity targets that do not have an actual biological purpose. Alternatively, these sites could have an additional role in DBL-1 signaling besides transcriptional regulation of nearby genes, or they could be regulating the expression of target genes at a far enough distance to not be detected by our BETA analysis as per the constraints chosen for the analysis. The difference between total binding sites and those associated with changes in gene expression underscores the importance of combining RNA-seq with ChIP-seq to identify the most biologically relevant targets. And as the reviewer indicated, more than one gene can be assigned to a single neighboring peak.

      Lines 294-323: I feel like there is a terminology problem, which makes reading very difficult. The authors use "direct targets" as bound genes with significant expression change, but then run into a problem when the gene is bound by SMA-9 and SMA-3, but significant expression change is only associated with one of the two factors. I am not sure this is consistent with the idea of the SMA3/SMA9 complex. Also, different modalities of the SMA3 and SMA9 effect in 15 cases can be explained by co-factors. Reading would be also simplified if the order of the panels in Figure 3 were different. Currently, the authors start their explanation by referring to the shared SMA-3/SMA-9 targets (Figures 3c-d), and only later come to Figure 3b. In general, the authors should start with a clear explanation of what is on the figure (currently starting on Line 313), otherwise, it is unclear why, if the authors only discuss common targets, it is not just 114+15=129 targets, but more.

      We have re-ordered the columns in Figure 3 to match the order discussed in the text. We also incorporated more precise language about regulation by SMA-3 and/or SMA-9 in the text.

      Lines 325-355: The chapter has a rather unfortunate name "Mechanisms of integration of SMA-3 and SMA-9 function", although the authors do not provide any mechanism. Using 3 target genes, they show that if the regulatory modality of SMA-3 and SMA-9 is the same (2 examples), there is no difference in the expression of the targets, but if the modalities are opposing (1 example), SMA-9 repressive action is epistatic to the SMA-3 activating action. Can this be generalized? The authors should test all their 15 targets with opposite regulations. Moreover, it seems obvious to ask whether the intermediate phenotype of the double-mutants can be attributed to the action of these 15 genes activated by SMA-3 and repressed by SMA-9. I would suggest testing this by RNAi. I would also suggest renaming the chapter to something better reflecting its content.

      We have removed the word “mechanism” from the title of this section. We also performed additional RT-PCR experiments on another 5 targets with opposing directions of regulation. The results from these genes are consistent with the result from C54E4.5, demonstrating that the epistasis of sma-9 is generalizable.

      Figure 4b: Why was a two-way ANOVA performed here? With the small number of measurements, I would consider using a non-parametric test.

      These data are parametric and the distribution of the data is normal, so we chose to use a parametric test (ANOVA).

      Lines 354-355. The authors offer two suggestions for the mechanism of the epistatic action of SMA-9 on SMA-3 in the case of C54E4.5, but this is something for the Discussion. If they want to keep it in the Results they should address this experimentally by performing SMA-3 ChIP-seq in the SMA-9 mutants and SMA-9 ChIP-Seq in the SMA-3 mutants.

      We moved these models to the discussion as suggested.

      Lines 365-367: "We expect that clusters of genes involved in fatty acid metabolism and innate immunity mediate the physiological functions of BMP signaling in fat storage and pathogen resistance, respectively." - This is pretty confusing since the Authors claim in the previous sentence that regulation of immunity by SMA-9 is TGF-beta independent.

      Co-regulation of immunity by BMP signaling and SMA-9 is already known. The novel insight is that SMA-9 may have an additional independent role in immunity. We have clarified the language to address this confusion.

      Lines 377, and 380: Please explain in non-C. elegans-specific terminology, what rrf-3 and LON-2 are (e.g. write "glypican LON-2" instead of just "LON-2") and add relevant references.

      We added information on the proteins encoded by these genes.

      Lines 382-384: I am not sure what the Authors mean here by "more limiting".

      We substituted the phrase “might have a more prominent requirement in mediating the exaggerated growth defect of a lon-2 mutant”.

      Lines 388-392: I found this very confusing. What were these 36 genes? Were these direct targets of SMA-3, SMA-9, or both? Top 36 targets? 36 targets for which mutants are available?

      The new Figure 5 clarifies whether target genes are SMA-3-exclusive, SMA-9-exclusive, or co-regulated. The text was also updated for clarity.

      Line 397: This is the first time the authors mention dpy-11 but they do not say what it is until later, and they do not say whether it is a target of SMA3/SMA9. Checking Figure 3, I found that it is among the 238 genes bound by both but upregulated only by SMA3. The authors need to explicitly state this - from this point on, they have a section for which SMA-9 appears to be irrelevant.

      We added the molecular function of dpy-11 at its first mention. Furthermore, we included the hypothesis that SMA-3 may regulate collagen secretion independently of SMA-9. Our subsequent results with sma-9 mutants disprove this hypothesis.

      Line 402: Is ROL-6 a SMA-3/SMA-9 target or just a marker gene?

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      Line 421: I am not sure what "more skeletonized" means.

      Replaced with “thinner and skeletonized”

      Figure 2b and 2d legends: "Non-target genes nevertheless showing differential expression are indicated with green squares." (l. 581-582 and again l. 588-589) I think should be "Non-direct target genes...".

      Changed to “non-direct target genes”

      Figure 7 legend: Please indicate the scale bar size in the legend.

      Indicated the scale bar size in the legend.

      Figure 7: The ER marker is referred to as "ssGFP::KDEL" (in the image and Line 700), however in the text it is called "KDEL::oxGFP" (Line 419). Please use consistent naming.

      We fixed the inconsistent naming.

      All the experiment suggestions made are optional and can, in principle, be ignored if the authors tone down their claims (for example, the SMA-3/SMA-9 complex formation).

      Reviewer #2 (Recommendations For The Authors):

      (1) As a control: Have the authors found the known regulated genes among the differentially regulated ones?

      Previously known target genes such as fat-6 and zip-10 were identified here. We have added this information in the text.

      (2) How many repetitions were performed in Figure 4b? I am wondering as the deviation for C54E4.5 is quite large and that makes me worry that the significant differences stated are not robust.

      There were two biologically independent collections from which three cDNA syntheses were analyzed using two technical replicates per point.

      (3) Lines 333-336: Can you really make this claim that the antagonistic effects seen in the regulation of body size can be correlated with some targets being regulated in the opposite direction? I would assume that the situation is far more complex as SMADs also regulate other processes.

      We agree with the reviewer that multiple models could explain this antagonism, and we have added distinct alternatives in the text.

      (4) Lines 367-369: Add the respective reference please.

      We have added the relevant references.

    1. eLife Assessment

      This valuable paper describes a comprehensive quantitative phospho-proteomic analysis of Xenopus oocytes during meiosis. Using time-resolved proteomic analyses, the authors provide insights into changes in protein levels and phosphorylation states to an unprecedented depth, quality, and quantitative detail. The key findings are solid and offer a helpful resource for the scientific community.

    2. Reviewer #1 (Public review):

      Summary:

      The study aims to create a comprehensive repository about the changes in protein abundance and their modification during oocyte maturation in Xenopus laevis.

      Strengths:

      The results contribute meaningfully to the field.

      Weaknesses:

      The manuscript could have benefitted from more comprehensive analyses and clearer writing. Nonetheless, the key findings are robust and offer a valuable resource for the scientific community.

    3. Reviewer #2 (Public review):

      Summary:

      The authors analyzed Xenopus oocytes at different stages of meiosis using quantitative phosphoproteomics. Their advanced methods and analyses revealed changes in protein abundances and phosphorylation states to an unprecedented depth and quantitative detail. In the manuscript they provide an excellent interpretation of these findings putting them in the context of past literature in Xenopus as well as in other model systems.

      Strengths:

      High quality data, careful and detailed analysis, outstanding interpretation in the context of the large body of the literature.

      Weaknesses:

      Merely a resource, none of the findings are tested in functional experiments.

      I am very impressed by the quality of the data and the careful and detailed interpretation of the findings. In this form the manuscript will be an excellent resource to the cell division community in general, and it presents a very large number of hypotheses that can be tested in future experiments.

      Xenopus has been and still is a popular and powerful model system that led to critical discoveries around countless cellular processes, including the spindle, nuclear envelope, translational regulation, just to name a few. This also includes a huge body of literature on the cell cycle describing its phosphoregulation. It is indeed somewhat frustrating to see that these earlier studies using phospho-mutants and phospho-antibodies were just scratching the surface. The phosphoproteomics analysis presented here reveals much more extensive and much more dynamic changes in phosphorylation states. Thereby, in my opinion, this manuscript opens a completely new chapter in this line of research, setting the stage for more systematic future studies.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed time-resolved proteomics and phospho-proteomics in Xenopus oocytes from prophase I through the MII arrest of the unfertilized egg. The data contains protein abundance and phosphorylation sites of a large number set of proteins at different stages of oocyte maturation. The large sets of the data are of high quality. In addition, the authors discussed several key pathways critical for the maturation. The data is very useful for the researchers not only researchers in Xenopus oocytes but also those in oocyte biology in other organisms.

      Strengths:

      The data of proteomics and phospho-proteomics in Xenopus oocyte maturation is very useful for future studies to understand molecular networks in oocyte maturation.

      Weaknesses:

      Although the authors offered molecular pathways of the phosphorylation in the translation, protein degradation, cell cycle regulation, and chromosome segregation. The author did not check the validity of the molecular pathways based ontheir proteomic data by the experimentation.

    5. Author response:

      We are both honored and humbled by the high praise our work received from all three reviewers. Below, we address the common comments made by the reviewers:

      (1) Value and Impact of the Resource: We are grateful for the recognition of our dataset as a valuable and high-quality resource. Our primary goal was to generate a comprehensive dataset on protein abundance and phosphorylation dynamics during Xenopus oocyte maturation. We are pleased that this work has been seen as a solid foundation for future studies in Xenopus research and beyond, with broader implications for oocyte and cell cycle biology.

      (2) Focus on Functional Validation and Contextualization with Prior Studies: The manuscript was submitted as a Tools and Resources article, a format that emphasizes the creation and presentation of datasets, tools, and methodological advances to facilitate future discoveries. In alignment with this format, we ensured that the information is accessible and deployable for the broader scientific community. While we did not include functional validation of specific pathways, the dataset provides a robust framework for generating numerous testable hypotheses. We plan to pursue some of these follow-up studies in our labs and encourage the community to explore these further.

      (3) Contextualization with Prior Studies: We appreciate the recognition of our efforts to integrate our findings with the existing body of literature. In conclusion, we would like to thank the reviewers for their evaluation and thoughtful suggestions. We look forward to seeing how this dataset contributes to future discoveries in the field.

    1. eLife Assessment

      In this important study, the authors combine innovative experimental approaches, including direct compressibility measurements and traction force analyses, with theoretical modeling to propose that wild-type cells exert compressive forces on softer HRasV12-transformed cells, influencing competition outcomes. The data generally provide solid evidence that transformed epithelial cells exhibit higher compressibility than wild-type cells, a property linked to their compaction during mechanical cell competition. However, the study would benefit from further characterization of how compression affects the behavior of HRasV12 cells and clearer causal links between compressibility and competition outcomes.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Gupta and colleagues explore the parameters that could promote the elimination of active Ras cells when surrounded by WT cells. The elimination of active Ras cells by surrounding WT cells was previously described extensively and associated with a process named cell competition, a context dependant elimination of cells. Several mechanisms have been associated with competition, including more recently elimination processes based on mechanical stress. This was explored theoretically and experimentally and was either associated with differential growth and sensitivity to pressure and/or differences in homeostatic density/pressure. This was extensively validated for the case of Scribble mutant cells which are eliminated by WT MDCK cells due to their higher homeostatic density. However, there has been so far very little systematic characterisation of the mechanical parameters and properties of these different cell types and how this could contribute to mechanical competition.

      Here, the authors used the context of active Ras cells in MDCK cells (with some observations in vivo in mice gut which are a bit more anecdotal) to explore the parameters causal to Ras cell elimination. Using for the first time traction force microscopy, stress microscopy combined with Bayesian inference, they first show that clusters of active Ras cells experience higher pressure compared to WT. Interestingly, this occurs in absence of differences in growth rate, and while Ras cells seems to have lower homeostatic density, in contractions with the previous models associated with mechanical cell competition. Using a self-propelled Voronoi model, they explored more systematically the conditions that will promote the compression of transformed cells, showing globally that higher Area compressibility and/or lower junctional tension are associated with higher compressibility. Using then an original and novel experimental method to measure bulk compressibility of cell populations, they confirmed that active Ras cells are globally twice more compressible than WT cells. This compressibility correlates with a disruption of adherens junctions. Accordingly, the higher pressure near transformed Ras cells can be completely rescued by increasing cell-cell adhesion through E-cad overexpression, which also reduces the compressibility of the transformed cells. Altogether, these results go along the lines of a previous theoretical work (Gradeci et al. eLife 2021) which was suggesting that reduced stiffness/higher compressibility was essential to promote loser cell elimination. Here, the authors provide for the first time a very convincing experimental measurement and validation of this prediction. Moreover, their modelling approach goes far beyond what was performed before in terms of exploration of conditions promoting compressibility, and their experimental data point at alternative mechanisms that may contribute to mechanical competition.

      Strengths:

      - Original methodologies to perform systematic characterisation of mechanical properties of Ras cells during cell competition, which include a novel method to measure bulk compressibility.<br /> - A very extensive theoretical exploration of the parameters promoting cell compaction in the context of competition.

      Weaknesses:

      - Most of the theoretical focus is centred on the bulk compressibility, but so far does not really explain the final fate of the transformed cells. Classic cell competition scenario (including the one involving active Ras cells) lead to the elimination of one cell population either by cell extrusion/cell death or global delamination. This aspect is absolutely not explored in this article, experimentally or theoretically, and as such it is difficult to connect all the observables with the final outcome of cell competition. For instance, higher compressibility may not lead to loser status if the cells can withstand high density without extruding compared to the WT cells (and could even completely invert the final outcome of the competition). Down the line, and as suggested in most of the previous models/experiments, the relationship between pressure/density and extrusion/death will be the key factor that determine the final outcome of competition. However, there is absolutely no characterisation of cell death/cell extrusion in the article so far.

      - While the compressibility measurement are very original and interesting, this bulk measurement could be explained by very different cellular processes, from modulation of cell shape, to cell extrusion and tissue multilayering (which by the way was already observed for active Ras cells, see for instance https://pubmed.ncbi.nlm.nih.gov/34644109/). This could change a lot the interpretation of this measurement and to which extend it can explain the compression observed in mixed culture. This compressibility measurement could be much more informative if coupled with an estimation of the change of cell aspect ratio and the rough evaluation of the contribution of cell shape changes versus alternative mechanisms.

      - So far, there is no clear explanation of why transformed Ras cells get more compacted in the context of mixed culture compared to pure Ras culture. Previously, the compaction of mutant Scribble cells could be explained by the higher homeostatic density of WT cells which impose their prefered higher density to Scribble mutant (see Wagstaff et al. 2016 or Gradeci et al 2021), however that is not the case of the Ras cells (which have even slightly higher density at confluency). If I understood properly, the Voronoid model assumes some directional movement of WT cell toward transformed which will actively compact the Ras cells through self-propelled forces (see supplementary methods), but this is never clearly discussed/described in the results section, while potentially being one essential ingredient for observing compaction of transformed cells. In fact, this was already described experimentally in the case of Scribble competition and associated with chemoattractant secretion from the mutant cells promoting directed migration of the WT (https://pubmed.ncbi.nlm.nih.gov/33357449/). It would be essential to show what happens in absence of directional propelled movement in the model and validate experimentally whether there is indeed directional movement of the WT toward the transformed cells. Without this, the current data does not really explain the competition process.

      - Some of the data lack a bit of information on statistic, especially for all the stress microscopy and traction forces where we do no really know how representative at the stress patterns (how many experiment, are they average of several movies ? integrated on which temporal window ?)

    3. Reviewer #2 (Public review):

      The work by Gupta et al. addresses the role of tissue compressibility as a driver of cell competition. The authors use a planar epithelial monolayer system to study cell competition between wild type and transformed epithelial cells expressing HRasV12. They combine imaging and traction force measurements from which the authors propose that wild type cells generate compressive forces on transformed epithelial cells. The authors further present a novel setup to directly measure the compressibility of adherent epithelial tissues. These measurements suggest a higher compressibility of transformed epithelial cells, which is causally linked to a reduction in cell-cell adhesion in transformed cells. The authors support their conclusions by theoretical modelling using a self-Propelled Voronoi model that supports differences in tissue compressibility can lead to compression of the softer tissue type.

      The experimental framework to measure tissue compressibility of adherent epithelial monolayers establishes a novel tool, however additional controls of this measurement appear required. Moreover, the experimental support of this study is mostly based on single representative images and would greatly benefit from additional data and their quantitative analysis to support the authors' conclusions. Specific comments are also listed in the following:

      Major points:

      It is not evident in Fig2A that traction forces increase along the interface between wild type and transformed populations and stresses in Fig2C also seem to be similar at the interface and surrounding cell layer. Only representative examples are provided and a quantification of sigma_m needs to be provided.

      In Figure 1-3 only panel 2G and 2H provide a quantitative analysis, but it is not clear how many regions of interest and clusters of transform cells were quantified.

      Several statements appear to be not sufficiently justified and supported by data.<br /> For example the statement on pg 3. line 38 seems to lack supportive data 'This comparison revealed that the thickness of HRasV12-expressing cells was reduced by more than 1.7-fold when they were surrounded by wild type cells. These observations pointed towards a selective, competition-dependent compaction of HRasV12-expressing transformed cells but not control cells, in the intestinal villi of mice.'<br /> Similarly, the statement about a cell area change of 2.7 fold (pg 3 line 47) lacks support by measurements.

      What is the rationale for setting 𝐾p = 1 in the model assumptions if clear differences in junctional membranes of transformed versus wild type cells occur, including dynamic ruffling? This assumption does not seem to be in line with biological observations.

      The novel approach to measure tissue compressibility is based on pH dependent hydrogels. As the pH responsive hydrogel pillar is placed into a culture medium with different conditions, an important control would be if the insertion of this hydrogel itself would change the pH or conditions of the culture assays and whether this alters tissue compressibility or cell adhesion. The authors could for example insert a hydrogel pillar of a smaller diameter that would not lead to compression or culture cells in a larger ring to assess the influence of the pillar itself.

      The authors focus on the study of cell compaction of the transformed cells, but how does this ultimately lead to a competitive benefit of wild type cells? Is a higher rate of extrusion observed and associated with the compaction of transformed cells or is their cell death rate increased? While transformed cells seem to maintain a proliferative advantage it is not clear which consequences of tissue compression ultimately drive cell competition between wild type and transformed cells.

      The argumentation that softer tissues would be more easily compressed is plausible. However, which mechanism do the authors suggest is generating the actual compressive stress to drive the compaction of transformed cells? They exclude a proliferative advantage of wild type cells, which other mechanisms will generate the compressive forces by wild type cells?

    4. Author response:

      eLife Assessment:

      In this important study, the authors combine innovative experimental approaches, including direct compressibility measurements and traction force analyses, with theoretical modeling to propose that wild-type cells exert compressive forces on softer HRasV12-transformed cells, influencing competition outcomes. The data generally provide solid evidence that transformed epithelial cells exhibit higher compressibility than wild-type cells, a property linked to their compaction during mechanical cell competition. However, the study would benefit from further characterization of how compression affects the behavior of HRasV12 cells and clearer causal links between compressibility and competition outcomes.

      We thank the reviewers and the editor for their thoughtful and encouraging feedback on our study and for appreciating the innovation in our experimental and theoretical approaches. We acknowledge the importance of further clarifying the mechanistic links between the compressibility of HRas<sup>V12</sup>-transformed cells, their compaction, and the outcomes of mechanical cell competition. In the revised manuscript, we will include additional experiments and analyses to assess how compression influences the cellular behavior and fate of HRas<sup>V12</sup>-transformed cells during competition. In addition, to strengthen the connection between collective compressibility and competition outcomes, we will integrate quantitative analyses of cell dynamics and additional modeling to explicitly correlate the mechanical properties with the spatial and temporal aspects of cell elimination. These additions will address the reviewer’s concerns comprehensively, further enriching the mechanistic understanding presented in the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article, Gupta and colleagues explore the parameters that could promote the elimination of active Ras cells when surrounded by WT cells. The elimination of active Ras cells by surrounding WT cells was previously described extensively and associated with a process named cell competition, a context dependant elimination of cells. Several mechanisms have been associated with competition, including more recently elimination processes based on mechanical stress. This was explored theoretically and experimentally and was either associated with differential growth and sensitivity to pressure and/or differences in homeostatic density/pressure. This was extensively validated for the case of Scribble mutant cells which are eliminated by WT MDCK cells due to their higher homeostatic density. However, there has been so far very little systematic characterisation of the mechanical parameters and properties of these different cell types and how this could contribute to mechanical competition.

      Here, the authors used the context of active Ras cells in MDCK cells (with some observations in vivo in mice gut which are a bit more anecdotal) to explore the parameters causal to Ras cell elimination. Using for the first time traction force microscopy, stress microscopy combined with Bayesian inference, they first show that clusters of active Ras cells experience higher pressure compared to WT. Interestingly, this occurs in absence of differences in growth rate, and while Ras cells seems to have lower homeostatic density, in contractions with the previous models associated with mechanical cell competition. Using a self-propelled Voronoi model, they explored more systematically the conditions that will promote the compression of transformed cells, showing globally that higher Area compressibility and/or lower junctional tension are associated with higher compressibility. Using then an original and novel experimental method to measure bulk compressibility of cell populations, they confirmed that active Ras cells are globally twice more compressible than WT cells. This compressibility correlates with a disruption of adherens junctions. Accordingly, the higher pressure near transformed Ras cells can be completely rescued by increasing cell-cell adhesion through E-cad overexpression, which also reduces the compressibility of the transformed cells. Altogether, these results go along the lines of a previous theoretical work (Gradeci et al. eLife 2021) which was suggesting that reduced stiffness/higher compressibility was essential to promote loser cell elimination. Here, the authors provide for the first time a very convincing experimental measurement and validation of this prediction. Moreover, their modelling approach goes far beyond what was performed before in terms of exploration of conditions promoting compressibility, and their experimental data point at alternative mechanisms that may contribute to mechanical competition.

      Strengths:

      - Original methodologies to perform systematic characterisation of mechanical properties of Ras cells during cell competition, which include a novel method to measure bulk compressibility.<br /> - A very extensive theoretical exploration of the parameters promoting cell compaction in the context of competition.

      We thank the reviewer for their detailed and thoughtful assessment of our study and for recognizing the originality of our methodologies, including the novel bulk compressibility measurement technique and the extensive theoretical exploration of parameters influencing mechanical competition. We are pleased that the reviewer finds our experimental validation and modeling approach convincing and acknowledges the relevance of our findings in advancing the understanding of mechanical cell competition. We will carefully address all the points raised to further clarify and strengthen the manuscript.

      Weaknesses:

      - Most of the theoretical focus is centred on the bulk compressibility, but so far does not really explain the final fate of the transformed cells. Classic cell competition scenario (including the one involving active Ras cells) lead to the elimination of one cell population either by cell extrusion/cell death or global delamination. This aspect is absolutely not explored in this article, experimentally or theoretically, and as such it is difficult to connect all the observables with the final outcome of cell competition. For instance, higher compressibility may not lead to loser status if the cells can withstand high density without extruding compared to the WT cells (and could even completely invert the final outcome of the competition). Down the line, and as suggested in most of the previous models/experiments, the relationship between pressure/density and extrusion/death will be the key factor that determine the final outcome of competition. However, there is absolutely no characterisation of cell death/cell extrusion in the article so far.

      We thank the reviewer for highlighting this important point. We agree that understanding the relationship between pressure, density, and the final outcomes of cell competition, such as extrusion and cell death, is crucial to connecting the mechanical properties to competition outcomes. While extrusion and cell death have been extensively characterized in previous works (e.g., https://www.nature.com/articles/s41467-021-27896-z; https://www.nature.com/articles/ncb1853), we nevertheless recognize the need to address this aspect more explicitly in our study. To this end, we have indeed performed experiments to characterize cell extrusion and cell death under varying conditions of pressure and density. We will incorporate these data into the revised manuscript. These additions will provide a more comprehensive understanding of how mechanical imbalance drives cell competition and determine the final fate of transformed cells.

      - While the compressibility measurement are very original and interesting, this bulk measurement could be explained by very different cellular processes, from modulation of cell shape, to cell extrusion and tissue multilayering (which by the way was already observed for active Ras cells, see for instance https://pubmed.ncbi.nlm.nih.gov/34644109/). This could change a lot the interpretation of this measurement and to which extend it can explain the compression observed in mixed culture. This compressibility measurement could be much more informative if coupled with an estimation of the change of cell aspect ratio and the rough evaluation of the contribution of cell shape changes versus alternative mechanisms.

      We thank the reviewer for raising this important concern. In our model system and within the experimental timescale of our studies involving gel compression microscopy (GCM) experiments, we do not observe tissue multilayering and cell extrusion, as these measurements are performed on homogeneous populations (pure wild-type or pure transformed cell monolayer). However, to address the reviewer’s suggestion, we will include measurements of cell aspect ratio as well as images eliminating the possibility of multilayering/extrusion in the revised manuscript. These results will provide additional insights into the plausible contributions of cell shape changes. Furthermore, our newer results indicate that the compressibility differences arise from variations in the intracellular organization (changed in nuclear and cytoskeletal organization) between wild-type and transformed cells. While a detailed molecular characterization of these underlying mechanisms is beyond the scope of the current manuscript, we acknowledge its importance and plan to explore it in a future study. These revisions will clarify and strengthen the interpretation of our findings.

      - So far, there is no clear explanation of why transformed Ras cells get more compacted in the context of mixed culture compared to pure Ras culture. Previously, the compaction of mutant Scribble cells could be explained by the higher homeostatic density of WT cells which impose their prefered higher density to Scribble mutant (see Wagstaff et al. 2016 or Gradeci et al 2021), however that is not the case of the Ras cells (which have even slightly higher density at confluency). If I understood properly, the Voronoid model assumes some directional movement of WT cell toward transformed which will actively compact the Ras cells through self-propelled forces (see supplementary methods), but this is never clearly discussed/described in the results section, while potentially being one essential ingredient for observing compaction of transformed cells. In fact, this was already described experimentally in the case of Scribble competition and associated with chemoattractant secretion from the mutant cells promoting directed migration of the WT (https://pubmed.ncbi.nlm.nih.gov/33357449/). It would be essential to show what happens in absence of directional propelled movement in the model and validate experimentally whether there is indeed directional movement of the WT toward the transformed cells. Without this, the current data does not really explain the competition process.

      We introduced directional movement of wild-type cells towards neighbouring transformed cells (and a form of active force to be exerted by them), motivated by the tissue compressibility measurements from the Gel Compression Microscopy experiments (Fig. 4E-L). This allowed us to devise an equivalent method of measuring the material response to isotropic compression within the SPV model framework. While the role of directional propelled movement is an area of ongoing investigation and has not been explored extensively within the current study, we emphasize that even without directional propulsion in the model, our results demonstrate compressive stress or elevated pressure, and increased compaction within the transformed population under suitable conditions reported in this work (when k<1), exhibiting a greater tissue-level compressibility in the transformed cells compared to WT cells (Figs. 4C-D), thereby laying the ground for competition. To clarify these concerns, we will provide additional results as well as detailed discussions on the effect of cell movements in compression.

      - Some of the data lack a bit of information on statistic, especially for all the stress microscopy and traction forces where we do no really know how representative at the stress patterns (how many experiment, are they average of several movies ? integrated on which temporal window ?)

      We thank the reviewer for highlighting the need for additional details regarding the statistical representation of our stress microscopy and traction force data. We will address these concerns in the revised manuscript by providing clear descriptions of the number of experiments, the averaging methodology, and the temporal windows used for analysis. Currently, Figs. 2A and 2C represent data from single time points, as the traction and stress landscapes evolve dynamically as transformed cells begin extruding (as shown in Supplementary movie 1). In contrast, Fig. 2H represents data collected from several samples across three independent experiments, all measured at the 3-hour time point following doxycycline induction. This specific time point is critical because it captures the emergence of compressive stresses before extrusion begins, simplifying the analysis and ensuring consistency. We will ensure these details are clearly articulated in the revised text and figure legends.

      Reviewer #2 (Public review):

      The work by Gupta et al. addresses the role of tissue compressibility as a driver of cell competition. The authors use a planar epithelial monolayer system to study cell competition between wild type and transformed epithelial cells expressing HRasV12. They combine imaging and traction force measurements from which the authors propose that wild type cells generate compressive forces on transformed epithelial cells. The authors further present a novel setup to directly measure the compressibility of adherent epithelial tissues. These measurements suggest a higher compressibility of transformed epithelial cells, which is causally linked to a reduction in cell-cell adhesion in transformed cells. The authors support their conclusions by theoretical modelling using a self-Propelled Voronoi model that supports differences in tissue compressibility can lead to compression of the softer tissue type.

      The experimental framework to measure tissue compressibility of adherent epithelial monolayers establishes a novel tool, however additional controls of this measurement appear required. Moreover, the experimental support of this study is mostly based on single representative images and would greatly benefit from additional data and their quantitative analysis to support the authors' conclusions. Specific comments are also listed in the following:

      Major points:

      It is not evident in Fig2A that traction forces increase along the interface between wild type and transformed populations and stresses in Fig2C also seem to be similar at the interface and surrounding cell layer. Only representative examples are provided and a quantification of sigma_m needs to be provided.

      In Figure 1-3 only panel 2G and 2H provide a quantitative analysis, but it is not clear how many regions of interest and clusters of transform cells were quantified.

      We thank the reviewer for their detailed comments and for highlighting the importance of additional quantitative analyses to support our conclusions. We appreciate their recognition of our novel experimental framework to measure tissue compressibility and the overall approach of our study. Regarding Fig. 2A and Fig. 2C, we acknowledge the need for further clarity. While the traction forces and stress patterns may not appear uniformly distinct at the interface in the representative images, these differences are more evident at specific time points before extrusion begins. Please note that the traction and stress landscapes evolve dynamically as transformed cells begin extruding (as shown in Supplementary movie 1). We will include a quantification of σ<sub>m</sub>​ and additional data from multiple experiments to substantiate the observations and address this concern in the revised manuscript. Currently, the data in Fig. 2G and Fig. 2H represent several regions of interest and transformed cell clusters collected from three independent experiments, all analyzed at the 3-hour time point after doxycycline induction. This time point was chosen because it captures the compressive stress emergence without interference from extrusion processes, simplifying the analysis. We will expand these sections with detailed descriptions of the sample sizes and statistical analyses to ensure greater transparency and reproducibility. These revisions will provide a stronger quantitative foundation for our findings and address the reviewer's concerns.

      Several statements appear to be not sufficiently justified and supported by data.<br /> For example the statement on pg 3. line 38 seems to lack supportive data 'This comparison revealed that the thickness of HRasV12-expressing cells was reduced by more than 1.7-fold when they were surrounded by wild type cells. These observations pointed towards a selective, competition-dependent compaction of HRasV12-expressing transformed cells but not control cells, in the intestinal villi of mice.'  Similarly, the statement about a cell area change of 2.7 fold (pg 3 line 47) lacks support by measurements.

      We thank the reviewer for pointing out the need for more supportive data to justify several statements in the manuscript. Specifically, the observation regarding the reduction in the thickness of HRas<sup>V12</sup>-expressing cells by more than 1.7-fold when surrounded by wild-type cells, and the statement about a 2.7-fold change in cell area, will be supported by detailed measurements. In the revised manuscript, we will include quantitative analyses with additional figures that clearly document these changes. These figures will provide representative images, statistical summaries, and detailed descriptions of the measurements to substantiate these claims. We appreciate the reviewer highlighting these areas and will ensure that all statements are robustly backed by data.

      What is the rationale for setting 𝐾p = 1 in the model assumptions if clear differences in junctional membranes of transformed versus wild type cells occur, including dynamic ruffling? This assumption does not seem to be in line with biological observations.

      While the specific role of K<sub>p</sub> in the differences observed in the junctional membranes of transformed versus WT cells, including dynamical ruffling, is not directly studied in this work, our findings indicate that the lower junctional tension (weaker and less stable cellular junctions) in mutant cells is influenced primarily by competition in the dimensionless cell shape index within the model. This also suggests a larger preferred cell perimeter (P<sub>0</sub>) for mutant cells, corresponding to their softer, unjammed state. Huang et al. (https://doi.org/10.1039/d3sm00327b) have previously argued that a high P<sub>0</sub> may, in some cases, result from elevated cortical tension along cell edges, or reflect weak membrane elasticity, implying a smaller K<sub>p</sub>. While this connection could be an intriguing avenue for future exploration, we emphasize that K<sub>p</sub> is not expected to alter any of the key findings or conclusions reported in this work. We will include any required analysis and corresponding discussions in the revised manuscript.

      The novel approach to measure tissue compressibility is based on pH dependent hydrogels. As the pH responsive hydrogel pillar is placed into a culture medium with different conditions, an important control would be if the insertion of this hydrogel itself would change the pH or conditions of the culture assays and whether this alters tissue compressibility or cell adhesion. The authors could for example insert a hydrogel pillar of a smaller diameter that would not lead to compression or culture cells in a larger ring to assess the influence of the pillar itself.

      We appreciate the reviewer’s insightful comment regarding the potential effects of the pH-responsive hydrogel pillar on the culture conditions and tissue compressibility. In our experiments, the expandable hydrogels are kept separate from the cells until the pH of the hydrogel is elevated to 7.4, ensuring that the hydrogel does not impact the culture environment. However, we acknowledge the concern and will include additional controls in the revised manuscript. Specifically, we will insert a hydrogel pillar with a smaller diameter that would not induce compression on culture cells in a larger ring to assess any potential influence of the hydrogel pillar itself. This will help to further validate our experimental setup.

      The authors focus on the study of cell compaction of the transformed cells, but how does this ultimately lead to a competitive benefit of wild type cells? Is a higher rate of extrusion observed and associated with the compaction of transformed cells or is their cell death rate increased? While transformed cells seem to maintain a proliferative advantage it is not clear which consequences of tissue compression ultimately drive cell competition between wild type and transformed cells.

      We thank the reviewer for highlighting this important point. We agree that understanding how tissue compression leads to a competitive advantage for wild type cells is crucial. While our current study focuses on the mechanical properties of transformed cells leading to the compaction and subsequent extrusion of the transformed cells, we recognize the need to explicitly connect these properties to the final outcomes of cell competition, such as extrusion or cell death. Although extrusion and cell death have been extensively characterized in previous studies (e.g., https://www.nature.com/articles/s41467-021-27896-z; https://www.nature.com/articles/ncb1853), we have indeed performed additional experiments to investigate the relationship between pressure, density, and these processes in our system. In the revised manuscript, we will include these new data, which will help to clarify how mechanical stress, driven by tissue compression, contributes to the competition between wild type and transformed cells and influences their eventual fate.

      The argumentation that softer tissues would be more easily compressed is plausible. However, which mechanism do the authors suggest is generating the actual compressive stress to drive the compaction of transformed cells? They exclude a proliferative advantage of wild type cells, which other mechanisms will generate the compressive forces by wild type cells?

      We thank the reviewer for raising this important question. As rightly pointed out by the reviewer indeed in our model system, we do not observe a proliferative advantage for the wild-type cells, and the compressive forces exerted by the wild-type cells are due to their intrinsic mechanical properties, such as lesser compressibility compared to the transformed cells. This difference in compressibility results in wild-type cells generating compressive stress at the interface with the transformed cells. Regarding the mechanism underlying the increased compressibility of the transformed cells, our newer findings indicate that the differences in compressibility arise from variations in the intracellular organization, specifically changes in nuclear and cytoskeletal organization between wild-type and transformed cells. While a detailed molecular characterization of these mechanisms is beyond the scope of the current manuscript, we acknowledge its significance and plan to investigate it in future work. We will, nevertheless, include a detailed discussion on the mechanism underlying the differential compressibility of wild-type and transformed cells in the revised manuscript.

    5. eLife Assessment

      In this important study, the authors combine innovative experimental approaches, including direct compressibility measurements and traction force analyses, with theoretical modeling to propose that wild-type cells exert compressive forces on softer HRasV12-transformed cells, influencing competition outcomes. The data generally provide solid evidence that transformed epithelial cells exhibit higher compressibility than wild-type cells, a property linked to their compaction during mechanical cell competition. However, the study would benefit from further characterization of how compression affects the behavior of HRasV12 cells and clearer causal links between compressibility and competition outcomes.

    6. Reviewer #1 (Public review):

      Summary:

      In this article, Gupta and colleagues explore the parameters that could promote the elimination of active Ras cells when surrounded by WT cells. The elimination of active Ras cells by surrounding WT cells was previously described extensively and associated with a process named cell competition, a context dependant elimination of cells. Several mechanisms have been associated with competition, including more recently elimination processes based on mechanical stress. This was explored theoretically and experimentally and was either associated with differential growth and sensitivity to pressure and/or differences in homeostatic density/pressure. This was extensively validated for the case of Scribble mutant cells which are eliminated by WT MDCK cells due to their higher homeostatic density. However, there has been so far very little systematic characterisation of the mechanical parameters and properties of these different cell types and how this could contribute to mechanical competition.

      Here, the authors used the context of active Ras cells in MDCK cells (with some observations in vivo in mice gut which are a bit more anecdotal) to explore the parameters causal to Ras cell elimination. Using for the first time traction force microscopy, stress microscopy combined with Bayesian inference, they first show that clusters of active Ras cells experience higher pressure compared to WT. Interestingly, this occurs in absence of differences in growth rate, and while Ras cells seems to have lower homeostatic density, in contractions with the previous models associated with mechanical cell competition. Using a self-propelled Voronoi model, they explored more systematically the conditions that will promote the compression of transformed cells, showing globally that higher Area compressibility and/or lower junctional tension are associated with higher compressibility. Using then an original and novel experimental method to measure bulk compressibility of cell populations, they confirmed that active Ras cells are globally twice more compressible than WT cells. This compressibility correlates with a disruption of adherens junctions. Accordingly, the higher pressure near transformed Ras cells can be completely rescued by increasing cell-cell adhesion through E-cad overexpression, which also reduces the compressibility of the transformed cells. Altogether, these results go along the lines of a previous theoretical work (Gradeci et al. eLife 2021) which was suggesting that reduced stiffness/higher compressibility was essential to promote loser cell elimination. Here, the authors provide for the first time a very convincing experimental measurement and validation of this prediction. Moreover, their modelling approach goes far beyond what was performed before in terms of exploration of conditions promoting compressibility, and their experimental data point at alternative mechanisms that may contribute to mechanical competition.

      Strengths:

      - Original methodologies to perform systematic characterisation of mechanical properties of Ras cells during cell competition, which include a novel method to measure bulk compressibility.<br /> - A very extensive theoretical exploration of the parameters promoting cell compaction in the context of competition.

      Weaknesses:

      - Most of the theoretical focus is centred on the bulk compressibility, but so far does not really explain the final fate of the transformed cells. Classic cell competition scenario (including the one involving active Ras cells) lead to the elimination of one cell population either by cell extrusion/cell death or global delamination. This aspect is absolutely not explored in this article, experimentally or theoretically, and as such it is difficult to connect all the observables with the final outcome of cell competition. For instance, higher compressibility may not lead to loser status if the cells can withstand high density without extruding compared to the WT cells (and could even completely invert the final outcome of the competition). Down the line, and as suggested in most of the previous models/experiments, the relationship between pressure/density and extrusion/death will be the key factor that determine the final outcome of competition. However, there is absolutely no characterisation of cell death/cell extrusion in the article so far.

      - While the compressibility measurement are very original and interesting, this bulk measurement could be explained by very different cellular processes, from modulation of cell shape, to cell extrusion and tissue multilayering (which by the way was already observed for active Ras cells, see for instance https://pubmed.ncbi.nlm.nih.gov/34644109/). This could change a lot the interpretation of this measurement and to which extend it can explain the compression observed in mixed culture. This compressibility measurement could be much more informative if coupled with an estimation of the change of cell aspect ratio and the rough evaluation of the contribution of cell shape changes versus alternative mechanisms.

      - So far, there is no clear explanation of why transformed Ras cells get more compacted in the context of mixed culture compared to pure Ras culture. Previously, the compaction of mutant Scribble cells could be explained by the higher homeostatic density of WT cells which impose their prefered higher density to Scribble mutant (see Wagstaff et al. 2016 or Gradeci et al 2021), however that is not the case of the Ras cells (which have even slightly higher density at confluency). If I understood properly, the Voronoid model assumes some directional movement of WT cell toward transformed which will actively compact the Ras cells through self-propelled forces (see supplementary methods), but this is never clearly discussed/described in the results section, while potentially being one essential ingredient for observing compaction of transformed cells. In fact, this was already described experimentally in the case of Scribble competition and associated with chemoattractant secretion from the mutant cells promoting directed migration of the WT (https://pubmed.ncbi.nlm.nih.gov/33357449/). It would be essential to show what happens in absence of directional propelled movement in the model and validate experimentally whether there is indeed directional movement of the WT toward the transformed cells. Without this, the current data does not really explain the competition process.

      - Some of the data lack a bit of information on statistic, especially for all the stress microscopy and traction forces where we do no really know how representative at the stress patterns (how many experiment, are they average of several movies ? integrated on which temporal window ?)

    7. Reviewer #2 (Public review):

      The work by Gupta et al. addresses the role of tissue compressibility as a driver of cell competition. The authors use a planar epithelial monolayer system to study cell competition between wild type and transformed epithelial cells expressing HRasV12. They combine imaging and traction force measurements from which the authors propose that wild type cells generate compressive forces on transformed epithelial cells. The authors further present a novel setup to directly measure the compressibility of adherent epithelial tissues. These measurements suggest a higher compressibility of transformed epithelial cells, which is causally linked to a reduction in cell-cell adhesion in transformed cells. The authors support their conclusions by theoretical modelling using a self-Propelled Voronoi model that supports differences in tissue compressibility can lead to compression of the softer tissue type.

      The experimental framework to measure tissue compressibility of adherent epithelial monolayers establishes a novel tool, however additional controls of this measurement appear required. Moreover, the experimental support of this study is mostly based on single representative images and would greatly benefit from additional data and their quantitative analysis to support the authors' conclusions. Specific comments are also listed in the following:

      Major points:

      It is not evident in Fig2A that traction forces increase along the interface between wild type and transformed populations and stresses in Fig2C also seem to be similar at the interface and surrounding cell layer. Only representative examples are provided and a quantification of sigma_m needs to be provided.

      In Figure 1-3 only panel 2G and 2H provide a quantitative analysis, but it is not clear how many regions of interest and clusters of transform cells were quantified.

      Several statements appear to be not sufficiently justified and supported by data.<br /> For example the statement on pg 3. line 38 seems to lack supportive data 'This comparison revealed that the thickness of HRasV12-expressing cells was reduced by more than 1.7-fold when they were surrounded by wild type cells. These observations pointed towards a selective, competition-dependent compaction of HRasV12-expressing transformed cells but not control cells, in the intestinal villi of mice.'<br /> Similarly, the statement about a cell area change of 2.7 fold (pg 3 line 47) lacks support by measurements.

      What is the rationale for setting 𝐾p = 1 in the model assumptions if clear differences in junctional membranes of transformed versus wild type cells occur, including dynamic ruffling? This assumption does not seem to be in line with biological observations.

      The novel approach to measure tissue compressibility is based on pH dependent hydrogels. As the pH responsive hydrogel pillar is placed into a culture medium with different conditions, an important control would be if the insertion of this hydrogel itself would change the pH or conditions of the culture assays and whether this alters tissue compressibility or cell adhesion. The authors could for example insert a hydrogel pillar of a smaller diameter that would not lead to compression or culture cells in a larger ring to assess the influence of the pillar itself.

      The authors focus on the study of cell compaction of the transformed cells, but how does this ultimately lead to a competitive benefit of wild type cells? Is a higher rate of extrusion observed and associated with the compaction of transformed cells or is their cell death rate increased? While transformed cells seem to maintain a proliferative advantage it is not clear which consequences of tissue compression ultimately drive cell competition between wild type and transformed cells.

      The argumentation that softer tissues would be more easily compressed is plausible. However, which mechanism do the authors suggest is generating the actual compressive stress to drive the compaction of transformed cells? They exclude a proliferative advantage of wild type cells, which other mechanisms will generate the compressive forces by wild type cells?

    1. eLife Assessment

      This manuscript presents a valuable study utilizing an in vitro organoid system to recapitulate the developmental process of the olfactory epithelium. The authors provided solid evidence indicating that a combination of niche factors can induce organoid development and give rise to multiple cell types. However, the calcium imaging part of the study could be seen as a limitation.

    2. Reviewer #1 (Public review):

      Summary:

      Olfaction is fundamental to the survival and reproduction of animals, as they rely on olfactory sensory neurons (OSNs) in the olfactory epithelium (OE) to detect volatile chemical cues in their environment. Most mature OSNs adhere to the 'one neuron one receptor' rule, wherein each neuron selects a single receptor for expression from a large repertoire of olfactory receptor genes. The precise regulation of olfactory receptor expression is critical for accurate odorant recognition. Since the seminal discovery of olfactory receptors by Linda Buck and Richard Axel in 1991, substantial efforts have been made to elucidate the mechanisms underlying OSN differentiation and receptor expression. However, these processes remain incompletely understood. The development of in vitro olfactory epithelium organoids offers a promising platform to address these fundamental questions. The in vivo OE is composed of a complex array of cell types, which has posed a significant challenge for recapitulating its structure and function in vitro. Previous attempts to generate olfactory organoids from adult human or mouse OE cells yielded tissue containing OSNs, but these constructs were structurally distinct from the in vivo OE and lacked the characteristic pseudostratified epithelium.

      In this study, Kazuya et al. successfully established olfactory epithelium organoids from E13.5 mouse embryonic OE stem cells, which developed into a pseudostratified structure closely resembling the native OE. They further examined the influence of different cultural conditions on OE differentiation, confirming the pivotal role of niche factors in promoting OSN development. Through immunofluorescence staining and single-cell RNA sequencing, they demonstrated that the organoids encompass a diverse range of cell types analogous to those present in the in vivo OE. Notably, calcium imaging revealed that the organoids were functionally responsive to odorants, and single-cell transcriptomic analysis showed that the majority of mature OSNs conformed to the 'one neuron one receptor' rule. Using these organoids, the authors performed a preliminary investigation into the developmental trajectories of OSNs, developed a tool to predict subpopulations of mature OSNs, and identified novel markers associated with OSN maturation. Collectively, the data provide compelling evidence for the reliability and utility of this olfactory organoid model. Further in-depth analyses may enable readers to better assess and utilize this tool to advance the study of olfactory biology.

      Strengths:

      The authors developed and established olfactory epithelium organoids, with immunofluorescence imaging confirming the presence of a pseudostratified structure similar to that of the in vivo olfactory epithelium, representing a significant advancement. Single-cell sequencing and calcium imaging further demonstrated the utility of these organoids, as they contain multiple cell types analogous to the in vivo olfactory epithelium. Importantly, they are physiologically functional, capable of responding to odor stimuli.

      Weakness:

      Although the authors have made significant progress in the technique, there are some gaps in understanding its underlying principles. First, it remains unclear what specific characteristics of E13.5 embryonic olfactory stem cells enable them to generate organoids in vitro that more closely resemble the in vivo olfactory epithelium, compared to adult mouse olfactory stem cells. Second, it is not clearly defined which specific cell type(s) from the embryonic olfactory epithelium give rise to these organoids, and the efficiency of organoid formation from the isolated cells also warrants further clarification.

    3. Reviewer #2 (Public review):

      Summary:

      Suzuki and colleagues aim to develop an in vitro organoid system to recapitulate the developmental process of the olfactory epithelium. The authors have succeeded in using a combination of niche factors to induce organoid development, which gives rise to multiple cell types including those with characteristics of mature olfactory sensory neurons. By comparing different cultural media in inducing lineage specification in the organoids, the authors show that the niche factors play an important role in the neuronal lineage whereas serum promotes the development of the respiratory epithelium. The authors further utilized single-cell RNASeq and trajectory analysis to demonstrate that the organoids recapitulate the developmental process of the olfactory epithelium and that some of the factory sensory neurons express only one receptor type per cell. Using these analyses, the authors proposed that a specific set of guidance modules are associated with individual receptor types to enable the formation of the factory map.

      Strengths:

      The strength of the paper is that the authors have demonstrated that olfactory epithelium organoids can develop from dissociated cells from embryonic or tissue. This provides a valuable tool for studying the development of processes of the factory epithelium in vitro. Defining various factors in the media that influence the development trajectories of various cell types also provides valuable information to guide further development of the method. Single-cell RNA-Seq experiments provide information about the developmental processes of the olfactory system.

      Weaknesses:

      The manuscript is also marked by a number of weaknesses. The premise of the studies is not well argued. The authors set out to use organoid culture to study the developmental process in order to unravel the mechanisms of single receptor choice, and its role in setting up the factory map. However, the paper has mostly focused on characterizing the organization rather than providing insights into the problem. The statement that the organoids can develop from single cells is misleading, because it's mostly likely that organoids develop after the dissociated cells form aggregates before developing into organoids. It is not known whether coarsely separated tissue chunks can develop into organoids with the same characteristics. Re-aggregation of the cells to form organoids is in and of itself is interesting. Unfortunately, the heterogeneity of the cells and how they contribute to the development of overnight is not explored. There is also a missed opportunity to compare single-cell RNASeq data from this study with existing ones. The in vitro system is likely to be different from embryonic development. It is critical to compare and determine how much the organoid is recapitulating the development of the OSNs in vivo. There are a number of comprehensive datasets from the OE in addition to that presented in the Fletcher paper. Finally, the quality of the functional assay (calcium imaging) of factory sensory neurons is poor. Experiments are of high quality are needed to verify the results.

      Major points:

      (1) Adding FBS in organoid culture medium has been shown to negatively affect the organoid formation and growth. Previous OE organoids culture method did not use FBS. Also, day 10 is an odd choice to compare the two conditions after showing day 20 of NF+ culture shows a better differentiation state. It is not known whether and how the differentiation may be different on day 20. Moreover, comparing Figure 2R to 2S, FBS treatment alone appears to have not only more Foxj1+ cells but also more Tuj1+ cells than NFs/FBS. This is inconsistent with the model. The authors should provide statistics for Tuj1+ cells as well.

      (2) As opposed to the statement in the manuscript, Plxnb2 had been shown to be expressed by the OSNs (Mclntyre et al. 2010; JNR), specifically in immature OSNs. It would be important to mention that Plxnb2 is expressed in OMP+ OSNs in the OE organoid system and its potential reasons to better guide the readers of the system mimicking the in vivo OSNs. Similarly, OSN expression of Cdh2 has been shown by Akins and colleagues. As Plxnb2 showed an expression pattern (immunofluorescence) with an anterior-posterior axis while Cdh2 expression level was not, it would be informative to show the odorant receptor types regarding the expression pattern of Plxnb2 (versus that of Cdh2) using single cell RNAseq data4.

      (3) There is no real layering of the organoids, although some cells show biases toward one side or the other in some regions of the organoid. The authors should not make a sweeping claim that the organoids establish layered structures.

      (4) Figure 2P, it is clear whether OMP is present in the cell bodies. The signal is not very convincing. Even the DAPI signal does not seem to be on a comparable scale compared to Figures 2N and 2O.

      (5) Annotation of the cell types in different single-cell RNA-Seq analysis. The iOSN is only marked in Figure 3A. In the marker expression panel, it appears that those marked as mOSN have high GAP43, which are an iOSN marker. These discrepancies are not detailed nor discussed.

      (6) The authors should merge the single-cell datasets from day 10 organoids cultured in NF-medium and FBS-medium to compare their differences.

      (7) The quality of the calcium imaging experiment is poor. Labeling and experimental details are not provided. The concentration of IVA, the manner of its delivery, and delivery duration are not provided. How many ROIs have been imaged, and what percentage of them responded to IVA? Do they respond to more than one odor? Do they respond to repeated delivery? There is no control for solution osmolarity. Cell body response was not recorded. Given that only a small number of cells express a receptor, it seems extraordinary that these axons respond to IVA receptors. The authors should also determine whether IVA receptor genes are found in their dataset.

    4. Reviewer #3 (Public review):

      Summary:

      The present work by Suzuki et al seeks to develop a new embryonic olfactory epithelium organoid culture model, to study OR gene expression and mechanisms involved in epithelium-to-bulb targeting. They characterize an organoid culture derived from E13 mouse olfactory tissue, using RT-qPCR, immunostaining, limited calcium imaging, and single-cell RNA-seq. Main findings show that the cultures produce major olfactory cell types; many olfactory neurons express a single OR; scSeq analysis identifies transcriptional programs associated with specific OR class expressions that may help define mechanisms involved in projection to specific bulb sites (glomeruli).

      Strengths:

      The organoid model is generally well-characterized and may be a useful approach for studying this question and other problems, such as basal cell lineage choice or damage and repair mechanisms. Overall, the paper is well-written, and the figures are of high quality.

      The cultures, produced from E13 mice, appear to produce HBCs, GBCs, neurons, and non-neural cells, providing an important tool. I think a really interesting question is: when do HBCs first appear in these cultures? Developmentally, in rodents, HBCs do not arise until near the end of gestation, and the OE cell populations are instead made from a more GBC-like cell (keratin negative, p63 negative) that proliferates as an apical or basal progenitor. The cell type and architectural descriptions used here repeatedly are really descriptions of the adult OE, yet the cultures are made from E13 mouse olfactory epithelium. Perhaps an important question could be addressed by this model - how this specific adult reserve epithelial stem cell (the HBC) is generated remains unclear. HBCs are a reserve multipotential cell that reconstitutes the entire olfactory epithelium in adults following severe injury, yet is not present during embryonic development until after the epithelium has been largely generated.

      Weaknesses:

      The paper should discuss the transcriptional programs identified here that correlate with OR class expression in the context of findings from Tsukahara et al, Cell 2021. Tsukahara identified from in vivo olfactory neuron scSeq fixed gene expression programs defining olfactory neuron position in AP or DV axes correlating highly with OR expression.

      While the current findings do define the expression of putative targeting, guidance or adhesion molecules in specific OR-expressing neurons in culture, the current results do not provide any experimental evidence that glomerulus targeting is actually mediated by these factors. Further discussion of this limitation may be helpful, along with a discussion of additional approaches to explore these questions.

      Calcium imaging: it is not clear why isovaleric acid was chosen as a stimulus for Ca imaging. Is it's known receptor expressed widely in these cultures? Why not use a cocktail of odorants, to activate a broader range of ORs, as has been widely used in in vitro calcium imaging studies of olfactory neurons? Can you show positive control activation (i.e. high potassium)?

      How many unique ORs are identified as expressed in the cultures? Figure 5 indicates only 78 genes. Since mice express about 1200 ORs, is this a limitation? How many replicates (individual cells) are found to express each of the ORs? Again, Figure 5 suggests only 202 cells are OR+? Is this enough to define the gene expression programs reliably associated with a given OR or OR class? More detail on this analysis would be helpful.

    1. eLife Assessment

      This manuscript reports on an FLIM-based calcium biosensor, G-CaFLITS. It represents an important contribution to the field of genetically-encoded fluorescent biosensors, and will serve as a practical tool for the FLIM imaging community. The paper provides convincing evidence of G-CaFLITS's photophysical properties and its advantages over previous biosensors such as Tq-Ca-FLITS. Although the benefits of G-Ca-FLITS over Tq-Ca-FLITS are limited by the relatively small wavelength shift, it presents some advantages in terms of compatibility with available instrumentation and brightness consistency.

    2. Reviewer #1 (Public review):

      Summary:

      van der Linden et al. report on the development of a new green-fluorescent sensor for calcium, following a novel rational design strategy based on the modification of the cyan-emissive sensor mTq2-CaFLITS. Through a mutational strategy similar to the one used to convert EGFP into EYFP, coupled with optimization of strategic amino acids located in proximity of the chromophore, they identify a novel sensor, G-CaFLITS. Through a careful characterization of the photophysical properties in vitro and the expression level in cell cultures, the authors demonstrate that G-CaFLITS combines a large lifetime response with a good brightness in both the bound and unbound states. This relative independence of the brightness on calcium binding, compared with existing sensors that often feature at least one very dim form, is an interesting feature of this new type of sensors, which allows for a more robust usage in fluorescence lifetime imaging. Furthermore, the authors evaluate the performance of G-CaFLITS in different subcellular compartments and under two-photon excitation in Drosophila. While the data appears robust and the characterization thorough, the interpretation of the results in some cases appears less solid, and alternative explanations cannot be excluded.

      Strengths:

      - The approach is innovative and extends the excellent photophysical properties of the mTq2-based to more red-shifted variants. While the spectral shift might appear relatively minor, as the authors correctly point out, it has interesting practical implications, such as the possibility to perform FLIM imaging of calcium using widely available laser wavelengths, or to reduce background autofluorescence, which can be a significant problem in FLIM.<br /> - The screening was simple and rationally guided, demonstrating that, at least for this class of sensors, a careful choice of screening positions is an excellent strategy to obtain variants with large FLIM responses without the need of high-throughput screening.<br /> - The description of the methodologies is very complete and accurate, greatly facilitating the reproduction of the results by others, or the adoption of similar methods. This is particularly true for the description of the experimental conditions for optimal screening of sensor variants in lysed bacterial cultures.<br /> - The photophysical characterization is very thorough and complete, and the vast amount of data reported in the supporting information is a valuable reference for other researchers willing to attempt a similar sensor development strategy. Particularly well done is the characterization of the brightness in cells, and the comparison on multiple parameters with existing sensors.<br /> - Overall, G-CaFLITS displays excellent properties for a FLIM sensor: very large lifetime change, bright emission in both forms and independence from pH in the physiological range.

      Weaknesses:

      - The paper demonstrates the application of G-CaFLITS in various cellular sub-compartments without providing direct evidence that the sensor's response is not affected by the targeting. Showing at least that the lifetime values in the saturated state are similar in all compartments would improve the robustness of the claims.<br /> - In some cases, the interpretation of the results is not fully convincing, leaving alternative hypotheses as a possibility. This is particularly the case for the claim of the origin of the strongly reduced brightness of G-CaFLITS in Drosophila. The explanation of the intensity changes of G-CaFLITS also shows some inconsistency with the basic photophysical characterization.<br /> - While the claims generally appear robust, in some cases they are conveyed with a lack of precision. Several sentences in the introduction and discussion could be improved in this regard. Furthermore, the use of the signal-to-noise ratio as a means of comparison between sensors appears to be imprecise, since it is dependent on experimental conditions.

    3. Reviewer #2 (Public review):

      Summary:

      Van der Linden et al. describe the addition of the T203Y mutation to their previously described fluorescence lifetime calcium sensor Tq-Ca-FLITS to shift the fluorescence to green emission. This mutation was previously described to similarly red-shift the emission of green and cyan FPs. Tq-Ca-FLITS_T203Y behaves as a green calcium sensor with opposite polarity compared with the original (lifetime goes down upon calcium binding instead of up). They then screen a library of variants at two linker positions and identify a variant with slightly improved lifetime contrast (Tq-Ca-FLITS_T203Y_V27A_N271D, named G-Ca-FLITS). The authors then characterize the performance of G-Ca-FLITS relative to Tq-Ca-FLITS in purified protein samples, in cultured cells, and in the brains of fruit flies.

      Strengths:

      This work is interesting as it extends their prior work generating a calcium indicator scaffold for fluorescent protein-based lifetime sensors with large contrast at a single wavelength, which is already being adopted by the community for production of other FLIM biosensors. This work effectively extends that from cyan to green fluorescence. While the cyan and green sensors are not spectrally distinct enough (~20-30nm shift) to easily multiplex together, it at least shifts the spectra to wavelengths that are more commonly available on commercial microscopes.

      The observations of organellar calcium concentrations were interesting and could potentially lead to new biological insight if followed up.

      Weaknesses:

      The new G-Ca-FLITS sensor doesn't appear to be significantly improved in performance over the original Tq-Ca-FLITS, no specific benefits are demonstrated.

      Although it was admirable to attempt in vivo demonstration in Drosophila with these sensors, depolarizing the whole brain with high potassium is not a terribly interesting or physiological stimulus and doesn't really highlight any advantages of their sensors; G-Ca-FLITS appears to be quite dim in the flies.

    4. Reviewer #3 (Public review):

      Summary:

      The authours present a variant of a previously described fluorescence lifetime sensor for calcium. Much of the manuscript describes the process of developing appropriate assays for screening sensor variants, and thorough characterization of those variants (inherent fluorescence characteristics, response to calcium and pH, comparisons to other calcium sensors). The final two figures show how the sensor performs in cultured cells and in vivo drosophila brains.

      Strengths:

      The work is presented clearly and the conclusion (this is a new calcium sensor that could be useful in some circumstances) is supported by the data.

      Weaknesses:

      There are probably few circumstances where this sensor would facilitate experiments (calcium measurements) that other sensors would prove insufficient.

    1. eLife Assessment

      This paper presents useful findings that misfolded proteins in the nucleus can impair proteasomal degradation and activate p53. The results supporting the findings are largely solid, but incomplete. The manuscript could be strengthened by including more quantitative data analyses and additional experimentation/discussions on the mechanism of p53 activation by misfolded nuclear proteins. The work will be interesting primarily to scientists studying protein homeostasis.

    2. Joint Public Review:

      Summary of the work:

      This manuscript defines the differential stress response signaling induced by nuclear and cytoplasmic protein misfolding. To accomplish this, the authors used superfolder GFP fused to a destabilized FKBP protein-bearing targeting signal for cytosolic or nuclear localization. When cells were grown in the presence of the ligand Shield-1, this protein was stable, allowing fluorescence of the GFP protein. Upon removal of Shield-1, the FKBP protein is unfolded targeting the entire fusion protein to proteasomal degradation. Using this approach, they performed RNAseq to probe similarities and differences in transcriptional responses to the accumulation of unfolded proteins in the cytosol or nucleus. As expected, many of the pathways upregulated in both datasets involved protein homeostasis pathways such as the proteasome and cytosolic chaperones. The increase in proteasome subunits correlated with the stabilization of Nrf1 under these conditions, suggesting that protein misfolding might induce proteasome subunits through an Nrf1-dependent mechanism, but this was not explicitly tested. In contrast, the authors report that the p53-dependent transcriptional response was selectively induced by protein misfolding stress in the nucleus, but not the cytosol. Deletion of p53 blocked this increase, indicating that this response is attributable to p53 stabilization. The increased p53 transcriptional activity corresponded with the stabilization of p53 and its target p21 in cells subjected to nuclear but not cytosolic protein misfolding stress. Using a reporter of nuclear proteasome activity, they show that nuclear proteasome activity is reduced in cells following protein misfolding stress in the nucleus, indicating that the stabilization of p53 (and other transcription factors such as NRF1) might be attributed to reduced proteasomal degradation. Additionally, the authors showed that nuclear misfolding stress also induces cell cycle arrest. However, this effect was not dependent on p53 deletion, indicating that this is mediated by other unknown mechanisms.

      Major strengths and weaknesses of the methods and results:

      The findings reported here define specific transcriptional outputs induced by targeted protein misfolding stress in the nucleus and cytosol, revealing new insights into the organelle-specific stress signaling. The approach is interesting and effective at revealing cellular responses induced by compartment-specific protein misfolding stress.

      One major weakness of the study is the lack of mechanistic follow-up for the transcriptional study. For example, what is the mechanistic basis for p53 stabilization by nuclear-destabilized domain (Nuc DD)? Is this entirely caused by diminished nuclear degradation activity as shown in Figure 6 or are there additional factors to be considered? If limited proteasome degradation capacity is the main reason for p53 upregulation, wouldn't the authors also see stabilization of other short-lived transcription factors? The fact that Nrf1 and Nrf2 are also stabilized by Nuc DD is consistent with the authors' hypothesis. On the other hand, if Nuc DD also affects other short-lived transcription factors such as c-fos or c-myc via proteasome inhibition, why did the gene expression analysis only pick up the p53 pathway as the one differentially regulated by Nuc DD? Would this imply that only p53 is specifically targeted by the nuclear proteasome, whereas other short-lived transcription factors are degraded either by the cytosolic proteasome or by both nuclear and cytosolic proteasome like Nrf1? Is there any evidence in the literature that supports this speculation? Additionally, how does Nuc DD affect the UPS system in the nucleus? Does it clog the proteasome directly or affect other assisting factors like chaperones or ubiquitinating enzymes? Lastly, it isn't clear what the functional implications of p53 stabilization would be for cells subjected to nuclear protein misfolding stress, particularly as the small effect on cell cycle arrest is not dependent on p53. In the end, the lack of mechanistic and/or functional follow-up reduces the overall importance of this manuscript. While the reviewers do not expect the authors to answer all these questions by experiments, additional work/clarifications/discussions along these lines would significantly improve the paper (see the recommendations).

      Another major weakness is the lack of statistical analysis (SA) to better support their conclusions. In fact, no SA was provided for many figures even though the authors tried to make many comparisons.

      The failure of the DD reporter to mount a significant heat shock response was puzzling. The presence of non-native proteins is the primary trigger for the heat shock response, but the authors acknowledge that inducible chaperones such as Hspa1a/b and Hsp90aa1 were not significantly changed in their system (page 8). Could this suggest a problem with the approach? What exactly is the nature of the stress mounted by Nuc DD?

      The cell cycle data presented in Figure 5 is less robust, particularly as the p53 data in panels C and D was collected only once.

      The Western blot data shown in Figure 6 does not have quantification to show how representative the blot is and how robust the changes in protein levels are over time. Western blots are known to be variable with different replicates and therefore the authors need to mention the number of biological repeats represented by the blot.

    1. eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

    2. Reviewer #1 (Public review):

      This manuscript discusses from a theory point of view he mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

    3. Reviewer #2 (Public review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript "Cluster size determines morphology of transcription factories in human cells".

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal.

      Major points

      (1) My first point concerns terminology. The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequence-specific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a "small" transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point iii are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units' activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo's group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors' predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors' key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      RNA promotes the formation of spatial compartments in the nucleus<br /> https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      Complex multi-enhancer contacts captured by genome architecture mapping<br /> https://www.nature.com/articles/nature21411

      Cell-type specialization is encoded by specific chromatin topologies<br /> https://www.nature.com/articles/s41586-021-04081-2

      Super-enhancer interactomes from single cells link clustering and transcription<br /> https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mir have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and "stressed cells". This distinction was initially established by Ibrahim Cisse's lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      Mediator and RNA polymerase II clusters associate in transcription-dependent condensates<br /> https://www.science.org/doi/10.1126/science.aar4199

      Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering<br /> https://www.science.org/doi/10.1126/sciadv.aay6515

      RNA polymerase II clusters form in line with surface condensation on regulatory chromatin<br /> https://www.embopress.org/doi/full/10.15252/msb.202110272

      If "morphology" should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper:

      Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin<br /> https://www.science.org/doi/10.1126/science.ade5308

      (5) The statement "scripts are available upon request" is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

    4. Reviewer #3 (Public review):

      Summary:<br /> In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) "gene expression" patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths:<br /> This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses:<br /> Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      Authors neglect protein-protein interactions. Without protein-protein interactions, condensate formation in natural systems is unlikely to happen.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn't they? What is specific about transcription factor or transcription here in this model?<br /> What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

    1. eLife Assessment

      The role of ACVR2A is potentially of importance to both the biology of trophoblast cells and to the pathogenesis of preeclampsia. In this manuscript, the authors have taken a useful first step towards better understanding this protein using a loss of function model in trophoblast cell lines and then examining invasion, proliferation, and transcription in these cells. At present, the results of this study are only based on the observation of in vitro phenotypes, and the strength of the invasion data is somewhat weak, given the confounding effect on proliferation. The study is currently incomplete as there is a lack of direct evidence on how target factors participate in the occurrence of placental structural disorders and diseases through potential downstream pathways.

    2. Reviewer #1 (Public review):

      Summary:

      This study has preliminarily revealed the role of ACVR2A in trophoblast cell function, including its effects on migration, invasion, proliferation, and clonal formation, as well as its downstream signaling pathways.

      Strengths:

      The use of multiple experimental techniques, such as CRISPR/Cas9-mediated gene knockout, RNA-seq, and functional assays (e.g., Transwell, colony formation, and scratch assays), is commendable and demonstrates the authors' effort to elucidate the molecular mechanisms underlying ACVR2A's regulation of trophoblast function. The RNA-seq analysis and subsequent GSEA findings offer valuable insights into the pathways affected by ACVR2A knockout, particularly the Wnt and TCF7/c-JUN signaling pathways.

      Weaknesses:

      The molecular mechanisms underlying this study require further exploration through additional experiments. While the current findings provide valuable insights into the role of ACVR2A in trophoblast cell function and its involvement in the regulation of migration, invasion, and proliferation, further validation in both in vitro and in vivo models is needed. Additionally, more experiments are required to establish the functional relevance of the TCF7/c-JUN pathway and its clinical significance, particularly in relation to pre-eclampsia. Additional techniques, such as animal models and more advanced clinical sample analyses, would help strengthen the conclusions and provide a more comprehensive understanding of the molecular pathways involved.

    3. Reviewer #2 (Public review):

      Summary:

      ACVR2A is one of a handful of genes for which significant correlations between associated SNPs and the incidences of preeclampsia have been found in multiple populations. It is one of the TGFB family receptors, and multiple ligands of ACVR2A, as well as its coreceptors and related inhibitors, have been implicated in placental development, trophoblast invasion, and embryo implantation. This useful study builds on this knowledge by showing that ACVR2A knockout in trophoblast-related cell lines reduces trophoblast invasion, which could tie together many of these observations. Support for this finding is incomplete, as reduced proliferation may be influencing the invasion results. The implication of cross-talk between the WNT and ACRV2A/SMAD2 pathways is an important contribution to the understanding of the regulation of trophoblast function.

      Strengths:

      (1) ACVR2A is one of very few genes implicated in preeclampsia in multiple human populations, yet its role in pathogenesis is not very well studied and this study begins to address that hole in our knowledge.

      (2) ACVR2A is also indirectly implicated in trophoblast invasion and trophoblast development via its connections to many ligands, inhibitors, and coreceptors, suggesting its potential importance.

      (3) The authors have used multiple cell lines to verify their most important observations.

      Weaknesses:

      (1) There are a number of claims made in the introduction without attribution. For example, there are no citations for the claims that family history is a significant risk factor for PE, that inadequate trophoblast invasion of spiral arteries is a key factor, and that immune responses, and renin-angiotensin activity are involved.

      (2) The introduction states "As a receptor for activin A, ACVR2A..." It's important to acknowledge that ACVR2A is also the receptor for other TGFB family members, with varying affinities and coreceptors. Several TGFB family members are known to regulate trophoblast differentiation and invasion. For example, BMP2 likely stimulates trophoblast invasion at least in part via ACVR2A (PMID 29846546).

      (3) An alternative hypothesis for the potential role of ACVR2A in preeclampsia is its functions in the endometrium. In the mouse ACVR2A knockout in the uterus (and other progesterone receptor-expressing cells) leads to embryo implantation failure.

      (4) In the description of the patient population for placental sample collections, preeclampsia is defined only by hypertension, and this is described as being in accordance with ACOG guidelines. ACOG requires a finding of hypertension in combination with either proteinuria or one of the following: thrombocytopenia, elevated creatinine, elevated liver enzymes, pulmonary, edema, and new onset unresponsive headache.

      (5) I believe that Figures 1a and 1b are data from a previously published RNAseq dataset, though it is not entirely clear in the text. The methods section does not include a description of the analysis of these data undertaken here. It would be helpful to include at least a brief description of the study these data are taken from - how many samples, how were the PE/control groups defined, gestational age range, where is it from, etc. For the heatmap presented in B, what is the significance of the other genes/ why are they being shown? If the purpose of these two panels is to show differential expression specifically of ACVR2A in this dataset, that could be shown more directly.

      (6) More information is needed in the methods section to understand how the immunohistochemistry was quantified. "Quantitation was performed" is all that is provided. Was staining quantified across the whole image or only in anchoring villous areas? How were HRP & hematoxylin signals distinguished in ImageJ? How was the overall level of HRP/DAB development kept constant between the NC and PE groups?

      (7) In Figure 1E it is not immediately obvious to many readers where the EVT are. It is probably worth circling or putting an arrow to the little region of ACVR2A+ EVT that is shown in the higher magnification image in Figure 1E. These are actually easier to see in the pictures provided in the supplement Figure 1. Of note, the STB is also staining positive. This is worth pointing out in the results text.

      (8) It is not possible to judge whether the IF images in 1F actually depict anchoring villi. The DAPI is really faint, and it's high magnification, so there isn't a lot of context. Would it be possible to include a lower magnification image that shows where these cells are located within a placental section? It is also somewhat surprising that this receptor is expressed in the cytoplasm rather than at the cell surface. How do the authors explain this?

      (9) The results text makes it sound like the data in Figure 2A are from NCBI & Protein atlas, but the legend says it is qPCR from this lab. The methods do not detail how these various cell lines were grown; only HTR-SVNeo cell culture is described. Similarly, JAR cells are used for several experiments and their culture is not described.

      (10) Under RT-qPCR methods, the phrase "cDNA reverse transcription cell RNA was isolated..." does not make any sense.

      (11) The paragraph beginning "Consequently, a potential association..." is quite confusing. It mentions analyzing ACVR2A expression in placentas, but then doesn't point to any results of this kind and repeats describing the results in Figure 2a, from various cell lines.

      (12) The authors should acknowledge that the effect of the ACVR2A knockout on proliferation makes it difficult to draw any conclusions from the trophoblast invasion assays. That is, there might be fewer migrating or invading cells in the knockout lines because there are fewer cells, not because the cells that are there are less invasive. Since this is a central conclusion of the study, it is a major drawback.

      (13) The legend and the methods section do not agree on how many fields were selected for counting in the transwell invasion assays in Figure 3C. The methods section and the graph do not match the number of replicate experiments in Figure 3D (the number of replicate experiments isn't described for 3C).

      (14) Discussion says "Transcriptome sequencing analysis revealed low ACVR2A expression in placental samples from PE patients, consistent with GWAS results across diverse populations." The authors should explain this briefly. Why would SNPs in ACVR2A necessarily affect levels of the transcript?

      (15) "The expression levels of ACVR2A mRNA were comparable to those of tumor cells such as A549. This discovery suggested a potential pivotal role of ACVR2A in the biological functions of trophoblast cells, especially in the nurturing layer." Alternatively, ACVR2A expression resembles that of tumors because the cell lines used here are tumor cells (JAR) or immortalized cells (HTR8). These lines are widely used to study trophoblast properties, but the discussion should at least acknowledge the possibility that the behavior of these cells does not always resemble normal trophoblasts.

      (16) The authors should discuss some of what is known about the relationship between the TCF7/c-JUN pathway and the major signaling pathway activated by ACVR2A, Smad 2/3/4. The Wnt and TGFB family cross-talk is quite complex and it has been studied in other systems.

    1. eLife Assessment

      This important study introduces a biologically constrained model of telencephalic area of adult zebrafish to highlight the significance of precisely balanced memory networks in olfactory processing. The authors provide compelling evidence that their model performs better in multiple situations (for e.g. in terms of network stability and shaping the geometry of representations), compared to traditional attractor networks and persistent activity. The work supports recent studies reporting functional E/I subnetworks in several sensory cortexes, and will be of interest to both theoretical and experimental neuroscientists studying network dynamics based on structured excitatory and inhibitory interactions.

    2. Reviewer #1 (Public review):

      Summary:

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing.

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks.

      Strengths:

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models.

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation.<br /> (2) Multiple quantification metrics and controls are used to support the main conclusions, and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.<br /> (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model.

      In the revised manuscript, the authors have also:<br /> (a) made a good effort to provide a mechanistic explanation of their results (especially on the mechanism underlying medium amplification in specific E/I network models);<br /> (b) performed a systematic analysis of the parameter space by changing different parameters of E and I neurons (specifically showing that different time constants of E and I neurons do not change the results and therefore the main effects result from connectivity);<br /> (c) added further analysis and discussion on the potential functional and computational significance of balanced specific E-I subnetworks.

      These additions substantially strengthen the study, presenting compelling evidence for how networks with specific E-I structure can underpin olfactory processing and memory representations. The findings have potential implications that extend beyond the olfactory system and may be applicable to other neural systems and species.

    3. Reviewer #2 (Public review):

      Summary:

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks.

      Strengths:

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper.

      Weaknesses:

      Intuitively, classification (decodability) in discrete attractor networks is much better than in networks with continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. Thus, tuned E-I networks cannot always perform better than any other network.

    4. Reviewer #3 (Public review):

      Summary:

      This work investigates computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in-between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allows for more robust learning of additional stimuli.

      Strengths:

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds.

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both, spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions.

      Weaknesses:

      All my previous points have been addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      The revised manuscript contains new results and additional text. Major revisions:

      (1) Additional simulations and analyses of networks with different biophysical parameters and with identical time constants for E and I neurons (Methods, Supplementary Fig. 5).

      (2) Additional simulations and analyses of networks with modifications of connectivity parameters to further analyze effects of E/I assemblies on manifold geometry (Supplementary Fig. 6).

      (3) Analysis of synaptic current components (Figure 3 D-F; to analyze mechanism of modest amplification in Tuned networks). 

      (4) More detailed explanation of pattern completion analysis (Results).

      (5) Analysis of classification performance of Scaled networks (Supplementary Fig.8).

      (6) Additional analysis (Figure 5D-F) and discussion (particularly section “Computational functions of networks with E/I assemblies”) of functional benefits of continuous representations in networks with E-I assemblies. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      Strengths: 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.  (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Weaknesses: 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      We agree that further mechanistic insights would be of interest and addressed this issue at different levels:

      (1) Biophysical parameters: to determine whether network behavior depends on specific choices of biophysical parameters in E and I neurons we equalized biophysical parameters across neuron types. The main observations are unchanged, suggesting that the observed effects depend primarily on network connectivity (see also response to comment [2]).

      (2) Mechanism of modest amplification in E/I assemblies: analyzing the different components of the synaptic currents demonstrate that the modest amplification of activity in Tuned networks results from an “imperfect” balance of recurrent excitation and inhibition within assemblies (see new Figures 3D-F and text p.7). Hence, E/I co-tuning substantially reduces the net amplification in Tuned networks as compared to Scaled networks, thus preventing discrete attractor dynamics and stabilizing network activity, but a modest amplification still occurs, consistent with biological observations.

      (3) Representational geometry: to obtain insights into the network mechanisms underlying effects of E/I assemblies on the geometry of population activity we tested the hypothesis that geometrical changes depend, at least in part, on the modest amplification of activity within E/I assemblies (see Supplementary Figure 6). We changed model parameters to either prevent the modest amplification in Tuned networks (increasing I-to-E connectivity within assemblies) or introduce a modest amplification in subsets of neurons by other mechanisms (concentration-dependent increase in the excitability of pseudo-assembly neurons; Scaled I networks with reduced connectivity within assemblies). Manipulations that introduced a modest, input-dependent amplification in neuronal subsets had geometrical effects similar to those observed in Tuned networks, whereas manipulations that prevented a modest amplification abolished these effects (Supplementary Figure 6). Note however that these manipulations generated different firing rate distributions. These results provide a starting point for more detailed analyses of the relationship between network connectivity and representational geometry (see p.12).

      In summary, our additional analyses indicate that effects of E/I assemblies on representational geometry depend primarily on network connectivity, rather than specific biophysical parameters, and that the resulting modest amplification of activity within assemblies makes an important contribution. Further analyses may reveal more specific relationships between E/I assemblies and representational geometry, but such analyses are beyond the scope of this study.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for raising this point. We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. Nevertheless, to assess whether network behavior depends on specific choices of biophysical parameters in E and I neurons, we have performed additional simulations with equal synaptic time constants and equal biophysical parameters for all neurons. Each neuron also received the same number of inputs from each population (see revised Methods). Results were similar to those observed previously (Supplementary Fig.5 and p.9 of main text). We therefore conclude that the main effects observed in Tuned networks cannot be explained by differences in biophysical parameters between E and I neurons but is primarily a consequence of network connectivity.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function. 

      In the previous manuscript, the analysis of potential computational benefits other than pattern classification was limited and the discussion of this issue was condensed into a single itemized paragraph to avoid excessive speculation. Although a thorough analysis of potential computational benefits exceeds the scope of a single paper, we agree with the reviewer that this issue is of interest and therefore added additional analyses and discussion.

      In the initial manuscript we analyzed pattern classification primarily to investigate whether Tuned networks can support this function at all, given that they do not exhibit discrete attractor states. We found this to be the case, which we consider a first important result.

      Furthermore, we found that precise balance of E/I assemblies can protect networks against catastrophic firing rate instabilities when assemblies are added sequentially, as in continual learning. Results from these simulations are now described and discussed in more detail (see Results p.11 and Discussion p.13).

      In the revised manuscript, we now also examine additional potential benefits of Tuned networks and discuss them in more detail (see new Figure 5D-F and text p.11). One hypothesis is that continuous representations provide a distance metric between a given input and relevant (learned) stimuli. To address this hypothesis, we (1) performed regression analysis and (2) trained support vector machines (SVMs) to predict the concentration of a given odor in a mixture based on population activity. In both cases, Tuned E+I networks outperformed Scaled and _rand n_etworks in predicting the concentration of learned odors across a wide range mixtures (Figure 5D-F).  E/I assemblies therefore support the quantification of learned odors within mixtures or, more generally, assessments of how strongly a (potentially complex) input is related to relevant odors stored in memory. Such a metric assessment of stimulus quality is not well supported by discrete attractor networks because inputs are mapped onto discrete network states.

      The observation that Tuned networks do not map inputs onto discrete outputs indicates that such networks do not classify inputs as distinct items. Nonetheless, the observed geometrical modifications of continuous representations support the classification of learned inputs or the assessment of metric relationships by hypothetical readout neurons. Geometrical modifications of odor representations may therefore serve as one of multiple steps in multi-layer computations for pattern classification (and/or other computations). In this scenario, the transformation of odor representations in Dp may be seen as related to transformations of representations between different layers in artificial networks, which collectively perform a given task (notwithstanding obvious structural and mechanistic differences between artificial and biological networks). In other words, geometrical transformations of representations in Tuned networks may overrepresent learned (relevant) information at the expense of other information and thereby support further learning processes in other brain areas. An obvious corollary of this scenario is that Dp does not perform odor classification per se based on inputs from the olfactory bulb but reformats representations of odor space based on experience to support computational tasks as part of a larger system. This scenario is now explicitly discussed (p.14).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks. 

      Strengths: 

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper. 

      Weaknesses: 

      that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks. 

      We have now quantified classification by networks with discrete attractor dynamics (Scaled) along with other networks. However, because the neuronal covariance matrix for such networks is low rank and not invertible, pattern classification cannot be analyzed by QDA as in Figure 5B. We therefore classified patterns from the odor subspace by template matching, assigning test patterns to one of the four classes based on correlations (see Supplementary Figure 8). As expected, Scaled networks performed well, but they did not outperform Tuned networks. Moreover, the performance of Scaled networks, but not Tuned networks, depended on the order in which odors were presented to the network. This hysteresis effect is a direct consequence of persistent attractor states and decreased the general classification performance of Scaled networks (see Supplementary Figure 8 for details). These results confirm the prediction that networks with discrete attractor states can efficiently classify inputs, but also reveal disadvantages arising from attractor dynamics. Moreover, the results indicate that the classification performance of Tuned networks is also high under the given task conditions, which simulate a biologically realistic scenario.

      We would also like to emphasize that classification may not be the only task, and perhaps not even a main task, of Dp/piriform cortex or other memory networks with E/I assemblies. Conceivably, other computations could include metric assessments of inputs relative to learned inputs or additional learning-related computations. Please see our response to comment (3) of reviewer 1 for a further discussion of this issue. 

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).  

      We thank the reviewer for this comment. We now explicitly discuss multistability (see p. 12) and refer to additional references in the statements addressing network stability.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B). 

      We thank the reviewer for this suggestion. We have added raster plots of responses to both familiar and novel inputs in the revised manuscript (Figure 2D and Supplementary Figure 4A).

      Reviewer #3 (Public Review): 

      Summary: 

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli. 

      Strengths: 

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds. 

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions. 

      Weaknesses: 

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion. 

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. We added a definition of pattern completion in the text: “…the retrieval of the whole memory from noisy or corrupted versions of the learned input.”. Pattern completion may be assessed using different procedures. In computational studies, it is often analyzed by delivering input to a subset of the assembly neurons which store a given memory (partial activation). Under these conditions, we find recruitment of the entire assembly in all structured networks, as demonstrated in Supplementary Figure 3. However, these conditions are unlikely to occur during odor presentation because the majority of neurons do not receive any input.

      Another more biologically motivated approach to assess pattern completion is to gradually modify a realistic odor input into a learned input, thereby gradually increasing the overlap between the two inputs. This approach had been used previously in experimental studies (references added to the text p.6). In the presence of assemblies, recurrent connectivity is expected to recruit assembly neurons (and thus retrieve the stored pattern) more efficiently as the learned pattern is approached. This should result in a nonlinear increase in the similarity between the evoked and the learned activity pattern. This signature was prominent in Scaled networks but not in Tuned or rand networks. Obviously, the underlying procedure is different from the partial activation of the assembly described above because input patterns target many neurons (including neurons outside assemblies) and exhibit a biologically realistic distribution of activity. However, this approach has also been referred to as “pattern completion” in the neuroscience literature, which may be the source of semantic confusion here. To clarify the difference between these approaches we have now revised the text and explicitly described each procedure in more detail (see p.6). 

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states. 

      We agree that a more rigorous comparison of Tuned and Scaled networks would be of interest. We have added the variance analysis (Fig 4G) and angle distributions (Fig. 4H) for both Tuned I and Scaled networks. However, the Mahalanobis distances and Quadratic Discriminant Analysis cannot be applied to Scaled networks because their neuronal covariance matrix is low rank and not invertible_. To nevertheless compare these networks, we performed template matching by assigning test patterns to one of the four odor classes based on correlations to template patterns (Supplementary Figure 8; see also response to the first comment of reviewer 2). Interestingly, _Scaled networks performed well at classification but did not outperform Tuned networks, and exhibited disadvantages arising from attractor dynamics (Supplementary Figure 8; see also response to the first comment of reviewer 2). Furthermore, in further analyses we found that continuous representational manifolds support metric assessments of inputs relative to learned odors, which cannot be achieved by discrete representations. These results are now shown in Figure 5D-E and discussed explicitly in the text on p.11 (see also response to comment 3 of reviewer 1).

      We preferred not to add a sentence in the Discussion about benefits of networks having both sources of inhibition_,_ as we find this a bit too speculative.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.  

      Thank you for this comment. We have provided additional data figures to support the following statements:

      “d<sub>M</sub> was again increased upon learning, particularly between learned odors and reference classes representing other odors (Supplementary Figure 9)”

      “decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (Supplementary Figure 6 B)”  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      The paper is generally well-written, the figures are informative and of good quality, and multiple approaches and metrics have been used to test and support the main results of the paper. 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.   (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      Precise balancing of excitation and inhibition in subnetworks would lead to the cancellation of specific dynamical modes responsible for the amplification of responses (hence, deviating from the attractor dynamics with an unstable specific mode). What is the key difference in the specific E/I networks here (tuned I or/and tuned E+I) which make them stand between random and attractor networks? Excitatory and inhibitory neurons have different parameters in the model (Table 1). Time constants of inhibitory and excitatory synapses are also different (P. 13). Are these parameters causing networks to be effectively more excitation dominated (hence deviating from a random spectrum which would be expected from a precisely balanced E/I network, with exactly the same parameters of E and I neurons)? It is necessary to analyse the network models, describe the key mechanism for their amplification, and pinpoint the key differences between E and I neurons which are crucial for this. 

      To address these comments we performed additional simulations and analyses at different levels. Please see our reply to comment (1) of the public review (reviewer 1) for a detailed description. We thank the reviewer for these constructive comments.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for this comment. We have now carried out additional simulations with equal time constants for all neurons. Please see our reply to the public review for more details (comment 2 of reviewer 1).

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      Please see our reply to the public review (comment 3 of reviewer 1).

      Specific comments: 

      Abstract: "resulting in continuous representations that reflected both relatedness of inputs and *an individual's experience*" 

      It didn't become apparent from the text or the model where the role of "individual's experience" component (or "internal representations" - in the next line) was introduced or shown (apart from a couple of lines in the Discussion) 

      We consider the scenario that that assemblies are the outcome of an experience-dependent plasticity process. To clarify this, we have now made a small addition to the text: “Biological memory networks are thought to store information by experience-dependent changes in the synaptic connectivity between assemblies of neurons.”.

      P. 2: "The resulting state of "precise" synaptic balance stabilizes firing rates because inhomogeneities or fluctuations in excitation are tracked by correlated inhibition" 

      It is not clear what the "inhomogeneities" specifically refers to - they can be temporal, or they can refer to the quenched noise of connectivity, for instance. Please clarify what you mean. 

      The statement has been modified to be more precise: “…“precise” synaptic balance stabilizes firing rates because inhomogeneities in excitation across the population or temporal variations in excitation are tracked by correlated inhibition…”.

      P. 3 (and Methods): When odour stimulus is simulated in the OB, the activity of a fraction of mitral cells is increased (10% to 15 Hz) - but also a fraction of mitral cells is suppressed (5% to 2 Hz). What is the biological motivation or reference for this? It is not provided. Is it needed for the results? Also, it is not explained how the suppressed 5% are chosen (e.g. randomly, without any relation to the increased cells?). 

      We thank the reviewer for this comment. These changes in activity directly reflect experimental observations. We apologize that we forgot to include the references reporting these observations (Friedrich and Laurent, 2001 and 2004); this is now fixed.

      In our simulation, OB neurons do not interact with each other, and the suppressed 5% were indeed randomly selected. We changed the text in Methods accordingly to read: “An additional 75 randomly selected mitral cells were inhibited” 

      P. 4, L. 1-2: "... sparsely connected integrate-and-fire neurons with conductance-based synapses (connection probability {less than or equal to}5%)." 

      Specify the connection probability of specific subtypes (EE, EI, IE, II).  

      We now refer to the Methods section, where this information can be found. 

      “... conductance-based synapses (connection probability ≤5%, Methods)”  

      P. 4, L. 6-7: "Population activity was odor-specific and activity patterns evoked by uncorrelated OB inputs remained uncorrelated in Dp (Figure 1H)" 

      What would happen to correlated OB inputs (e.g. as a result of mixture of two overlapping odours) in this baseline state of the network (before memories being introduced to it)? It would be good to know this, as it sheds light on the initial operating regime of the network in terms of E/I balance and decorrelation of inputs.  

      This information was present in the original manuscript at (Figure 3) but we improved the writing to further clarify this issue: “ (…) we morphed a novel odor into a learned odor (Figure 3A), or a learned odor into another learned odor (Supplementary Figure 3B), and quantified the similarity between morphed and learned odors by the Pearson correlation of the OB activity patterns (input correlation). We then compared input correlations to the corresponding pattern correlations among E neurons in Dp (output correlation). In rand networks, output correlations increased linearly with input correlations but did not exceed them (Figure 3B and Supplementary Figure 3B)”

      P. 4, L. 12-13: "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, .."   Where is this shown? 

      (There are other occasions too in the paper where references to the supporting figures are missing). 

      We now provide the statistics: “Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20”

      P. 4: "In each network, we created 15 assemblies representing uncorrelated odors. As a consequence, ~30% of E neurons were part of an assembly ..." 

      15 x 100 / 4000 = 37.5% - so it's closer to 40% than 30%. Unless there is some overlap? 

      Yes: despite odors being uncorrelated and connectivity being random, some neurons (6 % of E neurons) belong to more than one assembly.

      P. 4: "When a reached a critical value of ~6, networks became unstable and generated runaway activity (Figure 2B)." 

      Can this transition point be calculated or estimated from the network parameters, and linked to the underlying mechanisms causing it? 

      We thank the reviewer for this interesting question. The unstability arises when inhibitions fails to counterbalance efficiently the increased recurrent excitation within Dp. The transition point is difficult to estimate, as it can depend on several parameters, including the probability of E to E connections, their strength, assembly size, and others. We have therefore not attempted to estimate it analytically.

      P. 4: "Hence, non-specific scaling of inhibition resulted in a divergence of firing rates that exhausted the dynamic range of individual neurons in the population, implying that homeostatic   global inhibition is insufficient to maintain a stable firing rate distribution." 

      I don't think this is justified based on the results and figures presented here (Fig. 2E) - the interpretation is a bit strong and biased towards the conclusions the authors want to draw. 

      To more clearly illustrate the finding that in Scaled networks, assembly neurons are highly active (close to maximal realistic firing rates) whereas non-assembly neurons are nearly silent we have now added Supplementary Fig. 2B. Moreover, we have toned down the text: “Hence, non-specific scaling of inhibition resulted in a large and biologically unrealistic divergence of firing rates (Supplementary Figure 2B) that nearly exhausted the dynamic range of individual neurons in the population, indicating that homeostatic global inhibition is insufficient to maintain a stable firing rate distribution”

      P. 5, third paragraph: Description of Figure 2I, inset is needed, either in the text or caption. 

      The inset is now referred to in the text: ”we projected synaptic conductances of each neuron onto a line representing the E/I ratio expected in a balanced network (“balanced axis”) and onto an orthogonal line (“counter-balanced axis”; Figure 2I inset, Methods).”

      P. 5, last paragraph: another example of writing about results without showing/referring to the corresponding figures: 

      "In rand networks, firing rates increased after stimulus onset and rapidly returned to a low baseline after stimulus offset. Correlations between activity patterns evoked by the same odor at different time points and in different trials were positive but substantially lower than unity, indicating high variability ..." 

      And the continuation with similar lack of references on P. 6: 

      "Scaled networks responded to learned odors with persistent firing of assembly neurons and high pattern correlations across trials and time, implying attractor dynamics (Hopfield, 1982; Khona and Fiete, 2022), whereas Tuned networks exhibited transient responses and modest pattern correlations similar to rand networks." 

      Please go through the Results and fix the references to the corresponding figures on all instances. 

      We thank the reviewer for pointing out these overlooked figure references, which are now fixed.

      P. 8: "These observations further support the conclusion that E/I assemblies locally constrain neuronal dynamics onto manifolds." 

      As discussed in the general major points, mechanistic explanation in terms of how the interaction of E/I dynamics leads to this is missing. 

      As discussed in the reply to the public review (comment 3 of reviewer 1), we have now provided more mechanistic analyses of our observations.

      P. 9: "Hence, E/I assemblies enhanced the classification of inputs related to learned patterns."   The effect seems to be very small. Also, any explanation for why for low test-target correlation the effect is negative (random doing better than tuned E/I)? 

      The size of the effect (plearned – pnovel = 0.074; difference of means; Figure 5C) may appear small in terms of absolute probability, but it is substantial relative to the maximum possible increase (1 – p<sub>novel</sub> =  0.133; Figure 5C). The fact that for low test-target correlations the effect is negative is a direct consequence of the positive effect for high test-target correlations and the presence of 2 learned odors in the 4-way forced choice task. 

      P. 9: "In Scaled I networks, creating two additional memories resulted in a substantial increase   in firing rates, particularly in response to the learned and related odors"   Where is this shown? Please refer to the figure. 

      We thank the reviewer again for pointing this out. We forgot to include a reference to the relevant figure which has now been added in the revised manuscript (Figure 6C).

      P. 10: "The resulting Tuned networks reproduced additional experimental observations that were not used as constraints including irregular firing patterns, lower output than input correlations, and the absence of persistent activity" 

      It is difficult to present these as "additional experimental observations", as all of them are negative, and can exist in random networks too - hence cannot be used as biological evidence in favour of specific E/I networks when compared to random networks. 

      We agree with the reviewer that these additional experimental observations cannot be used as biological evidence favouring Tuned E+I networks over random networks. We here just wanted to point out that additional observations which we did not take into account to fit the model are not invalidating the existence of E-I assemblies in biological networks. As assemblies tend to result in persistent activity in other types of networks, we feel that this observation is worth pointing out.

      Methods: 

      P. 13: Describe the parameters of Eq. 2 after the equation. 

      Done.

      P. 13: "The time constants of inhibitory and excitatory synapses were 10 ms and 30 ms, respectively." 

      What is the (biological) justification for the choice of these parameters? 

      How would varying them affect the main results (e.g. local manifolds)? 

      We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. We have now also simulated networks with equal time constants for excitatory and inhibitory synapses and equal biophysical parameters for excitatory and inhibitory neurons, which did not affect the main results (see also reply to the public review: comment 2 of reviewer 1).

      P. 14: "Care was also taken to ensure that the variation in the number of output connections was low across neurons"   How exactly?

      More detailed explanations have now been added in the Methods section: “connections of a presynaptic neuron y to postsynaptic neurons x were randomly deleted when their total number exceeded the average number of output connections by ≥5%, or added when they were lower by ≥5%.“

      Reviewer #2 (Recommendations For The Authors): 

      Congratulations on the great and interesting work! The results were nicely presented and the idea of continuous encoding on manifolds is very interesting. To improve the quality of the paper, in addition to the major points raised in the public review, here are some more detailed comments for the paper: 

      (1) Generally, citations have to improve. Spiking networks with excitatory assemblies and different architectures of inhibitory populations have been studied before, and the claim about improved network stability in co-tuned E-I networks has been made in the following papers that need to be correctly cited: 

      • Vogels TP, Sprekeler H, Zenke F, Clopath C, Gerstner W. 2011. Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science 334:1-7. doi:10.1126/science.1212991 (mentions that emerging precise balance on the synaptic weights can result in the overall network stability) 

      • Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 (among other things, contrasts stability and competition which arises from multistable networks with global inhibition and reciprocal inhibition)   • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7 (compares different architectures of inhibition and their effects on network dynamics) 

      • Lagzi F, Fairhall A. 2022. Tuned inhibitory firing rate and connection weights as emergent network properties. bioRxiv 2022.04.12.488114. doi:10.1101/2022.04.12.488114 (here, see the eigenvalue and UMAP analysis for a network with global inhibition and E/I assemblies) 

      Additionally, there are lots of pioneering work about tracking of excitatory synaptic inputs by inhibitory populations, that are missing in references. Also, experimental work that show existence of cell assemblies in the brain are largely missing. On the other hand, some references that do not fit the focus of the statements have been incorrectly cited. 

      The authors may consider referencing the following more pertinent studies on spiking networks to support the statement regarding attractor dynamics in the first paragraph in the Introduction (the current citations of Hopfield and Kohonen are for rate-based networks): 

      • Wong, K.-F., & Wang, X.-J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4), 1314-1328. https://doi.org/10.1523/JNEUROSCI.3733-05.2006 

      • Wang, X.-J. (2008). Decision making in recurrent neuronal circuits. Neuron, 60(2), 215-234. https://doi.org/10.1016/j.neuron.2008.09.034  

      • F. Lagzi, & S. Rotter. (2015). Dynamics of competition between subnetworks of spiking neuronal networks in the balanced state. PloS One. 

      • Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3), 477-485. 

      • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7. 

      • Amit DJ, Tsodyks M (1991) Quantitative study of attractor neural network retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. Network 2:259-273. 

      • Mazzucato, L., Fontanini, A., & La Camera, G. (2015). Dynamics of Multistable States during Ongoing and Evoked Cortical Activity. Journal of Neuroscience, 35(21), 8214-8231. 

      We thank the reviewer for the references suggestions. We have carefully reviewed the reference list and made the following changes, which we hope address the reviewer’s concerns:

      (1) We adjusted References about network stability in co-tuned E-I networks.

      (2) We added the Lagzi & Rotter (2015), Amit et al. (1991), Mazzucato et al. (2015) and GoldmanRakic (1995) papers in the Introduction as studies on attractor dynamics in spiking neural networks. We preferred to omit the two X.J Wang papers, as they describe attractors in decision making rather than memory processes.

      (3) We added the Ko et al. 2011 paper as experimental evidence for assemblies in the brain. In our view, there are few experimental studies showing the existence of cell assemblies in the brain, which we distinguish from cell ensembles, group of coactive neurons. 

      (4) We also included Hennequin 2018, Brunel 2000, Lagzi et al. 2021 and Eckmann et al. 2024, which we had not cited in the initial manuscript.

      (5) We removed the Wiechert et al. 2010 reference as it does not support the statement about geometry-preserving transformation by random networks.

      (2) The gist of the paper is about how the architecture of inhibition (reciprocal vs. global in this case) can determine network stability and salient responses (related to multistable attractors and variations) for classification purposes. It would improve the narrative of the paper if this point is raised in the Introduction and Discussion section. Also see a relevant paper that addresses this point here: 

      Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 

      Classification has long been proposed to be a function of piriform cortex and autoassociative memory networks in general, and we consider it important. However, the computational function of Dp or piriform cortex is still poorly understood, and we do not focus only on odor classification as a possibility. In fact, continuous representational manifolds also support other functions such as the quantification of distance relationships of an input to previously memorized stimuli, or multi-layer network computations (including classification). In the revised manuscript, we have performed additional analyses to explore these notions in more detail, as explained above (response to public reviews, comment 3 of reviewer 1). Furthermore, we have now expanded the discussion of potential computational functions of Tuned networks and explicitly discuss classification but also other potential functions. 

      (3) A plot for the values of the inhibitory conductances in Figure 1 would complete the analysis for that section. 

      In Figure 1, we decided to only show the conductances that we use to fit our model, namely the afferent and total synaptic conductances. As the values of the inhibitory conductances can be derived from panel E, we refrained from plotting them separately for the sake of simplicity. 

      (4) How did the authors calculate correlations between activity patterns as a function of time in Figure 2E, bottom row? Does the color represent correlation coefficient (which should not be time dependent) or is it a correlation function? This should be explained in the Methods section. 

      The color represents the Pearson correlation coefficient between activity patterns within a narrow time window (100 ms). We updated the Figure legend to clarify this: “Mean correlation between activity patterns evoked by a learned odor at different time points during odor presentation. Correlation coefficients were calculated between pairs of activity vectors composed of the mean firing rates of E neurons in 100 ms time bins. Activity vectors were taken from the same or different trials, except for the diagonal, where only patterns from different trials were considered.”

      (5) Figure 3 needs more clarification (both in the main text and the figure caption). It is not clear what the axes are exactly, and why the network responses for familiar and novel inputs are different. The gray shaded area in panel B needs more explanation as well.  

      We thank the reviewer for the comment. We have improved Figure 3A, the figure caption, as well as the text (see p.6). We hope that the figure is now clearer.

      (6) The "scaled I" network, known for representing input patterns in discrete attractors, should exhibit clear separation between network responses in the 2D PC space in the PCA plots. However, Figure 4D and Figure 6D do not reflect this, as all network responses are overlapped. Can the authors explain the overlap in Figure 4D? 

      In Figure 4D, activity of Scaled networks is distributed between three subregions in state space that are separated by the first 2 PCs. Two of them indeed correspond to attractor states representing the two learned odors while the third represents inputs that are not associated with these attractor states. To clarify this, please see also the density plot in Figure 4E. The few datapoints between these three subregions are likely outliers generated by the sequential change in inputs, as described in Supplementary Figure 8C.

      (7) The reason for writing about the ISN networks is not clear. Co-tuned E-I assemblies do not necessarily have to operate in this regime. Also, the results of the paper do not rely on any of the properties of ISNs, but they are more general. Authors should either show the paradoxical effect associated with ISN (i.e., if increasing input to I neurons decreases their responses) or show ISN properties using stability analysis (See computational research conducted at the Allen Institute, namely Millman et al. 2020, eLife ). Currently, the paper reads as if being in the ISN regime is a necessary requirement, which is not true. Also, the arguments do not connect with the rest of the paper and never show up again. Since we know it is not a requirement, there is no need to have those few sentences in the Results section. Also, the choice of alpha=5.0 is extreme, and therefore, it would help to judge the biological realism if the raster plots for Figs 2-6 are shown.

      We have toned down the part on ISN and reduced it to one sentence for readers who might be interested in knowing whether activity is inhibition-stabilized or not. We have also added the reference to the Tsodyks et al. 1997 paper from which we derive our stability analysis. The text now reads “Hence, pDp<sub>sim</sub> entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b, Tsodyks et al., 1997).”  

      We have now also added the raster plots as suggested by the reviewer (see Figure 2D, Supplementary Figure 1 G, Supplementary Figure 4). We thank the reviewer for this comment.

      (8) In the abstract, authors mention "fast pattern classification" and "continual learning," but in the paper, those issues have not been addressed. The study does not include any synaptic plasticity. 

      Concerning “continual learning” we agree that we do not simulate the learning process itself. However, Figure 6 show results of a simulation where two additional patterns were stored in a network that already contained assemblies representing other odors. We consider this a crude way of exploring the end result of a “continual learning” process. “Fast pattern classification” is mentioned because activity in balanced networks can follow fluctuating inputs with high temporal resolution, while networks with stable attractor states tend to be slow. This is likely to account for the occurrence of hysteresis effects in Scaled but not Tuned networks as shown in Supplementary

      Fig. 8.

      (9) In the Introduction, the first sentence in the second paragraph reads: "... when neurons receive strong excitatory and inhibitory synaptic input ...". The word strong should be changed to "weak".

      Also, see the pioneering work of Brunel 2000. 

      In classical balanced networks, strong excitatory inputs are counterbalanced by strong inhibitory inputs, leading to a fluctuation-driven regime. We have added Brunel 2000.

      (10) In the second paragraph of the introduction, the authors refer to studies about structural co-tuning (e.g., where "precise" synaptic balance is mentioned, and Vogels et al. 2011 should be cited there) and functional co-tuning (which is, in fact, different than tracking of excitation by inhibition, but the authors refer to that as co-tuning). It makes it easier to understand which studies talk about structural co-tuning and which ones are about functional co-tuning. The paper by Znamenski 2018, which showed both structural and functional tuning in experiments, is missing here. 

      We added the citation to the now published paper by Znamenskyi et al. (2024).  

      (11) The third paragraph in the Introduction misses some references that address network dynamics that are shaped by the inhibitory architecture in E/I assemblies in spiking networks, like Rost et al 2018 and Lagzi et al 2021. 

      These references have been added.

      (12) The last sentence of the fourth paragraph in the Introduction implies that functional co-tuning is due to structural co-tuning, which is not necessarily true. While structural co-tuning results in functional co-tuning, functional co-tuning does not require structural co-tuning because it could arise from shared correlated input or heterogeneity in synaptic connections from E to I cells.  

      We generally agree with the reviewer, but we are uncertain which sentence the reviewer refers to.

      We assume the reviewer refers to the last sentence of the second (rather than the fourth paragraph), which explicitly mentions the “…structural basis of E/I co-tuning…”. If so, we consider this sentence still correct because the “structural basis” refers not specifically to E/I assemblies, but also includes any other connectivity that may produce co-tuning, including the connectivity underlying the alternative possibilities mentioned by the reviewer (shared correlated input or heterogeneity of synaptic connections).

      (13) In order to ensure that the comparison between network dynamics is legit, authors should mention up front that for all networks, the average firing rates for the excitatory cells were kept at 1 Hz, and the background input was identical for all E and I cells across different networks.

      We slightly revised the text to make this more clear “We (…) uniformly scaled I-to-E connection weights by a factor of χ until E population firing rates in response to learned odors matched the corresponding firing rates in rand networks, i.e., 1 Hz”

      (14) In the last paragraph on page 5, my understanding was that an individual odor could target different cells within an assembly in different trials to generate trial to trail variability. If this is correct, this needs to be mentioned clearly. 

      This is not correct, an odor consists of 150 activated mitral cells with defined firing rates. As now mentioned in the Methods, “Spikes were then generated from a Poisson distribution, and this process was repeated to create trial-to-trial variability.”

      (15) The last paragraph on page 6 mentions that the four OB activity patterns were uncorrelated but if they were designed as in Figure 4A, dues to the existing overlap between the patterns, they cannot be uncorrelated. 

      This appears to be a misunderstanding. We mention in the text (and show in Figure 4B) that the four odors which “… were assigned to the corners of a square…” are uncorrelated.  The intermediate odors are of course not uncorrelated. We slightly modified the corresponding paragraph (now on page 7) to clarify this: “The subspace consisted of a set of OB activity patterns representing four uncorrelated pure odors and mixtures of these pure odors. Pure odors were assigned to the corners of a square and mixtures were generated by selecting active mitral cells from each of the pure odors with probabilities depending on the relative distances from the corners (Figure 4A, Methods).”

      (16) The notion of "learned" and "novel" odors may be misleading as there was no plasticity in the network to acquire an input representation. It would be beneficial for the authors to clarify that by "learned," they imply the presence of the corresponding E assembly for the odor in the network, with the input solely impacting that assembly. Conversely, for "novel" inputs, the input does not target a predefined assembly. In Figure 2 and Figure 4, it would be especially helpful to have the spiking raster plots of some sample E and I cells.  

      As suggested by the reviewer, we have modified the existing spiking raster plots in Figure 2, such that they include examples of responses to both learned and novel odors. We added spiking raster plots showing responses of I neurons to the same odors in Supplementary Figure 1F, as well as spiking raster plots of E neurons in Supplementary Figure 4A. To clarify the usage of “learned” and “novel”, we have added a sentence in the Results section: “We thus refer to an odor as “learned” when a network contains a corresponding assembly, and as “novel” when no such assembly is present.”.

      (17) In the last paragraph of page 8, can the authors explain where the asymmetry comes from? 

      As mentioned in the text, the asymmetry comes from the difference in the covariance structure of different classes. To clarify, we have rephrased the sentence defining the Mahalanobis distance: 

      “This measure quantifies the distance between the pattern and the class center, taking into account covariation of neuronal activity within the class. In bidirectional comparisons between patterns from different classes, the mean dM may be asymmetric if neural covariance differs between classes.”

      (18) The first paragraph of page 9: random networks are not expected to perform pattern classification, but just pattern representation. It would have been better if the authors compared Scaled I network with E/I co-tuned network. Regardless of the expected poorer performance of the E/I co-tuned networks, the result would have been interesting. 

      Please see our reply to the public review (reviewer 2).

      (19) Second paragraph on page 9, the authors should provide statistical significance test analysis for the statement "... was significantly higher ...". 

      We have performed a Wilcoxon signed-rank test, and reported the p-value in the revised manuscript (p < 0.01). 

      (20) The last sentence in the first paragraph on page 11 is not clear. What do the authors mean by "linearize input-output functions", and how does it support their claim? 

      We have now amended this sentence to clarify what we mean: “…linearize the relationship between the mean input and output firing rates of neuronal populations…”.

      (21) In the first sentence of the last paragraph on page 11, the authors mentioned “high variability”, but it is not clear compared with which of the other 3 networks they observed high variability.

      Structurally co-tuned E/I networks are expected to diminish network-level variability. 

      “High variability” refers to the variability of spike trains, which is now mentioned explicity in the text. We hope this more precise statement clarifies this point.

      (22) Methods section, page 14: "firing rates decreased with a time constant of 1, 2 or 4 s". How did they decrease? Was it an implementation algorithm? The time scale of input presentation is 2 s and it overlaps with the decay time constant (particularly with the one with 4 s decrease).  

      Firing rates decreased exponentially. We have added this information in the Methods section.

      Reviewer #3 (Recommendations For The Authors): 

      In the following, I suggest minor corrections to each section which I believe can improve the manuscript. 

      - There was no github link to the code in the manuscript. The code should be made available with a link to github in the final manuscript. 

      The code can be found here: https://github.com/clairemb90/pDp-model. The link has been added in the Methods section.

      Figure 1: 

      - Fig. 1A: call it pDp not Dp. Please check if this name is consistent in every figure and the text. 

      Thank you for catching this. Now corrected in Figure 1, Figure 2 and in the text.

      - The authors write: "Hence, pDpsim entered an inhibition-stabilized balanced state (Sadeh and Clopath, 2020b) during odor stimulation (Figure 1D, E)." and then later "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, demonstrating that activity was indeed inhibition-stabilized. These results were robust against parameter variations (Methods)." I would suggest moving the second sentence before the first sentence, because the fact that the network is in the ISN regime follows from the shuffled spike timing result. 

      Also, I'd suggest showing this as a supplementary figure. 

      We thank the reviewer for this comment. We have removed “inhibition-stabilized” in the first sentence as there is no strong evidence of this in Rupprecht and Friedrich, 2018. And removed “indeed” in the second sentence. We also provided more detailed statistics. The text now reads “Hence, pDpsim entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b).”

      Figure 2: 

      - "... Scaled I networks (Figure 2H." Missing ) 

      Corrected.

      - The authors write "Unlike in Scaled I networks, mean firing rates evoked by novel odors were indistinguishable from those evoked by learned odors and from mean firing rates in rand networks (Figure 2F)." 

      Why is this something you want to see? Isn't it that novel stimuli usually lead to high responses? Eg in the paper Schulz et al., 2021 (eLife) which is also cited by the authors it is shown that novel responses have high onset firing rates. I suggest clarifying this (same in the context of Fig. 3C). 

      In Dp and piriform cortex, firing rates evoked by learned odors are not substantially different from firing rates evoked by novel odors. While small differences between responses to learned versus novel odors cannot be excluded, substantial learning-related differences in firing rates, as observed in other brain areas, have not been described in Dp or piriform cortex. We added references in the last paragraph of p.5. Note that the paper by Schulz et al. (2021) models a different type of circuit.  

      - Fig. 2B: Indicate in figure caption that this is the case "Scaled I" 

      This is not exactly the case “Scaled I”, as the parameter 𝝌𝝌 (increased I to E strength) is set to 1.

      - Suppl Fig. 2I: Is E&F ever used in the manuscript? I couldn't find a reference. I suggest removing it if not needed. 

      Suppl. Fig 2I E&F is now Suppl Fig.1G&H. We now refer to it in the text: “Activity of networks with E assemblies could not be stabilized around 1 Hz by increasing connectivity from subsets of I neurons receiving dense feed-forward input from activated mitral cells (Supplementary Figure 1GH; Sadeh and Clopath, 2020).”

      Figure 3: 

      - As mentioned in my comment in the public review section, I find the arguments about pattern completion a little bit confusing. For me it's not clear why an increase of output correlations over input correlations is considered "pattern completion" (this is not to say that I don't find the nonlinear increase of output correlations interesting). For me, to test pattern completion with second-order statistics one would need to do a similar separation as in Suppl Fig. 3, ie measuring the pairwise correlation at cells in the assembly L that get direct input from L OB with cells in the assembly L that do not get direct input from OB. If the pairwise correlations of assembly cells which do not get direct input from OB increase in correlations, I would consider this as pattern completion (similar to the argument that increase in firing rate in cells which are not directly driven by OB are considered a sign of pattern completion). 

      Also, for me it now seems like that there are contradictory results, in Fig. 3 only Scaled I can lead to pattern completion while in the context of Suppl. Fig. 3 the authors write "We found that assemblies were recruited by partial inputs in all structured pDpsim networks (Scaled and Tuned) without a significant increase in the overall population activity (Supplementary Figure 3A)."   I suggest clarifying what the authors exactly mean by pattern completion, why the increase of output correlations above input correlations can be considered as pattern completion, and why the results differs when looking at firing rates versus correlations. 

      Please see our reply to the public review (reviewer 3).

      - I actually would suggest adding Suppl. Fig. 3 to the main figure. It shows a more intuitive form of pattern completion and in the text there is a lot of back and forth between Fig. 3 and Suppl. Fig. 3 

      We feel that the additional explanations and panels in Fig.3 should clarify this issue and therefore prefer to keep Supplementary Figure 3 as part of the Supplementary Figures for simplicity.  

      - In the whole section "We next explored effects of assemblies ... prevented strong recurrent amplification within E/I assemblies." the authors could provide a link to the respective panel in Fig. 2 after each statement. This would help the reader follow your arguments. 

      We thank the reviewer for pointing this out. The references to the appropriate panels have been added. 

      - Fig. 3A: I guess the x-axis has been shifted upwards? Should be at zero. 

      We have modified the x-axis to make it consistent with panels B and C.  

      - Fig. 3B: In the figure caption, the dotted line is described as the novel odor but it is actually the unit line. The dashed lines represent the reference to the novel odor. 

      Fixed.

      - Fig. 3C: The " is missing for Pseudo-Assembly N

      Fixed.

      - "...or a learned odor into another learned odor." Have here a ref to the Supplementary Figure 3B.

      Added.

      Figure 4:   

      - "This geometry was largely maintained in the output of rand networks, consistent with the notion that random networks tend to preserve similarity relationships between input patterns (Babadi and Sompolinsky, 2014; Marr, 1969; Schaffer et al., 2018; Wiechert et al., 2010)." I suggest adding here reference to Fig. 4D (left). 

      Added.

      - Please add a definition of E/I assemblies. How do the authors define E/I assemblies? I think they consider both, Tuned I and Tuned E+I as E/I assemblies? In Suppl. Fig. 2I E it looks like tuned feedforward input is defined as E/I assemblies. 

      We thank the reviewer for pointing this out. E/I assemblies are groups of E and I neurons with enhanced connectivity. In other words, in E/I assemblies, connectivity is enhanced not only between subsets of E neurons, but also between these E neurons and a subset of I neurons. This is now clarified in the text: “We first selected the 25 I neurons that received the largest number of connections from the 100 E neurons of an assembly. To generate E/I assemblies, the connectivity between these two sets of neurons was then enhanced by two procedures.”. We removed “E/I assemblies” in Suppl. Fig.2, where the term was not used correctly, and apologize for the confusion.

      - Suppl. Fig. 4: Could the authors please define what they mean by "Loadings" 

      The loadings indicate the contribution of each neuron to each principal component, see adjusted legend of Suppl. Fig. 4: “G. Loading plot: contribution of neurons to the first two PCs of a rand and a Tuned E+I network (Figure 4D).”

      - Fig. 4F: The authors might want to normalize the participation ratio by the number of neurons (see e.g. Dahmen et al., 2023 bioRxiv, "relative PR"), so the PR is bound between 0 and 1 and the dependence on N is removed. 

      We thank the reviewer for the suggestion, but we prefer to use the non-normalized PR as we find it more easily interpretable (e.g. number of attractor states in Scaled networks).

      - Fig. 4G&H: as mentioned in the public review, I'd add the case of Scaled I to be able to compare it to the Tuned E+I case. 

      As already mentioned in the public review, we thank the reviewer for this suggestion, which we have implemented.

      - Figure caption Fig. 4H "Similar results were obtained in the full-dimensional space." I suggest showing this as a supplemental panel. 

      Since this only adds little information, we have chosen not to include it as a supplemental panel to avoid overloading the paper with figures.

      Figure 5: 

      - As mentioned in the public review, I suggest that the authors add the Scaled I case to Fig. 5 (it's shown in all figures and also in Fig. 6 again). I guess for Scaled I the separation between L and M will be very good? 

      Please see our reply to the public review (reviewer 3).

      - Fig. 5A&B: I am a bit confused about which neurons are drawn to calculate the Mahalanobis distance. In Fig. 5A, the schematic indicates that the vector B from which the neurons are drawn is distinct from the distribution Q. For the example of odor L, the distribution Q consists of pure odor L with odors that have little mixtures with the other odors. But the vector v for odor L seems to be drawn only from odors that have slightly higher mixtures (as shown in the schematic in Fig. 5A). Is there a reason to choose the vector v from different odors than the distribution Q? 

      The distribution Q and the vector v consist of activity patterns across the same neurons in response to different odors. The reason to choose a different odor for v was to avoid having this test datapoint being included in the distribution Q. We also wanted Q to be the same for all test datapoints. 

      What does "drawn from whole population" mean? Does this mean that the vectors are drawn from any neuron in pDp? If yes, then I don't understand how the authors can distinguish between different odors (L,M,O,N) on the y-axis. Or does "whole population" mean that the vector is drawn across all assemblies as shown in the schematic in Fig. 5A and the case "neurons drawn from (pseudo-) assembly" means that the authors choose only one specific assembly? In any case, the description here is a bit confusing, I think it would help the reader to clarify those terms better.  

      Yes, “drawn from whole population” means that we randomly draw 80 neurons from the 4000 E neurons in pDp. The y-axis means that we use the activity patterns of these neurons evoked by one of the 4 odors (L, M, N, O) as reference. We have modified the Figure legend to clarify this: “d<sub>M</sub> was computed based on the activity patterns of 80 E neurons drawn from the four (pseudo-) assemblies (top) or from the whole population of 4000 E neurons (bottom). Average of 50 draws.”

      - Suppl Fig. 5A: In the schematic the distance is called d_E(\bar{Q},\bar{V}) while the colorbar has d_E(\bar{Q},\bar{Q}) with the Qs in different color. The green Q should be a V. 

      We thank the reviewer for spotting this mistake, it is now fixed.

      - Fig. 5: Could the authors comment on the fact that a random network seems to be very good in classifying patterns on it's own. Maybe in the Discussion? 

      The task shown in Figure 5 is a relatively easy one, a forced-choice between four classes which are uncorrelated. In Supplementary Figure 9, we now show classification for correlated classes, which is already much harder.

      Figure 6: 

      - Is the correlation induced by creating mixtures like in the other Figures? Please clarify how the correlations were induced. 

      We clarified this point in the Methods section: “The pixel at each vertex corresponded to one pure odor with 150 activated and 75 inhibited mitral cells (…) and the remaining pixels corresponded to mixtures. In the case of correlated pure odors (Figure 6), adjacent pure odors shared half of their activated and half of their inhibited cells.”. An explicit reference to the Methods section has also been added to the figure legend.

      - Fig. 6C (right): why don't we see the clear separation in PC space as shown in Fig. 4? Is this related to the existence of correlations? Please clarify. 

      Yes. The assemblies corresponding to the correlated odors X and Y overlap significantly, and therefore responses to these odors cannot be well separated, especially for Scaled networks. We added the overlap quantification in the Results section to make this clear. “These two additional assemblies had on average 16% of neurons in common due to the similarity of the odors.”

      - "Furthermore, in this regime of higher pattern similarity, dM was again increased upon learning, particularly between learned odors and reference classes representing other odors (not shown)." Please show this (maybe as a supplemental figure). 

      We now show the data in Supplementary Figure 9.

      Discussion: 

      - The authors write: "We found that transformations became more discrete map-like when amplification within assemblies was increased and precision of synaptic balance was reduced. Likewise, decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (not shown)." 

      Where do I see the first point? I guess when I compare in Fig. 4D the case of Scaled I vs Tuned E+I, but the sentence above sounds like the authors showed this in a more step-wise way eg by changing the strength of \alpha or \beta (as defined in Fig. 1). 

      Also I think if the authors want to make the point that decreasing amplification in assemblies changes transformation with a different rate distribution in scaled vs tuned networks, the authors should show it (eg adding a supplemental figure). 

      The first point is indeed supported by data from different figures. Please note that the revised manuscript now contains further simulations that reinforce this statement, particularly those shown in Supplementary Figure 6, and that this point is now discussed more extensively in the Discussion. We hope that these revisions clarify this general point.

      The data showing effects of decreasing amplification in assemblies is now shown in Supplementary Figure 6 (Scaled[adjust])

      - I suggest adding the citation Znamenskiy et al., 2024 (Neuron; https://doi.org/10.1016/j.neuron.2023.12.013), which shows that excitatory and inhibitory (PV) neurons with functional similarities are indeed strongly connected in mouse V1, suggesting the existence of E/I assembly structure also in mammals.

      Done.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost importance and a reproducible setup essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

      Weaknesses:

      The authors stress that it can be done in BBB models, but I would argue that it is much more broadly applicable. This is not necessarily a weakness of the study but more an opportunity to strengthen the method. So I would encourage the authors to rewrite some parts and make it more broadly applicable.

      We thank the Reviewer for this suggestion. The barrier properties of the BBB influence the dynamic behavior of T cells during their multi-step extravasation cascade. The crawling of CD4 T cells against the direction of blood-flow is e.g. a unique behavior of T cells on the BBB  that is also observed in vivo(1-3). Nevertheless we fully agree that in principle UFMTrack is usable for studying in general immune cell interactions with endothelial monolayers under physiological flow. We have thus added a statement in the abstract and expanded the discussion to highlight availability of the framework and the potential necessary adaptations required when using UFMTrack for analyzing different experimental setups. Please also note, UFMTrack has been established as basic framework using the example of brain endothelial monolayers and one flow chamber devices while studying different immune cell subsets. The purpose of the publication is to make UFMTrack available to the community to address their specific questions.

      (1) Kawakami, N., Bartholomäus, I., Pesic, M. & Kyratsous, N. I. Intravital Imaging of Autoreactive T Cells in Living Animals. Methods Cell Biol. 113, 149–168 (2013).

      (2) Schläger, C., Litke, T., Flügel, A. & Odoardi, F. In Vivo Visualization of (Auto)Immune Processes in the Central Nervous System of Rodents. in 117–129 (Humana Press, New York, NY, 2014). doi:10.1007/7651_2014_150

      (3) Haghayegh Jahromi, N. et al. Intercellular Adhesion Molecule-1 (ICAM-1) and ICAM-2 Differentially Contribute to Peripheral Activation and CNS Entry of Autoaggressive Th1 and Th17 Cells in Experimental Autoimmune Encephalomyelitis. Front. Immunol. 10, 3056 (2020).

    2. eLife Assessment

      This work is important because it elucidates how immune cells migrate across the blood brain barrier. In the revised version of this study, the authors present a convincing framework to visualize, recognize and track the movement of different immune cells across primary human and mouse brain microvascular endothelial cells without the need for fluorescence-based imaging using microfluidic devices. This work will be of broad interest to the cancer biology, immunology and medical therapeutics fields.

    3. Reviewer #1 (Public review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost importance and a reproducible setup is essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

    4. Reviewer #2 (Public review):

      Summary:

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications.

      Strengths:

      The algorithm is almost as accurate as manual tracking and importantly saves time for researchers. The authors have discussed how their tool compares to other tracking methods.

      Weaknesses:

      Applicability can be questioned because the device used is 2D and physiological biology is in 3D. However, the authors have addressed this point in their manuscript.

    1. eLife Assessment

      This study presents a valuable conceptual approach that cell lineage can be determined using methylation data. However, the evidence supporting the claims of the author remains incomplete after revision. If clarified further as described in the reviews, this approach could be of broad interest to neuroscientists and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shibata describes a method to assess rapidly fluctuating CpG sites (fCpGs) from single-cell methylation sequencing (sc-MeSeq) data. Assuming that fCpGs are largely consistent over time with changes induced by inheritable events during replication, the author infers lineage relationships in available brain-derived sc-MeSeq. Supplementing current lineage tracing through genomic and mitochondrial mosaic variants is an interesting concept that could supplement current work or allow additional lineage analysis in existing data.

      However, the author failed to convincingly show the power of fCpG analysis to determine lineages in the human brain. While the correlation with cellular division and distinction of cell types appears plausible and strong, the application to detect specific lineages is less convincing. Aspects of this might be due to a lack of clarity in presentation and erroneous use of developmental concepts. However, without addressing these problems it is challenging for a reader to come to the same conclusions as the author.

      On the flip side, this novel application of fCpGs will allow the re-use of existing sc-MeSeq to infer additional features that were previously unavailable, once the biological relevance has been further elucidated.

      Strengths:

      • Novel re-analysis application of methylation data to infer the status of fCpGs and the use as a lineage marker<br /> • Application of this method to an innovative existing data set to benchmark this framework against existing developmental knowledge

      Weaknesses:

      • Inconsistent or erroneous use of neurodevelopmental concepts which hinders appropriate interpretation of the results.<br /> • Somewhat confusing presentation at times which makes it hard to judge the value of this novel approach.

    3. Reviewer #3 (Public review):

      Summary:

      Cell lineage tracing necessitates continuous visible tracking or permanent molecular markers that daughter cells inherit from their progenitors. To successfully trace cell lineages, it is essential to generate and detect sufficient new markers during each cell division. Thus, molecular cell lineages have been predominantly studied with stably inherited genetic markers in animal models and somatic DNA mutations in the human brain. DNA methylation is unstable across cell divisions and differentiation, and is hardly called barcodes. The use of "Human Brain Barcodes" in the title and across the whole paper lacks convincing evidence - it is questionable that CpG methylation is always stably inherited by daughter cells.

      Strengths:

      Analysis of DNA methylation.

      Weaknesses:

      The unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. To establish DNA methylation as a means for lineage tracing, it is necessary to test whether the DNA methylation patterns can faithfully track cell lineages with in vitro differentiated & visibly tracked cell lineages.

      The unreliable CpG methylation status also raises the question of what the "Barcodes" refer to in the title and across this study. Barcodes should be stable in principle and not dynamic across cell generations, as defined in the Reference #1. The CRISPR/Cas9 mutable barcodes or the somatic mutations may be considered barcodes, but the reviewer is not convinced that the "dynamic" CpG methylation fits the "barcodes" terminology. This problem is even more concerning in the last section of the results, where CpG status fluctuates in post-mitotic cells.

      The manuscript frequently states assumptions in a tone of conclusions and interprets results without rejecting alternative hypotheses. For example, the title "Human Brain Barcodes" should be backed with solid supporting evidence. For another example, the author assumed that the early-formed brain stem would resemble progenitors better and have a higher average methylation level than the forebrain - however, this difference in DNA methylation status could well reflect cell-type-specific gene expression instead of cell lineage progression.

      Other points:

      (1) The conclusion that excitatory neurons undergo tangential migration is unclear - how far away did the author mean for the tangential direction? Lateral dispersion is known, but it is hard to believe that the excitatory neurons travel across different brain regions. More importantly, how would the author interpret shared or divergent methylation for the same cell type across different brain regions?

      (2) The sparsity and resolution of the single-cell DNA methylation data. The methylation status is detected in only a small fraction (~500/31,000 = 1.6%) of fCpGs per cell, with only 48 common sites identified between cell pairs. Given that the human genome contains over 28 million CpG sites, it is important to evaluate whether these fCpGs are truly representative.

      (3) While focusing on the X-chromosome may simplify the identification of polymorphic fCpGs, the confidence in determining its methylation status (0 or 1) is questionable when a CpG site is covered by only one read.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Developing a reliable method to record ancestry and distinguish between human somatic cells presents significant challenges. I fully acknowledge that my current evidence supporting the claim of lineage tracing with fCpG barcodes is inadequate. I agree with Reviewer 1 that fCpG barcodes are essentially a cellular division clock that diverges over time. A division clock could potentially document when cells cease to divide during development, with immediate daughter cells likely exhibiting more similar barcodes than those that are less related. Although it remains uncertain whether the current fCpG barcodes capture useful biological information, refinement of this type of tool could complement other approaches that reconstruct human brain function, development, and aging.

      Due to my lack of clarity, the fCpG barcode was perceived to be a new type of cell classifier. However, it is fundamentally different. fCpG sites are selected based on their differences between cells of the same type, while traditional cell classifiers focus on sites with consistent methylation patterns in cells of the same type. Despite these opposing criteria, fCpG barcodes and traditional cell classifiers may align because neuron subtypes often share common progenitors. As a result, cells of the same phenotype are also closely related by ancestry, and ex post facto, have similar fCpG barcodes. fCpG barcodes are complementary to cell type classifiers, and potentially provide insights into aspects such as mitotic ages, diversity within a clade, and migration of immediate daughters---information which is otherwise difficult to obtain. The title has been modified to “Human Brain Ancestral Barcodes” to better reflect the function of the fCpG barcodes. The manuscript is edited to correct errors, and a new Supplement is added to further explain fCpG barcode mechanics and present new supporting data.

      Reviewer #1 (Public review):

      I thank Reviewer 1 for his constructive comments. Major noted weaknesses were 1) insufficient clarity and brevity of the methodology, 2) inconsistent or erroneous use of neurodevelopmental concepts, and 3) lack of consideration for alternative explanations.

      (1) The methodology is now outlined in detailed in a new Supplement, including simulations that indicate that the error rate consistent with the experimental data is about 0.01 changes in methylation per fCpG site per division.

      (2) Conceptual and terminology errors noted by the Reviewers are corrected in the manuscript.

      (3) I agree completely with the alternative explanation of Reviewer 1 that fCpGs are “a cellular division clock that diverges over 'time'”. Differences between more traditional cell type classifiers and fCpG barcodes are more fully outlined in the new Supplement.  Ancestry recorded by fCpGs and cell type classifiers are confounded because cells of the same phenotype typically have common progenitors---cells within a clade have similar fCpG barcodes because they are closely related. fCpG barcodes can compliment cell type classifiers with additional information such as mitotic ages, ancestry within a clade, and daughter cell migration.

      Reviewer #1 (Recommendations for the authors):

      (1) A lot of the interpretations suffer from an extremely loose/erroneous use of developmental concepts and a lack of transparency. For instance:

      a) The thalamus is not part of the brain stem

      Corrected.

      b) The pons contains cells other than inhibitory neurons in the data; the same is true for the hippocampus which contains multiple cell types

      Corrected to refer to the specific cell types in these regions.

      c) The author talks about the rostral-caudal timing a lot which is not really discussed to this degree in the cited references. Thus, it is also unclear how interneurons fit in this model as they are distinguished by a ventral-dorsal difference from excitatory neurons. Also, it is unclear whether the timing is really as distinct as claimed. For instance, inhibitory neurons and excitatory neurons significantly overlap in their birth timing. Finally, conceptually, it does not make sense to go by developmental timing as the author proposes that it is the number of divisions that is relevant. While they are somewhat correlated there are potentially stark differences.

      The manuscript attempts to describe what might be broadly expected when barcodes are sampled from different cell types and locations. As a proposed mitotic clock, the fCpG barcode methylation level could time when each neuron ceased division and differentiated. The wide ranges of fCpG barcode methylation of each cell type (Fig 2A) would be consistent with significant overlap between cell types. The manuscript is edited to emphasize overlapping rather than distinct sequential differentiation of the cell types.

      d) Neocortical astrocytes and some oligodendrocytes share a lineage, whereas a subset of oligodendrocytes in the cortex shares an origin with interneurons. This could confound results but is never discussed.

      The manuscript does not assess glial lineages in detail because neurons were preferentially included in the sampling whereas glial cells were non-systematically excluded. This sampling information is now included in the section “fCpG barcode identification”.

      e) Neocortical interneurons should be more closely related in terms of lineage-to-excitatory neurons than other inhibitory neurons of, for instance, the pons. This is not clearly discussed and delineated.

      This is not discussed. It may not be possible analyze these details with the current data. The ancestral tree reconstructions indicate that excitatory neurons that appear earlier in development (and are more methylated) are more often more closely related to inhibitory neurons.

      f) While there is some spread of excitatory neurons tangentially, there is no tangential migration at the scale of interneurons as (somewhat) suggested/implied here.

      The abstract and results have been modified to indicate greater inhibitory than excitatory neuron tangential migration, but that the extent of excitatory neuron tangential migration cannot be determined because of the sparse sampling and that barcodes may be similar by chance.

      g) The nature of the NN cells is quite important as cells not derived from the neocortical anlage are unlikely to share a developmental origin (e.g., microglia, endothelial cells). This should be clarified and clearly stated.

      The manuscript is modified to indicate that NN cells are microglial and endothelial cells. These cells have different developmental origins, and their data are present in Fig 2A, but are not further used for ancestral analysis.  

      (2) The presentation is often somewhat confusing to me and lacks detail. For instance:

      a) The methods are extremely short and I was unable to find a reference for a full pipeline, so other researchers can replicate the work and learn how to use the pipeline.

      The pipeline including python code is outlined in the new Supplement

      b) Often numbers are given as ~XX when the actual number with some indication of confidence or spread would be more appropriate.

      Data ranges are often indicated with the violin plots.

      c) Many figure legends are exceedingly short and do not provide an appropriate level of detail.

      Figure legends have been modified to include more detail

      d) Not defining groups in the figure legends or a table is quite unacceptable to me. I do not think that referring to a prior publication (that does not consistently use these groups anyway) is sufficient.

      The cell groups are based on the annotations provided with each single cell in the public databases.

      e) The used data should be better defined and introduced (number of cells, different subtypes across areas, which cells were excluded; I assume the latter as pons and hippocampus are only mentioned for one type of neuronal cells, see also above).

      The data used are present in Supplemental File 2 under the tab “cell summary H01, H02, H04”.

      f) Why were different upper bounds used for filtering for H01 and H02, and H04 is not mentioned? Why are inhibitory and excitatory neurons specifically mentioned (Lines 61-66)?

      The filtering is used to eliminate, as much as possible, cell type specific methylation, or CpG sites with skewed neuron methylation. The filtering eliminates CpG sites with high or low methylation within each of the three brains, and within the two major neuron subtypes. The goal is to enrich for CpG sites with polymorphic but not cell type specific methylation. This process is ad hoc as success criteria are currently uncertain. The extent of filtering is balanced by the need to retain sufficient numbers of fCpGs to allow comparisons between the neurons.

      g) What 'progenitor' does the author refer to? The Zygote? If yes, can the methylation status be tested directly from a zygote? There is no single progenitor for these cells other than the zygote. Does the assumption hold true when taking this into account? See, for instance, PMID 33737485 for some estimation of lineage bottlenecks.

      A brain progenitor cell can be defined as the common ancestor of all adult neurons, and is the first cell where each of its immediate daughter cell lineages yield adult neurons. The zygote is a progenitor cell to all adult cells, and barcode methylation at the start of conception, from the oocyte to the ICM, was analyzed in the new Supplement. The proposed brain progenitor cell with a fully methylated barcode was not yet evident even in the ICM.

      (3) I am generally not convinced that the fCpGs represent anything but a molecular clock of cell divisions and that many of the similarities are a function of lower division numbers where the state might be more homogenous. This mainly derives from the issues cited above, the lack of convincing evidence to the contrary, and the sparsity of the assessed data.

      Agree that the fCpG barcode is a mitotic clock that becomes polymorphic with divisions. As outlined in the new Supplement, ancestry and cell type are confounded because cells of the same type typically have a common progenitor.

      a) There appears little consideration or modeling of what the ability to switch back does to the lineage reconstruction.

      fCpG methylation flipping is further analyzed and discussed in the new Supplement.

      b) None of the data convinced me that the observations cannot be explained by the aforementioned molecular clock and systematic methylation similarities of cell types due to their cell state.

      See above

      (4) Uncategorized minor issues:

      a) The author should explain concepts like 'molecular clock hypothesis' (line 27) or 'radial unit hypothesis' (line 154), as they are somewhat complex and might not be intuitive to readers.

      The molecular clock hypothesis is deleted and the radial unit hypothesis is explained in more detail in the manuscript.

      b) Line 32: '[...] replication errors are much higher compared to base replication [...]'. I think this is central to the method and should be better explained and referenced. Maybe even through a schematic, as this is a central concept for the entire manuscript.

      The fCpG barcode mechanics are better explained in the new Supplement. With simulations, the fCpG flip rate is about 0.01 per division per fCpG.

      c) Line 41: 'neonatal'. Does the author mean to say prenatal? Most of the cells discussed are postmitotic before birth.

      Corrected to prenatal.

      d) Line 96: what does 'flip' mean in this context? Please also see the comment on Figure 2C.

      Edited to “chage”

      e) Lines 134-135: I am not sure whether the author claims to provide evidence for this question, and I would be careful with claims that this work does resolve the question here.

      Have toned down claims as evidence for my analysis is currently inadequate.

      f) Lines 192-193: I disagree as the fCpGs can switch back and the current data does not convince me that this is an improvement upon mosaic mutation analysis. In my mind, the main advantage is the re-analysis of existing data and the parallel functional insights that can be obtained.

      Lineage analysis is more straightforward with DNA sequencing, but with an error rate of ~10-9 per base per division, one needs to sequence a billion base pairs to distinguish between immediate daughter cells. By contrast, with an inferred error rate of ~10-2 per fCpG per division, much less sequencing (about a million-fold less) is needed to find differences between daughter cells.

      g) Lines 208-209: I would be careful with claims of complexity resolution given many of the limitations and inherent systematic similarities, as well as the potential of fCpGs to change back to an ancestral state later in the lineage.

      Have modified the manuscript to indicate the analysis would be more challenging due to back changes.

      h) There seem to be few figures that assess phenomena across the three brains. Even when they exist there is no attempt to provide any statistical analyses to support the conclusions or permutations to assess outlier status relative to expectations.

      The analysis could be more extensive, but with only three brains, any results, like this study itself, would be rightly judged inadequate.

      Figure 2B: there appears to be a higher number of '0s' for, for instance, inhibitory neurons compared to excitatory neurons. Is that correct and worth mentioning? The changing axes scales also make it hard to assess.

      Inhibitory neurons do appear to have more unmethylated fCpGs compared to excitatory neurons, but in general, most inhibitory fCpGs are methylated with a skew to fully methylated fCpGs, consistent with the barcode starting predominately methylated and inhibitory neurons generally appearing earlier in development relative to excitatory neurons.

      j) Figure 2C: I have several issues with this. A minor one is the use of 'Glial' which, I believe, does not appear anywhere else before this, so I am unclear what this curve represents. Generally, however, I am not sure what the y-axis represents, as it is not described in the methods or figure legend. I initially thought it was the cumulative frequency, but I do not think that this squares with the data shown in B. I appreciate the overall idea of having 'earlier'/samples with fewer divisions being shifted to the left, but it is very confusing to me when I try to understand the details of the plot.

      This graph is now better described in the legend. “Glial” cells are defined as oligodendrocytes and astrocytes. Other non-neuronal cells (such a microglial cells) have now been removed from the graph.

      This graph attempts to illustrate how it may be possible to reconstruct brain development from adult neurons, assuming barcodes are mitotic clocks that become polymorphic with cell division. The X axis is “time”, and the Y axis indicates when different cell types reach their adult levels. The cartoon indicates what is visually present along the X axis during development--- brainstem, then ganglionic eminences with a thin cortex, and finally the mature brain with a robust cortex. Time for the X axis is barcode methylation and starts at 100% and ends at 50% or greater methylation. The fCpG barcode methylation of each cell places it on this timeline and indicates when it ceased dividing and differentiated.

      The Y axis indicates the progressive accumulation of the final adult contents of each cell type during this timeline. Early in development, the brain is rudimentary and adult cells are absent. At 90% methylation, only the inhibitory neurons in the pons are present. At 80% methylation, some excitatory neurons are beginning to appear. Inhibitory neurons in the pons have reached their final adult levels and many other inhibitory neuron types are reaching adult levels. By 70% methylation, most inhibitory neurons have reached their adult levels, and more adult excitatory neurons (mainly low cortical neurons, L4-6) and glial cells are beginning to appear. By 60% methylation, inhibitory neurogenesis has largely finished. Adult excitatory neurons and glial cells are more abundant and reach their adult levels by 50% or greater cell barcode methylation levels.

      The graph illustrates a rough alignment between mitotic ages inferred by barcode methylation levels and the physical appearances of different neuronal types during development. Many neurons die during development, and this graph, if valid, indicates when neurons that survive to adulthood appear during development.

      k) Figure 4Bff: it is confusing to me that the text jumps to these panels after introducing Figure 5. This makes it very hard to read this section of the text.

      The Figures appear in the order they are first referred to in the text.

      l) Figure 5A: could any of this difference be explained by the shared lineage of excitatory neurons and dorsal neocortical glia?

      Not sure

      m) Figure 5B: after stating that interneurons have a higher lineage fidelity, the figure legend here states the opposite and I am somewhat confused by this statement.

      The legend and text have been clarified. Fig 5A restricts fidelity to within inhibitory cell types. Fig 5B compares between neuron subtypes, and illustrates more apparent inhibitory subtype switching, albeit there are more interneuron subtypes than excitatory subtypes.

      n) Figure 5E: generally, the use of tSNE for large pairwise distance analysis is often frowned upon (e.g., PMID 37590228), and I would reconsider this argument.

      This analysis was an attempt to illustrate that cells of the same phenotype based on their tSNE metrics can be either closely or more distantly related. Although the tSNE comparisons were restricted to subtypes (and not to the entire tSNE graph), tSNE are not designed for such comparisons. This graph and discussion are deleted. 

      Reviewer #2 (Public review):

      The manuscript by Shibata proposed a potentially interesting idea that variation in methylcytosine across cells can inform cellular lineage in a way similar to single nucleotide variants (SNVs). The work builds on the hypothesis that the "replication" of methylcytosine, presumably by DNMT1, is inaccurate and produces stochastic methylation variants that are inherited in a cellular lineage. Although this notion can be correct to some extent, it does not account for other mechanisms that modulate methylcytosines, such as active gain of methylation mediated by DNMT3A/B activity and activity demethylation mediated by TET activity. In some cases, it is known that the modulation of methylation is targeted by sequence-specific transcription factors. In other words, inaccurate DNMT1 activity is only one of the many potential ways that can lead to methylation variants, which fundamentally weakens the hypothesis that methylation variants can serve as a reliable lineage marker. With that being said (being skeptical of the fundamental hypothesis), I want to be as open-minded as possible and try to propose some specific analyses that might better convince me that the author is correct. However, I suspect that the concept of methylation-based lineage tracing cannot be validated without some kind of lineage tracing experiment, which has been successfully demonstrated for scRNA-seq profiling but not yet for methylation profiling (one example is Delgado et al., nature. 2022).

      I thank Reviewer 2 for the careful evaluation. The validation experiment example (Delgado et al.) introduced sequence barcodes in mice, which is not generally feasible for human studies.

      (1) The manuscript reported that fCpG sites are predominantly intergenic. The author should also score the overlap between fCpG sites and putative regulatory elements and report p-values. If fCpG sites commonly overlap with regulatory elements, that would increase the possibility that these sites being actively regulated by enhancer mechanisms other than maintenance methyltransferase activity.

      As mentioned for Reviewer 1, fCpGs are filtered to eliminate cell type specific methylation.

      (2) The overlap between fCpG and regulatory sequence is a major alternative explanation for many of the observations regarding the effectiveness of using fCpG sites to classify cell types correctly. One would expect the methylation level of thousands of enhancers to be quite effective in distinguishing cell types based on the published single-cell brain methylome works.

      As mentioned above, the manuscript did not clearly indicate that the fCpG barcode is not a cell type classifier. The distinctions between fCpG barcodes and cell type classifiers are better explained in the new Supplement.

      (3) The methylation level of fCpG sites is higher in hindbrain structures and lower in forebrain regions. This observation was interpreted as the hindbrain being the "root" of the methylation barcodes and, through "progressive demethylation" produced the methylation states in the forebrain. This interpretation does not match what is known about methylation dynamics in mammalian brains, in particular, there is no data supporting the process of "progressive demethylation". In fact, it is known that with the activation of DNMT3A during early postnatal development in mice or humans (Lister et al., 2013. Science), there is a global gain of methylation in both CH and CG contexts. This is part of the broader issue I see in this manuscript, which is that the model might be correct if "inaccurate mC replication" is the only force that drives methylation dynamics. But in reality, active enzymatic processes such as the activation of DNMT3A have a global impact on the methylome, and it is unclear if any signature for "inaccurate mC replication" survives the de novo methylation wave caused by DNMT3A activity.

      Reviewer 2 highlights a critical potential flaw in that any ancestral signal recorded by random replication errors could be overwritten by other active methylation processes. I cannot present data that indicates fCpG replication errors are never overwritten, but new data indicate barcode reproducibility and stability with aging.

      New data are also present where barcodes are compared between daughter cells (zygote to ICM) in the setting of active and passive demethylation, when germline methylation is erased. This new analysis shows that daughter cells in 2 to 8 cell embryos have more related barcodes than morula or ICM cells. The subsequent active remethylation by a wave of DNMT3A activity may underlie the observation that the barcode appears to start predominately methylated in brain progenitors.

      (3) Perhaps one way the author could address comment 3 is to analyze methylome data across several developmental stages in the same brain region, to first establish that the signal of "inaccurate mC replication" is robust and does not get erased during early postnatal development when DNMT3A deposits a large amount of de novo methylation.

      See above

      (4) The hypothesis that methylation barcodes are homogeneous among progenitor cells and more polymorphic in derived cells is an interesting one. However, in this study, the observation was likely an artifact caused by the more granular cell types in the brain stem, intermediate granularity in inhibitory cells, and highly continuous cell types in cortical excitatory cells. So, in other words, single-cell studies typically classify hindbrain cell types that are more homogenous, and cortical excitatory cells that are much more heterogeneous. The difference in cell type granularity across brain structures is documented in several whole-brain atlas papers such as Yao et al. 2023 Nature part of the BICCN paper package.

      As noted above, fCpG barcode polymorphisms and cell type differentiation are confounded because cells of the same phenotype tend to have common progenitors. The fCpG barcode is not a cell type classifier but more a cell division clock that becomes polymorphic with time. Although fCpG barcodes could be more polymorphic in cortical excitatory cells because there are many more types, fCpG barcodes would inherently become more polymorphic in excitatory cells because they appear later in development.

      (5) As discussed in comment 2, the author needs to assess whether the successful classification of cell types (brain lineage) using fCpG was, in fact, driven by fCpG sites overlapping with cell-type specific regulatory elements.

      Although unclear in the manuscript, the fCpG is not a cell classifier and the barcode is polymorphic between cells of the same type. fCpG barcodes can appear to be cell classifiers because cell types appear at different times during development, and therefore different cell types have characteristic average barcode methylation levels.

      (6) In Figure 5E, the author tried to address the question of whether methylation barcodes inform lineage or post-mitotic methylation remodeling. The Y-axis corresponds to distances in tSNE. However, tSNE involves non-linear scaling, and the distances cannot be interpreted as biological distances. PCA distances or other types of distances computed from high-dimensional data would be more appropriate.

      The Figure and discussion are deleted (similar comment by Reviewer 1)

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Human Brain Barcodes", the author sought to use single-cell CpG methylation information to trace cell lineages in the human brain.

      Strengths:

      Tracing cell lineages in the human brain is important but technically challenging. Lineage tracing with single-cell CpG methylation would be interesting if convincing evidence exists.

      Weaknesses:

      As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. The unreliable CpG methylation status also raises the question of what the "Barcodes" refer to in the title and across this study. Barcodes should be stable in principle and not dynamic across cell generations, as defined in Reference#1. It is not convincing that the "dynamic" CpG methylation fits the "barcodes" terminology. This problem is even more concerning in the last section of results, where CpG would fluctuate in post-mitotic cells.

      I thank Reviewer 3 for his thoughtful and careful evaluation. I think the “barcode” terminology is appropriate. Dynamic engineered barcodes such as CRISPR/Cas9 mutable barcodes are used in biology to record changes over time. The fCpG barcode appears to start with a single state in a progenitor cell and changes with cell division to become polymorphic in adult cells. Therefore, I think the description of a dynamic fCpG barcode is appropriate.

      Reviewer #3 (Recommendations for the authors):

      (1) As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. To establish DNA methylation as a means for lineage tracing, one control experiment would be testing whether the DNA methylation patterns can faithfully track cell lineages for in vitro differentiated & visibly tracked cell lineages. Has this kind of experiment been done in the field?

      These types of experiments have not been performed to my knowledge and an appropriate tissue culture model is uncertain. New single cell WGBS data from the zygote to ICM indicate that more immediate daughter cells have more related barcodes even in the setting of active DNA demethylation.

      (2) The study includes assumptions that should be backed with solid rationale, supporting evidence, or reference. Here are a couple of examples:

      a) the author discarded stable CpG sites with <0.2 or >0.8 average methylation without a clear rationale in H02, and then used <0.3 and >0.7 for a specific sample H01.

      The filtering was ad hoc and was used to remove, as much as possible, CpG sites with cell type specific or patient specific methylation. CpG sites with skewed methylation are more likely cell type specific, whereas X chromosome CpG sites with methylation closer to 0.5 in male cells are more likely to be unstable. The ad hoc filtering attempted to remove cell specific CpGs sites while still retaining enough CpG sites to allow comparisons between cells.

      b) The author assumed that the early-formed brain stem would resemble progenitors better and have a higher average methylation level than the forebrain. However, this difference in DNA methylation status could reflect developmental timing or cell type-specific gene expression changes.

      This observation that brain stem neurons that appear early in development have highly methylated fCpG barcodes in all 3 brains supports the idea that the fCpG barcode starts predominately methylated. Alternative explanations are possible.

      (3) The conclusion that excitatory neurons undergo tangential migration is unclear - how far away did the author mean for the tangential direction? Lateral dispersion is known, but it would be striking that the excitatory neurons travel across different brain regions. The question is, how would the author interpret shared or divergent methylation for the same cell type across different brain regions?

      As noted with Reviewer 1, this analysis is modified to indicate that evidence of tangential migration is greater for inhibitory than excitatory neurons, but the extent of excitatory neuron migration is uncertain because of sparse sampling, and because fCpG barcodes can be similar by chance.

      (4) The sparsity and resolution of the single-cell DNA methylation data. The methylation status is detected in only a small fraction (~500/31,000 = 1.6%) of fCpGs per cell, with only 48 common sites identified between cell pairs. Given that the human genome contains over 28 million CpG sites, it is important to evaluate whether these fCpGs are truly representative. How many of these sites were considered "barcodes"?

      fCpG barcodes are distinct from traditional cell type classifiers, and how fCpGs are identified are better outlined in the new Supplement.

      (5) While focusing on the X-chromosome may simplify the identification of polymorphic fCpGs, the confidence in determining its methylation status (0 or 1) is questionable when a CpG site is covered by only one read. Did the author consider the read number of detected fCpGs in each cell when calculating methylation levels? Certain CpG sites on autosomes may also have sufficient coverage and high variability across cells, meeting the selection criteria applied to X-chromosome CpGs.

      In most cases, a fCpG site was covered by only a single read

      (6) The overall writing in the Title, the Main text, Figure legends, and Methods sections are overly simplified, making it difficult to follow. For instance, how did the author perform PWD analysis? How did they handle missing values when constructing lineage trees?

      There is not much introduction to lineage tracing in the human brain or the use of DNA methylation to trace cell lineage.

      These shortcomings are improved in the manuscript and with the new Supplement. The analysis pipeline including the Python programs are outlined and included as new Supplemental materials. IQ tree can handle the binary fCpG barcode data and skips missing values with its standard settings.

      Line 80: it is unclear: "Brain patterns were similar"

      Clarified

      Line 98: The meaning is unclear here: "Outer excitatory and glial progenitor cells are present" What are these glial progenitor cells and when/how they stop dividing?

      The glial cells are the oligodendrocytes and astrocytes. The main take away point is that these glial cells have low barcode methylation, consistent with their appearances later in development.

      Line 104: It is unclear if this is a conclusion or assumption -- "A progenitor cell barcode should become increasingly polymorphic with subsequent divisions." The "polymorphic" happens within the progenitors, their progenies, or their progenies at different time points.

      The statement is now clarified as an assumption in the manuscript.

      Similarly line 134 "Barcodes would record neuronal differentiation and migration." Is this a conclusion from this study or a citation? How is the migration part supported?

      The reasoning is better explained in the manuscript.  Migration can be documented if immediate daughter cells with similar barcodes are found in different parts of the adult brain, albeit analysis is confounded by sparse sampling and because barcodes may be similar by chance.

      Line 148 and 150: "Nearest neighbor ... neuron pairs" in DNA methylation status would conceivably reflect their cell type-specific gene expression, how did the author distinguish this from cell lineage?

      As noted above, because cells with similar phenotypes usually arise from common progenitors, cells within a clade are also usually related. However, the barcodes are still polymorphic within a clade and potentially add complementary information on mitotic ages, ancestry within a clade, and possible cell migration.

      Figure 3C: "Cells that emerge early in development" Where are they on the figure?

      Hindbrain neurons differentiate early in development and their barcodes are more methylated. The figure has been modified to label some of the values with their neuron types. Also, the older figure mistakenly included data from all 3 brains and now the data are only from brain H01.

      Figures 4D and 4E, distinguishing cell subtypes is challenging, as the same color palette is used for both excitatory and inhibitory neurons.

      Unfortunate limitations due to complexity and color limitations

      Figures 4 and 5, what are these abbreviations?

      The abbreviations are presented in Figure 1 and maintained in subsequent figures.

    1. eLife Assessment

      This study presents a valuable finding on the mechanism of self-prioritization by revealing the influence of self-associations on early attentional selection. The evidence supporting the claims of the authors is solid, although inclusion of a discussion about the generalization and limitation would have strengthened the study. The work will be of interest to researchers in psychology, cognitive science, and neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Comments on revisions:

      The authors clearly showed the relationship between attention and self-prioritization.

    3. Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other-associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were: (1) self-association enhances both attentional weights and processing capacity, (2) self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and (3) social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., the attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity, and that this process occurs automatically. These findings offer new insights into the self-prioritization from the perspective of early stage of attentional selection.

      Weaknesses:

      The results are still not convincing enough to definitively support their conclusions. The generalization of the findings needs further examination. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. More empirical data are needed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The research questions should be revised to better align with the conclusions. For example, Q2 is phrased as "Does self-relatedness bias attentional selection at the level of the perceptual feature representation (shape) or at the level of the associated identity (social association)," which is unclear in its reference to "levels." A more appropriate phrasing would be whether the self-association bias occurs automatically or whether it depends on explicit social decoding.

      Thank you for this suggestion – we have revised the phrasing accordingly: “Does self-relatedness bias attentional selection automatically or does it require explicit social decoding?”

      (2) After presenting the data, it would be helpful to include one or two sentences summarizing the conclusions drawn from the data and how they relate to the research questions. Currently, readers are left to guess whether the results are consistent with the hypotheses.

      Thank you for this suggestion, which we think will enhance the clarity of the manuscript – we have added summary sentences when presenting the results:<br /> “This cross-experimental parameter inspection revealed that participants exhibited an attentional selection bias towards socially associated information. Interestingly, enhanced processing speed was observed for other-associated rather than self-associated information, a pattern that diverged from our prediction.”

      (1) “Results from experiment 2 demonstrated a faster, more automatic attentional selection for self-associated information when the decision did not require explicit social decoding. When the social identity had to be judged, processing speed for self-associated information decreased. Contrary to the hypothesis that social decoding is necessary for self-prioritization to emerge, these findings suggest that attentional selection can operate automatically to prioritize self-associated information. “

      (2) “Taken together, as also confirmed in the cross-experimental analysis, attentional selection favoured the other-related information when social identity had to be judged. In contrast, perceptual salience, as predicted, led to increased processing speed for the more salient stimulus. “

      (3) The identity of the "other" used in the experiments is unclear, making it uncertain whether the results are self-specific. It would be beneficial to compare the self condition with a control condition, such as a close friend vs. an unfamiliar other. Alternatively, the results may reflect attentional bias for familiar vs. unfamiliar individuals rather than self-specific bias.

      Thank you for this comment. Firstly, we would like to clarify that we have provided participants with a description of who the “other” is (see methods: “At the beginning of this task, participants were told that one of the two geometric shapes that was used in the TOJ task has been assigned to them, and the other shape has been assigned to another participant in the experiment – someone they did not know, but who was of similar age and gender”). We aimed to make the ‘other’ as concrete as possible, while maintaining a ‘stranger’ identity.

      Secondly, this specification is in line with the vast majority of the literature, which typically measures the effects of self-prioritization relative to the association with an unfamiliar other (stranger), or an unfamiliar and familiar other (e.g. friend, family member). They find that processing advantages that affect friend-related stimuli (friend-stimuli being processed faster than stranger-associated stimuli) are likely mediated by self-extension, that is, an association of the friend with the self. As such, SPEs, relative to familiar others, are typically smaller in size (see, e.g., Sui et al., 2012). They, however, are less stable and more variable than the self-prioritization effects measured relative to a stranger (see Scheller & Sui, 2022 JEP:HPP). Importantly, this is driven by the variability of the friend-associated stimulus, rather than the self or other-associated stimulus (see Figure 4 in main text and S5 in supplementary material in Scheller & Sui, 2022: https://durham-repository.worktribe.com/output/1210478/the-power-of-the-self-anchoring-information-processing-across-contexts). Effectively, this would suggest that choosing a familiar other as a reference would not only (a) lead to a smaller effect size, but also (b) be a less stable effect, which likely depends on the association the individual has to the other familiar person. In contrast, by associating the other shape with another participant in this experiment, we provide participants not only with a concrete representation of a stranger, but also maximise our ability to detect true effects, as these are likely to be larger and more stable.

      (4) The key aspects of the procedure (e.g., the order of different conditions) and its rationale need to be clearly explained before or during the presentation of the results. Currently, readers are left to infer certain details.

      Thank you for pointing this out. The methods that provide these details are outlined at the end of the document, however, we agree it would be useful to bring some of these details up. We have therefore revised the methods figure (Figure 3) to include an outline of the task type, order, and trial numbers. Task boxes are colour coded by the conditions that are listed in the results figures of the manuscript. We also added these details to the caption of Figure 3.

      “Task structures of Experiments 1 and 2. Both experiments started with a TOJ baseline task. In Experiment 1, only non-salient targets were presented, while in Experiment 2, perceptually salient and non-salient trials were included. These were presented in randomly intermixed order. Next, targets were associated with social identities. Associations were practiced using the matching task. Following association learning, which attaches social salience to the shapes, participants completed the same TOJ task as before. In Experiment 1, they completed one block using a social decision dimension, and one block using a perceptual decision dimension. The order of these blocks was counterbalanced across participants to reduce the influence of order effects in the results. In Experiment 2, perceptually salient and non-salient stimuli were presented in an intermixed fashion, and participants responded within the social decision dimension. Each task block was preceded by 8 (matching) to 14 (TOJ) practice trials.”

      (5) Certain imprecise terms used to describe the results, such as "slightly," "roughly," and "loosely," create confusion for the readers. The authors should take a clearer stance on the results and provide an explanation for why the data only "slightly," "roughly," or "loosely" support the findings.

      Thank you for highlighting this. We have provided a more concrete wording and details throughout (e.g., “target shapes’ were 30% bigger than the ‘background shapes”).

      Lastly, we have updated the formatting of the manuscript to provide higher fidelity figures, which were previously compromised by file conversion.

    1. eLife Assessment

      This study describes a valuable new model for in vivo manipulation of microglia, exploring how mutations in the Adar1 gene within microglia contribute to Aicardi-Goutières Syndome. The methodology is validated with solid data, supporting the authors' conclusions. The paper underscores both the advantages and limitations of using transplanted cells as a surrogate for microglia, making it a resource that is of value for biologists studying macrophages and microglia.

    2. Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Weaknesses:

      The robust data showing the quality of this model at the transcriptomic level can be strengthened with confirmation at protein and functional levels. The authors were unable to investigate the effects of Adar1-KO using ER-Hoxb8 cells and instead had to rely on a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H). Additionally, ER-Hoxb8-derived microglia do not express Sall1, a key marker of microglia, which limits their fidelity as a full microglial replacement, as has been rightfully pointed out in the discussion.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. While Adar1-KO macrophages do not engraft well, the success of TLR4-KO line highlights the model's potential for investigating other genes. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

    3. Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gained traction in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provided evidence to support: (1) Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to the Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competence and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

    1. eLife Assessment

      This provocative manuscript presents important comparisons of the morphologies of Archaean bacterial microfossils to those of microbes transformed under environmental conditions that mimic those present on Earth during the same Eon. The evidence in support of the conclusions is solid. The authors' environmental condition selection for their experiment is justified.

    2. Joint Public Review:

      Summary:

      Microfossils from the Paleoarchean Eon represent the oldest evidence of life, but their nature has been strongly debated among scientists. To resolve this, the authors reconstructed the lifecycles of Archaean organisms by transforming a Gram-positive bacterium into a primitive lipid vesicle-like state and simulating early Earth conditions. They successfully replicated all morphologies and life cycles of Archaean microfossils and studied cell degradation processes over several years, finding that encrustation with minerals like salt preserved these cells as fossilized organic carbon. Their findings suggest that microfossils from 3.8 to 2.5 billion years ago were likely liposome-like protocells with energy conservation pathways but without regulated morphology.

      Strengths:

      The authors have crafted a compelling narrative about the morphological similarities between microfossils from various sites and proliferating wall-deficient bacterial cells, providing detailed comparisons that have never been demonstrated in this detail before. The extensive number of supporting figures is impressive, highlighting numerous similarities. While conclusively proving that these microfossils are proliferating protocells morphologically akin to those studied here is challenging, we applaud this effort as the first detailed comparison between microfossils and morphologically primitive cells.

      Summary of reviewer comments on this revision:

      Each of the original reviewers evaluated the revised manuscript and were complimentary about how the authors addressed their original concerns. One reviewer added: "It is a thought-provoking manuscript that will be well received." We encourage readers of this version of the paper to consider the original reviewer comments and the authors' responses: https://elifesciences.org/reviewed-preprints/98637/reviews

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This provocative manuscript from presents valuable comparisons of the morphologies of Archaean bacterial microfossils to those of microbes transformed under environmental conditions that mimic those present on Earth during the same Eon, although the evidence in support of the conclusions is currently incomplete. The reasons include that taphonomy is not presently considered, and a greater diversity of experimental environmental conditions is not evaluated -- which is important because we ultimately do not know much about Earth's early environments. The authors may want to reframe their conclusions to reflect this work as a first step towards an interpretation of some microfossils as 'proto-cells,' and less so as providing strong support for this hypothesis. 

      Regarding the taphonomic alterations: The editor and reviewers are correct in pointing out this issue. Taphonomic alteration of the microfossils attains special significance in the case of microorganisms, as they lack rigid structures and are prone to morphological alterations during or after their fossilization. We are acutely aware of this issue and have conducted long-term experiments (lasting two years) to observe how cells die, decay, and get preserved. A large section of the manuscript (pages 11 to 20) and a substantial portion of the supplementary information is dedicated to understanding the taphonomic alterations. To the best of our knowledge, these are among the longest experiments done to understand the taphonomic alterations of the cells within laboratory conditions. 

      Recent reports by Orange et al. (1,2)  showed that under favorable environmental conditions, cells could be fossilized rather rapidly with little morphological modifications. We observed a similar phenomenon in this work. Cells in our study underwent rapid encrustation with cations from the growth media. We have analyzed the morphological changes over a period of 18 months. After 18 months, the softer biofilms got encrusted entirely in salt and turned solid (Fig. ). Despite this transformation, morphologically intact cells could still be observed within these structures. This suggests that the cells inhabiting Archaean coastal marine environments could undergo rather rapid encrustation, and their morphological features could be preserved in the geological record with little taphonomic alteration.    

      Regarding the environmental conditions: We are in total agreement with the reviewers that much is unknown about Archaean geology and its environmental conditions. Like the present-day Earth, Archaean Earth certainly had regions that greatly differed in their environmental conditions—volcanic freshwater ponds, brines, mildly halophilic coastal marine environments, and geothermal and hydrothermal vents, to name a few. Our experimental design focuses on one environment we have a relatively good understanding of rather than the rest of the planet, of which we know little. Below, we list our reasons for restricting to coastal marine environments and studying cells under mildly halophilic experimental conditions.  

      (1) Very little continental crust from Haden and early Archaean Eon exists on the presentday Earth. Much of our geochemical understanding of this time period was a result of studying the Pilbara Iron Formations and the Barberton Greenstone Belt. Geological investigations suggest that these sites were coastal marine environments. The salinity of coastal marine environments is higher than that of open oceans due to the greater water evaporation within these environments. Moreover, brines were discovered within pillow basalts within the Barberton greenstone belt, suggesting that the salinity within these sites is higher or similar to marine environments. 

      (2) We are not certain about the environmental conditions that could have supported the origin of life. However, all currently known Archaean microfossils were reported from coastal marine environments (3.8-2.4Ga). This suggests that proto-life likely flourished in mildly halophilic environments, similar to the experimental conditions employed in our study. 

      (3) The chemical analysis of Archaean microfossils also suggests that they lived in saltrich environments, as most, if not all, microfossils are closely associated, often encrusted in a thin layer of salt.  

      However, we concur with the reviewers that our interpretations should be reassessed if Archaean microfossils that greatly differ from the currently known microfossils are to be discovered or if new microfossils are to be reported from environments other than coastal marine sites.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Microfossils from the Paleoarchean Eon represent the oldest evidence of life, but their nature has been strongly debated among scientists. To resolve this, the authors reconstructed the lifecycles of Archaean organisms by transforming a Gram-positive bacterium into a primitive lipid vesicle-like state and simulating early Earth conditions. They successfully replicated all morphologies and life cycles of Archaean microfossils and studied cell degradation processes over several years, finding that encrustation with minerals like salt preserved these cells as fossilized organic carbon. Their findings suggest that microfossils from 3.8 to 2.5 billion years ago were likely liposome-like protocells with energy conservation pathways but without regulated morphology. 

      Strengths: 

      The authors have crafted a compelling narrative about the morphological similarities between microfossils from various sites and proliferating wall-deficient bacterial cells, providing detailed comparisons that have never been demonstrated in this detail before. The extensive number of supporting figures is impressive, highlighting numerous similarities. While conclusively proving that these microfossils are proliferating protocells morphologically akin to those studied here is challenging, we applaud this effort as the first detailed comparison between microfossils and morphologically primitive cells. 

      Weaknesses: 

      Although the species used in this study closely resembles the fossils morphologically, it would be beneficial to provide a clearer explanation for its selection. The literature indicates that many bacteria, if not all, can be rendered cell wall-deficient, making the rationale for choosing this specific species somewhat unclear. While this manuscript includes clear morphological comparisons, we believe the authors do not adequately address the limitations of using modern bacterial species in their study. All contemporary bacteria have undergone extensive evolutionary changes, developing complex and intertwined genetic pathways unlike those of early life forms. Consequently, comparing existing bacteria with fossilized life forms is largely hypothetical, a point that should be more thoroughly emphasized in the discussion. 

      Another weak aspect of the study is the absence of any quantitative data. While we understand that obtaining such data for microfossils may be challenging, it would be helpful to present the frequencies of different proliferative events observed in the bacterium used. Additionally, reflecting on the chemical factors in early life that might cause these distinct proliferation modes would provide valuable context. 

      Regarding our choice of using modern organisms or this particular bacterial species: 

      Based on current scientific knowledge, it is logical to infer that cellular life originated as protocells; nevertheless, there has been no direct geological evidence for the existence of such cells on early Earth. Hence, protocells remain an entirely theoretical concept. Moreover, protocells are considered to have been far more primitive than present-day cells. Surprisingly, this lack of sophistication was the biggest challenge in understanding protocells. Designing experiments in which cells are primitive (but not as primitive as non-living lipid vesicles) and still retain a functional resemblance to a living cell does pose some practical challenges. Laboratory experiments with substitute (proxy) protocells almost always come with some limitations. Although not a perfect proxy, we believe protocells and protoplasts share certain characteristics. Having said that, we would like to reemphasize that protoplasts are not protocells. Our reasons for using protoplasts as model organisms and working with this bacterial species (Exiguobacterium Strain-Molly) are based on several scientific and practical criteria listed below.

      (1) Irrespective of cell physiology and intracellular complexity, we believe that protoplasts and protocells share certain similarities in the biophysical properties of their cytoplasm. We explained our reasoning in the manuscript introduction and in our previous manuscripts (Kanaparthi et al., 2024 & Kanaparthi et al., 2023). In short, to be classified as a cell, even a protocell should possess minimal biosynthetic pathways, a physiological mechanism of harvesting free energy from the surrounding (energy-yielding pathways), and a means of replicating its genetic material and transferring it to the daughter cells. These minimal physiological processes could incorporate considerable cytoplasmic complexity. Hence, the biophysical properties of the protocell cytoplasm could have resembled those of the cytoplasm of protoplasts, irrespective of the genomic complexity. 

      (2) Irrespective of their physiology, protoplasts exhibit several key similarities to protocells, such as their inherent inability to regulate their morphology or reproduction. This similarity was pointed out in previous studies (3). Despite possessing all the necessary genetic information, protoplasts undergo reproduction through simple physiochemical processes independent of canonical molecular biological processes. This method of reproduction is considered to have been erratic and rather primitive, akin to the theoretical propositions on protocells. Although protoplasts are fully evolved cells with considerable physiological complexity, the above-mentioned biophysical similarities suggest that the protoplast life cycle could morphologically resemble that of protocells (in no other aspect except for their morphology and reproduction).  

      (3) Physiologically or genomically different species of Gram-positive protoplasts are shown to exhibit similar morphologies. This suggests that when Gram-positive bacteria lose their cell wall and turn into a protoplast,  they reproduce in a similar manner independent of physiological or genome-based differences. As morphology and only morphology is key to our study, at least from the scope of this study, intracellular complexity is not a key consideration. 

      (4) This specific strain was isolated from submerged freshwater springs in the Dead Sea. This isolate and members of this bacterial genus are known to have been well acclimatized to growing in a wide range of salt concentrations and in different salt species. This is important for our study (this and previous manuscript), in which cells must be grown not only at high salt concentrations (1-15%) but in different salts like NaCl, MgCl<sub>2</sub>, and KCl. 

      (5) Our initial interest in this isolate was due to its ability to reduce iron at high salt concentrations. Given that most spherical microfossils are found in Archaean-banded iron formations covered in pyrite, this suggests that these microfossils could have been reducing oxidized iron species like Fe(III). Nevertheless, over the course of our study, we realized the complexities of live cell staining and imaging under anoxic conditions. Given that the scope of the manuscript is restricted only to comparing the morphologies, not the physiology, we abandoned the idea of growing cells under anoxic conditions.  

      Based on these observations, cell physiology may not be a key consideration, at least within the scope of studying microfossil morphology. However, we want to emphasize again that “We do not claim present-day protoplasts are protocells.”  

      Regarding the absence of quantitative data:

      We are unsure what the reviewer meant by the absence of quantitative data. Is it from the cell size/reproductive pathways perspective or from a microfossil/ecological perspective? At the risk of being portrayed in a bad light, we admit that we did not present quantitative data from either of these perspectives. In our defense, this was not due to our lack of effort but due to the practical limitations imposed by our model organism. 

      If the reviewer means the quantitative data regarding cell sizes and morphology: In our previous work, we studied the relationship between protoplast morphology, growth rate, and environmental conditions. In that study, we proposed that the growth rate is one factor that regulates protoplast morphology. Nevertheless, we did not observe uniformity in the sizes of the cells. This lack of uniformity was not just between the replicates but even among the cells grown within the same culture flask or the cells within the same microscopic field. Moreover, cells are often observed to be reproducing either by forming internal or external or by both these processes at the same time. The size and morphological differences among cells within a growth stage could be explained by the physiological and growth rate heterogenicity among cells. 

      Bacterial growth curves and their partition into different stages (lag, log & stationary), in general, represent the growth dynamics of an entire bacterial population. Nevertheless, averaging the data obscures the behavior of individual cells (4,5). It is known that genetically identical cells within a single bacterial population could exhibit considerable cell-to-cell variation in gene expression (6,7) and growth rates (8). The reason for such stochastic behavior among monoclonal cells has not been well understood. In the case of normal cells, morphological manifestation of these variations is restricted by a rigid cell wall. Given the absence of a cell wall in protoplasts, we assume such cell-to-cell variations in growth rate is manifested in cell morphology. This makes it challenging to quantitatively determine variations in cell sizes or the size increase in a statically robust manner, even in monoclonal cells. 

      Although this lack of uniformity in cell sizes should not be perceived as a limitation, this behavior is consistently observed among microfossils. Spherical microfossils of similar morphology but different sizes were reported from different microfossil sites (9,10). In this regard, both protoplasts and microfossils are very similar. 

      If the reviewer means the quantitative data from an ecological perspective: 

      Based on the elemental composition and the isotopic signatures of the organic carbon, we can deduce if these structures are of biological origin or not. However, any further interpretation of this data to annotate these microfossils to a particular physiology group is fraught with errors. Hence, we refrain from making any inferences about the physiology and ecological function of these microfossils. This lack of clarity on the physiology of microfossils reduces the chance of quantitative studies on their ecological functions. Moreover, we would like to re-emphasize that the scope of this work is restricted to morphological comparison and is not targeted at understanding the ecological function of these microfossils. This narrow objective also limits the nature of the quantitative data we could present.

      Moreover, developing a quantitative understanding of some phenomena could be technically challenging. Many theories on the origin of life, like chemical evolution, started with the qualitative observation that lightning could mediate the synthesis of biologically relevant organic carbon. Our quantitative understanding of this process is still being explored and debated even to this day.     

      Reviewer #2 (Public Review): 

      Summary: 

      In summary, the manuscript describes life-cycle-related morphologies of primitive vesiclelike states (Em-P) produced in the laboratory from the Gram-positive bacterium Exiguobacterium Strain-Molly) under assumed Archean environmental conditions. Em-P morphologies (life cycles) are controlled by the "native environment". In order to mimic Archean environmental conditions, soy broth supplemented with Dead Sea salt was used to cultivate Em-Ps. The manuscript compares Archean microfossils and biofilms from selected photos with those laboratory morphologies. The photos derive from publications on various stratigraphic sections of Paleo- to Neoarchean ages. Based on the similarity of morphologies of microfossils and Em-Ps, the manuscript concludes that all Archean microfossils are in fact not prokaryotes, but merely "sacks of cytoplasm". 

      Strengths: 

      The approach of the authors to recognize the possibility that "real" cells were not around in the Archean time is appealing. The manuscript reflects the very hard work by the authors composing the Em-Ps used for comparison and selecting the appropriate photo material of fossils. 

      Weaknesses: 

      While the basic idea is very interesting, the manuscript includes flaws and falls short in presenting supportive data. The manuscript makes too simplistic assumptions on the "Archean paleoenvironment". First, like in our modern world, the environmental conditions during the Archean time were not globally the same. Second, we do not know much about the Archean paleoenvironment due to the immense lack of rock records. More so, the Archean stratigraphic sections from where the fossil material derived record different paleoenvironments: shelf to tidal flat and lacustrine settings, so differences must have been significant. Finally, the Archean spanned 2.500 billion years and it is unlikely that environmental conditions remained the same. Diurnal or seasonal variations are not considered. Sediment types are not considered. Due to these reasons, the laboratory model of an Archean paleoenvironment and the life therein is too simplistic. Another aspect is that eucaryote cells are described from Archean rocks, so it seems unlikely that prokaryotes were not around at the same time. Considering other fossil evidence preserved in Archean rocks except for microfossils, the many early Archean microbialites that show baffling and trapping cannot be explained without the presence of "real cells". With respect to lithology: chert is a rock predominantly composed of silica, not salt. The formation of Em-Ps in the "salty" laboratory set-up seems therefore not a good fit to evaluate chert fossils. Formation of structures in sediment is one step. The second step is their preservation. However, the second aspect of taphonomy is largely excluded in the manuscript, and the role of fossilization (lithification) of Em-Ps is not discussed. This is important because Archean rock successions are known for their tectonic and hydrothermal overprint, as well as recrystallization over time. Some of the comparisons of laboratory morphologies with fossil microfossils and biofilms are incorrect because scales differ by magnitudes. In general, one has to recognize that prokaryote cell morphologies do not offer many variations. It is possible to arrive at the morphologies described in various ways including abiotic ones. 

      Regarding the simplistic presumptions on the Archaean Eon environmental conditions, we provided a detailed explanation of this issue in our response to the eLife evaluation. In short, we agree with the reviewer that little is known about the Archaean Eon environmental conditions at a planetary scale. Hence, we restricted our study to one particular environment of which we had a comparatively good understanding. The Archaean Eon spanned 2.5 billion years. However, most of the microfossil sites we discussed in the manuscript are older than 3 billion years, with one exception (2.4 billion years old Turee Creek microfossils). We presume that conditions within this niche (coastal marine) environment could not have changed greatly until 2Ga, after which there have been major changes in the ocean salt composition and salinities.

      In the manuscript, we discussed extensively the reasons for restricting our study to these particular environmental conditions. Further explanations of these choices are presented in our response to the eLife evaluation (also see our previous manuscript). In short, the fact that all known microfossils are restricted to coastal marine environments justifies the experimental conditions employed in our study. Nevertheless, we agree with the reviewer that all lab-based studies involve some extent of simplification. This gap/mismatch is even wider when it comes to studies involving origin or early life on Earth.

      We are not arguing that prokaryotes are not around at this time. The key message of the manuscript is that they are present, but they have not developed intracellular mechanisms to regulate their morphology and remained primitive in this aspect.  

      The sizes of the microfossils and cells from our study were similar in most cases. However, we agree with the reviewer that they deviated considerably in some cases, for example, S70, S73, and S83. These size variations are limited to sedimentary structures like laminations rather than cells. These differences should be expected as we try to replicate the real-life morphologies of biofilms that could have extended over large swats of natural environments in a 2ml volume chamber slide. More specifically, in Fig. S70, there is a considerable size mismatch. But, in Fig. S73, the sizes were comparable between A & C (of course, the size of our reproduction did not match B). In the case of Fig. S83, we do not see a huge size mismatch.      

      Reviewer #1 (Recommendations For The Authors): 

      We would like to provide several suggestions for changes in text and additions to data analysis. 

      39-41: It has been stated that reconstructing the lifecycle is the only way of understanding the nature of these microfossils. First of all, I would rephrase this to 'the most promising way', as there are always multiple approaches to comparing phenomena. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 41). 

      125: Please rephrase "under the environmental condition of early Earth" to "under experimental conditions possibly resembling the conditions of the Paleoarchean Eon". Now it sounds like the exact environmental conditions have been produced, which has already been debated in the discussion. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 127). 

      125: Please mention the fold change in size, the original size in numbers, and whether this change is statistically significant. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      128: Have you found a difference in the relative percentages of modes of reproduction? In other words, is there a difference in percentage between forming internal daughter cells or a string of external daughter cells? 

      We explained our reservations about presenting the exact number above. But this has been extensively discussed in our accompaining manuscript. We want to reemphasize that the scope of this manuscript is restricted to comparing morphologies rather than providing a mechanistic explanation of the reproduction process. 

      151: A similar model for endocytosis has already been described in proliferating wall-less cells (Kapteijn et al., 2023). In the discussion, please compare your results with the observations made in that paper. 

      This is an oversight on our part. The manuscript suggested by the reviewer has now been added (line 154 & 155).  

      163: Please use another word for uncanny. We suggest using 'strong resemblance'. 

      We changed this according to the reviewers' suggestion (line 168). 

      433: Please elaborate on why the results are not shown. This sounds like a statement that should be substantiated further. 

      To observe growth and simultaneously image the cells, we conducted these experiments in chamber slides (2ml volume). Over time, we observed cells growing and breaking out of the salt crust (Fig. S86, S87 & Movie 22) and a gradual increase in the turbidity of the media. Although not quantitative, this is a qualitative indication of growth. We did not take precise measurements for several reasons. This sample is precious; it took us almost two years to solidify the biofilm completely, as shown in Fig. S84A. Hence, it was in limited supply, which prevented us from inoculating these salt crusts into large volumes of fresh media. Given a long period of starvation, these cells often exhibited a long lag phase (several days), and there wasn't enough volume to do OD measurements over time. 

      We also crushed the solidified biofilm with a sterile spatula before transferring it into the chamber slide with growth media. This resulted in debris in the form of small solid particles, which interfered with our OD measurements. These practical considerations made it challenging to determine the growth precisely. Despite these challenges, we measured an OD of 4 in some chamber slides after two weeks of incubation. Given that these measurements were done haphazardly, we chose not to present this data. 

      456: Could you please double-check whether the description is correct for the figure? 8C and 8D are part of Figure 8B, but this is stated otherwise in the description. 

      We thank the reviewer for pointing it out. It has now been rectified (line 461-472).

      Reviewer #2 (Recommendations For The Authors): 

      We thank Reviewer #2  for carefully reading the manuscript and such an elaborate list of questions. The revisions suggested have definitely improved the quality of the manuscript. Here, we would like to address some of the questions that came up repeatedly below. One frequently asked question is regarding the letters denoting the individual figures within the images. For comparison purposes, we often reproduced previously published images. To maintain a consistent figure style, we often have to block the previous denotations with an opaque square and give a new letter. 

      The second question that appeared repeatedly below is the missing scale bars in some of the images within a figure. We often did not include a scale bar in the images when this image is an enlarged section of another image within the same figure.     

      Title: Please consider being more precise in the title. Microfossils are only one fossil group of "oldest life". Perhaps better: "On the nature of some microfossils in Archean rocks". (see also Line 37).  

      Authors’ response: The title conveys a broader message without quantitative insinuations. If our manuscript had been titled "On the nature of all known Archaean microfossils,” we should have agreed with the reviewer's suggestion and changed it to "On the nature of some microfossils in Archean rocks". As it is not, we respectfully decline to make this modification.     

      Abstract:  

      Line 41: "one way", not "the only way" 

      We agree with the reviewer’s comment, and necessary changes have been made (line 41).  

      Introduction: 

      Line 58f: "oldest sedimentary rock successions", not "oldest known rock formations". There are rocks of much older ages, but those are not well preserved due to metamorphic overprint, or the rocks are igneous to begin with. Minor issue: please note that "formations" are used as stratigraphic units, not so much to describe a rock succession in the field. 

      We agree with the reviewer’s comment and have made necessary changes (line 58).

      Line 67: Microfossils are widely accepted as evidence of life. Please rephrase. 

      We agree with the reviewer’s comment, and necessary changes have been made.

      Line 71 - 74: perhaps add a sentence of information here.

      We agree with the reviewer’s comment, and necessary changes have been made (line 71).

      Line 76: which "chemical and mineralogical considerations"? 

      This has been rephrased to “Apart from the chemical and δ<sup>13</sup>C-biomass composition” (line 76).

      Line 84ff: This is a somewhat sweeping statement. Please remember that there are microbialites in such rocks that require already a high level of biofilm organization. The existence of cyanobacteria-type microbes in the Archean is also increasingly considered. 

      We are aware of literature that labeled the clusters of Archaean microfossils as biofilms and layered structures as microbialites or stromatolite-like structures. However, the use of these terms is increasingly being discouraged. A more recent consensus among researchers suggests annotating these structures simply as sedimentary structures, as microbially induced sedimentary structures (MISS). 

      We respectfully disagree with the reviewer’s comment that Archaean microfossils exhibit a high level of biofilm organization. We are not aware of any studies that have conducted such comprehensive research on the architecture of Archaean biofilms. We are not even certain if these clusters of Archaean cells could even be labeled as biofilms in the true sense of the term. We presently lack an exact definition of a biofilm. In our study, we do see sedimentation and bacteria and their encapsulation in cell debris. From a broader perspective, any such aggregation of cells enclosed in cell debris could be annotated as a biofilm. However, more in-depth studies show that biofilm is not a random but a highly organized structure. Different bacterial species have different biofilm architectures and chemical composition. The multispecies biofilms in natural environments are even more complex. We do agree with the reviewer that these structures could broadly be labeled as biofilms, but we presently lack a good, if any, understanding of the Archaean biofilm architecture. 

      Regarding the annotation of microfossils as cyanobacteria, we respectfully disagree with the reviewer. This is not a new concept. Many of the Archaean microfossils were annotated as cyanobacteria at the time of their discovery. This annotation is not without controversy. With the advent of genome-based studies, researchers are increasingly moving away from this school of thought.  

      Line 101ff: The conditions on early Earth are unknown - there are many varying opinions. Perhaps simply state that this laboratory model simulates an Archean Earth environment of these conditions outlined. 

      This is a good idea. We thank the reviewer for this suggestion, and we made appropriate changes. 

      Line 112: manuscript to be replaced by "paper"? 

      This change has been made (line 114).

      Line 116: "spanned years" - how many years? 

      We now added the number of years in the brackets (line 118).

      Results: 

      Line 125: see comment for 101ff. 

      we made appropriate changes. 

      Figure 1: Caption: Please write out ICV the first time this abbreviation is used. Images: Note that some lettering appears to not fit their white labels underneath. (G, H, I, J0, and M). 

      We apologize; this is an oversight on our part. We now spell complete expansion of ICV, the first time we used this abbreviation. 

      We took these images from previously published work (references in the figure legend), so we must block out the previous figure captions. This is necessary to maintain a uniform style throughout the manuscript. 

      Line 152ff.: here would be a great opportunity to show in a graph the size variations of modern ICVs and to compare the variations with those in the fossil material. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      Line 159f.: Fig.1K - what is to see here? Maybe a close-up or - better - a small sketch would help? 

      Fig. 1K shows the surface depressions formed during the vesicle formation. The surface characteristics of EM-P and microfossils is very similar.   

      Line 161f.: reference?  

      The paragraph spanning lines 159 to 172 discusses the morphological similarities between EM-P and SPF microfossils. We rechecked the reference no 35 (Delarue 2019). This is the correct reference. We do not see a mistake if the reviewer meant the reference to the figures.    

      Line 164ff.: A question may be asked, how many fossils of the Strelley Pool population would look similar to the "modeled" ones. Questions may rise in which way the environmental conditions control such morphology variations. Perhaps more details? 

      This relationship between the environmental conditions and the morphology is discussed extensively in our previous work (11).  

      Line 193: what is meant by "similar discontinuous distribution of organic carbon"?

      This statement highlights similarities between EM-P and microfossils. The distribution of cytoplasm within the cells is not uniform. There are regions with and devoid of cytoplasm, which is quite unusual for bacteria. Some previous studies argued that this could indicate that these organic structures are of abiotic origin. Here, we show that EMP-like cells could exhibit such a patchy distribution of cytoplasm within the cell.    

      Line 218 - 291: The observations are very nice, however, the figures of fossil material in Figures 3 A, B, and C appear not to conform. Perhaps use D, E and I to K. Also, S48 does not show features as described here (see below).  

      We did not completely understand the reviewer’s question. As mentioned in the figure legend, both the microfossils and the cells exhibit string with spherical daughter cells within them. Moreover, there are also other similarities like the presence of hollow spherical structures devoid of organic carbon. We also saw several mistakes in the Fig. S48 legend. We have rectified them, and we thank the reviewer for pointing them out.   

      Line 293f: Title with "." at end?

      This change has been made.

      Line 298: predominantly in chert. In clastic material preservation of cells and pores is unlikely due to the common lack of in situ entombment by silica. 

      We rephrased this entire paragraph to better convey our message. Either way, we are not arguing that hollow pore spaces exist. As the reviewer mentioned, they will, of course, be filled up with silica. In this entire paragraph, we did not refer to hollow spaces. So, we are not entirely sure what the question was.     

      Line 324, 328-349: Please see below comments on the supplementary figures 51-62. Some of the interpretations of morphologies may be incorrect. 

      Please find our response to the reviewer’s comments on individual figures below.  

      Figure 5 A to D look interesting, however E to J appear to be unconvincing. What is the grey frame in D (not the white insert). 

      The grey color is just the background that was added during the 3D rendering process.  

      Figure 6 does not appear to be convincing. - Erase? 

      We did not understand the reviewer’s reservations regarding this figure. Images A-F within the figure show the gradual transformation of cells into honeycomb-like structures, and images G-J show such structures from the Archaean that are closely associated with microfossils. Moreover, we did not come up with this terminology (honeycomb-like). Previous manuscripts proposed it.  

      Line 379ff: S66 and 69, please see my comments below. Microfossils "were often discovered" in layers of organic carbon. 

      Please see our response below.   

      Line 393-403: Laminae? There are many ways to arrive at C-rich laminae, especially, if the material was compressed during burial. Basically, any type of biofilm would appear as laminae, if compressed. The appearance of thin layers is a mere coincidence. Note that the scale difference in S70, S73, as well as S83, is way too high (cm versus μm!) to allow any such sweeping conclusions. What are α- and β- laminations, the one described by Tice et al.? The arguments are not convincing.

      We propose that cells be compressed to form laminae. We answered this question above about the differences in the scale bars. Yes, we are referring to α- and β- laminations described by Tice et al.       

      Figure 7: This is an interesting figure, but what are the arguments for B and C, the fossil material, being a membrane? Debris cannot be distinguished with certainty at this scale in the insert of C. B could also be a shriveled-up set of trichomes.  

      We agree with the reviewer that debris cannot be definitely differentiated. Traditionally, annotations given to microfossil structures such as biofilm, intact cells, or laminations were all based on morphological similarities with existing structures observed in microorganisms. Given that the structures observed in our study are very similar to the microfossil structures, it is logical to make such inferences. Scales in A & B match perfectly well. The structure in C is much larger, but, as we mentioned in reply to one of the reviewer’s earlier questions, some of the structures from natural environments could not be reproduced at scale in lab experiments. Working in a 2 ml chamber slides does impose some restrictions.   

      Figure 8: The figure does not show any honeycomb patterns. The "gaps" in the Moodies laminae are known as lenticular particles in biofilms. They form by desiccated and shriveledup biofilm that mineralizes in situ. Sometimes also entrapped gases induce precipitation. Note also that the modelled material shows a kind of skin around the blobs that are not present in the Moodies material.  

      We agree that entrapped gas bubbles could have formed lenticular gaps. In the manuscript, we did not discount this possibility. However, if that is the case, one should explain why we often find clumps of organic carbon within these gaps. As we presented a step-by-step transformation of parallel layers of cells into laminations, which also had similar lenticular gaps, we believe this is a more plausible way such structures could have formed. In the end, there could have been more than one way such structures could have been formed. 

      We do see the honeycomb pattern in the hollow gaps. Often, the 3D-rendering of the STED images obscures some details. Hence, in the figure legend, we referred to the supplementary figures also show the sequence of steps involved in the formation of such a pattern.      

      Line 405-417: During deposition of clastic sediment any hollow space would be compressed during burial and settling. It is rare that additional pore space (except between the graingrain-contacts) remains visible, especially after consolidation. The exception would be if very early silicification took place filling in any pore space. What about EPS being replaced by mineralic substance? The arguments are not convincing. 

      We are suggesting that EPS or cell debris is rapidly encrusted by cations from the surrounding environment and gets solidified into rigid structures. This makes it possible for the structures to be preserved in the fossil record. We believe that hollow structures like the lenticular gaps will be filled up with silica. 

      We do not agree with the reviewer’s comment that all biological structures will be compressed. If this is true, there should be no intact microfossils in the Archaean sedimentary structures, which is definitely not the case.      

      Line 419-430: Lithification takes place within the sediment and therefore is commonly controlled by the chemistry of pore water and chemical compounds that derive from the dissolution of minerals close by. Another aspect to consider is whether "desiccation cracks" on that small scale may be artefacts related to sample preparation (?).  

      We agree that desiccation cracks could have formed during the sample preparation for SEM imaging, as this involves drying the biofilms. However, we observed that the sample we used for SEM is a completely solidified biofilm (Fig. S84), so we expect little change in its morphology during drying. Moreover, visible cracks and pointy edges were also observed in wet samples, as shown in Fig. S87.        

      Line 432 - 439: Please see comments on the supplementary material below.

      Please find our response to the reviewer’s comments on individual figures below.  

      Discussion:  

      Line 477f: "all known microfossil morphologies" - is this a correct statement? Also, would the Archean world provide only one kind of "EM-P type"? Morphologies of prokaryote cells (spherical, rod-shaped, filamentous) in general are very simple, and any researcher of Precambrian material will appreciate the difficulties in concluding on taxonomy. There are papers that investigate putative microfossils in chert as features related to life cycles. Microfossil-papers commonly appear not to be controversial give and take some specific cases.  

      We made a mistake in using the term “all known microfossil morphologies.” We have now changed it to “all known spherical microfossils” from this statement (line 483). However, we do not agree with the statement that microfossil manuscripts tend not to be controversial. Assigning taxonomy to microfossils is anything but controversial. This has been intensely debated among the scientific community.     

      Line 494-496: This statement should be in the Introduction.

      We agree with the reviewer’s comment. In an earlier version of the manuscript this statement was in the introduction. To put this statement in its proper context, it needs to be associated with a discussion about the importance of morphology in the identification of microfossils. The present version of the manuscript do not permit moving an entire paragraph into the introduction. Hence, we think making this statement in the discussion section is appropriate. 

      Line 484ff. The discussion on biogenicity of microfossils is long-standing (e.g., biogenicity criteria by Buick 1990 and other papers), and nothing new. In paleontology, modern prokaryotes may serve as models but everyone working on Archean microfossils will agree that these cannot correspond to modern groups. An example is fossil "cyanobacteria" that is thought to have been around already in the early Archean. While morphologically very similar to modern cyanobacteria, their genetic information certainly differed - how much will perhaps remain undisclosed by material of that high age.  

      Yes, we agree with the reviewer that there has been a longstanding conflict on the topic of biogenicity of microfossils. However, we have never come across manuscripts suggesting that modern microorganisms should only be used as models. If at all, there have been numerous manuscripts suggesting that these microfossils represent cyanobacteria, streptomycetes, and methanotrophs. Regarding the annotation of microfossils as cyanobacteria, we addressed this issue in one of the previous questions raised by the reviewer.    

      Line 498ff: Can the variation of morphology and sizes of the EM-Ps be demonstrated statistically? Line 505ff are very speculative statements. Relabeling of what could be vesicles as "microfossils" appears inappropriate. Contrary to what is stated in the manuscript, the morphologies of the Dresser Formation vesicles do not resemble the S3 to S14 spheroids from the Strelley Pool, the Waterfall, and Mt Goldsworthy sites listed in the manuscript. The spindle-shaped vesicles in Wacey et al are not addressed by this manuscript. What roles in mineral and element composition would have played diagenetic alteration and the extreme hydrothermal overprint and weathering typical for Dresser material? S59, S60 do not show what is stated, and the material derives from the Barberton Greenstone Belt, not the Pilbara.

      Please see the comments below regarding the supplementary images. 

      We did not observe huge variations in the cell morphology. Morphologies, in most cases, were restricted to spherical cells with intracellular vesicles or filamentous extensions. Regarding the sizes of the cells, we see some variations. However, we are reluctant to provide exact numbers. We have presented our reasons above.

      We respectfully disagree with the reviewer’s comments. We see quite some similarities between Dresser formation microfossils and our cells. Not just the similarities, we have provided step-by-step transformation of cells that resulted in these morphologies. We fail to see what exactly is the speculation here. The argument that they should be classified as abiotic structures is based on the opinion that cells do form such structures. We clearly show here that they can, and these biological structures resemble Dresser formation microfossils more closely than the abiotic structures. 

      Regarding the figures S3-S14. We think they are morphologically very similar. Often, it's not just comparing both images or making exact reproductions (which is not possible). We should focus on reproducing the distinctive morphological features of these microfossils.  

      We agree with the reviewer that we did not reproduce all the structures reported by Wacey’s original manuscript, such as spherical structures. We are currently preparing another manuscript to address the filamentous microfossils. These spindle-like structures will be addressed in this subsequent work. 

      We agree with the reviewer, we often have difficulties differentiating between cells and vesicles. This is not a problem in the early stages of growth. During the log phase, a significant volume of the cell consists of the cytoplasm, with hollow vesicles constituting only a minor volume (Fig. 1B or S1A). During the later growth stages (Fig. 1E7F or S11), cells were almost hollow, with numerous daughter cells within them. These cells often resemble hollow vesicles rather than cells. However, given these are biologically formed structures, and one could argue that these vesicles are still alive as there is still a minimal amount of cytoplasm (Fig. S27). Hence, we should consider them as cells until they break apart to release daughter cells. 

      Regarding Figures S59 and S60, we did not claim either of these microfossils is from Pilbara Iron Formations. The legend of Figure S59 clearly states that these structures are from Buck Reef Chert, originally reported by Tice et al., 2006 (Figure 16 in the original manuscript). The legend of Figure S60 says these structures were originally reported by Barlow et al., 2018, from the Turee Creek Formation. 

      Line 546f and 552: The sites including microfossils in the Archean represent different paleoenvironments ranging from marine to terrestrial to lacustrine. References 6 and 66 are well-developed studies focusing on specific stratigraphic successions, but cannot include information covering other Archean worlds of the over 2.5 Ga years Archean time.  

      All the Archaean microfossils reported to date are from volcanic coastal marine environments. We are aware that there are rocky terrestrial environments, but no microfossils have been reported from these sites. We are unaware of any Archaean microfossils reported from freshwater environments. 

      Line 570ff: The statements may represent a hypothesis, but the data presented are too preliminary to substantiate the assumptions.

      We believe this is a correct inference from an evolutionary, genomic, and now from a morphological perspective. 

      Figures:  

      Please check all text and supplementary figures, whether scale bars are of different styles within the figure (minor quibble). 

      S3 (no scale in C, D); S4, S5: Note that scale bars are of different styles. 

      We believe we addressed this issue above. 

      S6 D: depressions here are well visible - perhaps exchange with a photo in the main text? Note that scale bars are of different styles.  

      We agree that depressions are well visible in E. The same image of EM-P cell in E is also present in Fig. 1D in the main text.   

      S7: Scale bars should all be of the same style, if anyhow possible. Scale in D? 

      We believe we addressed this issue above. 

      S9: F appears to be distorted. Is the fossil like this? The figure would need additional indicators (arrows) pointing toward what the reader needs to see - not clear in this version. More explanation in the figure caption could be offered. 

      We rechecked the figure from the original publication to check if by mistake the figure was distorted during the assembly of this image. We can assure you that this is not the case. We are not sure what further could be said in the figure legend.     

      S13: What is shown in the inserts of D and E that is also visible in A and B? Here a sketch of the steps would help. 

      We did not understand the question.  

      S14: Scale in A, B? 

      We believe we addressed this issue above. 

      S15: Scales in A, E, C, D 

      We believe we addressed this issue above. 

      S16: scales in D, E, G, H, I, J?  

      We believe we addressed this issue above. 

      S17: "I" appears squeezed, is that so? If morphology is an important message, perhaps reduce the entire figure so it fits the layout. Note that labels A, B, C, and D are displaced. 

      As shown in several subsequent figures, the hollow spherical vesicles are compressed first into honeycomb-like structures, and they often undergo further compression to form lamination-like structures. Such images often give the impression that the entire figure is squashed, but this is not the case. If one examines the figure closely, you could see perfectly spherical vesicles together with laterally sqeezed structures. Regarding the figure labels, we addressed this issue above. 

      S18: The filamentous feature in C could also be the grain boundaries of the crystals. Can this be excluded as an interpretation? Are there microfossils with the cell membranes? That would be an excellent contribution to this figure. Note that scale bars are of different styles.

      If this is a one-off observation, we could have arrived at the reviewer's opinion. But spherical cells in a “string of beads” configuration were frequently reported from several sites, to be discounted as mere interpretation.    

      S19: The morphologies in A - insert appear to be similar to E - insert in the lower left corner. The chain of cells in A may look similar to the morphologies in E - insert upper right of the image. B - what is to see here? D - the inclusions do not appear spherical (?). Does C look similar to the cluster with the arrow in the lower part of image E? Note that scale bars are of different styles (minor quibble). A, B, C, and D appear compressed. Perhaps reduce the size of the overall image?  

      The structures highlighted (yellow box) in C are similar to the highlighted regions in E—the agglomeration of hollow vesicles. It is hard to get understand this similarity in one figure. The similarities are apparent when one sees the Movie 4 and Fig. S12, clearly showing the spherical daughter cells within the hollow vesicle. We now added the movie reference to the figure legend.    

      S20: A appears not to contribute much. The lineations in B appear to be diagenetic. However, C is suitable. Perhaps use only C, D, E? 

      We believe too many unrecognizable structures are being labeled as diagenetic. Nevertheless, we do not subscribe to the notion that these are too lenient interpretations. These interpretations are justified as such structures have not been reported from live cells. This is the first study to report that cells could form such structures. As we now reproduced these structures, an alternate interpretation that these are organic structures derived from microfossils should be entertained. 

      S 21: Note that scale bars are of different styles.  

      We believe we addressed this issue above. 

      S22: Perhaps add an arrow in F, where the cell opened, and add "see arrow" in the caption? Is this the same situation as shown in C (white arrow)? What is shown by the white arrow in A? Note that scale bars are of different styles.

      We did the necessary changes.  

      S23: In the caption and main text, please replace "&" with "and" (please check also the other figure captions, e.g. S24). Note that scale bars are of different styles. What is shown in F? A, D - what is shown here?

      We replaced “&” with “and.”  

      S24: Note that scale bars are of different styles. Note that Wacey et al. describe the vesicles as abiotic not as "microfossils"; please correct in figure caption [same also S26; 25; 28].

      We are aware of Prof. Dr. Wacey’s interpretations. We discuss it at length in the discussion section our manuscript. Based on the similarities between the Dresser formation structures and structures formed by EM-P, we contest that these are abiotic structures.  

      S25: Appears compressed; note different scale bars. 

      We believe we addressed this issue above. 

      S28: The label in B is still in the upper right corner; scale in D? What is to see in rectangles (blue and red) in A, B? In fossil material, this could be anything. 

      These figures are taken from a previous manuscript cited in the figure legend. We could not erase or modify these figures.  

      S33: "L"ewis; G appears a bit too diffuse - erase? Note that scale bars are of different styles.

      We believe we addressed this issue above. 

      S34: This figure appears unconvincing. Erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S35: It would be more convincing to show only the morphological similarities between the cell clusters. B and C are too blurry to distinguish much. Scales in D to F and in sketches? A appears compressed (?). 

      We rechecked the original manuscript to see if image A was distorted while making this figure, but this is not the case. Regarding B & C, cells in this image are faint as they are hollow vesicles and, by nature, do not generate too much contrast when imaged with a phase-contrast microscope. There are some limitations on how much we can improve the contrast. We now added scale bars for D-I. Similarly, faint hollow vesicles can be seen in Fig. S21 C & D, and Fig. 3H.  

      S36: Very nice; in B no purple arrow is visible. Note that scale bars are of different styles. S37 and S36 are very much the same - fuse, perhaps?  

      We are sorry for the confusion. There are purple arrows in Fig. S37B-D. 

      S38: this is a more unconvincing figure - erase? 

      Unconvincing in wahy sense. There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.

      S39: white rectangle in A? Arrow in A? Note that scale bars are of different styles.

      These are some of the unavoidable remnants from the image from the original publication. 

      S40: in F: CM, V = ?; Note that scale bars are of different style. 

      It’s an oversite on our part. We now added the definitions to the figure legaend. We thank the reviewer for pointing it out.  

      S41: Rectangles in D, E, F, G can be deleted? Scales and labels missing in photos lower right. 

      Those rectangles are added by the image processing software to the 3Drendered images. Regarding the missing scale bars in H & I they are the magnified regions of F. The scale bar is already present in F.   

      S42: appears compressed. G could be trimmed. Labels too small; scale in G? 

      This is a curled-up folded membrane. We needed to lower the resolution of some images to restrict the size of the supplement to journal size restrictions. It is not possible to present 85 figures in high resolution. But we assure you that the image is not laterally compressed in any manner.   

      S43: This figure appears to be unconvincing. Reducing to pairing B, C, D with L, K? Spherical inclusions in B? Scales in E to G? Similar in S44: A, B, E only? Note that scale bars are of different styles. 

      Figures I to K are important. They show not just the morphological similarities but also the sequence of steps through which such structures are formed. We addressed the issue of the scale bars above.  

      S45: A, B, and C appear to show live or subrecent material. How was this isolated of a rock? Note that scale bars are of different styles.  

      It is common to treat rocks with acids to dissolve them and then retrieve organic structures within them. This technique is becoming increasingly common. The procedure is quite extensively discussed in the original manuscript. We don’t see much differences in the scale bars of microfossils and EM-P cells, they are quite similar. 

      S46: A: what is to see here? Note that scale bars are of different styles. 

      There are considerable similarities between the folded fabric like organic structures with spherical inclusions and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S47: Perhaps enlarge B and erase A. Note that scale bars are of different styles. 

      S48: Image B appears to show the fossil material - is the figure caption inconsistent? There are no aggregations visible in the boxes in A. H is described in the figure caption but missing in the figure. Overall, F and G do not appear to mirror anything in A to E (which may be fossil material?). 

      S51; S52 B, C, E; S53: these figures appear unconvincing - erase? 

      Unconvincing in what sense? The structures from our study are very similar to the microfossils.   

      S54: North "Pole; scale bars in A to C =? 

      These figures were borrowed from an earlier publication referenced in the figure legend. That is the reason for the differences in the styles of scale bars.  

      S55: D and E appear not to contribute anything. Perhaps add arrow(s) and more explanation? Check the spelling in the caption, please. 

      D & E show morphological similarities between cells from our study and microfossils (A).   

      S56: Hexagonal morphologies may also be a consequence of diagenesis. Overall, perhaps erase this figure?  

      I certainly agree that could be one of the reasons for the hexagonal morphologies. Such geometric polygonal morphologies have not been observed in living organisms. Nevertheless, as you can see from the figure, such morphologies could also be formed by living organisms. Hence, this alternate interpretation should not be discounted.   

      S57: The figure caption needs improvement. Please add more description. What show arrows in A, what are the numbers in A? What is the relation between the image attached to the right side of A? Is this a close-up? Note that scale bars are of different styles. 

      We expanded a bit on our original description of the figure. However, we request the reviewer to keep in mind that the parts of the figure are taken from previous publication. We are not at liberty to modifiy them, like removing the arrows. This imposes some constrains. 

      S58: There are no honeycomb-shaped features visible. What is to see here? Erase this figure? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P.  

      S59 and S60: What is to see here? - Erase? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P in Fig. S59. Further disintegration of these honeycomb shaped mats into filamentous struructures with spherical cells attached to them can be seen in both Archaean organic structures and structures formed by EM-P.   

      S61: This figure appears to be unconvincing. B and F may be a good pairing. Note that scale bars are of different styles.  

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S62: This figure appears to be unconvincing - erase?

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S66: This figure is unconvincing - erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.    

      S68: Scale in B, D, and E? 

      Image B is just a magnified image of a small portion of image A. Hence, there is no need for an additional scale bar. The same is true for images D and E. 

      S69: This figure appears to be unconvincing, at least the fossil part. Filamentous features are visible in fossil material as well, but nothing else. 

      We are not sure what filamentous features the reviewer is referring to. Both the figures show morphologically similar spherical cells covered in membrane debris.    

      S70 [as well as S82]: Good thinking here, but scales differ by magnitudes (cm to μm). Erase this figure? Very similar to Figure S73: Insert in C has which scale in comparison to B? Note that scale bars are of different styles.  

      We realize the scale bars are of different sizes. In our defense, our experiments are conducted in 1ml volume chamber slides. We don’t have the luxury of doing these experiments on a scale similar to the natural environments. The size differences are to be expected. 

      S71: Scale in E? 

      Image E is just a magnified image of a small portion of image D. Hence, we believe a scale bar is unnecessary. 

      S72: Scale in insert?  

      The insert is just a magnified region of A & C

      S75: This figure appears to be unconvincing. This is clastic sediment, not chert. Lenticular gaps would collapse during burial by subsequent sediment. - Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P.

      S76: A, C, D do not look similar to B - erase? Similar to S79, also with respect to the differences in scale. Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P. We believe we addressed the issue of scale bars above. 

      S80: A appears to be diagenetic, not primary. Erase? 

      These two structures share too many resemblances to ignore or discount just as diagenic structures - Raised filamentous structures originate out of parallel layers of organic carbon (laminations), with spherical cells within this filamentous organic carbon.  

      S85: What role would diagenesis play here? This figure appears unconvincing. Erase?

      We do believe that diagenesis plays a major role in microfossil preservation. However, we also do not suscribe to the notion that we should by default assign diagenesis to all microfossil features. Our study shows that there could be an alternate explanation to some of the observations.  

      S86 and S87: These appear unconvincing. What is to see here? Erase? 

      The morphological similarities between these two structures. Stellarshaped organic structures with strings of spherical daughter cells growing out of them.  

      S88: Does this image suggest the preservation of "salt" in organic material once preserved in chert?  

      That is one inference we conclude from this observation. Crystaline NaCl was previously reported from within the microfossil cells.    

      S89: What is to see here? Spherical phenomena in different materials? 

      At present, the presence of honeycomb-like structures is often considered to have been an indication of volcanic pumice. We meant to show that biofilms of living organisms could result in honeycomb-shaped patterns similar to volcanic pumice.

      References 

      Please check the spelling in the references. 

      We found a few references that required corrention. We now rectified them. 

      References  

      (1) Orange F, Westall F, Disnar JR, Prieur D, Bienvenu N, Le Romancer M, et al. Experimental silicification of the extremophilic archaea pyrococcus abyssi and methanocaldococcus jannaschii: Applications in the search for evidence of life in early earth and extraterrestrial rocks. Geobiology. 2009;7(4). 

      (2) Orange F, Disnar JR, Westall F, Prieur D, Baillif P. Metal cation binding by the hyperthermophilic microorganism, Archaea Methanocaldococcus Jannaschii, and its effects on silicification. Palaeontology. 2011;54(5). 

      (3) Errington J. L-form bacteria, cell walls and the origins of life. Open Biol. 2013;3(1):120143. 

      (4) Cooper S. Distinguishing between linear and exponential cell growth during the division cycle: Single-cell studies, cell-culture studies, and the object of cell-cycle research. Theor Biol Med Model. 2006; 

      (5) Mitchison JM. Single cell studies of the cell cycle and some models. Theor Biol Med Model. 2005; 

      (6) Kærn M, Elston TC, Blake WJ, Collins JJ. Stochasticity in gene expression: From theories to phenotypes. Nat Rev Genet. 2005; 

      (7) Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002; 

      (8) Strovas TJ, Sauter LM, Guo X, Lidstrom ME. Cell-to-cell heterogeneity in growth rate and gene expression in Methylobacterium extorquens AM1. J Bacteriol. 2007; 

      (9) Knoll AH, Barghoorn ES. Archean microfossils showing cell division from the Swaziland System of South Africa. Science. 1977;198(4315):396–8. 

      (10) Sugitani K, Grey K, Allwood A, Nagaoka T, Mimura K, Minami M, et al. Diverse microstructures from Archaean chert from the Mount Goldsworthy–Mount Grant area, Pilbara Craton, Western Australia: microfossils, dubiofossils, or pseudofossils? Precambrian Res. 2007;158(3–4):228–62. 

      (11) Kanaparthi D, Lampe M, Krohn JH, Zhu B, Hildebrand F, Boesen T, et al. The reproduction process of Gram-positive protocells. Sci Rep. 2024 Mar 25;14(1):7075.

    1. eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as<br /> a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

    4. Author response:

      We genuinely appreciate the reviewer critiques of our submitted paper, “Otoacoustic emissions but not behavioral measurements predict cochlear-nerve frequency tuning in an avian vocal-communication specialist.” We are planning a number of changes based on the reviewers’ helpful comments that we feel will substantially improve the manuscript and clarify its implications.

      We will add more support for the claim that budgerigars show unusual patterns of behavioral frequency tuning compared to other species. The original manuscript relied on previously published studies of budgerigar critical bands and psychophysical tuning curve to make this point (e.g., Fig. 1). Critical bands and psychophysical tuning curves have unfortunately not been studied in many bird species. Consequently, it was somewhat unclear (based on the information originally presented) whether the “unusual” behavioral tuning results shown in Fig. 1 reflect a hearing specialization in budgerigars or perhaps simply a general avian pattern attributable to declining audibility above 3-4 kHz (a point raised by both reviewers). Fortunately, behavioral critical-ratio results are available from a broader range of species. Albeit a less direct correlate of tuning, the results clearly highlight the unique hearing abilities of budgerigars in relation to other bird species as elaborated upon below.

      The critical ratio is the threshold signal-to-noise ratio for tone detection in wideband noise and partly depends on peripheral tuning bandwidth. Critical ratios have been studied in over a dozen bird species, the vast majority of which show similar thresholds to one another and monotonically increasing critical ratios for higher frequencies (by 2-3 dB/octave, similar to most mammals; reviewed by Dooling et al., 2000). By contrast, budgerigar critical ratios diverge markedly from other species at mid-to-high frequencies, with ~8 dB lower (more sensitive) thresholds from 3-4 kHz (Dooling & Saunders, 1975; Okanoya & Dooling, 1987; Farabaugh 1988; see Figs 5 & 6 in Okanoya & Dooling, 1987). The unusual critical-ratio function in budgerigars is not attributable to the audiogram and was hypothesized by Okanoya and Dooling (1987) to reflect specialized cochlear tuning or perhaps central processing mechanisms. A brief discussion of these studies will be added to the introduction, along with a new figure panel (for Fig. 1) illustrating these intriguing species differences in critical ratios.

      Another question was raised as to whether the simultaneous-masking paradigms and classic methods used to estimate behavioral tuning in budgerigars should be considered as valid, given newer forward-masking and notched-noise alternatives. We will expand the discussion of this issue in the revised manuscript. First, many of the methods from the classic budgerigar studies remain widely used in animal behavioral research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. While forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial, with one study showing a closer match of forward-masking results to auditory-nerve tuning (ferret: Sumner et al., 2018), but several others showing a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no real consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as essentially valid while also discussing limitations and remaining open to alternative interpretations.

      We will add clarification throughout the revision as to the specific behavioral measures used to quantify tuning in budgerigars (i.e., critical bands, psychophysical tuning curve, and critical ratios). This avoids potentially disparaging alternative behavioral methods that have not been tested. That the budgerigar behavioral data are “old” seems not particularly relevant considering that the methods are still used in animal behavioral research as noted previously. Rather, it seems important to clarify the specific behavioral techniques used to estimate budgerigar’s frequency tuning in the revised paper.

      Finally, we plan to add discussion of the apical-basal transition from the mammalian otoacoustic-emission literature, as suggested by reviewer 1, including how this concept might apply in budgerigars and other birds.

      References not already cited in the preprint:

      Burton JA, Dylla ME, Ramachandran R. Frequency selectivity in macaque monkeys measured using a notched-noise method. Hear Res. 2018 Jan;357:73-80. doi: 10.1016/j.heares.2017.11.012.

      King J, Insanally M, Jin M, Martins AR, D'amour JA, Froemke RC. Rodent auditory perception: Critical band limitations and plasticity. Neuroscience. 2015 Jun 18;296:55-65. doi: 10.1016/j.neuroscience.2015.03.053.

      Yost WA, Shofner WP. Critical bands and critical ratios in animal psychoacoustics: an example using chinchilla data. J Acoust Soc Am. 2009 Jan;125(1):315-23. doi: 10.1121/1.3037232. PMID: 19173418; PMCID: PMC2719489.

    1. eLife Assessment

      This is an important study using a combination of optogenetics and calcium imaging to provide insight into the function of the cholinergic input to the prelimbic cortex in probabilistic spatial learning as it relates to threat. These data are timely in contributing to an ongoing discussion in the field about the role of phasic cholinergic signaling to the cortex, about which relatively little is known. The strength of the evidence is incomplete and could be improved by changes in task design and analyses, cross-validation of the conditions in calcium imaging, as well as the incorporation of control experiments to more definitively show it is indeed acetylcholine working in this circuit.

    2. Reviewer #1 (Public review):

      Tu, Wen, et al. investigated the activity of mPFC putative glutamatergic neurons during a probabilistic threat discrimination and avoidance learning task using miniaturized GRIN lens implantation and single-photon calcium imaging in freely moving mice. In conjunction with this cellular recording, they employed channelrhodopsin-mediated optogenetic excitation of terminals from basal forebrain cholinergic projection neurons coupled to the delivery of an air puff on either of two maze paths with differential threat probability. The authors found that the optogenetic manipulation altered mPFC encoding of outcomes and disrupted animals' behavioral adaptation. Over the course of multiple learning sessions, optogenetically stimulated mice lagged behind control animals in resolving the differential threat probabilities on the two paths and making adaptive choices. In particular, the animals with optogenetic stimulation of cholinergic terminals were significantly more likely to switch to the path with higher threat probability after having just gotten a rare air puff on the generally "safer" path. Combined with data from a deterministic version of the task showing that optogenetically stimulated mice could behaviorally discriminate between the paths appropriately under such circumstances, these results suggest an impairment in the experimental animals' ability to make use of threat history over multiple trials. This comparison of probabilistic and deterministic versions of the same task is a highlight of this paper, representing a thoughtfulness about what information can be gleaned from such variations in the design of behavioral experiments that is all too often lacking. These data are timely in contributing to an ongoing discussion in the field about the role of phasic cholinergic signaling to the cortex, about which relatively little is known.

      While the ensemble recording of mPFC neurons during the task appears to be reliable and well-designed and the behavioral effects of the optogenetic stimulation are convincing, some major weaknesses of the paper limit its usefulness to others in the field:

      (1) Optogenetic excitation of presynaptic terminals can lead to antidromic action potentials that alter the firing properties of the target cell (see the excellent review on challenges of and strategies for presynaptic optogenetic experiments Rost et al., Nat Neurosci 2022). To their credit, the authors explicitly acknowledge this fact, but they believe that the only alternative possibility is that their intervention could lead to increased acetylcholine release at collateral projections in other prefrontal subregions. In fact, we do not know that the mechanism mediating the behavioral changes observed involves acetylcholine at all, as many ChAT+ basal forebrain neurons co-transmit using GABA (Saunders et al., Nature, 2015; Saunders et al., eLife, 2015; Granger et al., Neuropharmacology, 2016). A very useful internal control, which is recommended by Rost et al. for such presynaptic excitation experiments, would be to locally infuse nicotinic or muscarinic cholinergic antagonists into the mPFC in an attempt to reverse the optogenetically induced deficit; this would resolve whether the effect is indeed mediated by cholinergic neurotransmission and if it is specific to the mPFC.

      (2) In a similar vein, the fact that LED illumination in the no-opsin control group appears to increase activity in prefrontal neurons (Figure 2C) and, moreover, has a functional effect in disrupting location-selective cellular activity to a similar extent as in the ChrimsonR group (Figure S3) is inadequately explained and cause for concern. Although the authors argue that the degree or "robustness" of puff-evoked activity was significantly greater in the ChrimsonR group as compared to fluorophore-only controls, their statistical test for demonstrating this is the Kolmogorov-Smirnov test (Figure 2D), thus showing that the two samples likely are drawn from different distributions but little else.

      (3) Throughout the paper, the authors rely heavily on the Kolmogorov-Smirnov and binomial tests (Figures 2D, 3, 4D, S3, S4) to compare distributions in this manner, but it is unclear to me why these would be the most appropriate statistical tests for what they seek to demonstrate. Given the holistic nature of these tests in comparing the shape and spread of distributions, I am concerned that they might be inflating the significance of the differences between groups. Even if the authors were seeking a nonparametric statistical test, which most likely would be quite appropriate, there are nonparametric versions of ANOVA that they could use (e.g. Kruskal-Wallis, Friedman). Indeed, in much of this data set a repeated measures statistical analysis would seem to be called for, whereas the Kolmogorov-Smirnov test assumes that the two samples must be independent of each other. The most notable example of this premise being violated is in Figure 3, where data from the same cell populations in the same animals are being compared between experimental days and across various trial types.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested:

      (1) Whether mice learn that they are more/less likely to receive an aversive air puff outcome at different corners of a square-shaped open field apparatus, under 75%/25% probabilistic contingencies;

      (2) Whether stimulating basal forebrain cholinergic neurons and terminals in the prefrontal cortex affects learning in this context; and

      (3) Whether stimulating cholinergic neurons affects prefrontal cortical single neuron calcium signaling about outcome expectations during learning and contingency changes. They found that mice that received cholinergic stimulation approached high and low aversive outcome probability sites at similar velocities, while control mice approached high probability sites slower, suggesting that cholinergic stimulation impaired learning. Cholinergic stimulation reduced cortical neuron calcium activity during trials on the high-probability corner when the outcome was not delivered. The authors provide additional characterization of cellular responses during delivery/omission trials in high/low probability corners, using running speed as a proxy for low versus high expectations. The study will likely be of interest to those who are interested in prediction and error signaling in the cortex; however, the task and analyses do not permit very easy or clear dissociation of prediction versus prediction error signaling and place field versus place field-expectation multiplexing. The study has several strengths but some weaknesses, which are discussed below.

      Strengths:

      It is clear the authors were very careful and did a great job with their image processing and segmentation procedures. The details in the methods are appreciated, as are the supplemental descriptive statistics on cell counts.

      There are careful experimental controls - for example, the authors showed that the effects of cholinergic stimulation with air puff present are greater than without it, thus ruling out effects of stimulation on cellular physiology that were independent of learning or the task.

      The addition of a channelrhodopsin stimulation group is helpful to show that the effects are robust and not wavelength/opsin-specific.

      The prefrontal cortex cholinergic terminal stimulation experiment is a great addition. It shows that the behavioral effects of cell body stimulation, which was used in the imaging experiments, are similar to cortical terminal stimulation, where the imaging was performed.

      Weaknesses:

      The analyses were a bit difficult to follow and therefore it is difficult to determine whether the cells are signaling predictions versus prediction errors - a very important distinction.

      The task does not fully dissociate place field coding, since learning about the different probabilities necessarily took place at different areas in the apparatus. Some additional analyses could help address this.

    4. Reviewer #3 (Public review):

      Summary:

      Using a combination of optogenetic tools and single-photon calcium imaging, the authors collected a set of high-quality data and conducted thorough analyses to demonstrate the importance of cholinergic input to the prelimbic cortex in probabilistic spatial learning, particularly pertaining to threat.

      Strengths:

      Given the importance of the findings, this paper will appeal to a broad audience in the systems, behavioural, and cognitive neuroscience community.

      Weaknesses:

      I have only a few concerns that I consider need to be addressed.

      (1) Can the authors describe the basic effect of cholinergic stimulation on PL neurons' activity, during pretraining, probabilistic, and random stages? From the plot, it seems that some neurons had an increase and others had a decrease in activity. What are the percentages for significant changes in activities, given the intensity of stimulation? Were these changes correlated with the neurons' selectivity for the location? If they happen to have the data, a dose-response plot would be very helpful too.

      (2) Figure 2B: The current sorting does not show the effects of puff and LED well. Perhaps it's best to sort based on the 'puff with no stim' condition in the middle, by the total activity in 2s following the puff, and then by the timing in the rise/drop of activity (from early to late). This way perhaps the optogenetic stimulation would appear more striking. Figure 3Aa and Ba have the same issue: by the current sorting, the effects are not very visible at all. Perhaps they want to consider not showing the cells that did not show the effect of puff and/or LED.

      Also, I would recommend that the authors use ABCD to refer to figure panels, instead of Aa, Ab, etc. This is very hard to follow.

      (3) The authors mentioned the laminar distribution of ACh receptors in discussion. Can they show the presence/absence of topographic distribution of neurons responding to puff and/or LED?

      (4) Figure 2C seems to show only neurons with increased activity to an air puff. It's also important to know how neurons with an inhibitory response to air-puff behaved, especially given that in tdTomato animals, the proportion of these neurons was the same as excitatory responders.

      (5) Page 5, lines 107 and 110: Following 2-way ANOVA, the authors used a 'follow-up 1-way rmANOVA' and 'follow-up t-test' instead of post hoc tests (e.g. Tukey's). This doesn't seem right. Please use post hoc tests instead to avoid the problem of multiple comparisons.

      (6) Figure 1H: in the running speed analysis, were all trials included, both LED+ and LED-? This doesn't affect the previous panels in Figure 1 but it could affect 1H. Did stimulation affect how the running speed recovers?

      On a related note, does a surprising puff/omission affect the running speed on the subsequent trial?

      (7) On Page 7, line 143, it says "In the absence of LED stimulation, the magnitude of their puff-evoked activity was reduced in ChrimsonR-expressing mice...", but then on line 147 it says "This group difference was not detected without the LED stimulation". I don't follow what is meant by the latter statement, it seems to be conflicting with line 143. The red curves in the left vs right panels do not seem different. The effect of air puff seems to differ, but is this due to a higher gray curve ('no puff' condition) in the ChrimsonR group?

      (8) Did the neural activity correlate with running speed? Since the main finding was the absence of difference in running speed modulation by probability in ChrimsonR mice, one would expect to see PL cells showing parallel differences.

    5. Author response:

      (1) We do not know that the mechanism mediating the behavioral changes observed involves acetylcholine at all. (Reviewer 1)

      The reviewer rightly pointed out the co-release of acetylcholine (ACh) and GABA from cholinergic terminals. We believe that the detected behavioral changes are because of the augmentation of this innate mixed chemical signal. We agree that identifying the receptor specificity is an essential next step; however, addressing this point requires a currently unavailable research tool to block cholinergic receptors for a few hundred milliseconds. This temporal specificity is vital because acetylcholine is released in the medial prefrontal cortex (mPFC) on two distinct timescales, the slow release over tens of minutes from the task onset and the fast release time-locked to salient stimuli (TelesGrilo Ruivo et al., 2017). Moreover, the former slow signal is far more robust than the latter phasic signal. The pharmacological experiments suggested by the reviewer will suppress both the tonic and phasic signals, making it difficult to interpret the results. Given the rapid technological advancement in this field, we hope to investigate the underlying mechanisms in detail in the future. 

      (2) It is unclear whether mPFC cells are signaling predictions versus prediction errors. (Reviewer 2)

      As the reviewer pointed out, mPFC cells signal the prediction of imminent outcomes (Baeg et al., 2001; Mulder et al., 2003; Takehara-Nishiuchi and McNaughton, 2008; Kyriazi et al., 2020).

      However, the key difference between prediction signals and prediction error signals is their time course. The prediction signals begin to arise before the actual outcome occurs, whereas the prediction error signals are emitted after subjects experience the presence or absence of the expected outcome. In all our analyses, cell activity was normalized by the activity during the 1-second window before the threat site entry (i.e., the reveal of actual outcome; Lines 655-659). Also, all the statistical comparisons were made on the normalized activity during the 500-msec window, starting from the threat site entry (Lines 669670). Because this approach isolated the change in cell activity after the actual outcome, we interpret the data in Figure 4C as prediction error signals. 

      (3) The task does not fully dissociate place field coding. (Reviewer 2)

      The present analysis included several strategies to dissociate outcome selectivity from location selectivity (Figure 4). First, we collapsed cell activity on two threat sites to suppress the difference in cell activity between the sites. Second, our analysis compared how cell activity at the same location differed depending on whether outcomes were expected or surprising (Figure 4C). Nevertheless, we can use the present data to investigate the spatial tuning of mPFC cells. Indeed, an earlier version of this manuscript included some characterizations of spatial tuning. However, these data were deemed irrelevant and distracting when this manuscript was reviewed for publication in a different journal. As such, these data were removed from the current version. We are in the process of publishing another paper focusing on the spatial tuning of mPFC cells and their learning-dependent changes. 

      (4) The basic effects of cholinergic terminal stimulation on mPFC cell activity are unclear. (Reviewers 1, 3)

      We acknowledge the lack of characterization of the optogenetic manipulation of cholinergic terminals on mPFC cell activity outside the task context. As outlined in the discussion section (Lines 309-321), cholinergic modulation of mPFC cell activity is highly complex and most likely varies depending on behavioral states. In addition, because we intended to augment naturally occurring threatevoked cholinergic terminal responses (Tu et al., 2022), our optogenetic stimulation parameters were 3-5 times weaker than those used to evoke behavioral changes solely by the optogenetic stimulation of cholinergic terminals (Gritton et al., 2016). Based on these points, we validated the optogenetic stimulation based on its effects on air-puff-evoked cell activity during the task (Figure 2C, 2D). 

      (5) Some choices of statistical analyses are questionable (Reviewers 1, 3)

      We used the Kolmogorov-Smirnov (KS) test to investigate whether the distribution of cell responses differed between the two groups (Figure 2D) or changed with learning (Figure 3Ac, 3Bc). As seen in Figure 3Aa, some mPFC cells increased calcium activity in response to air-puffs, while others decreased. We expected that the manipulation or learning would alter these responses. If they are strengthened, the increased responses will become more positive, while the decreased responses will become more negative. If they are weakened, both responses will become closer to 0. Under such conditions, the shape of the distribution of cell response will change but not the median. The KS test can detect this, but not other tests sensitive to the difference in medians, such as Wilcoxon rank-sum tests. In Figure 2D, KS tests were applied to the independently sampled data from the control and ChrimsonRexpressing mice. In Figure 3Ac and 3Bc, we used all cells imaged in the first and fifth sessions. Considering that ~50% of them were longitudinally registered on both days, we acknowledge the violation in the assumption of independent sampling. In Figure 1D, we detected significant interaction between the group and sessions. Several approaches are appropriate to demonstrate the source of this interaction. We chose to conduct one-way ANOVA separately in each group to demonstrate the significant change in % adaptive choice across the sessions in the control group but not the ChrimsonR group. The cutoff for significance was adjusted with the Bonferroni correction in follow-up paired t-tests used in Figure 1F.

    1. eLife Assessment

      This important study leverages the power of Drosophila genetics and sparsely-labeled neurons to propose a new model for neuronal injury signaling. The authors present convincing evidence to support that the somatic response to axonal injury is suppressed if the injury is not complete, suggesting the presence of an integration of axonal injury-related signaling. While the underlying mechanism of this fascinating observation is unknown, the phenomenon itself will be of broad significance in the field.

    2. Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

    5. Author response:

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      We wish to further clarify a striking aspect of puc-lacZ induction following injury: it is bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only 1-2 remaining boutons. This was particularly evident for injuries that spared the NMJ on muscle 29, which is comprised of only a few boutons. In some instances, only a single bouton was evident on muscle 29. While our injuries varied enormously in the number of branches and boutons that were lost, we did not see a comparable variability in puc-lacZ induction.  In the revision we will include additional images to better demonstrate this observation.

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation.

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question.

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation.

      References Cited:

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015–1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211–223.

    1. eLife Assessment

      This important study describes a neural circuit contributing to two behavioral processes affecting pathogen avoidance in the nematode C. elegans. The method used to identify specific contributing neurons is innovative and the experimental evidence supporting the major claims is solid. This study will be of interest to neuroscientists studying behavior, in particular in C. elegans.

    2. Reviewer #1 (Public review):

      This study identifies two behavioral processes that underlie learned pathogen avoidance behavior in C. elegans: exiting and re-entry of pathogenic bacterial lawns. Long-term behavioral tracking indicates that animals increase the prevalence of both behaviors over long-term exposure to the pathogen Pseudomonas aeruginosa. Using an optogenetic silencing screen, the authors identify groups of neurons, whose activity regulates lawn occupancy. Surprisingly, they find that optogenetic inhibition of neurons during only the first two hours of pathogen exposure can establish subsequent long-term changes in pathogen aversion. By leveraging a compressed sensing approach, the authors define a set of neurons involved in either lawn exit or lawn re-entry behavior using a constrained set of transgenic lines that drive Arch-3 expression in overlapping groups of neurons. They then measure the calcium activity of the candidate neurons involved in lawn re-entry in freely moving animals using GCaMP, and observe a reduction in their neural activity after exposure to pathogen. Optogenetic inhibition of AIY and SIA neurons during acute pathogen exposure in naïve animals delays lawn entry whereas activating these neurons in animals previously exposed to pathogen enhances lawn entry, albeit transiently.

      This work is missing experiments and analyses that are necessary to substantiate their claims. Although the authors convincingly show that neuronal inhibition experiments during pathogen exposure reveal separable groups of neurons controlling pathogenic lawn exiting and re-entry, their methods to validate these results at single neuron cell-type resolution lack rigor.

      In Figure 4, the authors claim that the reduction in calcium activity in cells of interest following pathogen exposure encodes pathogen experience. However, they make no effort to correlate the observed decreased activity with concomitant shifts in increased immobility (decreased forward locomotion) or the increased age of the worms since pathogen exposure began (24 hours have elapsed), either of which could easily explain these results. A better comparison would be between age-matched naive animals and animals exposed to pathogen. More to the point, we are interested in the involvement of these neurons' activity patterns with the behavioral motifs associated with lawn exits and re-entries, so examining these activity patterns in the absence of any pathogen before or after long-term pathogen exposure yields little insight into their relevant signaling roles. To substantiate the authors' claims, a better experiment would measure these neurons' calcium activity during lawn exits and re-entries in naive and post-exposed age-matched worms.

      In Figure 5, the authors attempt to show that manipulating AIY and SIA/SIB neuronal activity controls pathogenic lawn re-entry behavior. Although they show that inhibiting these neurons in naive animals increases latency to enter pathogenic lawns, they never test the effect of neuronal inhibition in post-exposed animals. Instead they activate these neurons using channelrhodopsin, whereby they observe an increase in lawn entry and exit behavior, indicative of high forward locomotion speed. Although suggestive, neither of these experiments prove these neurons' involvement in pathogenic lawn re-entry behavior following pathogen exposure. To rigorously test the hypothesis that AIY and SIA/SIB neurons are required to sustain higher latency to lawn re-entry following pathogen exposure, the authors should perform neuronal inhibition experiments in post-pathogen-exposed animals as well and compare the results. The interpretation of this figure is further complicated by the fact that Npr-4::ChR2 animals express ChR2 in AIY in addition to SIA/SIB neurons: experiments that calculated lawn re-entry rates in Npr-4::ChR2 activation in post-exposed animals may include the known effect of stimulating AIY alone (Fig. 5J) since no discernible attempt at structured illumination to limit excitation to SIA/SIB neurons was made in these animals (Fig. 5 K, L).

      This work raises the interesting possibility that different sets of neurons control lawn exit and lawn re-entry behaviors following pathogen exposure. However, the authors never directly test this claim. To rigorously show this, the authors would need to show that lawn-exit promoting neurons (CEPs, HSNs, RIAs, RIDs, SIAs) are dispensable for lawn re-entry behavior and that lawn re-entry promoting neurons (AVK, SIA, AIY, MI) are dispensable for lawn exit behavior in pathogen-exposed animals. The authors identify AVK neurons as important for modulating lawn re-entry behavior by brief inhibition at the start of pathogen exposure but fail to find that these neurons are required for increased latency to re-entry in naïve animals (Fig. 5D). Recent work from Marquina-Solis et al (2024) shows that chronic silencing of these neurons delays pathogen lawn leaving, due to impaired release of flp-1 neuropeptide. Authors may wish to connect their work more closely with the existing literature by investigating the behavioral process by which AVK contributes to lawn evacuation.

    3. Reviewer #2 (Public review):

      In this manuscript, Hallacy et al. used a compressed sensing-based optogenetic screening method to investigate the crucial neurons that regulate pathogenic avoidance behavior in C. elegans. They further substantiate their findings using complementary optogenetic activation and imaging techniques to confirm the roles of the key neurons identified through extensive screening efforts. Notably, they identified AIY and SIA as pivotal neurons in the dynamic process of pathogenic avoidance. Their significant discovery is the delayed or stalled reentry process, which drives avoidance behavior; to my knowledge, this dynamic has not been previously documented. Additionally, the successful integration of quantitative optogenetic tools and compressed sensing algorithms is noteworthy, demonstrating the potential for obtaining highly quantitative data from the C. elegans nervous system. This approach is quite rare in this field, yet it represents a promising direction for studying this simple nervous system.

      However, the paper's main weakness lies in its lack of a detailed mechanism explaining how the delayed reentry process directly influences the actual locomotor output that results in avoidance. The term 'delayed reentry' is used as a dynamic metric for quantifying the screening, yet the causal link between this metric and the mechanistic output remains unclear. Despite this, the study is well-structured, with comprehensive control experiments, and is very well constructed.

      Comments on revisions:

      The authors have addressed all my concerns and suggestions. They particularly further clarified the AIY's role in navigation by providing a new figure. They also provided supplementary videos representing the re-entry process.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank the reviewer for their comments and suggestions. We have made several edits to the paper to address these comments, including the addition of several new control experiments, corrections to mislabeled figures in Fig 2, and other additions to improve the clarity of several figures.

      This work is missing several controls that are necessary to substantiate their claims. My most important concern is that the optogenetic screen for neurons that alter pathogenic lawn occupancy does not have an accompanying control on non-pathogenic OP50 bacteria. Hence, it remains unclear whether these neuronal inhibition experiments lead to pathogen-specific or generalized lawn-leaving alterations. For strains that show statistical differences between - and + ATR conditions, the authors should perform follow-up validation experiments on non-pathogenic OP50 lawns to ensure that the observed effect is PA14-specific. Similarly, neuronal inhibition experiments in Figures 5E and H are only performed with naïve animals on PA14 - we need to see the latency to re-entry on OP50 as well, to make general conclusions about these neurons' role in pathogen-specific avoidance.

      We have added data from new control experiments to Fig. S1 (subfigures B, C) for both exit and re-entry dynamics on OP50. We find that inhibition of neurons produces different effects on both lawn entry and exit on PA14 compared to OP50. We observed that inhibition of neurons failed to change the re-entry dynamics for any of the lines which showed delayed latency to re-entry on PA14. Our results suggest that the neural control of re-entry dynamics we see are PA14 specific.

      My second major concern is regarding the calcium imaging experiments of candidate neurons involved in lawn re-entry behavior. Although the data shows that AIY, AVK, and SIA/SIB neurons all show reduced activity following pathogen exposure, the authors do not relate these activity changes to changes in behavior. Given the well-established links between these cells and forward locomotion, it is essential to not only report differences in activity but also in the relationship between this activity and locomotory behavior. If animals are paused outside of the pathogen lawn, these neurons may show low activity simply because the animals are not moving forward. Other forward-modulated neurons may also show this pattern of reduced activity if the animals remain paused. Given that the authors have recorded neural activity before and after contact with pathogenic bacteria in freely moving animals, they should also provide an analysis of the relationship between proximity to the lawn and the activity of these neurons.

      In response, we added an additional supplementary figure S7 to illustrate the role of each neuron in navigational control and added text to the discussion to better explain the role of each neuron type in the regulation of re-entry, in light of our previously published work on SIA in speed control.

      This work is missing methodological descriptions that are necessary for the correct interpretation of the results shown here. Figure 2 suggests that the determination of statistical significance across the optogenetic inhibition screen will be found in the Methods, but this information is not to be found there. At various points in the text, authors refer to "exit rate", "rate constant", and "entry rate". These metrics seem derived from an averaged measurement across many individual animals in one lawn evacuation assay plate. However "latency to re-entry" is only defined on a per-animal basis in the lawn re-exposure assay. These differences should be clearly stated in the methods section to avoid confusion and to ensure that statistics are computed correctly.

      Additional details have been added to the methods section to provide more in depth information on the statistical analysis performed. In brief, the latency to re-entry is calculated in the same way across all assays – re-entry events across replicate experiments for a given experimental condition are aggregated together and used to calculate relevant statistics.

      This work also contains mislabeled graphs and incorrect correspondence with the text, which make it difficult to follow the authors 'claims. The text suggests that Pdop-2::Arch3 and Pmpz-1::Arch3 show increased exit rates, whereas Figure 2 shows that Pflp-4::Arch3 but not Pmpz-1::Arch3 has increased exit rate. The authors should also make a greater effort to correctly and clearly label which type of behavioral experiment is used to generate each figure and describe the differences in experimental design in the main text, figure legends, and methods. Figure 2E depicts trajectories of animals leaving a lawn over a 2.5-minute interval but it is unclear when this time window occurs within the 18-hour lawn leaving assay. Likewise, Figure 2H depicts a 30-minute time window which has an unclear relationship to the overall time course of lawn leaving. This figure legend is also mislabeled as "Infected/Healthy", whereas it should be labeled "-/+ ATR".

      In Figures 2C and F, the x-axis labels are in a different order, making it difficult to compare between the 2 plots. Promoter names should be italicized. What does the red ring mean in Figure 2A? Figure 2 legend incorrectly states that four lines showed statistically significant changes for the Exist rate constant - only 2 lines are significant according to the figure.

      We thank the reviewer for identifying this embarrassing error. Figure 2C and F were flipped, and we have corrected this, we are sorry for the error. Promoter names have been italicized, and we have added additional text in the captions that the red ring is a ring light for background illumination of the worms. In addition, we have corrected the error in the figure legends from “Infected/Healthy” to “+/- ATR”.

      Lines in figure 2C and 2F are ordered by significance rather than keeping the same order in both. Majority feedback from colleagues suggested that this ordering was preferred.

      This work raises the interesting possibility that different sets of neurons control lawn exit and lawn re-entry behaviors following pathogen exposure. However, the authors never directly test this claim. To rigorously show this, the authors would need to show that lawn-exit-promoting neurons (CEPs, HSNs, RIAs, RIDs, SIAs) are dispensable for lawn re-entry behavior and that lawn re-entry promoting neurons (AVK, SIA, AIY, MI) are dispensable for lawn exit behavior in pathogen-exposed animals.

      We agree with the reviewer’s comments that there is insufficient evidence to show a complete decoupling of lawn exit and lawn re-entry. However, we note that our screen results show that only 1 line (dop-2) shows changes in both exit and re-entry dynamics upon neural inhibition (Fig. 2). This seems to suggest that at least some degree of neural control of re-entry is decoupled from exit.

      Please label graph axes with units in Figure 1 - instead of "Exit Rate" make it #exits per worm per hour, and make it more clear that Figures 1C and E have a different kind of assay than Figures 1A, B and D. There should be more consistency between the meaning of "pre/post" and "naive/infected/healthy" - and how many hours constitutes post.

      We have edited Figure 1 and made additions to the captions of figure 1 to make both points clearer. We have also standardized our language for subsequent figures (such as figure 5) to provide less ambiguity in pre/post and naïve/infected/healthy.

      Figure 5 - it should be made more clear when the stimulation/inhibition occurred in these experiments and how long they were recorded/analyzed.

      We have added additional details to the figure captions to make it clearer when the data was collected.

      Workspaces and code have been added under a data availability section in the manuscript.

      Reviewer 2:

      However, the paper's main weakness lies in its lack of a detailed mechanism explaining how the delayed reentry process directly influences the actual locomotor output that results in avoidance. The term 'delayed reentry' is used as a dynamic metric for quantifying the screening, yet the causal link between this metric and the mechanistic output remains unclear. Despite this, the study is well-structured, with comprehensive control experiments, and is very well constructed.

      We thank the reviewer for their comments and suggestions. We have added additional data and details to our work to cover these weaknesses, as can be seen in our responses to the suggestions below.

      (1) A key issue in the manuscript is the mechanistic link between the delayed process and locomotor output. AIY is identified as a crucial neuron in this process, but the specifics of how AIY influences this delay are not clear. For instance, does AIY decrease the reversal rate, causing animals to get into long-range search when they leave the bacterial lawn? Is there any relationship between pdf-2 expression and reversal rates? Given that AIY typically promotes long-range motion when activated, the suppression of this function and its implications on motion warrants further clarification.

      We have included additional data to explain how AIY might be able to regulate lawn entry behaviors and have added more to the discussion to explain how neural suppression might lead to changes in the behavior (new figure S7). Both AIY and SIA dynamics have been linked to worm navigation. In previous work (Lee 2019), we have demonstrated that SIA can control locomotory speed. Inhibition of SIA decreases locomotory speed, and as a result may serve to drive the increased latency of re-entry.

      AIY’s role in navigation has been previously established (Zhaoyu 2014), but we have added an additional supplementary figure and edited our discussion to further illustrate this point. As can be seen in the new figure S7, AIY neural activity undergoes a transition after removal from a bacterial lawn, going from low activity to high activity. This activity increase is correlated with a transition from a high reversal rate local search state to a long range search state characterized by longer runs. Inhibition of AIY during this long range search state increased the reversal rate resulting in a higher rate of re-orientations. This might serve as a part of the mechanistic explanation for AIY’s role in preventing lawn re-entry, as inhibition dramatically increased the rate of re-orientation, preventing worms from making directed runs into the bacterial lawn. However, there is an additional effect of the inhibition of AIY, not seen during food search. Inhibition of AIY in the context of a pathogenic bacterial lawn leads to stalling at the edge. Therefore, re-entry AIY could have an additional role in governing the animals movement, post exposure, upon contact with a pathogenic lawn.

      (2) I recommend including supplementary videos to visually demonstrate the process. These videos might help others identify aspects of the mechanism that are currently missing or unclear in the text.

      (4) The authors mention that the worms "left the lawn," but the images suggest that the worms do not stray far and remain around the perimeter. Providing videos could help clarify this observation and strengthen the argument by visually connecting these points

      Additional supplementary videos (1-3) taken at several stages of lawn evacuation have been added to visually demonstrate the process.

      (3) Regarding the control experiments (Figure 1E-G), the manuscript describes testing animals picked from a PA14-seeded plate and retesting them on different plates. It's crucial to clarify the differences between these plates. Specifically, the region just outside the lawn should be considered, as it is not empty and worms can spread bacteria around. Testing animals on a new plate with a pristine proximity region might introduce variables that affect their behavior.

      We have reworded the paper to make it clearer that these new conditions on a fresh PA14 lawn represent a different type of assay from the lawn evacuation assay. Fresh PA14 plates will indeed have a pristine proximity region compared to plates where the worms have spread the bacteria.

      These experiments were done to test if the evacuation effect is purely due to aversive signals left on the lawn or attractive signals left outside of the lawn. Given that worms are known to be able to leave compounds such as ascarosides to communicate with each other, we wanted to test that this lawn re-entry defect was not simply the result of deposited pheromones. Without any other method to remove such compounds, we relied on using fresh PA14 lawns instead to test this. We have updated the manuscript to clarify this point.

      (5) The manuscript notes that the PA14 strain was grown without shaking. Typically, growing this strain without agitation leads to biofilm formation. Clarifying whether there is a link between biofilm formation and avoidance behavior would add depth to the understanding of the experimental conditions and their impact on the observed behaviors.

      As the reviewer has noted, growth of PA14 without shaking might indeed lead to biofilm formation. This does represent a legitimate concern, as evidence from previous work has suggested that biofilm formation could be linked to pathogen avoidance as worms make use of mechanosensation to avoid pathogenic bacteria (Chang et al. 2011).  However, we do not observe substantial formation of biofilm in our cultured bacteria, likely since our growth time might be insufficient for sufficient biofilm formation to occur. We also note that our evacuation dynamics appear to be of similar timescale to results reported in previous work which used different growth conditions. As such, we believe that our growth conditions thus represent similar conditions as to those historically used in the lawn evacuation literature.

      Reviewer 3:

      Weaknesses:

      My only concern is that the authors should be more careful about describing their "compressed sensing-based approach". Authors often cite their previous Nature Methods paper, but should explain more because this method is critical for this manuscript. Also, this analysis is based on the hypothesis that only a small number of neurons are responsible for a given behavior. Authors should explain more about how to determine scarcity parameters, for example.

      We have added more details to our paper outlining some of the details involved in our compressed sensing approach. We go into more detail about how we chose sparsity parameters and note that our discovered neurons for re-entry appear to be robust over choice of sparsity parameters. These additional details can be found in both the paper body and the methods section.

      Line 45: This paragraph tries to mention that there should be "small sets of neurons" that can play key roles in integrating previous information to influence subsequent behavior. Is it valid as an assumption in the nervous systems?

      We want to clarify that what is important is not that there are ‘small sets of neurons’, but rather that these key neurons make up a small fraction of the total number of neurons in the nervous system. More correctly: the compressed sensing approach identifies information bottlenecks in the neural circuits, and the assumption is that the number of neurons in these bottlenecks are small. This is the underlying sparsity assumption being made here that allows us to utilize a compressed sensing based approach to identify these neurons. We have reworded this section to make it clear that what is important is not that the total number of neurons is small, but that they must be a small fraction of the total number of neurons in the nervous system.

      Line 125: "These approaches…" Authors repeatedly mentioned this statement to emphasize that their compressed sensing-based approach is the best choice. Are you really sure?

      We agree that there are several approaches that might allow for faster screening of the nervous system. For example, many studies approach the problem by looking at neurons with synapses onto a neuron already known to be implicated in the behavior or find neurons that express a key gene known to regulate the behavior of interest. These approaches utilize prior information to greatly reduce the pool of candidate neurons needed to be screened.

      In the absence of such prior information, we believe that our compressed sensing based approach allows a rapid way to perform an unbiased interrogation of the entire nervous system to identify key neurons at bottlenecks of neural circuits. Once these key neurons are identified, neurons upstream and downstream of these key neurons can be investigated in the future.  This approach gives us the added advantage of being able to identify neurons that do not connect to neurons that are already implicated in the behavior, or that don’t have clear genetic signatures in the behavior of interest. Our approach further allows for screening of neurons with no clear single genetic marker without the next to utilize intersectional genetic strategies.  We should not use the phrase “best choice” which might not be justified. We have reworded these statements, and we believe that compressed sensing based methods provide a complementary approach to those in the literature.

      Line 42: If authors refer to mushroom bodies and human hippocampus in relation to the significance of their work, authors should go back to these references in the Discussion and explain how their work is important.

      We thank the reviewer for this feedback, and we have added to our discussion to expand upon these points.

      Line 151: "the accelerated pathogen avoidance" Accelerated pathogen avoidance does not necessarily indicate the existence of the neural mechanism that inhibits the association of pathogenicity with microbe-specific cues (during early stages: first two hours).

      We agree with the reviewer’s statements that these results alone do not indicate the presence of an early avoidance mechanism. Other evidence for early avoidance mechanisms exists as seen in two choice assay experiments (Zhang 2005), and our results do seem to support this. However, we agree that early neural inhibition is insufficient evidence towards such a mechanism. We have thus removed this statement for accuracy.

    1. eLife Assessment

      This study presents important analyses of the impacts of microexon deletions and loss-of-function in microexon regulators on zebrafish neurite outgrowth and gene expression, as well as adult and larval behavior. While microexons have been mapped in many genes several years ago, information on their functions - in particular with regard to individual gene isoforms - is limited. The authors provide convincing evidence that individual microexon deletions, only in a few cases, produce subtle cellular and behavioural phenotypes, while transcriptomic analysis reveals gene expression alterations that are suggestive of compensatory mechanisms that buffer against microexon disruption.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus loss-of-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses / Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      (4) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

    4. Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus loss-of-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      We thank the Reviewer for their positive comments on our manuscript. In the revised version, we will try to improve readability, reduce jargon and avoid overstatements. 

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses / Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      Thank you for bringing up this point. For the tests shown in Figure 5, we indeed decided to match the srrm3 pairs by fish size since we thought this would be more comparable to the other lines both biologically and methodologically (in terms of video tracking, etc.). However, we are confident the results would be very similar if matched by age, since the differences in social interactions between the srrm3 homozygous mutants and their control siblings are very dramatic at any age. For example, this can be appreciated, in line with the Reviewer's suggestion, in Videos S2 and S3, which show groups of five 5 mpf fish that are either srrm3 mutants or controls. It can be observed that the behavior of 5 mpf control fish is very similar to those of 1 mpf fish pairs, with very small interindividual distances. We will nonetheless agree that this decision on the experimental design should be clearly stated in the text and figure legend and we will do so in the revised version.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      We thought it made the interpretation of plots easier, but we will change this in the revised version.

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      Yes, they are interchangeable. We will double check the wording to avoid confusion.

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      Apologies for this omission. We will briefly described the methods; however, please note that the data was obtained from a previous publication (Torres-Mendez et al, 2019), where the detailed methodology is reported.

      (5) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

      We will add further details to the revised version. With regards to the specific question, we would like to mention that this has not been a very common problem for the time points used in the manuscript (10 hap and 24 hap). At those stages, it was nearly always evident how to track each individual neurite. Dubious cases were simply discarded. Of course, such cases become much more common at later time points (48 and 72 hap), not sure in this study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

      The Reviewer is absolutely right and we apologize for this omission, since it was not our intention to imply that these social defects should be interpreted simply as autistic-like. It is indeed very likely that the main reason for the social alterations displayed by the srrm3's mutants are due to their impaired vision. We will add this discussion explicitly in the revised version. 

      Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

      Thank you very much for your positive comments to our manuscript.

    1. eLife Assessment

      This important study provides new insights into the plasticity mechanisms underlying the formation of spatial maps in the hippocampus. Supported by a large and comprehensive dataset, the evidence is solid. However, certain aspects of the statistical analysis and data presentation may seem incomplete and warrant improvement. This study will be of interest to neuroscientists focusing on spatial navigation, learning, and memory.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the cellular mechanisms underlying place field formation (PFF) in hippocampal CA1 pyramidal cells by performing in vivo two-photon calcium imaging in head-restrained mice navigating a virtual environment. Specifically, they sought to determine whether BTSP-like (behavioral time scale synaptic plasticity) events, characterized by large calcium transients, are the primary mechanism driving PFFs or if other mechanisms also play a significant role. Through their extensive imaging dataset, the authors found that while BTSP-like events are prevalent, a substantial fraction of new place fields are formed via non-BTSP-like mechanisms. They further observed that large calcium transients, often associated with BTSP-like events, are not sufficient to induce new place fields, indicating the presence of additional regulatory factors (possibly local dendritic spikes).

      Strengths

      The study makes use of a robust and extensive dataset collected from 163 imaging sessions across 45 mice, providing a comprehensive examination of CA1 place-cell activity during navigation in both familiar and novel virtual environments. The use of two-photon calcium imaging allows the authors to observe the detailed dynamics of neuronal activity and calcium transients, offering insights into the differences between BTSP-like and non-BTSP-like PFF events. The study's ability to distinguish between these two mechanisms and analyze their prevalence under different conditions is a key strength, as it provides a nuanced understanding of how place fields are formed and maintained. The paper supports the idea that BTSP is not the only driving force behind PFF, and other mechanisms are likely sufficient to drive PFF, and BTSP events may also be insufficient to drive PFF in some cases. The longer-than-usual virtual track used in the experiment allowed place cells to express multiple place fields, adding a valuable dimension to the dataset that is typically lacking in similar studies. Additionally, the authors took a conservative approach in classifying PFF events, ensuring that their findings were not confounded by noise or ambiguous activity.

      Weaknesses

      Despite the impressive dataset, there are several methodological and interpretational concerns that limit the impact of the findings. Firstly, the virtual environment appears to be poorly enriched, relying mainly on wall patterns for visual cues, which raises questions about the generalizability of the results to more enriched environments. Prior studies have shown that environmental enrichment can significantly influence spatial coding, and it would be important to determine how a more immersive VR environment might alter the observed PFF dynamics. Secondly, the study relies on deconvolution methods in some cases to infer spiking activity from calcium signals without in vivo ground truth validation. This introduces potential inaccuracies, as deconvolution is an estimate rather than a direct measure of spiking, and any conclusions drawn from these inferred signals should be interpreted with caution. Thirdly, the figures would benefit from clearer statistical annotations and visual enhancements. For example, several plots lack indicators of statistical significance, making it difficult for readers to assess the robustness of the findings. Furthermore, the use of bar plots without displaying underlying data distributions obscures variability, which could be better visualized with violin plots or individual data points. The manuscript would also benefit from a more explicit breakdown of the proportion of place fields categorized as BTSP-like versus non-BTSP-like, along with clearer references to figures throughout the results section. Lastly, the authors' interpretation of their data, particularly regarding the sufficiency of large calcium transients for PFF induction, needs to be more cautious. Without direct confirmation that these transients correspond to actual BTSP events (including associated complex spikes and calcium plateau potentials), concluding that BTSP is not necessary or sufficient for PFF formation is speculative.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this manuscript aim to investigate the formation of place fields (PFs) in hippocampal CA1 pyramidal cells. They focus on the role of behavioral time scale synaptic plasticity (BTSP), a mechanism proposed to be crucial for the formation of new PFs. Using in vivo two-photon calcium imaging in head-restrained mice navigating virtual environments, employing a classification method based on calcium activity to categorize the formation of place cells' place fields into BTSP, non-BTSP-like, and investigated their properties.

      Strengths:

      A new method to use calcium imaging to separate BTSP and non-BTSP place field formation. This work offers new methods and factual evidence for other researchers in the field.

      The method enabled the authors to reveal that while many PFs are formed by BTSP-like events, a significant number of PFs emerge with calcium dynamics that do not match BTSP characteristics, suggesting a diversity of mechanisms underlying PF formation. The characteristics of place fields under the first two categories are comprehensively described, including aspects such as formation timing, quantity, and width.

      Weaknesses:

      There are some issues about data and statistics that need to be addressed before these research findings can be considered as rigorous conclusions.

      While the authors mentioned 3 features of PF generated by BTSP during calcium imaging in the Introduction, the classification method used features 1 and 2. The confirmation by feature 3 in its current form is important but not strong enough.

      Some key data is missing such as the excluded PFs, the BTSP/non-BTSP of each animal, etc

      Impact:

      This work is likely to provide a new method to classify BTSP and non-BTSP place field formation using calsium image to the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Sumegi et al. use calcium imaging in head-fixed mice to test whether new place fields tend to emerge due to events that resemble behavioral time scale plasticity (BTSP) or other mechanisms. An impressive dataset was amassed (163 sessions from 45 mice with 500-1000 neurons per sample) to study the spontaneous emergence of new place fields in area CA1 that had the signature of BTSP. The authors observed that place fields could emerge due to BTSP and non-BTSP-like mechanisms. Interestingly, when non-BTSP mechanisms seemed to generate a place field, this tended to occur on a trial with a spontaneous reset in neural coding (a remapping event). Novelty seemed to upregulate non-BTSP events relative to BTSP events. Finally, large calcium transients (presumed plateau potentials) were not sufficient to generate a place field.

      Strengths:

      I found this manuscript to be exceptionally well-written, well-powered, and timely given the outstanding debate and confusion surrounding whether all place fields must arise from BTSP event. Working at the same institute, Albert Lee (e.g. Epszstein et al., 2011 - which should be cited) and Jeff Magee (e.g. Bittner et al., 2017) showed contradictory results for how place fields arise. These accounts have not fully been put toe-to-toe and reconciled in the literature. This manuscript addresses this gap and shows that both accounts are correct - place fields can emerge due to a pre-existing map and due to BTSP.

      Weaknesses:

      I find only three significant areas for improvement in the present study:

      First, can it be concluded that non-BTSP events occur exclusively due to a global remapping event, as stated in the manuscript "these PFF surges included a high fraction of both non-BTSP- and BTSP-like PFF events, and were associated with global remapping of the CA1 representation"? Global remapping has a precise definition that involves quantifying the stability of all place fields recorded. Without a color scale bar in Figure 3D (which should be added), we cannot know whether the overall representations were independent before and after the spontaneous reset. It would be good to know if some neurons are able to maintain place coding (more often than expected by chance), suggestive of a partial-remapping phenomenon.

      Second, BTSP has a flip side that involves the weakening of existing place fields when a novel field emerges. Was this observed in the present study? Presumably place fields can disappear due to this bidirectional BTSP or due to global remapping. For a full comparison of the two phenomena, the disappearance of place fields must also be assessed.

      Finally, it would be good to know if place fields differ according to how they are born. For example, are there differences in reliability, width, peak rate, out-of-field firing, etc for those that arise due to BTSP vs non-BTSP.

    1. eLife Assessment

      This cleverly-designed and potentially important work supports our understanding regarding how and whether social behaviours promoting egalitarianism can be learned, even when implementing these norms entails a cost for oneself. However, the evidence supporting the major claims is currently incomplete, with major limitations being the statistical approach, the modelling, and over-interpretation. With a strengthening of the supporting evidence, this work will be of interest to a wide range of fields, including cognitive psychology/neuroscience, neuroeconomics, and social psychology, as well as policy making.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether advantageous and disadvantageous inequality aversion can be vicariously learned and generalized. Using an adapted version of the ultimatum game (UG), in three phases, participants first gave their own preference (baseline phase), then interacted with a "teacher" to learn their preference (learning phase), and finally were tested again on their own (transfer phase). The key measure is whether participants exhibited similar choice preferences (i.e., rejection rate and fairness rating) influenced by the learning phase, by contrasting their transfer phase and baseline phase. Through a series of statistical modeling and computational modeling, the authors reported that both advantageous and disadvantageous inequality aversion can indeed be learned (Study 1), and even be generalised (Study 2).

      Strengths:

      This study is very interesting, it directly adapted the lab's previous work on the observational learning effect on disadvantageous inequality aversion, to test both advantageous and disadvantageous inequality aversion in the current study. Social transmission of action, emotion, and attitude have started to be looked at recently, hence this research is timely. The use of computational modeling is mostly appropriate and motivated. Study 2, which examined the vicarious inequality aversion in conditions where feedback was never provided, is interesting and important to strengthen the reported effects. Both studies have proper justifications to determine the sample size.

      Weaknesses:

      Despite the strengths, a few conceptual aspects and analytical decisions have to be explained, justified, or clarified.

      INTRODUCTION/CONCEPTUALIZATION<br /> (1) Two terms seem to be interchangeable, which should not, in this work: vicarious/observational learning vs preference learning. For vicarious learning, individuals observe others' actions (and optionally also the corresponding consequence resulting directly from their own actions), whereas, for preference learning, individuals predict, or act on behalf of, the others' actions, and then receive feedback if that prediction is correct or not. For the current work, it seems that the experiment is more about preference learning and prediction, and less so about vicarious learning. The intro and set are heavily around vicarious learning, and later the use of vicarious learning and preference learning is rather mixed in the text. I think either tone down the focus on vicarious learning, or discuss how they are different. Some of the references here may be helpful: Charpentier et al., Neuron, 2020; Olsson et al., Nature Reviews Neuroscience, 2020; Zhang & Glascher, Science Advances, 2020

      EXPERIMENTAL DESIGN<br /> (2) For each offer type, the experiment "added a uniformly distributed noise in the range of (-10 ,10)". I wonder what this looks like? With only integers such as 25:75, or even with decimal points? More importantly, is it possible to have either 70:30 or 90:10 option, after adding the noise, to have generated an 80:20 split shown to the participants? If so, for the analyses later, when participants saw the 80:20 split, which condition did this trial belong to? 70:30 or 90:10? And is such noise added only to the learning phase, or also to the baseline/transfer phases? This requires some clarification.

      (3) For the offer conditions (90:10, 70:30, 50:50, 30:70, 10:90) - are they randomized? If so, how is it done? Is it randomized within each participant, and/or also across participants (such that each participant experienced different trial sequences)? This is important, as the order especially for the learning phase can largely impact the preference learning of the participants.

      STATISTICAL ANALYSIS & COMPUTATIONAL MODELING<br /> (4) In Study 1 DI offer types (90:10, 70:30), the rejection rate for DI-AI averse looks consistently higher than that for DI averse (ie, the blue line is above the yellow line). Is this significant? If so, how come? Since this is a between-subject design, I would not anticipate such a result (especially for the baseline). Also, for the LME results (eg, Table S3), only interactions were reported but not the main results.

      (5) I do not particularly find this analysis appealing: "we examined whether participants' changes in rejection rates between Transfer and Baseline, could be explained by the degree to which they vicariously learned, defined as the change in punishment rates between the first and last 5 trials of the Learning phase." Naturally, the participants' behavior in the first 5 trials in the learning phase will be similar to those in the baseline; and their behavior in the last 5 trials in the learning phase would echo those at the transfer phase. I think it would be stronger to link the preference learning results to the change between the baseline and transfer phase, eg, by looking at the difference between alpha (beta) at the end of the learning phase and the initial alpha (beta).

      (6) I wonder if data from the baseline and transfer phases can also be modeled, using a simple Fehr-Schimdt model. This way, the change in alpha/beta can also be examined between the baseline and transfer phase.

      (7) I quite liked Study 2 which tests the generalization effect, and I expected to see an adapted computational modeling to directly reflect this idea. Indeed, the authors wrote, "[...] given that this model [...] assumes the sort of generalization of preferences between offer types [...]". But where exactly did the preference learning model assume the generalization? In the methods, the modeling seems to be only about Study 1; did the authors advise their model to accommodate Study 2? The authors also ran simulation for the learning phase in Study 2 (Figure 6), and how did the preference update (if at all) for offers (90:10 and 10:90) where feedback was not given? Extending/Unpacking the computational modeling results for Study 2 will be very helpful for the paper.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates whether individuals can learn to adopt egalitarian norms that incur a personal monetary cost, such as rejecting offers that benefit them more than the giver (advantageous inequitable offers). While these behaviors are uncommon, two experiments demonstrate that individuals can learn to reject such offers through vicarious learning - by observing and acting in line with a "teacher" who follows these norms. The authors use computational modelling to argue that learners adopt these norms through a sophisticated process, inferring the latent structure of the teacher's preferences, akin to theory of mind.

      Strengths:

      This paper is well-written and tackles a critical topic relevant to social norms, morality, and justice. The findings, which show that individuals can adopt just and fair norms even at a personal cost, are promising. The study is well-situated in the literature, with clever experimental design and a computational approach that may offer insights into latent cognitive processes. Findings have potential implications for policymakers.

      Weaknesses:

      Note: in the text below, the "teacher" will refer to the agent from which a participant presumably receives feedback during the learning phase.

      (1) Focus on Disadvantageous Inequity (DI): A significant portion of the paper focuses on responses to Disadvantageous Inequitable (DI) offers, which is confusing given the study's primary aim is to examine learning in response to Advantageous Inequitable (AI) offers. The inclusion of DI offers is not well-justified and distracts from the main focus. Furthermore, the experimental design seems, in principle, inadequate to test for the learning effects of DI offers. Because both teaching regimes considered were identical for DI offers the paradigm lacks a control condition to test for learning effects related to these offers. I can't see how an increase in rejection of DI offers (e.g., between baseline and generalization) can be interpreted as speaking to learning. There are various other potential reasons for an increase in rejection of DI offers even if individuals learn nothing from learning (e.g. if envy builds up during the experiment as one encounters more instances of disadvantageous fairness).

      (2) Statistical Analysis: The analysis of the learning effects of AI offers is not fully convincing. The authors analyse changes in rejection rates within each learning condition rather than directly comparing the two. Finding a significant effect in one condition but not the other does not demonstrate that the learning regime is driving the effect. A direct comparison between conditions is necessary for establishing that there is a causal role for the learning regime.

      (3) Correlation Between Learning and Contagion Effects:<br /> The authors argue that correlations between learning effects (changes in rejection rates during the learning phase) and contagion effects (changes between the generalization and baseline phases) support the idea that individuals who are better aligning their preferences with the teacher also give more consideration to the teacher's preferences later during generalization phase. This interpretation is not convincing. Such correlations could emerge even in the absence of learning, driven by temporal trends like increasing guilt or envy (or even by slow temporal fluctuations in these processes) on behalf of self or others. The reason is that the baseline phase is temporally closer to the beginning of the learning phase whereas the generalization phase is temporally closer to the end of the learning phase. Additionally, the interpretation of these effects seems flawed, as changes in rejection rates do not necessarily indicate closer alignment with the teacher's preferences. For example, if the teacher rejects an offer 75% of the time then a positive 5% learning effect may imply better matching the teacher if it reflects an increase in rejection rate from 65% to 70%, but it implies divergence from the teacher if it reflects an increase from 85% to 90%. For similar reasons, it is not clear that the contagion effects reflect how much a teacher's preferences are taken into account during generalization.

      (4) Modeling Efforts: The modelling approach is underdeveloped. The identification of the "best model" lacks transparency, as no model-recovery results are provided, and fits for the losing models are not shown, leaving readers in the dark about where these models fail. Moreover, the reinforcement learning (RL) models used are overly simplistic, treating actions as independent when they are likely inversely related (for example, the feedback that the teacher would have rejected an offer provides feedback that rejection is "correct" but also that acceptance is "an error", and the later is not incorporated into the modelling). It is unclear if and to what extent this limits current RL formulations. There are also potentially important missing details about the models. Can the authors justify/explain the reasoning behind including these variants they consider? What are the initial Q-values? If these are not free parameters what are their values?

      (5) Conceptual Leap in Modeling Interpretation: The distinction between simple RL models and preference-inference models seems to hinge on the ability to generalize learning from one offer to another. Whereas in the RL models learning occurs independently for each offer (hence to cross-offer generalization), preference inference allows for generalization between different offers. However, the paper does not explore RL models that allow generalization based on the similarity of features of the offers (e.g., payment for the receiver, payment for the offer-giver, who benefits more). Such models are more parsimonious and could explain the results without invoking a theory of mind or any modelling of the teacher. In such model versions, a learner learns a functional form that allows to predict the teacher's feedback based on said offer features (e.g., linear or quadratic form). Because feedback for an offer modulates the parameters of this function (feature weights) generalization occurs without necessarily evoking any sophisticated model of the other person. This leaves open the possibility that RL models could perform just as well or even show superiority over the preference learning model, casting doubt on the authors' conclusions. Of note: even the behaviourists knew that as Little Albert was taught to fear rats, this fear generalized to rabbits. This could occur simply because rabbits are somewhat similar to rats. But this doesn't mean little Alfred had a sophisticated model of animals he used to infer how they behave.

      (6) Limitations of the Preference-Inference Model: The preference-inference model struggles to capture key aspects of the data, such as the increase in rejection rates for 70:30 DI offers during the learning phase (e.g. Figure 3A, AI+DI blue group). This is puzzling.

      Thinking about this I realized the model makes quite strong unintuitive predictions that are not examined. For example, if a subject begins the learning phase rejecting the 70:30 offer more than 50% of the time (meaning the starting guilt parameter is higher than 1.5), then overleaning the tendency to reject will decrease to below 50% (the guilt parameter will be pulled down below 1.5). This is despite the fact the teacher rejects 75% of the offers. In other words, as learning continues learners will diverge from the teacher. On the other hand, if a participant begins learning to tend to accept this offer (guilt < 1.5) then during learning they can increase their rejection rate but never above 50%. Thus one can never fully converge on the teacher. I think this relates to the model's failure in accounting for the pattern mentioned above. I wonder if individuals actually abide by these strict predictions. In any case, these issues raise questions about the validity of the model as a representation of how individuals learn to align with a teacher's preferences (given that the model doesn't really allow for such an alignment).

    1. eLife Assessment

      This paper shows convincingly that the human visual system can recalibrate itself to compensate for phase alterations in an image induced by optical blur. This phenomenon is studied using state-of-the-art adaptive optics approaches that allow the manipulation of the eye's optics while making concurrent psychophysical measurements. The findings are broadly important because they highlight a neural mechanism by which flawed information is used to create seemingly accurate perceptions of the visual environment.

    2. Reviewer #1 (Public review):

      Summary:

      Optical blur is characterized by contrast losses and phase shifts that alter the local relationship between the component spatial frequencies in the image. The eye experiences optical blur on several occasions - for instance, physiologically, when the focus state of the eye does not match the optical vergence demand and, in cases of pathologies like keratoconus where the cornea gets progressively distorted leading to degraded retinal image quality. Recalibration of the visual system to suprathreshold contrast losses arising from the optical blur and the mechanisms that may underlie such a recalibration have been well-researched. This study by Barbot et al presents convincing evidence that the visual system could also recalibrate itself to the phase distortions experienced with optical blur. This was demonstrated, in principle, on a small number of participants with normal vision but with induced blur (?? experienced psychophysical observers) and in a few keratoconic patients using their state-of-the-art adaptive optics apparatus. In the former cohort, known magnitudes of radially asymmetric blur from a vertical coma were induced while participants judged the position of a compound grating target that shifted in predictable ways with the induction of blur. Immediate exposure to images blurred with such higher-order aberrations resulted in position shifts that were consistent with optical theory, but prolonged exposure to such blur resulted in the position shift returning to veridical perception (albeit, not completely). When the blur was removed after the adaptation phase, after effects of the position offset were noticed. In the keratoconic cohort, such position offsets were observed even when the eye was completely corrected for optical degradation. These results are discussed in the context of the perception of real-world targets, the underlying neurophysiology, and what it means to space perception in disease conditions like keratoconus.

      Strengths:

      A clear hypothesis, a parameterized experimental space, rigor of optical correction and psychophysical judgements, and clarity in the explanation of results are the major strengths of the paper. Additional strengths include the control experiments to address confounders and the additional analyses shown in the supplementary section to rule out analytical inconsistencies in explaining the results.

      Weaknesses:

      The small sample size (especially in the keratoconic cohort) may be a limitation of the study. While the experiments conducted in this study are meant to demonstrate a basic visual phenomenon, that only 6 keratoconic patients were included in the study precludes the results from being extrapolated to the heterogeneity of disease presentation. It must, however, be noted that these are difficult experiments to conduct, and getting multiple participants to agree to such an experiment is not an easy task.

      Second, the analysis shown in Figure 6C relating the magnitude of habitual higher-order RMS to the absolute PSE shift is not convincing. The PSE's were both positive and negative in the KC patients. The direction of the phase shift experienced by the patient (i.e., positive or negative shift in the PSE) should also be determined by the pattern of HOA's in their eyes. Simply comparing the absolute magnitudes does not make sense. Would it be possible to convolve the compound grating with the PSF obtained from each patient and predict which direction should the PSE shift? This prediction can then be compared with the observed shift in the PSE's.

      A third weakness of the study may be the assumption that the phase recalibration in keratoconic cohort may be eye-specific. That is, if the participant has dissimilar severities of keratoconus, the probed eye's aberration profile may determine the phase profile that the eye is calibrated to. I am not sure to what extent this assumption is valid. Further, under natural viewing, the pupil size will change with light intensity and accommodative state and this will, in turn, determine the optical quality of the eye. Given this, it is not clear what will the visual system recalibrate itself to, when the phase shifts in the retinal image may keep changing from the underlying blur profile in the retina. Also, if the disease is progressive in nature (in their cohort, the authors indicate that the disease did not progress), the calibration state should also constantly change. What is the time scale of such a calibration and could there be multiple states of such adaptation remains to be explored. This, of course, is not a weakness of the present study, but an open question for the future.

      Finally, one additional experiment could have been performed (this is good to have information and certainly not a necessity). What is the wavefront profile of a few keratoconic patients that participated in the study, used as the adaptation profile in the 2nd experiment (as opposed to a fixed level of coma)? Would a 60-min paradigm result in adapted states that will result in phase shifts matching what is experienced by keratoconic eyes (see Marella et al., Vis Res, 2024 for a similar induced experiment for studying the impact of phase shifts on visual and stereoacuities)?

    3. Reviewer #2 (Public review):

      Summary:

      The authors examine the ability of the human visual system to adapt to optically induced phase shifts. The study shows clear adaptation to the relative phase created by exposure to vertical coma. The study assesses the impact of adaptation to the coma on the perceived relative phase of f and 3f compound gratings. It is observed that during the first couple of minutes of a 1-hour exposure to induced vertical coma, the apparent relative locations of the f and 3f shifted in the opposite direction to that induced by the coma, a classic adaptation effect. This result highlights a neural mechanism by which flawed information is used to create seemingly accurate perceptions of the visual environment.

      Strengths:

      Sophisticated and rigorous optical and psychophysical methods, and a clear research question. The manuscript is well-written and the data quality is very high. The authors are to be congratulated on this challenging and complex optics and psychophysics study.

      Weaknesses:

      Some more details on the phase and amplitude consequences of the induced coma would add value to the reader.

    1. eLife Assessment

      The study presents some useful findings on Mendelian randomization-phenome-wide association, with BMI associated with health outcomes, and there is a focus on sex differences. Although there are some solid phenotype and genotype data, some of the data are incomplete and could be better presented, perhaps benefiting from more rigorous approaches. Confirmation and further assessment of the observed sex differences will add further value.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses information from the UK Biobank and aims to investigate the role of BMI on various health outcomes, with a focus on differences by sex. They confirm the relevance of many of the well-known associations between BMI and health outcomes for males and females and suggest that associations for some endpoints may differ by sex. Overall their conclusions appear supported by the data. The significance of the observed sex variations will require confirmation and further assessment.

      Strengths:

      This is one of the first systematic evaluations of sex differences between BMI and health outcomes.

      The hypothesis that BMI may be associated with health differentially based on sex is relevant and even expected. As muscle is heavier than adipose tissue, and as men typically have more muscle than women, as a body composition measure BMI is sometimes prone to classifying even normal weight/muscular men as obese, while this measure is more lenient when used in women.

      Confirmation of the many well-known associations is as expected and attests to the validity of their approach.

      Demonstration of the possible sex differences is interesting, with this work raising the need for further study.

      Weaknesses:

      Many of the statistical decisions appeared to target power at the expense of quality/accuracy. For example, they chose to use self-reported information rather than doctor diagnoses for disease outcomes for which both types of data were available.

      Despite known problems and bias arising from the use of one sample approach, they chose to use instruments from the UK Biobank instead of those available from the independent GIANT GWAS, despite the difference in sample size being only marginally greater for UKB for the context. With the way the data is presented, it is difficult to assess the extent to which results are compatible across approaches.

      The approach to multiple testing correction appears very lenient, although the lack of accuracy in the reporting makes it difficult to know what was done exactly. The way it reads, FDR correction was done separately for men, and then for women (assuming that the duplication in tests following stratification does not affect the number of tests). In the second stage, they compared differences by sex using Z-test, apparently without accounting for multiple testing.

      Presentation lacks accuracy in a few places, hence assessment of the accuracy of the statements made by the authors is difficult.

      Conclusion "These findings highlight the importance of retaining a healthy BMI" is rather uninformative, especially as they claim that for some attributes the effects of BMI may be opposite depending on sex/gender.

    3. Reviewer #2 (Public review):

      Summary:

      In this present Mendelian randomization-phenome-wide association study, the authors found BMI to be positively associated with many health-related conditions, such as heart disease, heart failure, and hypertensive heart disease. They also found sex differences in some traits such as cancer, psychological disorders, and ApoB.

      Strengths:

      The use of the UK-biobank study with detailed phenotype and genotype information.

      Weaknesses:

      Previous studies have performed this analysis using the same cohort, with in-depth analysis. See this paper: Searching for the causal effects of body mass index in over 300,000 participants in UK Biobank, using Mendelian randomization. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007951

      I believe that the authors' claim, "To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes," is not well supported. They have not cited a relevant paper that conducted both overall and sex-stratified PheWAS using UK Biobank data with a detailed analysis. Given the prior study linked above, I am uncertain about the additional contributions of the present research.

    1. eLife Assessment

      This important study explores the power of computational methods to predict lifespan-extending small molecules, demonstrating that while these methods significantly increase hit rates, experimental validation remains essential. The study uses all-trans retinoic acid in Caenorhabditis elegans as a model, providing genetic and transcriptomic insights into its longevity effects. The data are compelling in describing a robust, computationally informed screening process for discovering compounds that extend lifespan in this species.

    2. Reviewer #1 (Public review):

      Summary:

      This study highlights the strengths of using predictive computational models to inform C. elegans screening studies of compounds' effects on aging and lifespan. The authors primarily focus on all-trans retinoic acid (atRA), one of the 5 compounds (out of 16 tested) that extended C. elegans lifespan in their experiments. They show that atRA has positive effects on C. elegans lifespan and age-related health, while it has more modest and inconsistent effects (i.e., some detrimental impacts) for C. briggsae and C. tropicalis. In genetic experiments designed to evaluate contributing mediators of lifespan extension with atRA exposure, it was found that 150 µM of atRA did not significantly extend lifespan in akt-1 or akt-2 loss-of-function mutants, nor in animals with loss of function of aak-2, or skn-1 (in which atRA had toxic effects); these genes appear to be required for atRA-mediated lifespan extension. hsf-1 and daf-16 loss-of-function mutants both had a modest but statistically significant lifespan extension with 150 µM of atRA, suggesting that these transcription factors may contribute towards mediating atRA lifespan extension, but that they are not individually required for some lifespan extension. RNAseq assessment of transcriptional changes in day 4 atRA-treated adult wild-type worms revealed some interesting observations. Consistent with the study's genetic mutant lifespan observations, many of the atRA-regulated genes with the greatest fold-change differences are known regulated targets of daf-2 and/or skn-1 signaling pathways in C. elegans. hsf-1 loss-of-function mutants show a shifted atRA transcriptional response, revealing a dependence on hsf-1 for ~60% of the atRA-downregulated genes. On the other hand, RNAseq analysis in aak-2 loss-of-function mutants revealed that aak-2 is only required for less than a quarter of the atRA transcriptional response. All together, this study is proof of the concept that computational models can help optimize C. elegans screening approaches that test compounds' effects on lifespan, and provide comprehensive transcriptomic and genetic insights into the lifespan-extending effects of all-trans retinoic acid (atRA).

      Strengths:

      (1) A clearly described and well-justified account describes the approach used to prioritize and select compounds for screening, based on using the top candidates from a published list of computationally ranked compounds (Fuentealba et al., 2019) that were cross-referenced with other bioinformatics publications to predict anti-aging compounds, after de-selecting compounds previously evaluated in C. elegans as per the DrugAge database. 16 compounds were tested at 4-5 different concentrations to evaluate effects on C. elegans lifespan.

      (2) Robust experimental design was undertaken evaluating the lifespan effects of atRA, as it was tested on three strains each of C. elegans, C. briggsae, and C. tropicalis, with trial replication performed at three distinct laboratories. These observations extended beyond lifespan to include evaluations of health metrics related to swimming performance.

      (3) In-depth analyses of the RNAseq data of whole-worm transcriptional responses to atRA revealed interesting insights into regulator pathways and novel groups of genes that may be involved in mediating lifespan-extension effects (e.g., atRA-induced upregulation of sphingolipid metabolism genes, atRA-upregulation of genes in a poorly-characterized family of C. elegans paralogs predicted to have kinase-like activity, and disproportionate downregulation of collagen genes with atRA).

      Weaknesses:

      (1) The authors' computational-based compound screening approach led to a ~30% prediction success rate for compounds that could extend the median lifespan of C. elegans. However, follow-up experiments on the top compounds highlighted the fact that some of these observed "successes" could be driven by indirect, confounding effects of these compounds on the bacterial food source, rather than direct beneficial effects on C. elegans physiology and lifespan. For instance, this appeared to be the case for the "top" hit of propranolol; other compounds were not tested with metabolically inert or killed bacteria. In addition, there are no comparative metrics provided to compare this study's ~30% success rate to screening approaches that do not use computational predictions.

      (2) Transcriptomic analyses of atRA effects were extensive in this study, but evaluations and discussions of non-transcriptional effects of key proposed regulators (such as AMPK) were limited. For instance, non-transcriptional effects of aak-2/AMPK might account for its requirement for mediating lifespan extension effects, since aak-2 was not required for a major proportion of atRA transcriptional responses.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Banse et al. experimentally validate the power of computational approaches that predict anti-aging molecules using the multi-species approach of the Caenorhabditis Intervention Testing Program (CITP). Filtering candidate molecules based on transcriptional profiles, ML models, literature searches, and the DrugAge database, they selected 16 compounds for testing. Of those, eight did not affect C.elegan's lifespan, three shortened it, and five extended C.elegan's lifespan, resulting in a hit rate of over 30%. Of those five, they then focused on all-trans-retinoic acid (atRA), a compound that has previously resulted in contradictory effects. The lifespan-extending effect of atRA was consistent in all C. elegans strains tested, was absent in C. briggsae, and a small effect was observed in some C. tropicalis strains. Similar results were obtained for measures of healthspan. The authors then investigated the mechanism of action of atRA and showed that it was only partially dependent on daf-16 but required akt-1, akt-2, skn-1, hsf-1, and, to some degree, pmk-1. The authors further investigate the downstream effects of atRA exposure by conducting RNAseq experiments in both wild-type and mutant animals to show that some, but surprisingly few, of the gene expression changes that are observed in wild-type animals are lost in the hsf-1 and aak-2 mutants.

      Strengths:

      Overall, this study is well conceived and executed as it investigates the effect of atRA across different concentrations, strains, and species, including life and health span. Revealing the variability between sites, assays, and the method used is a powerful aspect of this study. It will do a lot to dispel the nonsensical illusion that we can determine a percent increase in lifespan to the precision of two floating point numbers.

      An interesting and potentially important implication arises from this study. The computational selection of compounds was agnostic regarding strain or species differences and was predominantly based on observations made in mammalian systems. The hit rate calculated is based on the results of C. elegans and not on the molecules' effectiveness in Briggsae or Tropicalis. If it were, the hit rate would be much lower. How is that? It would suggest that ML models and transcriptional data obtained from mammals have a higher predictive value for C. elegans than for the other two species. This selectivity for C.elegans over C.tropicalis and C.Briggsae seems both puzzling and unexpected. The predictions for longevity were based on the transcriptional data in cell lines. Would it be feasible to compare the mammalian data to the transcriptional data in Figure 5 and see how well they match? While this is clear beyond the focus of this study, an implied prediction is that running RNAseqs for all these strains exposed to atRA would reveal that the transcriptional changes observed in the strains where it extends lifespan the most should match the mammalian data best. Otherwise, how could the mammalian datasets be used to predict the effects of C.elegans over C.Briggsae or C.Tropicalis have more predictive for one species than the other? There are a lot of IFs in this prediction, but such an experiment would reconsider and validate the basis on which the original predictions were made.

      Weaknesses:

      Many of the most upregulated genes, such as cyps and pgps are xenobiotic response genes upregulated in many transcriptional datasets from C.elegans drug studies. Their expression might be necessary to deal with atRA breakdown metabolites to prevent toxicity rather than confer longevity. Because atRA is very light sensitive and has toxicity of breakdown, metabolites may explain some of the differences observed with the lifespan of machine effects compared to standard assay practices.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Banse et al., demonstrate that combining computer prediction with genetic analysis in distinct Caenorhabditis species can streamline the discovery of aging interventions by taking advantage of the diverse pool of compounds that are currently available. They demonstrate that through careful prioritization of candidate compounds, they are able to accomplish a 30% positive hit rate for interventions that produce significant lifespan extensions. Within the positive hits, they focus on all-trans retinoic acid (atRA) and discover that it modulates lifespan through conserved longevity pathways such as AKT-1 and AKT-2 (and other conserved Akt-targets such as Nrf2/SKN-1 and HSF1/HSF-1) as well as through AAK-2, a conserved catalytic subunit of AMPK. To better understand the genetic mechanisms behind lifespan extension upon atRA treatment, the authors perform RNAseq experiments using a variety of genetic backgrounds for cross-comparison and validation. Using this current state-of-the-art approach for studying gene expression, the authors determine that atRA treatment produces gene expression changes across a broad set of stress-response and longevity-related pathways. Overall, this study is important since it highlights the potential of combining traditional genetic analysis in the genetically tractable organism C. elegans with computational methods that will become even more powerful with the swift advancements being made in artificial intelligence. The study possesses both theoretical and practical implications not only in the field of aging but also in related fields such as health and disease. Most of the claims in this study are supported by solid evidence, but the conclusions can be refined with a small set of additional experiments or re-analysis of data.

      Strengths:

      (1) The criteria for prioritizing compounds for screening are well-defined and easy to replicate (Figure 1), even for scientists with limited experience in computational biology. The approach is also adaptable to other systems or model organisms.

      (2) I commend the researchers for doing follow-up experiments with the compound propranolol to verify its effect on lifespan (Figure 2 Supplement 2), given the observation that it affected the growth of OP50. To prevent false hits in the future, the reviewer recommends the use of inactivated OP50 for future experiments to remove this confounding variable.

      (3) The sources of variation (Figure 3, Figure Supplement 2) are taken into account and demonstrate the need for advancing our understanding of the lifespan phenotype due to inter-individual variation.

      (4) The addition of the C. elegans swim test in addition to the lifespan assays provides further evidence of atRA-induced improvement in longevity.

      (5) The RNAseq approach was performed in a variety of genetic backgrounds, which allowed the authors to determine the relationship between AAK-2 and HSF-1 regulation of the retinoic acid pathway in C. elegans, specifically, that the former functions downstream of the latter.

      Weaknesses:

      (1) The filtering of compounds for testing using the DrugAge database requires that the database is consistently updated. In this particular case, even though atRA does not appear in the database, the authors themselves cite literature that has already demonstrated atRA-induced lifespan extension, which should have precluded this compound from the analysis in the first place.

      (2) The threshold for determining positive hits is arbitrary, and in this case, a 30% positive hit rate was observed when the threshold is set to a lifespan extension of around 5% based on Figure 1B (the authors fail to explicitly state the cut-off for what is considered a positive hit).

      (3) The authors demonstrate that atRA extends lifespan in a species-specific manner (Figure 3). Specifically, this extension only occurs in the species C. elegans yet, the title implies that atRA-induced lifespan extension occurs in different Caenorhabditis species when it is clearly not the case. While the authors state that failure to observe phenotypes in C. briggsae and C. tropicalis is a common feature of CITP tests, they do not speculate as to why this phenomenon occurs.

      (4) There are discrepancies between the lifespan curves by hand (Figure 3 Figure Supplement 1) and using the automated lifespan machine (Figure 3 Supplement 3). Specifically, in the automated lifespan assays, there are drastic changes in the slope of the survival curve which do not occur in the manual assays. This may be due to improper filtering of non-worm objects, improper annotation of death times, or improper distribution of plates in each scanner.

      (5) The authors miss an opportunity to determine whether the lifespan extension phenotype attributed to the retinoic acid pathway is mostly transcriptional in nature or whether some of it is post-transcriptional. The authors even state "that while aak-2 is absolutely required for the longevity effects of atRA, aak-2 is required only for a small proportion (~1/4) of the transcriptional response", suggesting that some of the effects are post-transcriptional. Further information could have been obtained had the authors also performed RNAseq analysis on the tol-1 mutant which exhibited an enhanced response to atRA compared to wild-type animals, and comparing the magnitude of gene expression changes between the tol-1 mutant and all other genetic backgrounds for which RNAseq was performed.

    1. eLife Assessment

      This study reveals that female moths utilize ultrasonic sounds emitted by dehydrated plants to inform their oviposition decisions, highlighting sound as a potential sensory cue for optimal host plant selection. By investigating this novel acoustic interaction, the research adds an important piece to our understanding of plant-insect interactions. While the authors employed an overall solid experimental approach, weaknesses include the lack of raw data and individual data point visualization, inconsistencies in moth responses to sound cues with and without plants, and the use of a click frequency higher than what plants typically produce, which may limit the ecological applicability and broader generalization of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on well-watered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area. Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated that the moths gradually spent more time at the arena side with the click sounds.

      Strengths:

      This manuscript is very interesting to read and the possibility that female moths might use sound as an additional sensory modality during host-searching is exciting and very relevant to the field of insect-plant interactions.

      Weaknesses:

      The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dB-levels.

      Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

    3. Reviewer #2 (Public review):

      This paper presents an interesting and fresh approach as it investigates whether female moths utilize plant-emitted ultrasounds, particularly those associated with dehydration stress, in their egg-laying decision-making process.

      Female moths showed a preference for moist, fresh plants over dehydrated ones in experiments using actual plants. Additionally, when both plants were fresh but ultrasonic sounds specific to dehydrated plants were presented from one side, the moths chose the silent plant. However, in experiments without plants, contrary to the hypothesis derived from the above results, the moths preferred to oviposit near ultrasonic playback mimicking the sounds of dehydrated plants. 

      The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.

      Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials.

      There were also areas where more detailed explanations of the experimental methods would be beneficial.

    1. eLife Assessment

      This descriptive manuscript builds on prior research showing that the elimination of Origin Recognition Complex (ORC) subunits does not halt DNA replication. The authors use various methods to genetically remove one or two ORC subunits from specific tissues and observe continued replication, though it may be incomplete. The replication appears to be primarily endoreduplication, indicating that ORC-independent replication may promote genome reduplication without mitosis. Despite similar findings in previous studies, the paper provides convincing genetic evidence in mice that liver cells can replicate and undergo endoreduplication even with severely depleted ORC levels. While the mechanism behind this ORC-independent replication remains unclear, the study lays the groundwork for future research to explore how cells compensate for the absence of ORC and to develop functional approaches to investigate this process. The reviewers agree that this valuable paper would be strengthened significantly if the authors could delve a bit deeper into the nature of replication initiation, potentially using an origin mapping experiment. Such an exciting contribution would help explain the nature of the proposed new type of Mcm loading, thereby increasing the impact of this study for the field at large.

    2. Reviewer #1 (Public review):

      The origin recognition complex (ORC) is an essential loading factor for the replicative Mcm2-7 helicase complex. Despite ORC's critical role in DNA replication, there have been instances where the loss of specific ORC subunits has still seemingly supported DNA replication in cancer cells, endocycling hepatocytes, and Drosophila polyploid cells. Critically, all tested ORC subunits are essential for development and proliferation in normal cells. This presents a challenge, as conditional knockouts need to be generated, and a skeptic can always claim that there were limiting but sufficient ORC levels for helicase loading and replication in polyploid or transformed cells. That being said, the authors have consistently pushed the system to demonstrate replication in the absence or extreme depletion of ORC subunits.

      Here, the authors generate conditional ORC2 mutants to counter a potential argument with prior conditional ORC1 mutants that Cdc6 may substitute for ORC1 function based on homology. They also generate a double ORC1 and ORC2 mutant, which is still capable of DNA replication in polyploid hepatocytes. While this manuscript provides significantly more support for the ability of select cells to replicate in the absence or near absence of select ORC subunits, it does not shed light on a potential mechanism.

      The strengths of this manuscript are the mouse genetics and the generation of conditional alleles of ORC2 and the rigorous assessment of phenotypes resulting from limiting amounts of specific ORC subunits. It also builds on prior work with ORC1 to rule out Cdc6 complementing the loss of ORC1.

      The weakness is that it is a very hard task to resolve the fundamental question of how much ORC is enough for replication in cancer cells or hepatocytes. Clearly, there is a marked reduction in specific ORC subunits that is sufficient to impact replication during development and in fibroblasts, but the devil's advocate can always claim minimal levels of ORC remaining in these specialized cells.

      The significance of the work is that the authors keep improving their conditional alleles (and combining them), thus making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC. This work lays the foundation for future functional screens to identify other factors that may modulate the response to the loss of ORC subunits.

      This work will be of interest to the DNA replication, polyploidy, and genome stability communities.

    3. Reviewer #2 (Public review):

      This manuscript proposes that primary hepatocytes can replicate their DNA without the six-subunit ORC. This follows previous studies that examined mice that did not express ORC1 in the liver. In this study, the authors suppressed expression of ORC2 or ORC1 plus ORC2 in the liver.

      Comments:

      (1) I find the conclusion of the authors somewhat hard to accept. Biochemically, ORC without the ORC1 or ORC2 subunits cannot load the MCM helicase on DNA. The question arises whether the deletion in the ORC1 and ORC2 genes by Cre is not very tight, allowing some cells to replicate their DNA and allow the liver to develop, or whether the replication of DNA proceeds via non-canonical mechanisms, such as break-induced replication. The increase in the number of polyploid cells in the mice expressing Cre supports the first mechanism, because it is consistent with few cells retaining the capacity to replicate their DNA, at least for some time during development.

      (2) Fig 1H shows that 5 days post infection, there is no visible expression of ORC2 in MEFs with the ORC2 flox allele. However, at 15 days post infection, some ORC2 is visible. The authors suggest that a small number of cells that retained expression of ORC2 were selected over the cells not expressing ORC2. Could a similar scenario also happen in vivo?

      (3) Figs 2E-G shows decreased body weight, decreased liver weight and decreased liver to body weight in mice with recombination of the ORC2 flox allele. This means that DNA replication is compromised in the ALB-ORC2f/f mice.

      (4) Figs 2I-K do not report the number of hepatocytes, but the percent of hepatocytes with different nuclear sizes. I suspect that the number of hepatocytes is lower in the ALB-ORC2f/f mice than in the ORC2f/f mice. Can the authors report the actual numbers?

      (5) Figs 3B-G do not report the number of nuclei, but percentages, which are plotted separately for the ORC2-f/f and ALB-ORC2-f/f mice. Can the authors report the actual numbers?

      (6) Fig 5 shows the response of ORC2f/f and ALB-ORC2f/f mice after partial hepatectomy. The percent of EdU+ nuclei in the ORC2-f/f (aka ALB-CRE-/-) mice in Fig 5H seems low. Based on other publications in the field it should be about 20-30%. Why is it so low here? The very low nuclear density in the ALB-ORC2-f/f mice (Fig 5F) and the large nuclei (Fig 5I) could indicate that cells fire too few origins, proceed through S phase very slowly and fail to divide.

      (7) Fig 6F shows that ALB-ORC1f/f-ORC2f/f mice have very severe phenotypes in terms of body weight and liver weight (about on third of wild-type!!). Fig 6H and 6I, the actual numbers should be presented, not percentages. The fact that there are EYFP negative cells, implies that CRE was not expressed in all hepatocytes.

      (8) Comparing the EdU+ cells in Fig 7G versus 5G shows very different number of EdU+ cells in the control animals. This means that one of these images is not representative. The higher fraction of EdU+ cells in the double-knockout could mean that the hepatocytes in the double-knockout take longer to complete DNA replication than the control hepatocytes. The control hepatocytes may have already completed DNA replication, which can explain why the fraction of EdU+ cells is so low in the controls. The authors may need to study mice at earlier time points after partial hepatectomy, i.e. sacrifice the mice at 30-32 hours, instead of 40-52 hours.

      (9) Regarding the calculation of the number of cell divisions during development: the authors assume that all the hepatocytes in the adult liver are derived from hepatoblasts that express Alb. Is it possible to exclude the possibility that pre-hepatoblast cells that do not express Alb give rise to hepatocytes? For example the cells that give rise to hepatoblasts may proliferate more times than normal giving rise to a higher number of hepatoblasts than in wild-type mice.

      (10) My interpretation of the data is that not all hepatocytes have the ORC1 and ORC2 genes deleted (eg EYFP-negative cells) and that these cells allow some proliferation in the livers of these mice.

    4. Reviewer #3 (Public review):

      Summary:

      The authors address the role of ORC in DNA replication and that this protein complex is not essential for DNA replication in hepatocytes. They provide evidence that ORC subunit levels are substantially reduced in cells that have been induced to delete multiple exons of the corresponding ORC gene(s) in hepatocytes. They evaluate replication both in purified isolated hepatocytes and in mice after hepatectomy. In both cases, there is clear evidence that DNA replication does not decrease at a level that corresponds with the decrease in detectable ORC subunit and that endoreduplication is the primary type of replication observed. It remains possible that small amounts of residual ORC are responsible for the replication observed, although the authors provide arguments against this possibility. The mechanisms responsible for DNA replication in the absence of ORC are not examined.

      Strengths:

      The authors clearly show that there are dramatic reductions in the amount of the targeted ORC subunits in the cells that have been targeted for deletion. They also provide clear evidence that there is replication in a subset of these cells and that it is likely due to endoreduplication. Although there is no replication in MEFs derived from cells with the deletion, there is clearly DNA replication occurring in hepatocytes (both isolated in culture and in the context of the liver). Interestingly, the cells undergoing replication exhibit enlarged cell sizes and elevated ploidy indicating endoreduplication of the genome. These findings raise the interesting possibility that endoreduplication does not require ORC while normal replication does.

      Weaknesses:

      There are two significant weaknesses in this manuscript. The first is that although there is clearly robust reduction of the targeted ORC subunit, the authors cannot confirm that it is deleted in all cells. For example, the analysis in Fig. 4B would suggest that a substantial number of cells have not lost the targeted region of ORC2. Although the western blots show stronger effects, this type of analysis is notorious for non-linear response curves and no standards are provided. The second weakness is that there is no evaluation of the molecular nature of the replication observed. Are there changes in the amount of location of Mcm2-7 loading that is usually mediated by ORC? Does an associated change in Mcm2-7 loading lead to the endoreduplication observed? After numerous papers from this lab and others claiming that ORC is not required for eukaryotic DNA replication in a subset of cells, we still have no information about an alternative pathway that could explain this observation.

      The authors frequently use the presence of a Cre-dependent eYFP expression as evidence that the ORC1 or ORC2 genes have been deleted. Although likely the best visual marker for this, it is not demonstrated that the presence of eYFP ensures that ORC2 has been targeted by Cre. For example, based on the data in Fig. 4B, there seems to be a substantial percentage of ORC2 genes that have not been targeted while the authors report that 100% of the cells express eYFP.

    5. Author response:

      eLife Assessment

      This descriptive manuscript builds on prior research showing that the elimination of Origin Recognition Complex (ORC) subunits does not halt DNA replication. The authors use various methods to genetically remove one or two ORC subunits from specific tissues and observe continued replication, though it may be incomplete. The replication appears to be primarily endoreduplication, indicating that ORC-independent replication may promote genome reduplication without mitosis. Despite similar findings in previous studies, the paper provides convincing genetic evidence in mice that liver cells can replicate and undergo endoreduplication even with severely depleted ORC levels. While the mechanism behind this ORC-independent replication remains unclear, the study lays the groundwork for future research to explore how cells compensate for the absence of ORC and to develop functional approaches to investigate this process. The reviewers agree that this valuable paper would be strengthened significantly if the authors could delve a bit deeper into the nature of replication initiation, potentially using an origin mapping experiment. Such an exciting contribution would help explain the nature of the proposed new type of Mcm loading, thereby increasing the impact of this study for the field at large.<br />

      We appreciate the reviewers’ suggestion. Till now we know of only one paper that mapped origins of replication in regenerating mouse liver, and that was published two months back in Cell (PMID: 39293447).  We want to adopt this method, but we do not need it to answer the question asked.  We have mapped origins of replication in ORC-deleted cancer cell lines and compared to wild-type cells in Shibata et al., BioRXiv (PMID: 39554186) (it is under review).  We report the following:  Mapping of origins in cancer cell lines that are wild type or engineered to delete three of the subunits, ORC1, ORC2 or ORC5 shows that specific origins are still used and are mostly at the same sites in the genome as in wild type cells. Of the 30,197 origins in wild type cells (with ORC), only 2,466 (8%) are not used in any of the three ORC deleted cells and 18,319 (60%) are common between the four cell types. Despite the lack of ORC, excess MCM2-7 is still loaded at comparable rates in G1 phase to license reserve origins and is also repeatedly loaded in the same S phase to permit re-replication. 

      Citation: Specific origin selection and excess functional MCM2-7 loading in ORC-deficient cells. Yoshiyuki Shibata, Mihaela Peycheva, Etsuko Shibata, Daniel Malzl, Rushad Pavri, Anindya Dutta. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186)

      Public Reviews:

      Reviewer #1 (Public review):

      The origin recognition complex (ORC) is an essential loading factor for the replicative Mcm2-7 helicase complex. Despite ORC's critical role in DNA replication, there have been instances where the loss of specific ORC subunits has still seemingly supported DNA replication in cancer cells, endocycling hepatocytes, and Drosophila polyploid cells. Critically, all tested ORC subunits are essential for development and proliferation in normal cells. This presents a challenge, as conditional knockouts need to be generated, and a skeptic can always claim that there were limiting but sufficient ORC levels for helicase loading and replication in polyploid or transformed cells. That being said, the authors have consistently pushed the system to demonstrate replication in the absence or extreme depletion of ORC subunits.

      Here, the authors generate conditional ORC2 mutants to counter a potential argument with prior conditional ORC1 mutants that Cdc6 may substitute for ORC1 function based on homology. They also generate a double ORC1 and ORC2 mutant, which is still capable of DNA replication in polyploid hepatocytes. While this manuscript provides significantly more support for the ability of select cells to replicate in the absence or near absence of select ORC subunits, it does not shed light on a potential mechanism.

      The strengths of this manuscript are the mouse genetics and the generation of conditional alleles of ORC2 and the rigorous assessment of phenotypes resulting from limiting amounts of specific ORC subunits. It also builds on prior work with ORC1 to rule out Cdc6 complementing the loss of ORC1.

      The weakness is that it is a very hard task to resolve the fundamental question of how much ORC is enough for replication in cancer cells or hepatocytes. Clearly, there is a marked reduction in specific ORC subunits that is sufficient to impact replication during development and in fibroblasts, but the devil's advocate can always claim minimal levels of ORC remaining in these specialized cells.

      The significance of the work is that the authors keep improving their conditional alleles (and combining them), thus making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC. This work lays the foundation for future functional screens to identify other factors that may modulate the response to the loss of ORC subunits.

      This work will be of interest to the DNA replication, polyploidy, and genome stability communities.

      Thank you.

      Reviewer #2 (Public review):

      This manuscript proposes that primary hepatocytes can replicate their DNA without the six-subunit ORC. This follows previous studies that examined mice that did not express ORC1 in the liver. In this study, the authors suppressed expression of ORC2 or ORC1 plus ORC2 in the liver.

      Comments:

      (1) I find the conclusion of the authors somewhat hard to accept. Biochemically, ORC without the ORC1 or ORC2 subunits cannot load the MCM helicase on DNA. The question arises whether the deletion in the ORC1 and ORC2 genes by Cre is not very tight, allowing some cells to replicate their DNA and allow the liver to develop, or whether the replication of DNA proceeds via non-canonical mechanisms, such as break-induced replication. The increase in the number of polyploid cells in the mice expressing Cre supports the first mechanism, because it is consistent with few cells retaining the capacity to replicate their DNA, at least for some time during development.

      In our study, we used EYFP as a marker for Cre recombinase activity. ~98% of the hepatocytes in tissue sections and cells in culture express EYFP, suggesting that the majority of hepatocytes successfully expressed the Cre protein to delete the ORC1 or ORC2 genes. To assess deletion efficiency, we employed sensitive genotyping and Western blotting techniques to confirm the deletion of ORC1 and ORC2 in hepatocytes isolated from Alb-Cre mice. Results in Fig. 2C and Fig. 6D demonstrate the near-complete absence of ORC2 and ORC1 proteins, respectively, in these hepatocytes.

      The mutant hepatocytes underwent at least 15–18 divisions during development. The inherited ORC1 or ORC2 protein present during the initial cell divisions, would be diluted to less than 1.5% of wild-type levels within six divisions, making it highly unlikely to support DNA replication, and yet we observe hepatocyte numbers that suggest there was robust cell division even after that point.

      Furthermore, the EdU incorporation data confirm DNA synthesis in the absence of ORC1 and ORC2. Specifically, immunofluorescence showed that both in vitro and in vivo, EYFP-positive hepatocytes (indicating successful ORC1 and ORC2 deletion) incorporated EdU, demonstrating that DNA synthesis can occur without ORC1 and ORC2.

      Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      (2) Fig 1H shows that 5 days post infection, there is no visible expression of ORC2 in MEFs with the ORC2 flox allele. However, at 15 days post infection, some ORC2 is visible. The authors suggest that a small number of cells that retained expression of ORC2 were selected over the cells not expressing ORC2. Could a similar scenario also happen in vivo?

      This would not explain the significant incorporation of EdU in hepatocytes that do not have detectable ORC by Western blots and that are EYFP positive.  Also note that for MEFs we are delivering the Cre by AAV infection in vitro, so there is a finite probability that a cell will not receive Cre and will not delete ORC2.  However, in vivo, the Alb-Cre will be expressed in every cell that turns on albumin.  There is no escaping the expression of Cre.

      (3) Figs 2E-G shows decreased body weight, decreased liver weight and decreased liver to body weight in mice with recombination of the ORC2 flox allele. This means that DNA replication is compromised in the ALB-ORC2f/f mice.

      It is possible that DNA replication is partially compromised or may slow down in the absence of ORC2. However, it is important to emphasize that livers with ORC2 deletion remain capable of DNA replication, so much so that liver function and life span are near normal. Therefore, some kind of DNA replication has to serve as a compensatory mechanism in the absence of ORC2 to maintain liver function and support regeneration.

      (4) Figs 2I-K do not report the number of hepatocytes, but the percent of hepatocytes with different nuclear sizes. I suspect that the number of hepatocytes is lower in the ALB-ORC2f/f mice than in the ORC2f/f mice. Can the authors report the actual numbers?

      We show in Table 2 that the Alb-Orc2f/f mice have about 25-37.5% of the hepatocytes compared to the WT mice.

      (5) Figs 3B-G do not report the number of nuclei, but percentages, which are plotted separately for the ORC2-f/f and ALB-ORC2-f/f mice. Can the authors report the actual numbers?

      In all the FACS experiments in Fig. 3B-G we collect data for a total of 10,000 nuclei (or cells).  For Fig. 3E-G we divide the 10,000 nuclei into the bottom 40% on the EYFP axis (EYFP low, which is mostly EYFP negative) as the control group, and EYFP high (top 20% on the EYFP axis) test group.  We will mention this in the revision and label EYFP negative and positive as EYFP low and high.

      (6) Fig 5 shows the response of ORC2f/f and ALB-ORC2f/f mice after partial hepatectomy. The percent of EdU+ nuclei in the ORC2-f/f (aka ALB-CRE-/-) mice in Fig 5H seems low. Based on other publications in the field it should be about 20-30%. Why is it so low here? The very low nuclear density in the ALB-ORC2-f/f mice (Fig 5F) and the large nuclei (Fig 5I) could indicate that cells fire too few origins, proceed through S phase very slowly and fail to divide.

      The percentage of EdU+ nuclei in the ORC2f/f without Alb-Cre mice is 8%, while in PMID 10623657, the 10% of wild type nuclei incorporate  EdU at 42 hr post partial hepatectomy (mid-point between the 36-48 hr post hepatectomy that was used in our study).  The important result here is that in the ORC2f/f mice with Alb-Cre (+/-) we are seeing significant EdU incorporation. We will also correct the X-axis labels in 5F, 5I, 7E and 7F to reflect that those measurements were not made at 36 hr post-resection but later (as was indicated in the schematic in Fig. 5A).

      (7) Fig 6F shows that ALB-ORC1f/f-ORC2f/f mice have very severe phenotypes in terms of body weight and liver weight (about on third of wild-type!!). Fig 6H and 6I, the actual numbers should be presented, not percentages. The fact that there are EYFP negative cells, implies that CRE was not expressed in all hepatocytes.

      The liver to body weight ratio is what one has to look at, and it is 70% of the WT.  In females the liver and body weight are low (although in proportion to each other), which maybe is what the reviewer is talking about.  However, the fact that liver weight and body weight are not as low in males, suggest that this is a gender (hormone?) specific effect and not a DNA replication defect.  We have another paper also in BioRXiv (Su et al.) that suggests that ORC subunits have significant effect on gene expression, so it is possible that that is what leads to this sexual dimorphism in phenotype.

      The bottom 40% of nuclei on the EYFP axis in the FACS profiles (what was labeled EYFP negative but will now be called EYFP low) contains mostly non-hepatocytes that are genuinely EYFP negative.   Non-hepatocytes (bile duct cells, endothelial cells, Kupffer cells, blood cells) are a significant part of cells in the dissociated liver (as can be seen in the single cell sequencing results in PMID: 32690901).  Their presence does not mean that hepatocytes are not expressing Cre.  Hepatocytes mostly are EYFP positive, as can be seen in the tissue sections (where the hepatocytes take up most of visual field) and in cells in culture.  Also if there are EYFP negative hepatocyte nuclei in the FACS, that still does not rule out EYFP presence in the cytoplasm.  The important point from the FACS is that the EYFP high nuclei (which have expressed Cre for the longest period) are polyploid relative to the EYFP low nuclei.

      (8) Comparing the EdU+ cells in Fig 7G versus 5G shows very different number of EdU+ cells in the control animals. This means that one of these images is not representative. The higher fraction of EdU+ cells in the double-knockout could mean that the hepatocytes in the double-knockout take longer to complete DNA replication than the control hepatocytes. The control hepatocytes may have already completed DNA replication, which can explain why the fraction of EdU+ cells is so low in the controls. The authors may need to study mice at earlier time points after partial hepatectomy, i.e. sacrifice the mice at 30-32 hours, instead of 40-52 hours.

      The apparent difference that the reviewer comments on stems from differences in nuclear density in the images in Fig. 7G and 5G (also quantitated in Fig. 7F and 5F).  The quantitation in Fig. 7H and 5H show that the % of EdU plus cells are comparable (5-8%). 

      (9) Regarding the calculation of the number of cell divisions during development: the authors assume that all the hepatocytes in the adult liver are derived from hepatoblasts that express Alb. Is it possible to exclude the possibility that pre-hepatoblast cells that do not express Alb give rise to hepatocytes? For example the cells that give rise to hepatoblasts may proliferate more times than normal giving rise to a higher number of hepatoblasts than in wild-type mice.

      Single cell sequencing of mouse liver at e11 shows hepatoblasts expressing hepatocyte specific markers (PMID: 32690901).  All the cells annotated from the single-cell seq analysis are differentiated cells arguing against the possibility that undifferentiated endodermal cells (what the reviewer probably means by pre-hepatoblasts) exist at e11.  The following review (https://www.ncbi.nlm.nih.gov/books/NBK27068/) says: “The differentiation of bi-potential hepatoblasts into hepatocytes or BECs begins around e13 of mouse development. Initially hepatoblasts express genes associated with both adult hepatocytes (Hnf4α, Albumin) ...”  Thus, we can be certain that undifferentiated endodermal cells are unlikely to persist on e11 and that hepatoblasts at e11 express albumin.  Our calculation of number of cell divisions in Table 2 begins from e12.

      The reviewer maybe suggesting that ORC deletion leads to the immediate demise of hepatoblasts (despite having inherited ORC protein from the endodermal cells) causing undifferentiated endodermal cells to persist and proliferate much longer than in normal development.  We consider it unlikely, but if true it will be amazing new biology, both by suggesting that deletion of ORC immediately leads to the death of the hepatoblasts (despite a healthy reserve of inherited ORC protein) and by suggesting that there is a novel feedback mechanism from the death/depletion of hepatoblasts leading to the persistence and proliferation of undifferentiated endodermal cells.

      (10) My interpretation of the data is that not all hepatocytes have the ORC1 and ORC2 genes deleted (eg EYFP-negative cells) and that these cells allow some proliferation in the livers of these mice.

      Please see the reply in question #1.  Particularly relevant: “Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      Reviewer #3 (Public review):

      Summary:

      The authors address the role of ORC in DNA replication and that this protein complex is not essential for DNA replication in hepatocytes. They provide evidence that ORC subunit levels are substantially reduced in cells that have been induced to delete multiple exons of the corresponding ORC gene(s) in hepatocytes. They evaluate replication both in purified isolated hepatocytes and in mice after hepatectomy. In both cases, there is clear evidence that DNA replication does not decrease at a level that corresponds with the decrease in detectable ORC subunit and that endoreduplication is the primary type of replication observed. It remains possible that small amounts of residual ORC are responsible for the replication observed, although the authors provide arguments against this possibility. The mechanisms responsible for DNA replication in the absence of ORC are not examined.

      Strengths:

      The authors clearly show that there are dramatic reductions in the amount of the targeted ORC subunits in the cells that have been targeted for deletion. They also provide clear evidence that there is replication in a subset of these cells and that it is likely due to endoreduplication. Although there is no replication in MEFs derived from cells with the deletion, there is clearly DNA replication occurring in hepatocytes (both isolated in culture and in the context of the liver). Interestingly, the cells undergoing replication exhibit enlarged cell sizes and elevated ploidy indicating endoreduplication of the genome. These findings raise the interesting possibility that endoreduplication does not require ORC while normal replication does.

      Weaknesses:

      There are two significant weaknesses in this manuscript. The first is that although there is clearly robust reduction of the targeted ORC subunit, the authors cannot confirm that it is deleted in all cells. For example, the analysis in Fig. 4B would suggest that a substantial number of cells have not lost the targeted region of ORC2. Although the western blots show stronger effects, this type of analysis is notorious for non-linear response curves and no standards are provided. The second weakness is that there is no evaluation of the molecular nature of the replication observed. Are there changes in the amount of location of Mcm2-7 loading that is usually mediated by ORC? Does an associated change in Mcm2-7 loading lead to the endoreduplication observed? After numerous papers from this lab and others claiming that ORC is not required for eukaryotic DNA replication in a subset of cells, we still have no information about an alternative pathway that could explain this observation.

      We do not see a significant deficit in MCM2-7 loading (amount and rate) in cancer cell lines where we have deleted ORC1, ORC2 or ORC5 genes separately in Shibata et al. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186)

      The authors frequently use the presence of a Cre-dependent eYFP expression as evidence that the ORC1 or ORC2 genes have been deleted. Although likely the best visual marker for this, it is not demonstrated that the presence of eYFP ensures that ORC2 has been targeted by Cre. For example, based on the data in Fig. 4B, there seems to be a substantial percentage of ORC2 genes that have not been targeted while the authors report that 100% of the cells express eYFP.

      The PCR reactions in Fig. 4B are still contaminated by DNA from non-hepatocyte cells:  bile duct cells, endothelial, Kupfer cells and blood cells.  Under the microscope  culture we can recognize the hepatocytes unequivocally from their morphology. <2% of the hepatocyte cells in culture in Fig. 4C are EYFP-.

    1. eLife Assessment

      In this useful manuscript, the authors performed scRNA-seq on a diverse cohort of 15 early-stage cervical cancer patients. Correlative data is provided to support the possible establishment of an immunosuppressive microenvironment near SCL26A3+ cells, and an association of these cells with upstaging at time of surgery. However without more extensive validation, the evidence supporting the conclusions remains incomplete. Overall, this paper will provide a potentially helpful dataset for researchers studying cervical cancer.

    2. Reviewer #1 (Public review):

      Summary:

      The authors in this manuscript performed scRNA-seq on a cohort of 15 early-stage cervical cancer patients with a mixture of adeno- and squamous cell carcinoma, HPV status, and several samples that were upstaged at the time of surgery. From their analyses they identified differential cell populations in both immune and tumour subsets related to stage, HPV status, and whether a sample was adenocarcinoma or squamous cell. Putative microenvironmental signaling was explored as a potential explanation for their differential cell populations. Through these analyses the authors also identified SLC26A3 as a potential biomarker for later stage/lymph node metastasis which was verified by IHC and IF. The dataset is likely useful for the community. The accuracy and clarity have been improved from the previous version, and additional immunofluorescence supporting the existence of their proposed cluster is now present. That said, there remain some issues with the strength of some claims (particularly in the abstract and results sections) and some of the cell type definitions.

      Strengths

      The dataset could be useful for the community<br /> SLC26A3 could potentially be a useful marker to predict lymph node metastasis with further study

      Weaknesses

      Casual language is used in the abstract around immunosuppressive microenvironment and signal cross-talk between Epi_10_CYSTM1 cluster and Tregs. The data show localization that supports a possible interaction and probable cytokines, but functional experiments would be needed to establish causality.

      In the description of the single cell data processing there is no mention of batch effect correction. Given that many patients were analyzed, and no mention was made of pooling or deconvolution, it must be assumed these were run separately which invariably leads to batch effects. Given the good overlays across patients some batch correction must have been performed. How was batch effect correction performed?

      While statistics were added to the clinical correlates, it would appear that single variables are being assessed one at a time by chi-squared analysis. This ignores the higher order structure of the data and the correlations between some variables resulting in potentially spurious findings. This is compounded as some categories had below 5 observations violating the assumptions of a chi-squared test.

      The description of all analytical steps remains quite truncated. While the inclusion of annotated code is useful, a full description of which tools were used, with which settings, and why each were chosen, is a minimum needed to properly interpret the results. This is as important in a mainly analytical paper as the experimental parameters.

      Validation of the clustering results remains a problem. The only details provided are that FindClusters was used. This depends on a manual choice of multiple parameters including the k-nearest neighbours included, whether Louvain or Leiden clustering is used, the resolution parameter, and others (how many variable genes/PCs etc...). Why were these parameters selected, how do you know that you're not over or under-clustering.

      The cluster Epi_10_CYSTM1 remains somewhat problematic. None of the additional data supports its existence outside of the single patient who has cells from that population. Additionally, it falls well outside of any of the other Epithelial cells to the point that drawing it as part of a differentiation order doesn't even make sense. Indeed, most of the upregulated pathways in this cluster appear to be related to class II antigen presentation which would fit better with a dendritic cell/macrophage than an epithelial cell. While the IF at the end does support the existence of the cluster, numbers are still very limited, and this doesn't have data on the antigen presenting function. At the least a strong disclaimer should be included in the text that this population is essentially exclusive to one sample in the scRNA data.

      The linkage between the cluster types and IHC for prediction of lymph node metastasis is tenuous. Most of the strongly cluster associated markers were not predictive despite their clusters being theoretically enriched. This inability to recognize the clusters in additional samples using alternative methods does not give confidence that these clusters are robust. SLC26A3 being associated with upstaging may very well be a useful marker, however, given the lack of association of the other markers, it may be premature to say this is due to the same Epi_10_CYSTM1 cluster.

      There are multiple issues in the classification of T cells and neutrophils. In the analysis of T cell subset, all CD4+ T cells are currently scored as Tregs, what happened to the T-helper cells? Additionally, Activated T and Cytotoxic T both seem to contain CD8+ cells, but all their populations have equivalent expression of the activation marker CD69. Moreover, the "Cytotoxic" ones also express TIGIT, HAVCR2 and LAG3 which are generally exhaustion markers. For neutrophils, several obviously different clusters have been grouped together (Neu_1 containing two diametrically opposite cell clouds being an obvious example).

      Again in the CellChat section of the results causal language is being repeatedly used. These are just possible interactions, not validated ones. While the co-localization in the provided IF images certainly supports the co-localization, this still is only correlative and doesn't prove causality.

      Minor Issues<br /> The sentence "However, due to the low morbidity of ADC, in-depth investigations are insufficient" could be misinterpreted. Morbidity generally refers to the severity or health burden rather than the frequency of cases, though it's true in some studies prevalence is used for the overall impact of the disease on a population and referred to as morbidity. In this instance though, "incidence" or "prevalence" would be clearer word choices.

      The previous rebuttal states that clusters/cell type calls were refined to eliminate issues such as epithelial cells creeping into the T cell cluster, however, the cell %s have not been altered according to the change tracking. Shouldn't all the %s have been altered even if only slightly?

    3. Reviewer #2 (Public review):

      Summary:

      Peng et al. present a study using scRNA-seq to examine phenotypic properties of cervical cancer, contrasting features of both adenocarcinomas (ADC) and squamous cell carcinoma (SCC), and HPV-positive and negative tumours. They propose several key findings: unique malignant phenotypes in ADC with elevated stemness and aggressive features, interactions of these populations with immune cells to promote an immunosuppressive TME, and SLC26A3 as a biomarker for metastatic (>=Stage III ) tumours.

      Strengths:

      This study provides a valuable resource of scRNA-seq data from a well-curated collection of patient samples. The analysis provides a high-level view of the cellular composition of cervical cancers. The authors introduce some mechanistic explanations of immunosuppression and the involvement of regulatory T cells that is intriguing.

      Weaknesses:

      I believe many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. I believe there may also be some artifacts in the data that may not reflect true biology--eg. The presentation of KRT+ neutrophils, which may reflect doublets with cancer cells. In some cases there is mention of quality control steps to remove contaminant cell clusters, but there is no method or supplemental figure to describe and/or justify these steps.

      The key limitation is related to the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper. This population only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample. Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts are only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort. The manuscript does not address why this marker is so rare in the scRNA-seq data, but abundant in the IHC.

      While I understand it may be out of the scope of this individual study, many of the conclusions are inferred from the data analysis with little follow-up in experimental models or orthogonal assays.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      We have rewritten the introduction of the manuscript and clearly stated the study questions we were aiming for:

      In paragraph 1-we have stated clearly that we need to study why ADC type of cervical cancer is more aggressive. (Line 58 - 77)

      In paragraph 2- we have stated clearly that we need to find valuable biomarkers to help diagnose lymph node metastasis, which may compensate the shortage of radiological imaging tools and reduce the rate of misdiagnosis. (Line 78 - 100)

      In paragraph 3- we have stated clearly that HPV negative cases is a special group of cervical cancer and we aim to study its cellular features. (Line 101 - 108)

      In paragraph 4- we have stated clearly that we need to decode cell-to-cell interaction mode in the tumor immune microenvironment of ADC using scRNA-seq. (Line 109 - 123)

      (2) For the sequencing, which kit was used on the Novaseq6000?

      For sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and have already add the information in Methods section. (Line 196- 197)

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      We apologize for the inadequacy of descriptions of data analysis process. We have already provided a new part of “data processing” with more details in the Methods section (Line 202 - 221). In addition, we have also provided annotated copies of scripts in the supplementary data as Supplementary Data 1.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      We have already added the list of marker genes for cell type annotation in the revised manuscript as Supplementary Table 3.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      We feel sorry for lacking statistics when performing analyses of comparisons. In the revision, we have already used statistic approaches to analyze the differences between each set of group comparison. As a result, the corresponding figures have been revised, accordingly.

      For examle, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D had been re-analyzed to compare ADC/SCC;Supplementary Fig. 1A, Supplementary Fig. 2A, Supplementary Fig. 4A, Supplementary Fig. 5A, Supplementary Fig. 6A had been re-analyzed to compare HPV+/HPV-; Supplementary Fig. 1B, Supplementary Fig. 2B, Supplementary Fig. 4B, Supplementary Fig. 5B, Supplementary Fig. 6B had been re-analyzed to compare Early/Late stage. All P values have been listed in the figure legends.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      We feel sorry for impreciseness when presenting histograms of Fig. 2D and we have also revised other figures with similar mistakes, such as Fig. 1F,  Fig. 5D. As for the width of bars, which is due to output style of data processing, we have already corrected all similar mistakes alongside the whole manuscript, for example, Fig. 2D and Supplementary Fig. 2A-B.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Thank you for your insightful comments. As you noted, several conclusions were initially based on bioinformatics predictions. Thus in the revised manuscript, we have rewritten all relevant descriptions in a more softened way, particularly in the paragraph of “epithelial cells” in Results section, as well as the conclusions derived from bioinformatics predictions in other paragraphs throughout the manuscript. We hope our revised descriptions will enhance the precision of our work.

      For example, in paragraph “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features (from Line 353)”, many over-affirmative disriptions had been re-written in Line 353, 362, 371, 375, 379, 383, 390, 392. From Line 395 to 399, the conclusion had been revised as “The observation of cluster Epi_10_CYSTM1 and its possible specificity to ADC makes us question whether or not it may be related to the aggressiveness of ADC” compared to the previous “This observation may partially indicate that high stemness cluster Epi_10_CYSTM1 is essential for ADC to present more aggressive features”. From Line 400 to 408, conclusions from GO analyses had also been rewritten.

      In paragraph “ADC-specific epithelial cluster-derived gene SLC26A3 is a potential prognostic marker for lymph node metastasis (from Line 422)”, many conclusions based on predictions had been revises, such as Line 424 - 428, Line 439 - 441, Line 451 - 453, Line 455 - 457, Line 458 - 459, Line 471 - 473, Line 478 - 481, Line 484 - 486, Line 489, etc.

      In paragraph “Tumor associated neutrophils (TANs) surrounding ADC tumor area may contribute to the formation of a malignant microenvironment (from Line 536)”, we have changed the descriptions based on bio-infomative predictions, such as Line 560, Line 561, Line 565, Line 566, Line 572, Line 576 - 577, etc.

      In paragraph “Crosstalk among tumor cells, Tregs and neutrophils establishes the immunosuppressive TIME in ADC (from Line 601)”, we have already corrected the all the affirmative descriptions, such as Line 604, Line 612, Line 614, Line 626, Line 628 - 629, Line 641, Line 654 – 655, etc.

      All the changes have also been listed in Revision Notes in detail.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      We appreciate this suggestion. We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we have detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC patients and highlights its potential as a diagnostic marker for clinical staging. To further investigate whether this cluster is generally existing in ADC patients, we have identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on a total of 56 tissue samples, and the results showed positive expressions of these markers in the majority of stage IIIC tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section (From Line 801).

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Thank you for your insightful comment. From the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. As a result, we have already deleted this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      We feel thankful for this question. The conclusion that “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We feel sorry for this and have already corrected the description into “As one of stage IIIC-specific cell clusters, the cluster of Epi_10_CYSTM1, with its representative marker gene SLC26A3, presents potential diagnostic value to predict lymph node metastasis” from Line 478-481.

      However, based on our results, we do think this cluster is a potential diagnostic marker and the hypothesis is right. As for SLC26A3, we have specifically added a new paragraph (from Line 801 - 822) in Discussion section to discuss the rationality and necessity of selecting this gene as our central focus, and the reasons why SLC26A3 should be the representative of cluster Epi_10_CYSTM1. As you noted, SLC26A3 appears to be broadly expressed in later tumors rather than restricted to a minor subset in the images. We apologize for any misunderstanding caused. When presenting the IHC data, we only showed the strongly positive areas of each slide to emphasize the differences. In our revision, we have included whole slide scanning images of the IHC samples, clearly showing that SLC26A3 is restricted to a part of the tumors (Supplementary Fig.9).

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      We apologize for using data without noticing the contamination of T cells with few epithelial cells. We have re-performed quality control to exclude contamination and re-analyzed all data of T cells. In the reviesed manuscript, we have therefore updated completely new data for T cells in both Fig. 4 and Supplementary Fig. 4.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Our initial purpose was to use GO analysis as supports for our conclusions. However, we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we have already deleted GO data and descriptions in the paragraphs of “T cell (Fig.4)”(from Line 495) and “B/plasma cell (Fig.6)” (from Line 579), because the predictions are quite irrelevant to our conclusions.

      However, in the sections of “epithelial cell (Fig.2)” (from Line 352) and “neutrophils (Fig.5)” (from Line 536), we retained the GO data and rewrote the conclusions, because these analyses have provided us with valuable information regarding the role of specific cell clusters in ADC progression. Furthermore, our subsequent analyses, such as CellChat, have further validated the accuracy of the findings from the GO analysis. We do think this logically supports the whole storyline of the study.

      Reviewer #2 (public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      We feel sorry that many of the conclusions have been written in an over-affirmative way but lack profound supporting evidences. In our revision, we have already optimized the writing techniques and re-written all conclusions or descriptions related to only bio-informatic predictions. Moreover, we have performed statistical re-analyses on all data and rearranged the related figures.

      For example, in Line 352, we have changed the sub-title “The sub-clusters of epithelial cells exhibit elevated stem-like features to promote the aggressiveness of ADC” into “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features”. In this paragraph, many over-affirmative discriptions such as “exclusively”, “significant”, “overwhelmingly”, “remarkably” have been deleted. From Line 486-493, the conclusion of “Moreover, SLC26A3 could be employed as a marker for the Epi_10_CYSTM1 cluster, aiding in the diagnosis of lymph node metastasis to prevent post-surgical upstaging in ADC patients in the future” have been changed into “our results propose that SLC26A3 might be considered as a diagnostic marker to predict lymph node metastasis in ADC patients”. Similar over-affirmative descriptions and conclusions had also been re-written in the other paragraphs, which has been refered to question (7) above.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      We sincerely feel grateful for this question. This is a quite important question as it is also pointed out by reviewer#1 in question (8) above. In the revised manuscript, we have already optimized our descriptions and have added detailed explanation for the importance of SLC26A3 in the Discussion section  (from Line 802 - 823). We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. To further investigate whether this cluster is generally present in ADC patients, we identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on 56 cases of tissue samples, and the results showed positive expression of these markers in the majority of stage III tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Thank you for your insightful comment. This important point is also raised by reviewer#1 above. In the revised manuscript, we have reanalyzed our scRNA-seq data and listed the canonical marker genes for cell type annotation. Most importantly, as for T cells and its sub-clustering, we have performed quality control and re-analyzed all data for T cells, with contamination excluded. In the reviesed manuscript, we have added the re-analyzed data for T cells in both Fig. 4 and Supplementary Fig. 4.

      Recommendations for the authors:

      Reviewer #1 (recommendations for the authors):

      The text would substantially benefit from an editorial revision of language usage.

      We sincerely feel grateful for this suggestion. In our revision, we have conducted language editing and carefully rewritten our manuscript. The changes have been clearly marked in the tracked version of the revised manuscript.

      Reviewer #2 (recommendations for the authors):

      (1) Use statistical approaches to claim enrichment/specificity of populations to given groups (ADC, HPV, etc). Analysis packages like Milo for differential abundance testing would be very helpful.

      We feel grateful for this suggestion. In our revision, we have performed statistical analyses for all groups of comparison data. Meanwhile, we have rearranged the figures based on these statistical results, for example, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D, Supplementary Fig. 1A-B, Supplementary Fig. 2A-B, Supplementary Fig. 4A-B, Supplementary Fig. 5A-B, Supplementary Fig. 6A-B.

      (2) In the subclustering, consider a round of quality control to ensure that all cells are of the cell type they are claimed to be. Contaminant clusters/cells could be filtered out or reassigned. This could be supplemented with an automated annotation approach using cell-type references.

      We feel thankful for this suggestion. As a result, we have provided copies of scripts in the supplementary data to ensure the quality control of cell type annotation.

      (3) An explanation for why SLC26A3 is so rare in the scRNA-seq data, but seemingly common in the IHC staining would be helpful. I am concerned about the specificity of the stain.

      We apologize for lacking adequate explanation of SLC26A3 and cluster Epi_10_CYSTM1. This is a quite crucial question as it has been listed above in question (8) of reviewer #1 and question (2) of reviewer #2 (public review section). In the revised manuscript, we have added intenstive discussion about this question in the seventh paragraph of Disccusion section (from Line 801 - 822). In fact, because of the heterogeneity among different individuals and different tumor regions even within one sample, Epi_10_CYSTM1 seemed to be derived from only one sample. However, the relatively high number of cells in this cluster from one late-stage (stage IIIC) patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. Furthermore, we have identified SLC26A3, ORM1 and ORM2 as specific markers of this cluser and performed IHC staining. With a positive expression of these markers, the existence of this cluster has been indirectly proved (as shown in Fig. 3B).

    1. Author response:

      The following is the authors’ response to the current reviews.

      The authors agree with the reviewers that future studies are needed to dissect the mechanisms of eIF3 binding to 3'UTRs and their impact on translation, and the impact of this binding on cellular fate.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reveals extensive binding of eukaryotic translation initiation factor 3 (eIF3) to the 3' untranslated regions (UTRs) of efficiently translated mRNAs in human pluripotent stem cell-derived neuronal progenitor cells. The authors provide solid evidence to support their conclusions, although this study may be enhanced by addressing potential biases of techniques employed to study eIF3:mRNA binding and providing additional mechanistic detail. This work will be of significant interest to researchers exploring post-transcriptional regulation of gene expression, including cellular, molecular, and developmental biologists, as well as biochemists.

      We thank the reviewers for their positive views of the results we present, along with the constructive feedback regarding the strengths and weaknesses of our manuscript, with which we generally agree. We acknowledge our results will require a deeper exploration of the molecular mechanisms behind eIF3 interactions with 3'-UTR termini and experiments to identify the molecular partners involved. Additionally, given that NPC differentiation toward mature neurons is a process that takes around 3 weeks, we recognize the importance of examining eIF3-mRNA interactions in NPCs that have undergone differentiation over longer periods than the 2-hr time point selected in this study. Finally, considering the molecular complexity of the 13subunit human eIF3, we agree that a direct comparison between Quick-irCLIP and PAR-CLIP will be highly beneficial and will determine whether different UV crosslinking wavelengths report on different eIF3 molecular interactions. Additional comments are given below to the identified weaknesses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors perform irCLIP of neuronal progenitor cells to profile eIF3-RNA interactions upon short-term neuronal differentiation. The data shows that eIF3 mostly interacts with 3'-UTRs - specifically, the poly-A signal. There appears to be a general correlation between eIF3 binding to 3'-UTRs and ribosome occupancy, which might suggest that eIF3 binding promotes protein

      Strengths:

      The study provides a wealth of new data on eIF3-mRNA interactions and points to the potential new concept that eIF3-mRNA interactions are polyadenylation-dependent and correlate with ribosome occupancy.

      Weaknesses:

      (1) A main limitation is the correlative nature of the study. Whereas the evidence that eIF3 interacts with 3-UTRs is solid, the biological role of the interactions remains entirely unknown. Similarly, the claim that eIF3 interactions with 3'-UTR termini require polyadenylation but are independent of poly(A) binding proteins lacks support as it solely relies on the absence of observable eIF3 binding to poly-A (-) histone mRNAs and a seeming failure to detect PABP binding to eIF3 by co-immunoprecipitation and Western blotting. In contrast, LC-MS data in Supplementary File 1 show ready co-purification of eIF3 with PABP.

      We agree the molecular mechanisms underlying the crosslinking between eIF3 and the end of mRNA 3’-UTRs remains to be determined. We also agree that the lack of interaction seen between eIF3 and PABP in Westerns, even from HEK293T cells, is a puzzle. The low sequence coverage in the LC-MS data gave us pause about making a strong statement that these represent direct eIF3 interactions, given the similar background levels of some ribosomal proteins.

      (2) Another question concerns the relevance of the cellular model studied. irCLIP is performed on neuronal progenitor cells subjected to neuronal induction for 2 hours. This short-term induction leads to a very modest - perhaps 10% - and very transient 1-hour-long increase in translation, although this is not carefully quantified. The cellular phenotype also does not appear to change and calling the cells treated with differentiation media for 2 hours "differentiated NPCs" seems a bit misleading. Perhaps unsurprisingly, the minor "burst" of translation coincides with minor effects on eIF3-mRNA interactions most of which seem to be driven by mRNA levels. Based on the ~15-fold increase in ID2 mRNA coinciding with a ~5-fold increase in ribosome occupancy (RPF), ID2 TE actually goes down upon neuronal induction.

      We agree that it will be interesting to look at eIF3-mRNA interactions at longer time points after induction of NPC differentiation. However, the pattern of eIF3 crosslinking to the end of 3’-UTRs occurs in both time points reported here, which is likely to be the more general finding in what we present.

      (3) The overlap in eIF3-mRNA interactions identified here and in the authors' previous reports is minimal. Some of the discrepancies may be related to the not well-justified approach for filtering data prior to assessing overlap. Still, the fundamentally different binding patterns - eIF3 mostly interacting with 5'-UTRs in the authors' previous report and other studies versus the strong preference for 3'-UTRs shown here - are striking. In the Discussion, it is speculated that the different methods used - PAR-CLIP versus irCLIP - lead to these fundamental differences. Unfortunately, this is not supported by any data, even though it would be very important for the translation field to learn whether different CLIP methodologies assess very different aspects of eIF3-mRNA interactions.

      We agree the more interesting aspect of what we observe is the difference in location of eIF3 crosslinking, i.e. the end of 3’-UTRs rather than 5’-UTRs or the pan-mRNA pattern we observed in T cells. The reviewer is right that it will be important in the future to compare PAR-CLIP and Quick-irCLIP side-by-side to begin to unravel the differences we observe with the two approaches.

      Reviewer #2 (Public review):

      Summary:

      The paper documents the role of eIF3 in translational control during neural progenitor cell (NPC) differentiation. eIF3 predominantly binds to the 3' UTR termini of mRNAs during NPC differentiation, adjacent to the poly(A) tails, and is associated with efficiently translated mRNAs, indicating a role for eIF3 in promoting translation.

      Strengths:

      The manuscript is strong in addressing molecular mechanisms by using a combination of nextgeneration sequencing and crosslinking techniques, thus providing a comprehensive dataset that supports the authors' claims. The manuscript is methodologically sound, with clear experimental designs.

      Weaknesses:

      (1) The study could benefit from further exploration into the molecular mechanisms by which eIF3 interacts with 3' UTR termini. While the correlation between eIF3 binding and high translation levels is established, the functionality of these interactions needs validation. The authors should consider including experiments that test whether eIF3 binding sites are necessary for increased translation efficiency using reporter constructs.

      We agree with the reviewer that the molecular mechanism by which eIF3 interacts with the 3’UTR termini remains unclear, along with its biological significance, i.e. how it contributes to translation levels. We think it could be useful to try reporters in, perhaps, HEK293T cells in the future to probe the mechanism in more detail.

      (2) The authors mention that the eIF3 3' UTR termini crosslinking pattern observed in their study was not reported in previous PAR-CLIP studies performed in HEK293T cells (Lee et al., 2015) and Jurkat cells (De Silva et al., 2021). They attribute this difference to the different UV wavelengths used in Quick-irCLIP (254 nm) and PAR-CLIP (365 nm with 4-thiouridine). While the explanation is plausible, it remains a caveat that different UV crosslinking methods may capture different eIF3 modules or binding sites, depending on the chemical propensities of the amino acid-nucleotide crosslinks at each wavelength. Without addressing this caveat in more detail, the authors cannot generalize their findings, and thus, the title of the paper, which suggests a broad role for eIF3, may be misleading. Previous studies have pointed to an enrichment of eIF3 binding at the 5' UTRs, and the divergence in results between studies needs to be more explicitly acknowledged.

      We agree with the reviewer that the two methods of crosslinking will require a more detailed head-to-head comparison in the future. However, we do think the title is justified by the fact that we see crosslinking to the termini of 3’-UTRs across thousands of transcripts in each condition. Furthermore, the 3’-UTR crosslinking is enriched on mRNAs with higher ribosome protected fragment counts (RPF) in differentiated cells, Figure 3F.

      (3) While the manuscript concludes that eIF3's interaction with 3' UTR termini is independent of poly(A)-binding proteins, transient or indirect interactions should be tested using assays such as PLA (Proximity Ligation Assay), which could provide more insights.

      This is a good idea, but would require a substantial effort better suited to a future publication. We think our observations are interesting enough to the field to stimulate future experimentation that we may or may not be most capable of doing in our lab.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Mestre-Fos and colleagues, authors have analyzed the involvement of eIF3 binding to mRNA during differentiation of neural progenitor cells (NPC). The authors bring a lot of interesting observations leading to a novel function for eIF3 at the 3'UTR.

      During the translational burst that occurs during NPC differentiation, analysis of eIF3-associated mRNA by Quick-irCLIP reveals the unexpected binding of this initiation factor at the 3'UTR of most mRNA. Further analysis of alternative polyadenylation by APAseq highlights the close proximity of the eIF3-crosslinking position and the poly(A) tail. Furthermore, this interaction is not detected in Poly(A)-less transcripts. Using Riboseq, the authors then attempted to correlate eIF3 binding with the translation efficacy of mRNA, which would suggest a common mechanism of translational control in these cells. These observations indicate that eIF3-binding at the 3'UTR of mRNA, near the poly(A) tail, may participate to the closed-loop model of mRNA translation, bridging 5' and 3', and allowing ribosomes recycling. However, authors failed to detect interactions of eIF3, with either PABP or Paip1 or 40S subunit proteins, which is quite unexpected.

      Strength:

      The well-written manuscript presents an attractive concept regarding the mechanism of eIF3 function at the 3'UTR. Most mRNA in NPC seems to have eIF3 binding at the 3'UTR and only a few at the 5'end where it's commonly thought to bind. In a previous study from the Cate lab, eIF3 was reported to bind to a small region of the 3'UTR of the TCRA and TCRB mRNA, which was responsible for their specific translational stimulation, during T cell activation. Surprisingly in this study, the eIF3 association with mRNA occurs near polyadenylation signals in NPC, independently of cell differentiation status. This compelling evidence suggests a general mechanism of translation control by eIF3 in NPC. This observation brings back the old concept of mRNA circularization with new arguments, independent of PABP and eIF4G interaction. Finally, the discussion adequately describes the potential technical limitations of the present study compared to previous ones by the same group, due to the use of Quick-irCLIP as opposed to the PAR-CLIP/thiouridine.  

      Weaknesses:

      (1) These data were obtained from an unusual cell type, limiting the generalizability of the model.

      We agree that unraveling the mechanism employed by eIF3 at the mRNA 3’-UTR termini might be better studied in a stable cell line rather than in primary cells.

      (2) This study lacks a clear explanation for the increased translation associated with NPC differentiation, as eIF3 binding is observed in both differentiated and undifferentiated NPC. For example, I find a kind of inconsistency between changes in Riboseq density (Figure 3B) and changes in protein synthesis (Figure 1D). Thus, the title overstates a modest correlation between eIF3 binding and important changes in protein synthesis.

      We thank the reviewer for this question. Riboseq data and RNASeq data are not on absolute scales when comparing across cell conditions. They are normalized internally, so increases in for example RPF in Figure 3B are relative to the bulk RPF in a given condition. By contrast, the changes in protein synthesis measured in Figure 1D is closer to an absolute measure of protein synthesis. 

      (3) This is illustrated by the candidate selection that supports this demonstration. Looking at Figure 3B, ID2, and SNAT2 mRNA are not part of the High TE transcripts (in red). In contrast, the increase in mRNA abundance could explain a proportionally increased association with eIF3 as well as with ribosomes. The example of increased protein abundance of these best candidates is overall weak and uncertain.

      We agree that using TE as the criterion for defining increased eIF3 association would not be correct. By “highly translated” we only mean to convey the extent of protein synthesis, i.e. increases in ribosome protected fragments (RPF), rather than the translational efficiency.

      (4) Despite several attempts (chemical and UV cross-linking) to identify eIF3 partners in NPC such as PABP, PAIP1, or proteins from the 40S, the authors could not provide any evidence for such a mechanism consistent with the closed-loop model. Overall, this rather descriptive study lacks mechanistic insight (eIF3 binding partners).

      We agree that it will be important to identify the molecular mechanism used by eIF3 to engage the termini of mRNA 3’-UTRs. Nevertheless, the identification of eIF3 crosslinking to that location in mRNAs is new, and we think will stimulate new experiments in the field.

      (5) Finally, the authors suspect a potential impact of technical improvement provided by QuickirCLIP, that could have been addressed rather than discussed.

      We agree a side-by-side comparison of eIF3 crosslinks captured by PAR-CLIP versus QuickirCLIP will be an important experiment to do. However, NPCs or other primary cells may not be the best system for the comparison. We think using an established cell line might be more informative, to control for effects such as 4-thiouridine toxicity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The Western blot signals for SLC38A2 and ID2 are close to the membrane background and little convincing. Size markers are missing.

      We agree these antibodies are not great. They are the best we could find, unfortunately. We have included originals of all western blots and gels as supplementary information. It’s important to note that the Riboseq data for ID2 and SLC38A2 are consistent with the western blots. See Figure 3C and Figure 3–figure supplement 3B.

      (2) Figure 1 - Figure Supplement 1 appears to present data from a single experiment. This is far less than ideal considering the minor differences measured.

      Thanks for the comment. This is a representative experiment showing the early time course. We have added a second experiment with two different treatments that show the same pattern in the puromycin assay, in Figure 1–figure supplement 1.

      (3) Figure 3F: One wonders what this would look like if TE was plotted instead of RPF. Figure 3 - Figure Supplement 4 seems to show something along those lines. However, the data are not mentioned in the main results section are quite unclear. Why are data separated into TE high and low? Doesn't TE high in differentiated cells equal TE low in undifferentiated cells?

      This is an interesting question. Note that in Figure 3B, n=6300 genes show no change in TE upon differentiation, compared to a total of n=2127 that show a change in TE, with most of those changes not very large. We have now replotted Figure 3F comparing irCLIP read counts in 3’-UTRs to RPF read counts, which shows a significant positive correlation, regardless of whether we look at undifferentiated or differentiated NPCs (See Figure 3F and a new Figure 3– figure supplement 4A). We also compare irCLIP reads in 3’-UTRs to TE values, which show no correlation (See Figure 3G and Figure 3–figure supplement 4B).

      Figure 3-figure supplement 4 was actually a response to a previous round of review (at PLOS Biology) to a rather technical question from a reviewer. We think this figure and associated text should be removed. Instead, we now include supplementary tables with the processed RPF and TE values, for reference (Supplemental files 4-6). We omitted these in the original submission when they should have been included. We also abandoned comparing undifferentiated and differentiated NPCs, and instead look directly at irCLIP reads vs. RPFs or TE, regardless of NPC state, as noted above (Figure 3F, G, and Figure 3–figure supplement 4).

      (4) Figure 3C: The data should be plotted on the same y-axis scale. This would make a visual assessment of the differences in mRNA and RFP levels more intuitive.

      Thanks for this suggestion. We have rescaled the plots as requested.

      Reviewer #2 (Recommendations for the authors):

      (1) The quality of the Western blots in several figures is quite poor. Notably, Figure 1C seems to be a composite gel, as each blot appears to come from a different gel. Additionally, in Supplementary Figure 1A, there is only a single data point, yet the authors indicate that this image is representative of multiple assays. The lack of error bars in this figure raises a question vis-a-vis the reproducibility of the experiments.

      Thanks for the comments. We now include all the original gels as supplementary information. As noted above, the antibodies for ID2 and SLC38A2 are not great, we agree. And as we noted above, the Riboseq data for ID2 and SLC38A2 are consistent with the western blots.

      (2) For the top 500 targets of undifferentiated and differentiated NPCs in the Quick-irCLIP assay, the manuscript does not clarify how many targets are common and how many are unique to each condition. This information is important for understanding the extent of overlap and differentiation-specific interactions of eIF3 with mRNAs. Providing this data would strengthen the interpretation of the results.

      There are 449 of the top 500 hits in common between undifferentiated and differentiated NPCs. We have now added this information to the text, to add clarity. 

      (3) The manuscript does not provide detailed percentages or numbers regarding the overlap between iCLIP and APA-Seq peaks. Clarifying this overlap, particularly in terms of how many of the APA sites are also targets of eIF3, would bolster the understanding of how these two datasets converge to support the authors' conclusions.

      This is a difficult calculation to make, due to the fact that APA-Seq reads are generally much longer than the Quick-irCLIP reads. This is why we focused instead on quantifying the percent of Quick-irCLIP peaks (which are more narrow) overlap with predicted polyadenylation sequences, in Figure 2-figure supplement 1.

      Reviewer #3 (Recommendations for the authors):

      (1) Perform Quick-irCLIP in HEK293 cells to infer technical limitations and/or to generalize the model. The authors will then compare again eIF3 binding site in Jurkat, HEK293, and NPC.

      This is an experiment we plan to do for a future publication, given that we would want to repeat both Quick-irCLIP and PAR-CLIP at the same time.

      (2) Select mRNA candidates with high or low TE changes and analyze eIF3 binding and RPF density and protein abundance along NPC differentiation to support the role of eIF3 binding in stimulating translation.

      We agree looking at time courses in more depth would be interesting. However, this would require substantial experimentation, which is better suited to a future study. Furthermore, now that we have moved away from comparing undifferentiated NPCs and differentiated NPCs when examining TE and RPF values (Figure 3 and Figure 3–figure supplement 4), we think the results now support a more general mechanism of translation reflected in the irCLIP 3’-UTR vs. RPF correlation, independent of NPC state.

      (3) Analyze the interaction of eIF3 with eIF4G and other known partners. This will really provide an improvement to the manuscript. The lack of interaction between eIF3 and the 40S is quite surprising.

      We agree more work needs to be done on the mechanistic side. These are experiments we think would be best to carry out in a stable cell line in the future, rather than primary cells.

      (4) Perform Oligo-dT pulldown (or cap column if possible) and analyze the relative association of PABP, eIF3, and eIF4F on mRNA in NPC versus HEK293. This will clarify whether this mechanism of mRNA translation is specific to NPC or not.

      Thanks for this suggestion. We are uncertain how it would be possible to deconvolute all the possible ways to interpret results from such an experiment. We agree thinking about ways to study the mechanism will keep us occupied for a while.

      (5) Citations in the text indicate the first author, whereas the references are numbered! 

      Our apologies for this oversight. This was a carryover from previous formatting, and has been fixed.

    2. eLife Assessment

      This valuable study shows previously unappreciated binding of the eukaryotic translation initiation factor 3 (eIF3) to the poly(A) tail proximal portion of 3' untranslated regions (UTRs) of mRNAs that are efficiently translated in neuronal progenitors. The authors' conclusions are supported by solid experimental evidence which is based on several orthogonal systems biology approaches. This article is of considerable interest to the broad spectrum of biomedical researchers interested in studying post-transcriptional regulation of gene expression.

    3. Joint Public Review:

      Reviewers thought that the authors addressed some, but not all the concerns raised in the previous round of a review.

      Strengths: The authors employed a battery of next-generation sequencing and crosslinking techniques (e.g., Quick-irCLIP, APA-Seq, and Ribo-Seq) to describe a previously unappreciated binding of eIF3 to the 3'UTRs of the mRNAs. It is also shown that eIF3:3'UTR binding occurs in the vicinity of poly(A) tail of mRNAs that are actively translated in neuronal progenitor cells derived from human pluripotent stem cells. Collectively, these findings provide evidence for the role of eIF3 in regulating translation from the 3'UTR end of the mRNA.

      Weaknesses: In addition to these clear strengths of the article, some weaknesses were observed pertinent to the lack of mechanistic data. It was therefore thought that the experiments aiming to dissect the mechanisms of eIF3 binding to 3'UTRs and their impact on translation warrant future studies. Finally, establishing the impact of the proposed eIF3:3'UTR binding mechanism of translational regulation on cellular fate is required to further support the biological importance of the observed phenomena. It was found that this should also be addressed in the follow up studies.

    1. eLife Assessment

      Yu and colleagues used two-sample MR to test the effect of PUFA on cerebral aneurysms. They found that genetically predicted omega-3 and DHA decreased the risk for Intracranial Aneurysm and Subarachnoid Haemorrhage. This work is useful and the revised version provides solid evidence to support the claims.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors performed two-sample MR combined with sensitivity analyses and colocalization to test the effect of PUFA on cerebral aneurysms. They found that genetically predicted omega-3 and DHA decreased the risk for intracranial aneurysm (IA) and subarachnoid haemorrhage (SAH) but not for unruptured IA (uIA).

      Strengths:

      PUFA on the risk of cerebral aneurysms is of clinical importance; the authors performed multiple sensitivity analyses to ensure MR fulfils its assumptions.

    3. Reviewer #2 (Public Review):

      Summary:

      In the manuscript, Yu et al reported a two-sample Mendelian randomization study to evaluate the causation between polyunsaturated fatty acids (PUFA) and cerebral aneurysm, based on summary statistics from published genome-wide association studies. The authors identified that omega-3 fatty acids and Docosahexaenoic acid decreased the risk for intracranial aneurysm (IA) and aneurysmal subarachnoid haemorrhage (aSAH). COLOC analysis suggested that the acids and IA, aSAH likely share causal variants in gene fatty acid desaturase 2.

      Strengths:

      The methodology is sound, with appropriate sensitivity analysis.

      Weaknesses:

      The results did not provide significant novel findings.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) In my opinion, the major weakness is the selection of IVs, the same IVs should be used for each exposure, especially when the outcomes (IA, SAH, and uIA) are closely related. The removal of IVs was inconsistent, for example, why was LPA rs10455872 removed for SAH but not for uIA? (significantly more IVs were used for uIA). The authors should provide more details for the justification of the removal of IVs other than only indicating "confounder" in supplementary tables. The authors should also perform additional analyses including all IVs and IVs from other PUFA GWAS.

      We apologized for our negligence. We reconducted a two-sample MR analysis following the removal of rs10455872 from the uIA, which yielded unaltered ORs and 95% confidence intervals. The P-value was once again found to be statistically insignificant. These results demonstrate the robustness of our MR analyses and indicate that this SNP does not exert an influence on the overall results. (see Figure 4)

      For SNP selection, we adhered rigorously to the established Mendelian randomization analysis process for the screening of instrumental variables. "Confounder" is mean that a current explicit influencer that is explicitly associated with the outcome variable. Following the removal of such confounding SNPs, the analysis of heterogeneity and pleiotropy is repeated on several occasions in MR analysis using radical MR, MRPRESSO, IVW-radical and Egger-radical, with each iteration involving the removal of the corresponding anomalous SNPs until all instances of pleiotropy and heterogeneity have been eliminated, it can be observed that the final single-nucleotide polymorphism (SNP) for each group is not identical. Therefore, It can be observed that the final SNPs for each group is not identical.

      (2) In addition, it seems that the SNPs in the FADS locus were driving the MR association, while FADS is a very pleiotropic locus associated with many lipid traits, removing FADS could attenuate the MR effect. The authors should perform a sensitivity analysis to remove this locus.

      Thanks for the reviewer’s suggestion. In our revised manuscript, We reconducted MR analysis of the positive results after the removal of the FADS2 and its SNPs within 500 kb of the FADS2 locus. This analysis demonstrated that there was no significant causal pathogenic association between PUFA and IA, aSAH. This result validated that SNP: rs174564 was a significant factor driving the causal association between PUFAs and CA. (See page 6, line155-157 and Figure 8)

      (3) Instead of removing multiple "confounder" IVs which I think may bias the MR results due to very closely related lipid traits, the authors should perform multivariable MR to identify independent effects of PUFAs to IA, conditioning on other PUFAs and/or other lipids.

      Thanks for the reviewer’s suggestion. In our revised manuscript, we employed MVMR through adjust for HDL cholesterol, LDL cholesterol, total cholesterol and triglycerides, to remove bias from closely related lipid traits. The application of MVMR analysis serves to reinforce the robustness of our conclusions. (See page 6, line151-153 and Figure5-7)

      (4) Colocalization was not well described, the authors should include the colocalization results for each locus in a supplementary table. They also mentioned "a large PP for H4 (PP.H4 above 0.75) strongly supports shared causal variants affecting both gene expression and phenotype". The authors should make sure that the colocalization was performed using the expression data of each gene or using the GWAS summary of each PUFA locus.

      I apologize for our negligence. We have added the detailed results of the COLOC for each locus in the supplementary table. (See supplementary table 6)

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) I suggest the authors consult Borges et al., 2022 (doi: 10.1186/s12916-022-02399-w) for PUFA IV selection, and perform sensitivity analysis based on Borges et al., 2022 IVs and another PUFA GWAS (such as J Kettunen et al., 2016, doi: 10.1038/ncomms11122).

      Thanks for the reviewer’s suggestion. In order to provide further evidence of the robustness of the results of our analyses, we conducted MVMR and a sensitivity analysis after excluding SNPs within 500 kb of the FADS2 locus, as recommended by Borges et al. (2022). (See page 6, line151-157 and Figure 5-8)

      In regard to the article by J. Kettunen et al. (2016), we found that the validation dataset from which the article was sourced was insufficient in terms of sample size and lacked the requisite statistical efficacy to be used for validation purposes.

      (2) The authors justified that colocalization is to determine if "PUFAs are mediators in the hereditary causative route of cerebral aneurysm", which I don't think is the case.

      Colocalization is to determine whether an MR estimate is not confounded by LD.

      I apologize for our incorrect description. We have made careful modification in our revised manuscript, as follows: “There is consistent evidence that PUFAs have a beneficial causal effect on cerebral aneurysm. In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7-8, line 215-217)

      (3) Supplementary tables 2-4 were a bit confusing to me, I suggest the authors provide one supplementary table for each exposure.

      Thanks for the reviewer’s suggestion. Supplementary tables 2_1-2_5 shows the exposure data for the five PUFAs associated with IA, supplementary tables 3_1-3_5 shows the exposure data for the five PUFAs associated with aSAH and supplementary tables 4_1-4_5 shows the exposure data for the five PUFAs associated with UIA. Each exposure is represented by a distinct table.

      (4) Figure 1 legend: I can't find multivariable MR in the figure/method.

      I apologize for our negligence. In our revised manuscript, we have added the MVMR methodology. We also have modified Figure 1 and Figure 1 legend. (See Figure 1, Figure 1 legend and page 6, line 151-153)

      (5) LOO analysis was mentioned in methods and results but I could not find the results for LOO.

      I apologize for our negligence. In our revised manuscript, we have described the results of the LOO, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm.” (See page 7, line 209-210)

      (6) Finally, the authors should proofread their manuscript as many sentences are difficult to read, such as:

      Line 183: "...MR methods revealed consistency", "However, there was no any causal relationship..."

      Line 200: "For achieve that..."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “The results demonstrated consistency in the outcomes and directionality of the various MR methods employed” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7, line 187-188 and line 215-217).

      Reviewer #2 (Recommendations For The Authors):

      (1) Are there any previous epidemiological studies on the association between PUFA and cerebral aneurysm? It will be helpful to introduce this background.

      Thanks for the reviewer’s suggestion. The epidemiology of PUFA with aneurysm in other sites, such as the abdominal aorta, are described in the Introduction section. Although there is a paucity of large-scale multicenter clinical epidemiological studies examining the relationship between PUFAs and cerebral aneurysms, we are endeavoring to infer a prior association between PUFAs and cerebral aneurysms with the aid of Mendelian randomization analysis.

      (2) The authors performed a leave-one-out analysis but did not explain much about the results. The leave-one-out analysis seems to provide some evidence that some SNP is driving the results, like rs174564 in Supplementary Figure 5-1.

      I apologize for our negligence. In our revised manuscript, we have described the results of the leave-one-out analysis, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm”. (See page 7, line209-214)”.

      (3) In the discussion (line 211), the authors mentioned omega-6 fatty acids increased the risk of IA and aSAH, omega-3 fatty acids decreased the risk for IA and aSAH, but omega-6 by omega-3 decreased the risk of IA and aSAH. This seems to be different from the figures.

      I apologize for our incorrect description. We have modified this description in our revised manuscript, as follows: “We demonstrated that the omega-3 fatty acids, DHA and, omega-3-pct causally decreased the risk for IA and aSAH. And omega-6 by omega-3 causally increased the risk of IA and aSAH”. (See page 8, line228-230)

      Minor:

      (4) Some grammar errors need to be checked, such as:

      In line 200, "For achieve that, we tested for shared causative SNPs between PUFAs and cerebral aneurysm using COLOC".

      In line 123, "Fourth, to eliminate unclear, palindromic and associated with known confounding factors (body mass index (McDowell et 125 al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018)) SNPs."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “Fourth, remove SNPs that are obscure, palindromic, and linked to recognized confounding variables (body mass index (McDowell et al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018))” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 5, line 124-127 and page 7 line215-217)

    1. eLife Assessment

      This work provides important findings characterizing potential synaptic mechanisms supporting the role of midline thalamus-hippocampal projections in fear memory extinction in mice. The methods and approaches were considered solid, though some evidence is incomplete as there are some concerns with the analytical approaches used for some aspects of the study. This work will be of interest to those in the field of thalamic regulation and fear memory.

    2. Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal CA1 of the hippocampus plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a new finding. Yet, the potential novelty and excitement comes from the authors' identification of structural alterations from RE projecting neurons to the specific stratum lacunosum moleculare subregion of CA1 after learning. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. The significance of these findings was previously hampered by several technical shortcomings in the experimental design and interpretation. The authors adequately addressed some of the design concerns raised in the previous round, along with the interpretive critique that they couldn't localize the timing of effects to consolidation as originally claimed. Nevertheless, the authors provided an inadequate response to the concern regarding their misapplication of Ns and missing controls in one experiment.

      In the previous review, a major methodological weakness in the experimental design involved the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway used N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7). In each instance it was recommended that N = animal number should be used. Reasons for this are as follows: this is standard practice in neuroanatomical research; using N = branch/ dendrite/ bouton/ spine number artificially inflates the statistical power and this incorrectly assumes independence of observations; using N = number of sections, etc., doesn't account for imbalances in the number of observations that vary from animal to animal that may skew group results.

      In the authors' response, they generally concurred, but then they followed up with the defense that the number of items was too few in some cases, or absent in others, to permit using N = animal number. While they changed some of their data to N = animal numbers, other aspects of their data remained as-is. The description of the statistics in the figure legend is also dense and difficult to follow in places. Ns should be checked in the legend and figure to make sure they're correct, as at least one error was noted (e.g., see Fig. 2C). Overall, the authors' response falls short of the standard of rigor that helps to reinforce scientific findings from reliability and reproducibility concerns when generating more data to increase Ns (i.e., the number of animals) would have been the better choice.

      Another persistent concern from the previous review is that, in the electron microscopic analyses of dendritic spines (Fig. 5), the authors only compared fear acquisition versus extinction training. One critique was that the lack of inclusion of a naïve control group made it difficult to understand how these structural synaptic changes are occurring relative to baseline. It was also noted that the authors appropriately included naïve controls in other experiments in the paper. In the revised submission the authors simply added in naïve control data to their previous histogram. It is not considered good practice to collect, process, or analyze data one group at a time, as this would be prone to cohort effects or experimental bias. These data should be discarded and the experiment should be run correctly with randomized cases in each group, or instead these data should be eliminated from the report since there is a key control group missing. Again, the nature of the authors' response perpetuates the aforementioned concern that data collection and analysis in this report may fall short of an acceptable standard of rigor.

    3. Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the RE-dCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The results illustrated in this manuscript show nice incremental evidence about the neural mechanisms contributing to the RE-CA1 modulation of fear extinction. The novelty of this manuscript is therefore not exceptional, but still highly relevant for the field.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters. 

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: “For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).”

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions.

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H).

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text:

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024).

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included.

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: “Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571).”

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions.

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other/ Minor:

      In the beginning of the second paragraph on p. 21 of the Results section, it states that "RE-dCA1 has no effect on working memory," although it was not clear what data the authors were referring to support this conclusion.

      We refer there to the changes of freezing behavior within the CFE session. This is explained now.

      Reviewer #2 (Recommendations for the authors):

      No statement about code and data availability is present.

      The statements are added.

      ln 785: “Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).”

    1. eLife Assessment

      The important study established a large-scale objective and integrated multiple optical microscopy systems to demonstrate their potential for long-term imaging of the developmental process. The convincing imaging data cover a wide range of biological applications, such as organoids, mouse brains, and quail embryos, but enhancing image quality can further enhance the method's effectiveness. This work will appeal to biologists and imaging technologists focused on long-term imaging of large fields.

    2. Reviewer #1 (Public review):

      Summary:

      The authors are trying to develop a microscopy system that generates data output exceeding the previous systems based on huge objectives.

      Strengths:

      They have accomplished building such a system, with a field of view of 1.5x1.0 cm2 and a resolution of up to 1.2 um. They have also demonstrated their system performance on samples such as organoids, brain sections, and embryos.

      Weaknesses:

      To be used as a volumetric imaging technique, the authors only showcase the implementation of multi-focal confocal sectioning. On the other hand, most of the real biological samples were acquired under the wide-field illumination, and processed with so-called computational sectioning. Despite the claim that it improves the contrast, sometimes I felt that the images were oversharpened and the quantitative nature of these fluorescence images may be perturbed.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript introduced a volumetric trans-scale imaging system with an ultra-large field-of-view (FOV) that enables simultaneous observation of millions of cellular dynamics in centimeter-wide 3D tissues and embryos. In term of technique, this paper is just a minor improvement of the authors' previous work, which is a fluorescence imaging system working at visible wavelength region (https://www.nature.com/articles/s41598-021-95930-7).

      Strengths:

      In this study, the authors enhanced the system's resolution and sensitivity by increasing the numerical aperture (NA) of the lens. Furthermore, they achieved volumetric imaging by integrating optical sectioning and computational sectioning. This study encompasses a broad range of biological applications, including imaging and analysis on organoids, mouse brains, and quail embryos, respectively. Overall, this method is useful and versatile.

      Weaknesses:

      What is the unique application that only can be done by this high-throughput system remains vague. Meanwhile, there are also several outstanding issues in this paper, such as the lack of technical advances, unclear method details and non-standardized figures.

      Comments on revisions:

      The revised manuscript has significantly improved in response to the initial review comments, particularly with the detailed additions regarding the objective lens and confocal imaging modes, which enhance the clarity and comprehensibility of the paper. While the structure and arguments are much clearer overall, there are still key issues that need to be addressed, specifically regarding algorithm validation, computational sectioning presentation, and volume imaging rate.

      Algorithm Validation:<br /> The validation of the algorithm's accuracy is not sufficiently robust. Reviewer 1's comment is entirely reasonable, and the authors should validate the algorithm's accuracy using well-established methods as ground truth. In the revised version, the authors attempt to demonstrate the fidelity of the algorithm by employing deep learning methods for high-accuracy cell recognition. However, this validation relies solely on comparisons between deep learning results and manual annotation results. The problem lies in the fact that both manual annotations and deep learning outcomes are derived from algorithm-processed data, which fails to prove the authenticity or validity of the data itself. To strengthen the validation, the authors should incorporate independent, gold-standard methods for comparison.

      Computational Sectioning:<br /> In the revised manuscript, the authors effectively demonstrate the ability of optical sectioning to improve axial resolution using fluorescent beads, as shown in Fig. S3, which is a strong point. However, the manuscript lacks a direct comparison for computational sectioning and does not provide a clear evaluation of axial resolution before and after applying computational sectioning. While some related information is included in Figs. 5.C and D, the details are insufficient, and intensity profiles are absent. I recommend that the authors include more direct visual demonstrations of computational sectioning, along with comparisons of axial resolution before and after applying computational sectioning. This would better showcase the method's effectiveness.

      Volume Imaging Rate:<br /> The manuscript currently omits critical details about the method's volume imaging rate. In the description of the quail embryo imaging experiment, key parameters such as exposure time and imaging speed are missing. Additionally, the manuscript does not discuss the maximum imaging rate supported by the system in confocal mode. The volume imaging rate is an essential factor for biological researchers to evaluate the applicability of the technique. Therefore, this information should be included, ideally in the abstract and introduction. Furthermore, the authors could describe how the volume imaging rate performs under different conditions and discuss its potential applications across various biological research contexts. Including such details would significantly enhance the paper's utility and appeal to the broader research community.

      These adjustments will further strengthen the manuscript and address the reviewers' concerns effectively.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors are trying to develop a microscopy system that generates data output exceeding the previous systems based on huge objectives. 

      Strengths: 

      They have accomplished building such a system, with a field of view of 1.5x1.0 cm2 and a resolution of up to 1.2 um. They have also demonstrated their system performance on samples such as organoids, brain sections, and embryos. 

      Weaknesses: 

      To be used as a volumetric imaging technique, the authors only showcase the implementation of multi-focal confocal sectioning. On the other hand, most of the real biological samples were acquired under wide-field illumination, and processed with so-called computational sectioning. Despite the claim that it improves the contrast, sometimes I felt that the images were oversharpened and the quantitative nature of these fluorescence images may be perturbed. 

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript introduced a volumetric trans-scale imaging system with an ultra-large field-of-view (FOV) that enables simultaneous observation of millions of cellular dynamics in centimeter-wide 3D tissues and embryos. In terms of technique, this paper is just a minor improvement of the authors' previous work, which is a fluorescence imaging system working at visible wavelength region (https://www.nature.com/articles/s41598-021-95930-7). 

      Strengths: 

      In this study, the authors enhanced the system's resolution and sensitivity by increasing the numerical aperture (NA) of the lens. Furthermore, they achieved volumetric imaging by integrating optical sectioning and computational sectioning. This study encompasses a broad range of biological applications, including imaging and analysis of organoids, mouse brains, and quail embryos, respectively. Overall, this method is useful and versatile. 

      Weaknesses: 

      The unique application that only can be done by this high-throughput system remains vague. Meanwhile, there are also several outstanding issues in this paper, such as the lack of technical advances, unclear method details, and nonstandardized figures. 

      Here, we address the first part of the Weaknesses concerning the unique application, and will respond to the latter part in the Reply to the Recommendations.

      We are developing 'large field of view with cellular resolution' imaging technique, aiming to apply it to the observation of multicellular systems consisting of a large number of cells. Our proposed optical system has achieved optical performance that enables simultaneous observation of more than one million cells in a single field of view. In this paper, we have succeeded in adding three-dimensional imaging capability while maintaining the size of this two-dimensional field of view. By simultaneously observing the dynamics of a large number of cells, we can reveal spatio-temporal sequences in state transitions (pattern formation, pathogenesis, embryogenesis, etc.) in multicellular systems and discover cells that serve as a starting point. These were mentioned in the 1st and 2nd paragraphs of the Introduction section (Line 48-, 58-) and discussed in the 4th paragraph of Discussion section (Line 398-) of the main text. While our previous work on two-dimensional specimens has shown its validity, the present work demonstrated that temporal changes of multicellular systems in three-dimensional specimens can be observed at the single-cell level.

      Ideally, we aim to achieve the same level of depth observation capability as the FOV size in the lateral direction. However, at present, the penetration depth for living specimens is limited to a few hundred micrometers due to non-transparency, while the lateral FOV size exceeds 1 cm. The current optical performance is well-suited for systems where development occurs within a thin volume but a large area, such as the quail embryo presented in this paper (Fig. 6 in the revised manuscript). In addition to quail embryos, this technique can also be applied to the developmental systems of highly transparent model organisms, such as zebrafish. Furthermore, for chemically cleared specimens, even those thicker than 1.5 mm, as shown in this paper (Fig. 5 in the revised manuscript), can be observed. Besides organs other than the brain, it could also be applied to imaging entire living organisms. However, for observation depths up to 10 mm, such as in the whole mouse brain, a mechanism to compensate for spherical aberration is required, which we consider the next step in our technological development.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) I suggest that authors shall re-examine the quantitative nature of their image processing algorithm. Also, I wonder whether there are parameters that could be adjusted, as images in Figure 3D and 4E seem to be oversharpened with potential loss of information. 

      As the reviewer pointed out, we recognized that there was an insufficient explanation of the image processing.

      Therefore, descriptions on the quantitative nature and parameter adjustments have been added to the text (Materials and Methods, Line 552) and the Supplementary File (Fig. S4-5, Note 2), and these have been referenced in the main text. A summary is given below.

      The adjustable parameters in our method include the cutoff frequency of the smoothing filter used in the background light estimation. If the cutoff frequency is too high, the focal plane component will be included in the “background”; if it is too low, background light will remain in the focal plane. The cutoff frequency needs to be optimized within this range. In this optimization, neither the size of the cell itself nor the performance of the optical system was considered; instead, we utilized the concept of independent component analysis (ICA). This approach is taken because the size and structure of cells vary from sample to sample, and the optical properties also vary with wavelength and location, making it impractical to consider each factor for every case. ICA employs a blind separation method, which is based on the principle that individual signals deviate from the normal (Gaussian) distribution, while the superimposition of signals tends to bring the distribution closer to the Gaussian distribution. Several indices have been proposed to quantify the non-Gaussian nature of the distribution, including kurtosis, skewness, negentropy, and mutual information. Among these measures, we empirically found skewness to be the most suitable and robust, and therefore adopted it for our algorithm. The optimal parameters were selected using a subset of the data before applying the calculations of the entire dataset. The determined values were then applied to the entire dataset.

      Regarding the oversharpening, we believe that it rarely occurs in the image data shown in the manuscript. In a case where low-frequency structures and high-frequency structures are mixed in the focal plane, oversharpeninglike effect can occur because of the disappearance of low-frequency structures, which is discussed in Supplementary File (Note 2, Figs. S5D). However, in the case of a sample with nearly uniform spatial frequency, such as the nucleus observed in this study, oversharpening is unlikely to occur by setting appropriate parameters as described above. If it appears that some images are oversharpened in the figures, it is due to the contrast of the image.

      (2) On the other hand, I am curious how a wide-field fluorescence system may reliably extract information from a denselylabeled sample within axial volume of 200 um, as they showed in the mouse brain in Figure 4. Thus I am skeptical regarding the fidelity and completeness of the signals and cells recorded there. It would be ideal if the authors could benchmark their system performance with a two-photon microscope system, which serves as the ground truth. 

      The reviewer's suggestion is reasonable; however, we are unfortunately unable to observe the same sample using a two-photon microscope. Instead, we will explain these differences from a theoretical perspective. Two-photon microscopes used for brain imaging typically employ objective lenses with a numerical aperture (NA) of at least 0.5, allowing for 3D imaging with depth resolution ranging from several micrometers down to sub-micrometer levels. In contrast, our method uses a lens system with NA of 0.25, and the optical configuration (focusing NA, pinhole size) are not optimized for resolution (Note 2 in Supplementary File), thus the longitudinal resolution (FWHM) is about 14 microns (Fig. 3E in the revised manuscript). This difference is significant in the brain imaging, where our method may not fully separate all cells in close proximity along the depth axis, as shown in the bottom panels (xz-plane) of Fig. 5F of the revised manuscript. Nevertheless, we believe that cell nuclei can be accurately detected in this 3D image using appropriate cell detection methods based on deep learning. To support this claim, we conducted cell detection using the state-of-the-art cell detection platform ELEPHANT and incorporated the results into Fig. 5 (Fig. 5G-I). This figure demonstrates that even with the current spatial resolution, accurate detection of cell nuclei is achievable.

      We accordingly added one paragraph (Line 285) in the main text to explain the cell detection method and discuss the results. We also added one section into Materials and Methods for more detail of the cell detection (Line 650).

      In conjunction with the revision, the developer of ELEPHANT (K. Sugawara) has been included as a co-author.

      Reviewer #2 (Recommendations For The Authors): 

      In my opinion, the following concerns need to be addressed. 

      Major comments: 

      (1) The proposed system's crucial element involves the development of a giant lens system with a numerical aperture (NA) of 0.25. However, a comprehensive introduction and explanation of this significant giant lens system are missing from the manuscript. I strongly suggest that the authors supplement the relevant content to provide a clearer understanding of this integral component. 

      A detailed description of the giant lens system has been added to the main text (Optical Configuration and Performance, Line 83) and the Materials and Methods section (Wide -field imaging system (AMATERAS-2w) configuration, Line 446). A diagram of the lens configuration has also been included in Fig. 1A. In conjunction with these additions, two engineers from SIGMAKOKI CO. LTD., who made significant contributions to the design and manufacturing of the lens system, have been included as co-authors.

      (2) The manuscript introduces a computational sectioning technique, based on iteratively filtering technology. However, the accuracy of this algorithm is not sufficiently validated. 

      It is challenging to discuss accuracy of the processing results compared to the ground truth, because the ground truth is unknown for any of the experiments. Instead, in the Supplementary File (Notes 2, Figures S4-5), we show how the processing results for the measured and simulated data vary with the parameter (cutoff frequency), illustrating the characteristics of our method. The results suggest that by optimally pre-selecting the parameter, it is possible to successfully separate the in-focus and out-of-focus components. This discussion is related to our response to the first recommendation made by the reviewer #1. Please review our response to Reviewer #1 regarding parameter optimization and oversharpening. Here, as an addition, we describe a discussion of the conditions that must be met in order to perform the calculation correctly, as described below (also included in Note 2, Limitation of the computational sectioning).

      To apply this method, certain requirements must be met regarding cutoff spatial frequency and intensity. Regarding cutoff spatial frequency, the algorithm utilizes a low-pass filter with a single cutoff frequency, which can make it challenging to accurately extract structures in the focal plane when structures of varying sizes and shapes are mixed within the sample. This is illustrated by the simulation shown in Fig. S5 and described in Note 2. Conversely, regarding intensity, if the structure’s intensity in the focal plane is weak compared to the Gaussian fluctuations in the background intensity, it becomes difficult to extract the structure. However, intensity fluctuations can be reduced by applying a 3x3 moving average filter to the entire image as a pre-processing step before applying the baseline estimation algorithm. 

      In the experimental data presented in this paper (Figs. 4-6 in the revised manuscript), the spatial frequency issue was not significant because the target structures, which are stained nuclei, appear to be of nearly uniform size in the focal plane. The second issue, related to intensity, is also addressed in Fig. 4, as the signal intensity from the focal plane is sufficient to overcome background light in almost all regions. In the mouse brain example, the use of confocal imaging suppresses background light, allowing the structures in the focal plane to be accurately extracted.

      (3) I didn't see a detailed description of the confocal imaging in the manuscript. If it adheres to conventional confocal technology, then the question arises: what truly constitutes the novel aspect of this technique? 

      The principle of confocal imaging and optics is based on the use of a pinhole array, a system also employed commercially by CrestOptics (X-Light, Italy). Prior to the 1990s, when the configuration utilizing Yokogawa Electric's pinhole array and microlens array pairs became popular, pinhole array-only setups were the norm, and are now considered somewhat traditional. We do not claim novelty in the optical configuration itself, but rather in the design of a confocal optical system tailored for our original large-field (low-magnification) imaging system with a relatively high NA. The pinhole array disk we designed features significantly smaller pinholes and correspondingly tighter pinhole spacing than those used for high-magnification observation purposes. We believe that this unique size and arrangement provides sufficient novelty.

      We have revised the manuscript to clearly emphasize what we believe constitutes the novelty of this technique (paragraphs starting from Line 166 and Line 183). We have also added a discussion on our confocal optical configuration and its spatial resolution in the Supplementary File (Note 1, Fig. S2-3).

      (4) Light-sheet and light-field microscopy, as two emerging 3D microscopy techniques which has theoretically higher throughput than confocal, are not sufficiently introduced in this manuscript. 

      In the previous version, we briefly mentioned light-sheet and light-field microscopy, but we recognized that more detailed explanations were necessary and should be included in the manuscript. We have added several sentences to the main text (Line 159-165). A summary is provided below. 

      Light-sheet microscopy requires the illumination light to propagate over long distances within the specimen, and many applications necessitate the use of transparency-enhanced tissue. Even when the sample is highly transparent, no existing technique can form thin optical sections as long as 1 cm. Therefore, light-sheet microscopy is not an effective method for the thin, wide, three-dimensional objects that are the focus of this project. Regarding light-field microscopy, it features a trade-off where the number of pixels in the two-dimensional plane is reduced in exchange for the ability to record three-dimensional fluorescence distribution information in a single shot. In our imaging system, the pixel spacing is set to be comparable to the Nyquist Frequency to observe as many cells as possible, meaning that no more additional pixels can be sacrificed. Therefore, the light-field microscopy technique is not suitable for our imaging system.

      (5) The fluorescence images of cardiomyocytes derived from human induced pluripotent stem cells (hiPSCs) stained with Rhodamine phalloidin, as presented in Figure 1(E), exhibit suboptimal quality. This may hinder the effective use of the image for biological research. It is imperative that the authors address and explain this aspect, shedding light on the limitations and potential implications of the research findings. 

      We acknowledge the reviewer’s concern regarding the suboptimal quality of the fluorescence image. Upon further examination, we recognized that the resolution and clarity of the image could potentially limit its utility for detailed biological analysis. To address this, we have re-examined the image size and quality to enhance its presentation in Fig. 2C-E in the revised manuscript, which allows for finer structures to be recognized within the large image size.

      Regarding the effective use of the image for biological research, the results shown in the images indicated the capability of observing subcellular structures, such as myofibrils, in cell sheets with a large area, such as myocardial sheets. This would enable us to simultaneously investigate micro-level structures (orientation and density of myofibrils) and macro-level multicellular dynamics (performance of myocardial sheet). We added the above explanation in the manuscript (Line 146). We hope this revision clarifies the quality and utility of the presented image.

      (6) The imaging quality difference between the two techniques shown in Figure 1F, G is relatively small, and the signal distribution difference shown in Figure H is significant, unlike the effects expected from an improvement in resolution. 

      We acknowledge the reviewer's concern regarding the minimal apparent difference in imaging quality between the two images. Upon re-evaluation, we recognized that the original presentation may not have clearly demonstrated the improvements intended by the different techniques. Figure 1H, which showed the line profile of Figs. 1F and G, may have been impacted by the resolution and compression settings of the image file, leading to a less pronounced distinction between the two techniques. To address this, we have enlarged Figs 1F and 1G

      (renumbered as Fig. 2D and 2E in the revised manuscript) and carefully reviewed the resolution and compression ratio to ensure that the differences are more clearly visible. 

      (7) The chart in Figure 2(C) lacks axis titles and numerical labels, making it challenging for readers to comprehend. To enhance reader convenience, it is recommended that the authors incorporate axis titles and numerical labels, providing a clearer context for interpreting the chart. 

      We appreciate the reviewer’s observation regarding the lack of axis titles and numerical labels in the figure. The vertical axis represents fluorescence intensity, which we initially omitted, assuming it was self-evident. However, as the reviewer correctly pointed out, it is crucial to ensure that figures are clear and accessible to readers from diverse backgrounds. In response, we have added the vertical axis title to Fig. 2C (renumbered as Fig. 3C in the revised manuscript) to enhance clarity, while the numerical labels remain omitted as the unit is arbitrary (a.u.). We have also reviewed all other figures in the manuscript to ensure that no similar errors are present.

      (8) In Figures 2(D) and (E), where the authors present the point spread function for quantifying the lateral and axial resolution of the system, I would recommend increasing the number of fluorescent microspheres to more than 10 for statistical averaging. This adjustment would strengthen the persuasiveness of the data and contribute to a more robust analysis. 

      We appreciate the reviewer’s recommendation to increase the number of fluorescent microspheres for statistical averaging in Figs. 2D and E (renumbered as Fig. 3D-E in the revised manuscript). In response, we have revised the graphs to present the point spread function with the statistical mean and standard deviation (SD) of fluorescent images obtained from a large sample size (N = 100), and accordingly revised the main text to mention the statistics (Line 118, Line 132). We also recognized that a similar adjustment was necessary for Figs 1C and D (renumbered as Fig. 2A-B in the revised manuscript), and have accordingly made the same modifications to those figures as well. We believe these changes enhance the robustness and persuasiveness of our data.

      (9) Figure 4(C) visually represents the characteristic 3D structures of several regions. However, discerning the 3D structural information in the images poses a challenge. To address this issue, I recommend that the authors optimize the 3D visualization to improve clarity and facilitate a more effective interpretation of the depicted structures. 

      We appreciate the reviewer’s suggestion regarding the challenges in discerning the 3D structural information in Fig. 4C. To address this, we have added representative images from the xy-plane and xz-plane of the cortex, medial habenula, and choroid plexus (Fig. 5G-I) in the revised manuscript. These additions provide a clearer visualization of the 3D distribution in each region, making it easier for readers to interpret the structures. Additionally, we have overlaid the results of deep-learning based cell detection on these images, further enhancing the visibility of the cells. This adjustment also aligns with our response to Reviewer #1's second comment.

      Minor comments: 

      (1) The labelling of ROI is missing in Figure 1(e). 

      We appreciate the reviewer’s observation regarding the missing labeling of the ROI in Fig. 1E. Upon review, we confirmed that the ROI was indeed labeled with a white square in the previous manuscript; however, it was difficult to discern due to its small size and the black-and-white contrast. To improve visibility, we have recolored the square in magenta, ensuring that it stands out more clearly in the figure (Fig. 2C in the revised manuscript).

      (2) The subfigure order and labeling in Fig. 1 and Fig. 2 are not consistent.

      We appreciate the reviewer’s attention to the subfigure order and labeling in Fig. 1 and 2 (Fig. 1-3 in the revised manuscript). To accommodate subfigures of varying sizes without leaving gaps, we arranged the subfigures in a non-sequential order. However, we have ensured that the text refers to the figures in the correct order. We acknowledge the importance of consistency and will work with the editorial team to explore the best way to present the figures while maintaining clarity and alignment with standard practices.

      (3) Figure 1B reappears in Figure 2.  

      We appreciate the reviewer’s observation regarding the repetition of Figure 1B in Figure 2. While the central part of the optical system (custom lens system) is common to both figures, the illumination system, pinhole array disk, and detection optics for the confocal set up differ. To provide a complete understanding of the optical system, we opted to include the full diagram in Fig. 2B (renumbered as Fig. 3B in the revised manuscript). We considered highlighting only the different components, but we felt that doing so might complicate the reader’s comprehension of the overall system. Therefore, we chose to include the common elements twice to ensure clarity.

    1. eLife Assessment

      Wang et al. presented visual (dot) motion and/or the sound of a walking person and found solid evidence that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm, with some demonstration that the gait rhythm is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition). The valuable findings will be of wide interest to those examining biological motion perception and oscillatory processes more broadly. Some of the theoretical interpretations concerning entrainment must remain speculative when the authors cannot dissociate evoked responses from entrained oscillatory effects

    2. Reviewer #1 (Public review):

      Summary:

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition, Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      Weaknesses:

      There is a concern of double-dipping in one of the tests (Experiment 2, Figure 3: interaction of Upright/Inverted X Congruent/Incongruent). I raised this concern on the original submission, and it has not been resolved properly. The follow-up statistical test (after channel selection using the interaction contrast permutation test) still is geared towards that same contrast, even though the latter is now being tested differently. (Perhaps not explicitly testing the interaction, but in essence still testing the same.) A very simple solution would be to remove the post-hoc statistical tests and simply acknowledge that you're comparing simple means, while the statistical assessment was already taken care of using the permutation test. (In other words: the data appear compelling because of the cluster test, but NOT because of the subsequent t-tests.)

    3. Reviewer #2 (Public review):

      Summary:

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      Weaknesses:

      In the revised version of the paper, the manuscript better relays the results and anticipates analyses, and this version adequately resolves some concerns I had about analysis details. Still, it is my view that the findings of the study are basic neural correlate results that do not provide insights into neural mechanisms or the causal relevance of neural effects towards behavior and cognition. The presence of an inversion effect suggests that the supra-additivity is related to cognition, but that leaves open whether any detected neural pattern is actually consequential for multi-sensory integration (i.e., correlation is not causation). In other words, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated in the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion, and it is possible that such neural correlates relate to behaviorally relevant neural mechanisms, but based on the current task and associated analyses this has not been shown.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      We are grateful to the reviewer’s thoughtful comments.

      Weaknesses:

      I only have one potential worry. The analysis for gait tracking (1 Hz) in Experiment 2 (Figures 3a/b) starts by computing a congruency effect (A/V stimulation congruent (same frequency) versus A/V incongruent (V at 1 Hz, A at either 0.6 or 1.4 Hz), separately for the Upright and Inverted conditions. Then, this congruency effect is contrasted between Upright and Inverted, in essence computing an interaction score (Congruent/Incongruent X Upright/Inverted). Then, the channels in which this interaction score is significant (by cluster-based permutation test; Figure 3a) are subselected for further analysis. This further analysis is shown in Figure 3b and described in lines 195-202. Critically, the further analysis exactly mirrors the selection criteria, i.e. it is aimed at testing the effect of Congruent/Incongruent and Upright/Inverted. This is colloquially known as "double dipping", the same contrast is used for selection (of channels, in this case) as for later statistical testing. This should be avoided, since in this case even random noise might result in a significant effect. To strengthen the evidence, either the authors could use a selection contrast that is orthogonal to the subsequent statistical test, or they could skip either the preselection step or the subsequent test. (It could be argued that the test in Figure 3b and related text is not needed to make the point - that same point is already made by the cluster-based permutation test.)

      Thanks for the helpful suggestions. In Experiment 2, to investigate whether the multisensory integration effect was specialized for biological motion perception, we contrasted the congruency effect between the upright and inverted conditions to search for clusters showing a significant interaction effect. We performed further analyses based on neural responses from this cluster to examine whether the congruency effect was significant in the upright and the inverted conditions, respectively, following the logic of post hoc comparisons after identifying an interaction effect. However, we agree with the reviewer that comparing the congruency effects between the upright and inverted conditions again based on data from this cluster was redundant and resulted in doubledipping. Therefore, we have removed this comparison from the main text and optimized the way to present our results in the revised Fig. 3).

      Related to the above: the test for the three-way interaction (lines 211-216) is reported as "marginally significant", with a p-value of 0.087. This is not very strong evidence.

      As shown in Fig.3b & e, the magnitude of amplitude differs between the gaitcycle frequency (mean = 0.008, SD = 0.038) and the step-cycle frequency (mean = 0.052; SD =0.056), which might influence the statistical results of the interaction effect. To reduce such influence, we converted the amplitude data at each frequency condition into Z-scores, separately. The repeated-measures ANOVA analysis on these normalized amplitude data revealed a significant three-way interaction (F (1,23) = 7.501, p = 0.012, ƞ<sub>p</sub><sup>2</sup> \= 0.246). We have updated the results in the revised manuscript (lines 218-225).

      Reviewer #1 (Recommendations For The Authors):

      -  Which variable caused one data point to be classified as outlier? (line 221).

      The outlier is a participant whose audiovisual congruency effect (Upright – Inverted) in neural responses at the frequency of interest exceeds 3 SD from the group mean. It is marked by a red diamond in Author response 2. Before removing the data, the correlation between the AQ score and the congruency effect is r \= -0.396, p \= 0.055. For comparison, the results after removing the outlier are shown in Fig. 3c of the revised manuscript. We have added more information about the variable causing the outlier in the revised manuscript (lines 231-232).

      Author response image 1.

      The correlation between AQ score and congruency effect

      -  The authors cite Maris & Oostenveld (2007) in line 415 as the main reference for the FieldTrip toolbox, but the correct reference here is different, see https://www.fieldtriptoolbox.org/faq/how_should_i_refer_to_fieldtrip_in_my_p ublication/

      Thank you for pointing out this issue. Citation corrected.

      -  The authors could consider giving some more background on the additive vs superadditive distinction in the Introduction, which may increase the impact; as it stands the reader might not know why this is particularly interesting. Summarize some of the takeaways of the Stevenson et al. (2014) review in this respect.

      Thanks for the suggestion and we have added the following relevant information in the Introduction (lines 80-90):

      “Moreover, we adopted an additive model to classify multisensory integration based on the AV vs A+V comparison. This model assumes independence between inputs from each sensory modality and distinguishes among sub-additive (AV < A+V), additive (AV = A+V), and super-additive (AV > A+V) response modes (see a review by Stevenson et al., 2014). The additive mode represents a linear combination between two modalities. In contrast, the super-additive and subadditive modes indicate non-linear interaction processing, either with potentiated neural activation to facilitate the perception or detection of nearthreshold signals (super-additive) or a deactivation mechanism to minimize the processing of redundant information cross-modally (sub-additive) (Laurienti et al., 2005; Metzger et al., 2020; Stanford et al., 2005; Wright et al., 2003).”

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      We thank the reviewer for the valuable feedback.

      Weaknesses:

      The manuscript interprets the neural findings using mechanistic and cognitive claims that are not justified by the presented analyses and results.

      First, entrainment and cortical tracking are both invoked in this manuscript, sometimes interchangeably so, but it is becoming the standard of the field to recognize their separate evidential requirements. Namely, step and gate cycles are striking perceptual or cognitive events that are expected to produce event-related potentials (ERPs). The regular presentation of these events in the paradigm will naturally evoke a series of ERPs that leave a trace in the power spectrum at stimulation rates even if no oscillations are at play. Thus, the findings should not be interpreted from an entrainment framework except if it is contextualized as speculation, or if additional analyses or experiments are carried out to support the assumption that oscillations are present. Even if oscillations are shown to be present, it is then a further question whether the oscillations are causally relevant toward the integration of biological motion and for the orchestration of cognitive processes.

      Second, if only a cortical tracking account is adopted, it is not clear why the demonstration of supra-additivity in spectral amplitude is cognitively or behaviorally relevant. Namely, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated with the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion, and it is possible that such neural correlates relate to behaviorally relevant neural mechanisms, but based on the current task and associated analyses this has not been shown.

      Thanks for raising the important concerns regarding the interpretation of our results within the entrainment or the cortical tracking frame. A strict neural entrainment account emphasizes the alignment of endogenous neural oscillations with external rhythms, rather than a mere regular repetition of stimulus-evoked responses. However, it is challenging to fully dissociate these components, given that rhythmic stimulation can shape intrinsic neural oscillations, resulting in an intricate interplay between endogenous neural oscillations and stimulus-evoked responses (Duecker et al., 2024; Herrmann et al., 2016; Hosseinian et al., 2021). Therefore, some research, including the current study, use the term “entrainment” to refer to the alignment of brain activity to rhythmic stimulation in a broader context, without isolating the intrinsic oscillations and evoked responses (e.g., Ding et al., 2016; Nozaradan et al., 2012; Obleser & Kayser, 2019). Nevertheless, we agree with the reviewer that since the current results did not examine or provide direct evidence for endogenous oscillations, it is better to contextualize the oscillation view as speculations. Hence, we have replaced most of the expressions about “entrainment” with a more general term “tracking” in the revised manuscript (as well as in the title of the manuscript). We only briefly mentioned the entrainment account in the Discussion to facilitate comparison with the literature (lines 307-312).

      Regarding the relevance between neural findings and cognition or behavioral performance, the first supporting evidence comes from the inversion effect in Experiment 2. For the neural responses at gait-cycle frequency, we observed a significantly enhanced audiovisual congruency effect in the upright condition compared with the inverted condition. Inversion disrupts the distinctive kinematic features of biological motion (e.g., gravity-compatible ballistic movements) and significantly impairs biological motion processing, but it does not change the basic visual properties of the stimuli, including the rhythmic signals generated by low-level motion cues. Therefore, the inversion effect has long been regarded as an indicator of the specificity of biological motion processing in numerous behavioral and neuroimaging studies (Bardi et al., 2014; Grossman & Blake, 2001; Shen, Lu, Yuan, et al., 2023; Simion et al., 2008; Troje & Westhoff, 2006; Vallortigara & Regolin, 2006; Wang et al., 2014; Wang & Jiang, 2012; Wang et al., 2022). Here, our finding of the cortical tracking of higher-order rhythmic structures (gait cycles) present in the upright but not in the inverted condition suggests that this cortical tracking effect can not be explained by ERPs evoked by regular onsets of rhythmic events. Rather, it is closely linked with the specialized cognitive processing of biological motion. Furthermore, we found that the BM-specific cortical tracking effect at gait-cycle frequency (rather than the non-selective tracking effect at step-cycle frequency) correlates with observers’ autistic traits, indicating its functional relevance to social cognition. These findings convergingly suggest that the cortical tracking effect that we currently observed engages cognitively relevant neural mechanisms. In addition, our recent behavioral study showed that listening to frequency-congruent footstep sounds, compared with incongruent sounds, enhanced the visual search for human walkers but not for non-biological motion stimuli containing the same rhythmic signals (Shen, Lu, Wang, et al., 2023). These results suggest that audiovisual correspondence specifically enhances the perceptual and attentional processing of biological motion. Future research could examine whether the cortical tracking of rhythmic structures plays a functional role in this process, which may shed more light on the behavioral relevance of the cortical tracking effect to biological motion perception. We have incorporated the above information into the Discussion (lines 268-293).

      Reviewer #2 (Recommendations For The Authors):

      In Figure 1c, it could be helpful to add the word "static" in the illustration for the auditory condition so that readers understand without reading the subtext that it is a static image without biological motion.

      Suggestion taken.

      In the Discussion, I believe it is important to justify an oscillation and entrainment account, or if it cannot be justified based on the current results and analyses (which is my opinion), it could be helpful to explicitly frame it as speculation.

      We agree with the reviewer. For more clarification, please refer to our response to the public review.

      L335, I did not understand this sentence - a reformulation would be helpful.

      The point-light stimuli were created by capturing the motion of a walking actor (Vanrie & Verfaillie, 2004). The global motion of the walking sequences was eliminated so that the point-light walker looks like walking on a treadmill without translational motion. We have reformulated the sentence as follows: “The point-light walker was presented at the center of the screen without translational motion.”

      The results in Figure 2a and 2d are derived by performing a t-test between the amplitude at the frequency of gait and step cycles and zero. Comparison against amplitude of zero is too liberal; the possibility for a Type-I error is inflated because even EEG data with only noise will not have amplitudes of zero at all frequencies. A better baseline (H0) is either the 1/frequency trend in the power spectrum derived using methods like FOOOF (https://fooof-tools.github.io/fooof/) or by performing non-parametric shuffling based methods (https://doi.org/10.1016/j.jneumeth.2007.03.024).

      In our data analysis, instead of performing the t-test between raw amplitude with zero, we compared the normalized amplitude at each frequency bin (by subtracting the average amplitude measured at the neighboring frequency bins from the original amplitude data) against zero. Such analysis is equal to contrasting the raw amplitude to its neighboring frequency bins, allowing us to test whether the neural response in each frequency bin showed a significant enhancement compared with its neighbors. The multiple comparisons on each frequency bin were controlled by false discovery rate (FDR) correction, reducing the Type-I error. Such analysis procedures help reduce (though not totally remove) the influence of the 1/f trend and have been widely used in this field (Cirelli et al., 2016; Henry & Obleser, 2012; Lenc et al., 2018; Nozaradan et al., 2012; Peter et al., 2023).

      To further verify our findings, we adopted the reviewer’s suggestion and created a baseline by performing a non-parametric shuffling-based analysis. More specifically, to establish the statistical significance of amplitude peaks, we carried out a surrogate analysis on each condition. For each participant, a single control surrogate dataset was derived from their actual dataset by jittering the onset of each step-cycle relative to the actual original onset by a randomly selected integer value ranging between − 490–490 ms. This procedure removed the consistent relationship between the EEG signal and the stimuli while preserving each epoch’s general timing within the exposure period. Then, epochs were extracted based on surrogate stimuli onset, and amplitude was computed across frequencies through FFT under a null model of non-entrainment (Moreau et al., 2022). This entire procedure was performed 100 times, producing a surrogate amplitude distribution of 100 group-averaged values for each condition. If the observed amplitude values at the frequency of interest exceeded the value corresponding to the 95th percentile of the surrogate distribution (p < .05) within a given condition (e.g., AV), the amplitude peak was considered significant (Batterink, 2020). As shown in Author response image 2, the statistical results from these analyses are similar to those reported in the manuscript, confirming the significant amplitude peaks at the frequencies of interest.

      Author response image 2.

      Non-parametric analysis for spectral peak. The dotted lines represent the random data based on shuffling analysis. The solid lines represent the observed data in measured EEG signals. All conditions induced significant peaks at step-cycle frequency and its harmonic, while only the AV condition induced a significant peak at gait-cycle frequency.

      Reviewer #3 (Public Review):

      Strengths:

      The main strengths of the paper relate to the conceptualization of BM and the way it is operationalized in the experimental design and analyses. The use of entrainment, and the tracking of different, nested aspects of BM result in seemingly clean data that demonstrate the basic pattern. The first experiments essentially provide the basic utility of the methodological innovation and the second experiment further hones in on the relevant interpretation of the findings by the inclusion of better control stimuli sets.

      Another strength of the work is that it includes at a conceptual level two replications.

      We appreciate the reviewer for the comprehensive review and positive comments.

      Weaknesses:

      The statistical analysis is misleading and inadequate at times. The inclusion of the autism trait is not foreshadowed and adequately motivated and is likely underpowered. Finally, a broader discussion over other nested frequencies that might reside in the point-light walker stimuli would also be important to fully interpret the different peaks in the spectra.

      (1) Regarding the nested frequency peaks in the spectra, we did observe multiple significant amplitude peaks at 1f (1/0.83 Hz), 2f (2/1.67 Hz), and 4f (4/3.33 Hz) relative to the gait-cycle frequency (Fig. 2 a&d). To further test the functional roles of the neural activity at different frequencies, we analyzed the audiovisual integration modes at each frequency. Note that we collapsed the data from Experiments 1a & 1b in the analysis as they yielded similar results. Overall, results show a similar additive audiovisual integration mode at 2f and 4f and a super-additive integration mode only at 1f (Figure S1), suggesting that the cortical tracking effects at 2f and 4f may be functionally linked but independent of that at 1f. We have reported the detailed results in the Supplementary Information.

      (2) For the reviewer’s other concerns about statistical analysis and autism traits, please refer to our responses below to the Recommendations for the authors.

      Reviewer #3 (Recommendations For The Authors):

      The description of the analyses performed for experiment 2 comes across as double dipping. Congruency effects for BM and non-BM motion (inverted) were compared using cluster-based statistics. Then identified clusters informed an averaging of signals which then were subjected to a paired comparison. At this point, it is no surprise that these paired comparisons are highly significant seeing that the channels were selected based on a cluster analysis of the same exact contrast. This approach should be avoided.

      In the analysis of the repeated measures ANOVA reporting a trend as marginally significant is misleading. Reporting the statistical results whilst indicating that those do not reach significance is the appropriate way to communicate this finding. Other statistics can be used in order to provide the likelihood of those findings supporting H1 or H0 if the authors would like to state something more precise (Bayesian).

      Thanks for the comments. We have addressed these two points in our response to the public review of Reviewer #1.

      The authors perform a correlation along "autistic trait" scores in an individual differences approach. Individual differences are typically investigated in larger samples (>n=40). In addition, the range of AQ scores seems limited to mostly average or lower-than-average AQs (barring a couple). These points make the conclusions on the possible role of BM in the autistic phenotype very tentative. I would recommend acknowledging this.

      An alternative analysis approach that might better suit the smaller sample size is a comparison between high and low AQ participants, defined based on a median split.

      Many thanks for the suggestion. We agree with the reviewer that the sample size (n = 24) in the current study is not large for exploring the correlation between BM and autistic traits. The narrow range of AQ scores was due to the fact that all participants were non-clinical populations and we did not pre-select participants by AQ scores. To further confirm our findings, we adopted your suggestion to compare the BM-specific cortical tracking effect (i.e., audiovisual congruency effect (Upright - Inverted)) between high and low AQ participants split by the median AQ score (20) of this sample. Similar to correlation analysis, one outlier, whose audiovisual congruency effect (Upright – Inverted) in neural responses at 1 Hz exceeds 3 SD from the group mean, was removed from the following analysis. As shown in Figure S3, at 1 Hz, participants with low AQ showed a greater cortical tracking effect compared with high AQ participants (t (21) = 2.127, p \= 0.045). At 2 Hz, low and high AQ participants showed comparable neural responses (t (22) = 0.946, p \= 0.354). These results are in line with the correlation analysis, providing further support to the functional relevance between social cognition and cortical tracking of biological motion as well as its dissociation at the two temporal scales. We have added these results to the main text (lines 238-244) and the supplementary information.

      Writing

      The narrative could be better unfolded and studies better motivated. The transition from basic science research on BM to possibly delineating a mechanistic understanding of autism was a surprise at the end of the intro. Once the authors consider the suggestions and comments above it would be good to have this detail and motivation more obviously foreshadowed in the text.

      Thanks for the great suggestion and we have provided an introduction about how audiovisual BM processing links with social cognition and ASD in the first paragraph of the revised manuscript (lines 46-56). In particular, integrating multisensory BM cues is foundational for perceiving and attending to other people and developing further social interaction. However, such ability is usually compromised in people with social deficits, such as individuals with autism spectrum disorder (ASD) (Feldman et al., 2018), and even in non-clinical populations with high autistic traits (Ujiie et al., 2015). These behavioral findings underline the close relationship between multisensory BM processing and one’s social cognitive capability, motivating us to further explore this issue at the neural level in the current study. We have also modified the relevant content in the last paragraph of the Introduction (lines 100-108), briefly mentioning the methods that we used to investigate this issue.

      The use of terminology related to neural oscillations which are entraining to the BM seems to suggest that the rhythmic tracking inevitably stems from the shaping of existing intrinsic dynamics of the brain. I am not sure this is necessarily the case. I would therefore adopt a more concrete jargon for the description of the entrainment seen in this study. If a discussion over internal dynamics shaped by external stimuli should be invoked, it should be done explicitly with appropriate references (but in my opinion, it isn't quite required).

      Please refer to our response to a similar point raised in the public review of Reviewer #2.

      References

      Bardi, L., Regolin, L., & Simion, F. (2014). The First Time Ever I Saw Your Feet: Inversion Effect in Newborns’ Sensitivity to Biological Motion. Developmental Psychology, 50. https://doi.org/10.1037/a0034678

      Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/highfunctioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5–17. https://doi.org/10.1023/a:1005653411471

      Batterink, L. (2020). Syllables in Sync Form a Link: Neural Phase-locking Reflects Word Knowledge during Language Learning. Journal of Cognitive Neuroscience, 32(9), 1735–1748. https://doi.org/10.1162/jocn_a_01581

      Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience, 10. https://doi.org/10.3389/fnins.2016.00229

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B. J., Sivarao, D. V., & Zoefel, B. (2024). Challenges and approaches in the study of neural entrainment. Journal of Neuroscience, 44(40). https://doi.org/10.1523/JNEUROSCI.1234-24.2024

      Falck-Ytter, T., Nyström, P., Gredebäck, G., Gliga, T., Bölte, S., & the EASE team. (2018). Reduced orienting to audiovisual synchrony in infancy predicts autism diagnosis at 3 years of age. Journal of Child Psychology and Psychiatry, 59(8), 872–880. https://doi.org/10.1111/jcpp.12863

      Feldman, J. I., Dunham, K., Cassidy, M., Wallace, M. T., Liu, Y., & Woynaroski, T. G. (2018). Audiovisual multisensory integration in individuals with autism spectrum disorder: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews, 95, 220–234. https://doi.org/10.1016/j.neubiorev.2018.09.020

      Grossman, E. D., & Blake, R. (2001). Brain activity evoked by inverted and imagined biological motion. Vision Research, 41(10), 1475–1482. https://doi.org/10.1016/S0042-6989(00)00317-5

      Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences, 109(49), 20095–20100. https://doi.org/10.1073/pnas.1213390109

      Herrmann, C. S., Murray, M. M., Ionta, S., Hutt, A., & Lefebvre, J. (2016). Shaping Intrinsic Neural Oscillations with Periodic Stimulation. Journal of Neuroscience, 36(19), 5328–5337. https://doi.org/10.1523/JNEUROSCI.0236-16.2016

      Hosseinian, T., Yavari, F., Biagi, M. C., Kuo, M.-F., Ruffini, G., Nitsche, M. A., & Jamil, A. (2021). External induction and stabilization of brain oscillations in the human. Brain Stimulation, 14(3), 579–587. https://doi.org/10.1016/j.brs.2021.03.011

      Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. https://doi.org/10.1038/nature07868

      Laurienti, P. J., Perrault, T. J., Stanford, T. R., Wallace, M. T., & Stein, B. E. (2005). On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166(3), 289–297. https://doi.org/10.1007/s00221-005-2370-2

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Metzger, B. A., Magnotti, J. F., Wang, Z., Nesbitt, E., Karas, P. J., Yoshor, D., & Beauchamp, M. S. (2020). Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 40(36), 6938–6948. https://doi.org/10.1523/JNEUROSCI.0279-20.2020

      Moreau, C. N., Joanisse, M. F., Mulgrew, J., & Batterink, L. J. (2022). No statistical learning advantage in children over adults: Evidence from behaviour and neural entrainment. Developmental Cognitive Neuroscience, 57, 101154. https://doi.org/10.1016/j.dcn.2022.101154

      Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective Neuronal Entrainment to the Beat and Meter Embedded in a Musical Rhythm. Journal of Neuroscience, 32(49), 17572–17581. https://doi.org/10.1523/JNEUROSCI.3203-12.2012

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Peter, V., Goswami, U., Burnham, D., & Kalashnikova, M. (2023). Impaired neural entrainment to low frequency amplitude modulations in English-speaking children with dyslexia or dyslexia and DLD. Brain and Language, 236, 105217. https://doi.org/10.1016/j.bandl.2022.105217

      Shen, L., Lu, X., Wang, Y., & Jiang, Y. (2023). Audiovisual correspondence facilitates the visual search for biological motion. Psychonomic Bulletin & Review, 30(6), 2272–2281. https://doi.org/10.3758/s13423-023-02308-z

      Shen, L., Lu, X., Yuan, X., Hu, R., Wang, Y., & Jiang, Y. (2023). Cortical encoding of rhythmic kinematic structures in biological motion. NeuroImage, 268, 119893. https://doi.org/10.1016/j.neuroimage.2023.119893

      Simion, F., Regolin, L., & Bulf, H. (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. https://doi.org/10.1073/pnas.0707021105

      Stanford, T. R., Quessy, S., & Stein, B. E. (2005). Evaluating the Operations Underlying Multisensory Integration in the Cat Superior Colliculus. Journal of Neuroscience, 25(28), 6499–6508. https://doi.org/10.1523/JNEUROSCI.5095-04.2005

      Stevenson, R. A., Ghose, D., Fister, J. K., Sarko, D. K., Altieri, N. A., Nidiffer, A. R., Kurela, L. R., Siemann, J. K., James, T. W., & Wallace, M. T. (2014). Identifying and Quantifying Multisensory Integration: A Tutorial Review. Brain Topography, 27(6), 707–730. https://doi.org/10.1007/s10548-014-0365-7

      Troje, N. F., & Westhoff, C. (2006). The Inversion Effect in Biological Motion Perception: Evidence for a “Life Detector”? Current Biology, 16(8), 821–824. https://doi.org/10.1016/j.cub.2006.03.022

      Ujiie, Y., Asai, T., & Wakabayashi, A. (2015). The relationship between level of autistic traits and local bias in the context of the McGurk effect. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.00891

      Vallortigara, G., & Regolin, L. (2006). Gravity bias in the interpretation of biological motion by inexperienced chicks. Current Biology, 16(8), R279–R280. https://doi.org/10.1016/j.cub.2006.03.052

      Vanrie, J., & Verfaillie, K. (2004). Perception of biological motion: A stimulus set of human point-light actions. Behavior Research Methods, Instruments, & Computers, 36(4), 625–629. https://doi.org/10.3758/BF03206542

      Wang, L., & Jiang, Y. (2012). Life motion signals lengthen perceived temporal duration. Proceedings of the National Academy of Sciences of the United States of America, 109(11), E673-677. https://doi.org/10.1073/pnas.1115515109

      Wang, L., Yang, X., Shi, J., & Jiang, Y. (2014). The feet have it: Local biological motion cues trigger reflexive attentional orienting in the brain. NeuroImage, 84, 217–224. https://doi.org/10.1016/j.neuroimage.2013.08.041

      Wang, Y., Zhang, X., Wang, C., Huang, W., Xu, Q., Liu, D., Zhou, W., Chen, S., & Jiang, Y. (2022). Modulation of biological motion perception in humans by gravity. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-30347-y

      Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech. Cerebral Cortex, 13(10), 1034–1043. https://doi.org/10.1093/cercor/13.10.1034

    1. eLife Assessment

      In this manuscript, Griesius et al analyze the dendritic integration properties of NDNF and OLM interneurons, and suggest that the supralinear NMDA receptor-dependent synaptic integration may be associated with dendritic calcium transients only in NDNF interneurons. These findings are important because they suggest there might be functional heterogeneities in the mechanisms underlying synaptic integration in different classes of interneurons of the mouse neocortex and hippocampus. The revised work remains incomplete due to remaining concerns about experimental methodology, cell health, and lack of dendritic Na-spikes which have been recorded in previous works.

    2. Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Comments on revisions:

      The authors have addressed the reviewers' comments, but haven't resolved most of the key issues.

      Specifically, performing only a single uncaging experiment at a single dendritic location per cell prevents a detailed biophysical analysis of NDNF and OLM cell integration properties. A more extended exploration would have potentially addressed several of the reviewers' questions. It is particularly worrying that the authors cite cell health, dendritic blebbing, and changes in input resistance as the reason for terminating experiments after a single uncaging event. This suggests that the uncaging laser may be damaging the dendrite, potentially affecting the membrane potential directly, and overall cell health, beyond simply uncaging glutamate.

      While the authors' qualitative conclusions about supra-linear integration and NMDA receptor dependency seem plausible, the limited data and potential methodological issues weaken any quantitative interpretations and comparisons between the two cell types.

      Similarly, the absence of dendritic Na-spikes remains unexplained, despite reports of strong dendritic Na-currents in these cells.

    3. Reviewer #3 (Public review):

      Summary:

      The authors study temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. The data indicate non-linear summation, which is larger in dendrites of NDNF-expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Synaptic integration in dendrites of cortical GABAergic interneurons is important and still poorly investigated. Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments. 2P calcium imaging is a powerful method to potentially disentangle dendritic signal processing in interneuron dendrites.

      Weaknesses:

      Due to several experimental limitations of the study including a relatively low number of recorded dendrites, lack of voltage-clamp recordings, lack of NMDA-dependent calcium signals in OLM cells and lack of wash-out during pharmacological experiments (AP5-application), the mechanistic insights are limited.

      (1) NMDA-receptor signalling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. Pharmacology with AP5 hints towards contribution of NMDA receptors. However, the authors report that the non-linearity in not significantly dependent on EPSP amplitude (Fig. S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data showing NMDA and AMPA currents, potentially providing a mechanistic explanation for the non-linear summation.

      (2) Recovery of drug effect. Pharmacological application of AP5 is the only argument for the involvement of NMDA receptors. However, as long-lasting experiments were apparently difficult to obtain, there is no washout-data presented - only drug effect versus baseline. For all the other drugs (TTX, Nimodipine, CPA) recordings were even shorter, lacking a baseline recording. Thus, it remains open to what extent the AP5-effect might be affected by rundown of receptors or channels during whole-cell recordings or beginning phototoxicity.

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The diameter of proximal dendrites in OLM cells is larger than the diameter in NDNF cells. However, there is probably also an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, it would have been helpful to see experiments quantifying synapse density (counting spines, PSD95-puncta, ...) and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      (4) NMDA in OLM-IN. Similar to the NDNF cells, the authors argue for an involvement of NMDA receptors in OLM cells, based on bath-application of AP5 (Fig. 8). Again, there seems to be no significant dependence on EPSP amplitude (Fig. S3). Even more remarkable, the authors claim that there is no dendritic calcium increase after activation of NMDA receptors without showing data. Therefore, it remains unclear whether the calcium signals are just below detection threshold, or whether the non-linearity depends on other calcium-impermeable channels and receptors. To understand this phenomenon different calcium sensors, different Ca2+/Mg2+ concentrations or voltage-clamp data would have helped.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).

      In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.

      The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.

      While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.

      The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.

      I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).

      We thank the reviewer for the evaluation and address the detailed points below.

      Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Strengths:

      (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.

      (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.

      Weaknesses:

      It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.

      We are grateful to the reviewer for the critique. Indeed, the degree to which clustered inputs belonging to a functional neuronal assembly occur on interneuron dendrites is an open question. However, Chen et al (2013, Nature 499:295-300) reported that dendritic domains of PV-positive interneurons in visual cortex, unlike their somata, exhibit calcium transients in vivo which are highly tuned to stimulus orientation. This suggests that clustered inputs to dendritic segments may well belong to functional assemblies, much as in principal cells (e.g. Wilson et al, 2016, Nature Neuroscience 19:1003–1009; Iacaruso et al, 2017, Nature 547;449–452). In our earlier work reporting NMDAR-dependent supralinear summation of glutamate uncaging-evoked responses at a subset of dendrites on PV-positive interneurons, we demonstrated how this arrangement in an oscillating feedback circuit could be exploited to stabilise neuronal assemblies.

      Reviewer #3 (Public review):

      Summary:

      The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.

      Weaknesses:

      (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.

      We thank the reviewer for the helpful comments.

      We did not actually claim that EPSP amplitude has no role in determining the magnitude of non-linearity: “Among possible sources of variability for voltage supralinearity, we did not observe a systematic dependence on the average amplitude of individual uEPSPs […] (Fig. S2)”. Whilst we fully agree that, at first sight, a positive dependence of supralinearity on uEPSP amplitude might be expected simply from the voltage-dependent kinetics of NMDARs, there are two main reasons why this could have been obscured. First, the expected relationship is non-monotonic, because with large local depolarizations the driving force collapses, as seen in the overall sigmoid shape of the average relationship between the scaled observed response and arithmetic sum (e.g. Figs 2a & c; 4c & e). Therefore, we would arguably expect a parabolic relationship rather than a simple positive slope relating the degree of supralinearity to the average amplitude of individual uEPSPs. Second, given that the uncaging distance varied substantially, the average amplitudes of the individual uEPSPs recorded at the soma would have undergone different degrees of electrotonic attenuation and further distortion by active conductances before they were measured. Ultimately, the plots in Fig. S2 show too much scatter to be able to exclude a positive or parabolic relationship of nonlinearity to uEPSP amplitude. To avoid misunderstanding, we have changed the sentence in the Results that refers to Fig. S2 to: “Among possible sources of variability for voltage supralinearity, we did not observe a significant monotonic dependence on the average amplitude of individual uEPSPs, distance from the uncaging location along the dendrite to the soma, [or] the dendrite order (Fig. S2)”.

      As for the relative contributions of NMDARs and AMPARs, voltage clamp recordings from both neurogliaform and OLM interneurons have already been reported, with the conclusion that neurogliaform cells exhibit relatively larger NMDAR-mediated currents (e.g. Chittajallu et al. 2017; Booker et al. 2021; Mercier et al. 2022), entirely in keeping with the conclusions of our study. Repeating these measurements would add little to the study. Furthermore, because the mean baseline uEPSP amplitude was <0.5 mV (Fig S2), it would be difficult to obtain reliable meaurements of isolated NMDAR-mediated uEPSCs.

      Turning to the high variability of supralinearity, indeed, the 95% confidence interval for the data in Fig. 2d is 73%, 213%. This degree of variability is consistent with the wide range of NMDAR/AMPAR ratios reported by Chittajallu et al. 2017 (their Fig. 1g), compounded by the expected non-monotonic relationship alluded to above.

      (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.

      We have plotted the degree of nonlinearity against dendritic diameter for neurogliaform cells (under baseline conditions and in D-AP5) in Fig S2h-k. We did not observe any significant linear correlations, other than between amplitude nonlinearity and dendrite diameter post D-AP5. This does not negate the possibility that the significant difference in average dendritic diameters between neurogliaform and OLM cells contributes to differences in impedance (which we have rephrased as “Among possible explanations is that the local dendritic impedance is greater in neurogliaform cells, lowering the threshold for recruitment of regenerative currents”).

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      The difference in average dendritic diameters between OLM and neurogliaform cells is highly significant (Fig. 8q, P<0.001). We do not claim that dendritic diameter (and by implication local impedance) is the only determinant of the degree of non-linearity. The suggestion that a gradient of glutamate receptor density contributes is interesting. However, the results of uncaging experiments targeting more distal OLM dendrites of similar diameter as neurogliaform dendrites would be subject to numerous confounds, not least the very different electrotonic attenuation, likely differences in various active conductances, and the presence of spines in OLM dendrites (which are generally sparse and were not reliably imaged in our experiments). Moreover, the cell would have to remain patched for longer in order for the fluorescent dyes to invade the distal dendrites. This alone could potentially result in systematic biases among groups. We now cite Guirrado et al (2014) and Gramuntell et al (2021) to highlight that factors other than dendritic diameter per se, such as inhomogeneity in spine and NMDA receptor density may also contribute to the heterogeneity of nonlinear summation in OLM cells.

      (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.

      The same response to point 1) above applies to OLM cells. As with neurogliaform cells, mean OLM baseline asynchronous (separate response) amplitudes were <0.5 mV, making it very difficult to record an isolated NMDAR-mediated uEPSC. Having said that, NMDARs do contribute to EPSCs elicited by stimulation of multiple afferents (e.g. Booker et al, 2021). We do not claim that dendritic calcium transients cannot be elicited following activation of NMDARs in OLM cells. We simply reported that the evoked uEPSPs, designed to approximate individual synaptic signals, were sub-threshold for detectable dendritic calcium signals under conditions that were suprathreshold in neurogliaform cells. The statement has been amended to specify that there were no detectable signals under our recording conditions. There is no evidence presented in the manuscript to suggest that OLM NMDARs are calcium impermeable and indeed no such claim was made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There is a large variability in the observed dendritic nonlinearity, in NDNF IN dendrites e.g. the uEPSP amplitude nonlinearity measure varies from as low as 10-20% to over 200%. As only single dendrites were recorded from each IN, it is unclear if this variability is among the cells or between individual dendrites. While the authors analyzed some potential factors, such as distance along the dendrites, branch order, or response magnitude (amplitude and integral), they did not find any substantial correlation. It remains open if different dendrites of NDNF INs, located in the str. moleculare vs. those in or projecting towards str. radiatum, have divergent properties. Similarly, for SOM INs an important question is if axon-carrying dendrites show distinct properties.

      In this context, it would be interesting to see not only values for the mean nonlinearity but also the maximal nonlinearity and its distribution.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in Fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify the analysis method. We did not address specifically whether dendrites projecting in different directions behaved differently. This is an interesting question beyond the scope of this study. Nor did we compare axon-carrying OLM dendrites to other dendrites.

      Figures:

      Figure 1: The gray line in plots g and h is not explained. While it looks like an identity line, the legend in plot i ("asynchronous") interferes.

      In plots g and h the gray line is the line of identity. In plot i it is an estimate of the linear summation. In plot i it is not the line of identity as it does not start at the origin with a slope of 1. The figure legend has been amended to clarify.

      In the same panels (Figure 1g,h, and subsequent figures) consider changing the title from "soma (voltage)" to uEPSP.

      The titles have been amended.

      In panel Figure 1i note the missing "(" in the title.

      Title amended.

      In panel Figure 1h: Shouldn't the X-axis label and legend text read "Arithmetic sum of (EPSP) integrals" instead of "Integral of arithmetic sum").

      The wording more accurately reflects the analytical operations. The asynchronous (separate) responses were summed arithmetically first, and then the integral was taken of each cumulative sum. We have therefore left the axis title and legend unchanged.

      Figure 2a,c: Could you please describe how the scaling was performed for the two axes?

      Method section amended.

      In the same panels (Figure 2a,c, and subsequent figures), the legend seems to be misleading: the plot is NS Amplitude/Integral vs Arithmetic sum, and the black line is the identity line (or scaled interpolation of the arithmetic sum, which is essentially the same).

      The scaled arithmetic sums (uEPSP amplitude, integral) represent linear summation and so overlap with the line of identity. The interpolation estimate of the asynchronous (separate) calcium transient response does not overlap with the line of identity as this estimate does not start at the origin with a slope of 1. The legends throughout the manuscript have been amended to clarify this.

      Figure 2b,d,f (and subsequent figures) slope plots: Please indicate that this is the average amplitude supralinearity for the individual recorded dendrites. Note here that the Results text mentions only the average amplitude supralinearity, but not the slop plots, paired mean difference, or Gardner-Altman estimation, illustrated in the figures.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify.

      Fig 2e: The legend (both text and figure, also in the following figures) is confusing, as the gray line and diamonds are defined as separate 12(?) responses, but it seems to represent a linear interpolation of the scaled arithmetic sums (ultimately nothing else but an identity line).

      The grey line shows the linear interpolation output between the calcium transient measurements at 1 uncaging location and at 12 uncaging locations. The 12th uncaging location is indicated in the key as “separate 12”. The linear interpolation in these plots does represent linear summation but is not the line of identity as it does not begin at the origin and does not have a slope of 1.

      Reviewer #2 (Recommendations for the authors):

      This study is well-developed and technically executed. I only have minor comments for the authors:

      (1) To target NDNF+ neurons, the authors use the NDNF-Cre mouse line and a Cre-dependent AAV using the mDLX promotor. Why the mDLX promotor? Would it have been sufficient to use any Cre-dependent fluorophore?

      Pilot experiments revealed leaky expression when a virus driving flexed ChR2 under a non-specific promoter (EF1a) was injected in the neocortex of Ndnf-Cre mice (Author response image 1). In our hands, and in line with Dimidschtein et al (2016),  the use of the mDLX enhancer reduced off-target expression.

      Author response image 1.

      A. AAV2/5-EF1a-DIO-hChR2(H134R)-mCherry injected into superficial neocortex of Ndnf-Cre mice led to expression in a few pyramidal neurons in addition to layer 1 neurogliaform cells. B. Patch-clamp recording from a non-labelled pyramidal cell showed that an optogenetically evoked glutamatergic current remained after blockade of GABAA and GABAB receptors, further confirming limited specificity of expression of ChR2. (Data from M Muller, M Mercier and V Magloire, Kullmann lab.)

      (2) The distance of the uncaring sites from the soma plays a key role. The authors should indicate the mean distance of the cluster and its variance.

      Uncaging distance from soma is indicated for both NGF and OLM interneurons in the supplementary figures S2 and S3 respectively.

      (3) Martina et al., in Science 2000, showed high levels of Na+ channels in the dendrites of OLM cells and hinted that spikes could occur in them. The authors should discuss this possible discrepancy.

      Discussion amended.

      (4) Looking at Figure 1d, the EPSPs look exceptionally long-lasting, longer than those observed by stimulating axonal inputs. Could this indicate spill-over excitation? If so, how could this affect the outcome of this study?

      The asynchronous (separate responses) decay to baseline within 100 ms, similar to the neurogliaform EPSPs evoked by electrical stimulation of axons in the SLM in Mercier et al. 2022. We observed clear plateau potentials in a minority of cells (e.g. Fig. S1b). Such plateau potentials can be generated by dendritic calcium channels and we do not consider that glutamate spillover needs to be invoked to account for them.

      (5) In the legend of Figure 2: "n=11 dendrites in 11 cells from 9 animals". Why do the authors only study 11 dendrites from 11 cells? Isn't it possible to repeatedly stimulate clusters of synaptic inputs onto the same cells? In principle, could one test many dendrites of the same cell at different distances from the soma? It is also remarkable that there were very few cells per animal.

      The goal always was to record from as many dendrites as possible from the same cells whilst maintaining high standards of cell health. When cell health indicators such as blebbing, input resistance change or resting voltage change were detected, no further dendritic location could be tested with reasonable confidence. In a given 400 um slice there would be relatively few healthy candidate cells at a suitable depth to attempt to patch-clamp.

    1. eLife assessment

      This manuscript provides valuable information about the genesis of CPSF6 condensates due to HIV-1 infection. However, the evidence is incomplete as it is missing more functional assays. Furthermore, some data on the fusion between CPSF6 Aggregates and SC35 speckles are not novel.

    2. Reviewer #1 (Public review):

      In recent years, our understanding of the nuclear steps of the HIV-1 life cycle has made significant advances. It has emerged that HIV-1 completes reverse transcription in the nucleus and that the host factor CPSF6 forms condensates around the viral capsid. The precise function of these CPSF6 condensates is under investigation, but it is clear that the HIV-1 capsid protein is required for their formation. This study by Tomasini et al. investigates the genesis of the CPSF6 condensates induced by HIV-1 capsid, what other co-factors may be required, and their relationship with nuclear speckels (NS). The authors show that disruption of the condensates by the drug PF74, added post-nuclear entry, blocks HIV-1 infection, which supports their functional role. They generated CPSF6 KO THP-1 cell lines, in which they expressed exogenous CPSF6 constructs to map by microscopy and pull down assays of the regions critical for the formation of condensates. This approach revealed that the LCR region of CPSF6 is required for capsid binding but not for condensates whereas the FG region is essential for both. Using SON and SRRM2 as markers of NS, the authors show that CPSF6 condensates precede their merging with NS but that depletion of SRRM2, or SRRM2 lacking the IDR domain, delays the genesis of condensates, which are also smaller.

      The study is interesting and well conducted and defines some characteristics of the CPSF6-HIV-1 condensates. Their results on the NS are valuable. The data presented are convincing.

      I have two main concerns. Firstly, the functional outcome of the various protein mutants and KOs is not evaluated. Although Figure 1 shows that disruption of the CPSF6 puncta by PF74 impairs HIV-1 infection, it is not clear if HIV-1 infection is at all affected by expression of the mutant CPSF6 forms (and SRRM2 mutants) or KO/KD of the various host factors. The cell lines are available, so it should be possible to measure HIV-1 infection and reverse transcription. Secondly, the authors have not assessed if the effects observed on the NS impact HIV-1 gene expression, which would be interesting to know given that NS are sites of highly active gene transcription. With the reagents at hand, it should be possible to investigate this too.

    3. Reviewer #2 (Public review):

      Summary:

      HIV-1 infection induces CPSF6 aggregates in the nucleus that contain the viral protein CA. The study of the functions and composition of these nuclear aggregates have raised considerable interest in the field, and they have emerged as sites in which reverse transcription is completed and in the proximity of which viral DNA becomes integrated. In this work, the authors have mutated several regions of the CPSF6 protein to identify the domains important for nuclear aggregation, in addition to the already-known FG region; they have characterized the kinetics of fusion between CPSF6 aggregates and SC35 nuclear speckles and have determined the role of two nuclear speckle components in this process (SRRM2, SUN2).

      Strengths:

      The work examines systematically the domains of CPSF6 of importance for nuclear aggregate formation in an elegant manner in which these mutants complement an otherwise CPSF6-KO cell line. In addition, this work evidences a novel role for the protein SRRM2 in HIV-induced aggregate formation, overall advancing our comprehension of the components required for their formation and regulation.

      Weaknesses:

      Some of the results presented in this manuscript, in particular the kinetics of fusion between CPSF6-aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983).

      The observations of the different effects of CPSF6 mutants, as well as SRRM2/SUN2 silencing experiments are not complemented by infection data which would have linked morphological changes in nuclear aggregates to function during viral infection. More importantly, these functional data could have helped stratify otherwise similar morphological appearances in CPSF6 aggregates.

      Overall, the results could be presented in a more concise and ordered manner to help focus the attention of the reader on the most important issues. Most of the figures extend to 3-4 different pages and some information could be clearly either aggregated or moved to supplementary data.

    4. Reviewer #3 (Public review):

      In this study, the authors investigate the requirements for the formation of CPSF6 puncta induced by HIV-1 under a high multiplicity of infection conditions. Not surprisingly, they observe that mutation of the Phe-Gly (FG) repeat responsible for CPSF6 binding to the incoming HIV-1 capsid abrogates CPSF6 punctum formation. Perhaps more interestingly, they show that the removal of other domains of CPSF6, including the mixed-charge domain (MCD), does not affect the formation of HIV-1-induced CPSF6 puncta. The authors also present data suggesting that CPSF6 puncta form individual before fusing with nuclear speckles (NSs) and that the fusion of CPSF6 puncta to NSs requires the intrinsically disordered region (IDR) of the NS component SRRM2. While the study presents some interesting findings, there are some technical issues that need to be addressed and the amount of new information is somewhat limited. Also, the authors' finding that deletion of the CPSF6 MCD does not affect the formation of HIV-1-induced CPSF6 puncta contradicts recent findings of Jang et al. (https://doi.org/10.1093/nar/gkae769).

    5. Author response:

      We would like to extend our sincere thanks to you and reviewers at eLife for their thoughtful handling of our manuscript and their valuable feedback, which will greatly improve our study.

      We are committed to performing the additional experiments as recommended by the reviewers. However, we would like to clarify our study's focus. 

      The novelty of our study lies in the highlights of our manuscript:

      • The formation of HIV-induced CPSF6 puncta is critical for restoring HIV-1 nuclear reverse transcription (RT).

      • CPSF6 protein lacking the FG peptide cannot bind to the viral core, thereby failing to form HIVinduced CPSF6 puncta.

      • The FG peptide, rather than low-complexity regions (LCRs) or the mixed charge domains (MCDs) of the CPSF6 protein, drives the formation of HIV-induced CPSF6 puncta.

      • HIV-induced CPSF6 puncta form individually and later fuse with nuclear speckles (NS) via the intrinsically disordered region (IDR) of SRRM2.

      By focusing on these processes, we believe our study provides a critical perspective on the molecular interactions that mediate the formation of HIV-induced CPSF6 puncta and broadens the understanding of how HIV manipulates host nuclear architecture.

      Public Reviews: 

      Reviewer #1 (Public review): 

      In recent years, our understanding of the nuclear steps of the HIV-1 life cycle has made significant advances. It has emerged that HIV-1 completes reverse transcription in the nucleus and that the host factor CPSF6 forms condensates around the viral capsid. The precise function of these CPSF6 condensates is under investigation, but it is clear that the HIV-1 capsid protein is required for their formation. This study by Tomasini et al. investigates the genesis of the CPSF6 condensates induced by HIV-1 capsid, what other co-factors may be required, and their relationship with nuclear speckels (NS). The authors show that disruption of the condensates by the drug PF74, added post-nuclear entry, blocks HIV-1 infection, which supports their functional role. They generated CPSF6 KO THP-1 cell lines, in which they expressed exogenous CPSF6 constructs to map by microscopy and pull down assays of the regions critical for the formation of condensates. This approach revealed that the LCR region of CPSF6 is required for capsid binding but not for condensates whereas the FG region is essential for both. Using SON and SRRM2 as markers of NS, the authors show that CPSF6 condensates precede their merging with NS but that depletion of SRRM2, or SRRM2 lacking the IDR domain, delays the genesis of condensates, which are also smaller. 

      The study is interesting and well conducted and defines some characteristics of the CPSF6-HIV-1 condensates. Their results on the NS are valuable. The data presented are convincing. 

      I have two main concerns. Firstly, the functional outcome of the various protein mutants and KOs is not evaluated. Although Figure 1 shows that disruption of the CPSF6 puncta by PF74 impairs HIV-1 infection, it is not clear if HIV-1 infection is at all affected by expression of the mutant CPSF6 forms (and SRRM2 mutants) or KO/KD of the various host factors. The cell lines are available, so it should be possible to measure HIV-1 infection and reverse transcription. Secondly, the authors have not assessed if the effects observed on the NS impact HIV-1 gene expression, which would be interesting to know given that NS are sites of highly active gene transcription. With the reagents at hand, it should be possible to investigate this too. 

      We thank the reviewer for her/his valuable feedback on our manuscript. We are pleased to see her/his appreciation of our results, and we will do our utmost to address the highlighted points to further improve our work.

      Reviewer #2 (Public review): 

      Summary: 

      HIV-1 infection induces CPSF6 aggregates in the nucleus that contain the viral protein CA. The study of the functions and composition of these nuclear aggregates have raised considerable interest in the field, and they have emerged as sites in which reverse transcription is completed and in the proximity of which viral DNA becomes integrated. In this work, the authors have mutated several regions of the CPSF6 protein to identify the domains important for nuclear aggregation, in addition to the alreadyknown FG region; they have characterized the kinetics of fusion between CPSF6 aggregates and SC35 nuclear speckles and have determined the role of two nuclear speckle components in this process (SRRM2, SUN2). 

      Strengths: 

      The work examines systematically the domains of CPSF6 of importance for nuclear aggregate formation in an elegant manner in which these mutants complement an otherwise CPSF6-KO cell line. In addition, this work evidences a novel role for the protein SRRM2 in HIV-induced aggregate formation, overall advancing our comprehension of the components required for their formation and regulation. 

      Weaknesses: 

      Some of the results presented in this manuscript, in particular the kinetics of fusion between CPSF6aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983). 

      The observations of the different effects of CPSF6 mutants, as well as SRRM2/SUN2 silencing experiments are not complemented by infection data which would have linked morphological changes in nuclear aggregates to function during viral infection. More importantly, these functional data could have helped stratify otherwise similar morphological appearances in CPSF6 aggregates. 

      Overall, the results could be presented in a more concise and ordered manner to help focus the attention of the reader on the most important issues. Most of the figures extend to 3-4 different pages and some information could be clearly either aggregated or moved to supplementary data. 

      First, we thank the reviewer for her/his appreciation of our study and to give to us the opportunity to better explain our results and to improve our manuscript. We appreciate the reviewer’s positive feedback on our study, and we will do our best to address her/his concerns. In the meantime, we would like to clarify the focus of our study. Our research does not aim to demonstrate an association between CPSF6 condensates (we use the term "condensates" rather than "aggregates," as aggregates are generally non-dynamic (Alberti & Hyman, 2021; Banani et al., 2017), and our work specifically examines the dynamic behavior of CPSF6 during infection, as shown in Scoca et al., JMCB 2022) and SC35 nuclear speckles. This association has already been established in previous studies, as noted in the manuscript.

      About the point highlighted by the reviewer: "Kinetics of fusion between CPSF6-aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983)."

      Our study differs from prior work PMID 32665593 because we utilize a full-length HIV genome and we did not follow the integrase (IN) fluorescence in trans and its association with CPSF6 but we specifically assess if CPSF6 clusters form in the nucleus independently of NS factors and next to fuse with them. In the current study we evaluated the dynamics of formation of CPSF6/NS puncta, which it has not been explored before. Given this focus, we believe that our work offers a novel perspective on the molecular interactions that facilitate HIV / CPSF6-NS fusion.

      For better clarity, we would like to specify that our study focuses on the role of SON, a scaffold factor of nuclear speckles, rather than SUN2 (SUN domain-containing protein 2), which is a component of the LINC (Linker of Nucleoskeleton and Cytoskeleton) complex.

      As suggested by the reviewer, we will keep key information in the main figure and move additional details to the supplementary material.

      Reviewer #3 (Public review): 

      In this study, the authors investigate the requirements for the formation of CPSF6 puncta induced by HIV-1 under a high multiplicity of infection conditions. Not surprisingly, they observe that mutation of the Phe-Gly (FG) repeat responsible for CPSF6 binding to the incoming HIV-1 capsid abrogates CPSF6 punctum formation. Perhaps more interestingly, they show that the removal of other domains of CPSF6, including the mixed-charge domain (MCD), does not affect the formation of HIV-1-induced CPSF6 puncta. The authors also present data suggesting that CPSF6 puncta form individual before fusing with nuclear speckles (NSs) and that the fusion of CPSF6 puncta to NSs requires the intrinsically disordered region (IDR) of the NS component SRRM2. While the study presents some interesting findings, there are some technical issues that need to be addressed and the amount of new information is somewhat limited. Also, the authors' finding that deletion of the CPSF6 MCD does not affect the formation of HIV-1-induced CPSF6 puncta contradicts recent findings of Jang et al. (doi.org/10.1093/nar/gkae769). 

      We thank the reviewer for her/his thoughtful feedback and the opportunity to elaborate on why our findings provide a distinct perspective compared to those of Jang et al. (doi.org/10.1093/nar/gkae769), while aligning with the results of Rohlfes et al. (doi.org/10.1101/2024.06.20.599834).

      One potential reason for the differences between our findings and those of Jang et al. could be the choice of experimental systems. Jang et al. conducted their study in HEK293T cells with CPSF6 knockouts, as described in Sowd et al., 2016 (doi.org/10.1073/pnas.1524213113). In contrast, our work focused on macrophage-like THP-1 cells, which share closer characteristics with HIV-1’s natural target cells. 

      Our approach utilized a complete CPSF6 knockout in THP-1 cells, enabling us to reintroduce untagged versions of CPSF6, such as wild-type and deletion mutants, to avoid potential artifacts from tagging. Jang et al. employed HA-tagged CPSF6 constructs, which may lead to subtle differences in experimental outcomes due to the presence of the tag.

      Finally, our investigation into the IDR of SRRM2 relied on CRISPR-PAINT to generate targeted deletions directly in the endogenous gene (Lester et al., 2021, DOI: 10.1016/j.neuron.2021.03.026). This approach provided a native context for studying SRRM2’s role.

      We will incorporate these clarifications into the discussion section of the revised manuscript.

    1. eLife Assessment

      This fundamental work demonstrates that compartmentalized cellular metabolism is a dominant input into cell size control in a variety of mammalian cell types and in Drosophila. The authors show that increased pyruvate import into the mitochondria in liver-like cells and in primary hepatocytes drives gluconeogenesis but reduces cellular amino acid production, suppressing protein synthesis. The evidence supporting the conclusions is compelling, with a variety of genetic and pharmacologic assays rigorously testing each step of the proposed mechanism. This work will be of interest to cell biologists, physiologists, and researchers interested in cell metabolism, and is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

    4. Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

    5. Author response:

      Reviewer #1 (Public review):

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNA-seq analyses in whole fat body tissues expressing MPC. These methods revealed hyperactivation of mTORC1 and Myc signaling in fat body cells expressing MPC in Drosophila, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses inappropriate reductions in cell size and activates signaling pathways to promote cell growth. The best characterized “sizer” pathway for mammalian cells is the CycD/CDK4 complex which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTOR pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those over-expressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc are epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. TAG abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We will carefully revise the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation.

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impacts nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author response image 1A). However, the reduction in nuclear size became significant only after 36 hours of MPC expression (Author response image 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author response image 1C and D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells. A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>²</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R<sup>²</sup> value.

      This image highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass.

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      We appreciate this valuable suggestion. In the revised manuscript, we will quantify trehalose abundance in circulation and within fat bodies. As described in the Methods section, following the approach outlined in Ugrankar-Banerjee et al., 2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. We will apply this methodology to include the trehalose measurements as part of our updated analysis.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we will quantify the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR and validate the corresponding data in Supplementary Figure 8f through western blot analysis.

      Additionally, we will assess the efficiency of pcb, pdha, dlat, pepck2, and Got2 manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

    1. eLife Assessment

      This work is of fundamental significance and has a compelling level of evidence for the role of mutant p53 in regulation of tumorigenesis using an in vivo mouse model. The study is well-conducted and will be of interest to a broad audience including those interested in p53, transcription factors and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Toledo and colleagues describes the generation and characterization of Y220C mice (Y217C in the mouse allele). The authors make notable findings: Y217C mice that have been backcrossed to C57Bl/6 for five generations show decreased female pup births due to exencephaly, a known defect in p53 -/- mice, and they show a correlation with decreased Xist expression, as well increased female neonatal death. They also noted similar tumor formation in Y217C/+ and p53 +/- mice, suggesting that Y217C may not function as a dominant negative. Notably, the authors find that homozygous Y217C mice die faster than p53 -/- mice and that the lymphomas in the Y217C mice were more aggressive and invasive. The authors then perform RNA seq on thymi of Y217C homozygotes compared to p53 -/-, and they suggest that these differentially expressed genes may explain the increased tumorigenesis in Y217C mice.

      Strengths:

      Overall, the study is well controlled and quite well done and will be of interest to a broad audience, particularly given the high frequency of the Y220C mutation in cancer (1% of all cancers, 4% of ovarian cancer).

      Weaknesses:

      No weaknesses were noted by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Jaber et al. describe the generation and characterization of a knock-in mouse strain expressing the p53 Y217C hot-spot mutation. While the homozygous mutant cells and mice reflect the typical loss-of-p53 functions, as expected, the Y217C mutation also appears to display gain-of-function (GOF) properties, exemplified by elevated metastasis in the homozygous context (as noted with several hot-spot mutations). Interestingly, this mutation does not appear to exhibit any dominant-negative effects associated with most hot-spot p53 mutations, as determined by the absence of differences in overall survival and tumor predisposition of the heterozygous mice, as well as target gene activation upon nutlin treatment.

      In addition, the authors noted a severe reduction in the female 217/217 homozygous progeny, significantly more than that observed with the p53 null mice, due to exencephaly, leading them to conclude that the Y217C mutation also has additional, non-cancer-related GOFs. Though this property has been well described and attributed to p53 functional impairment, the authors conclude that the Y217C has additional properties in accelerating the phenotype.

      Transcriptomic analyses of thymi found additional gene signature differences between the p53 null and the Y217C strains, indicative of novel target gene activation, associated with inflammation.

      Strengths:

      Overall, the characterisation of the mice highlights the expected typical outcomes associated with most hot-spot p53 mutations published earlier. The quality of the work presented is well done and good, and the conclusions and reasonably well justified.

      Weaknesses:

      The manuscript would benefit from the provision of additional data to strengthen the claims made, as follows:

      (1) Oncogenic GOF - the main data shown for GOF are the survival curve and enhanced metastasis. Often, GOF is exemplified at the cellular level as enhanced migration and invasion, which are standard assays to support the GOF. As such, the authors should perform these assays using either tumor cells derived from the mice or transformed fibroblasts from these mice. This will provide important and confirmatory evidence for GOF for Y217C.

      (2) Novel target gene activation - while a set of novel targets appears to be increased in the Y217C cells compared to the p53 null cells, it is unclear how they are induced. The authors should examine if mutant p53 can bind to their promoters through CHIP assays, and, if these targets are specific to Y217C and not the other hot-spot mutations. This will strengthen the validity of the Y217C's ability to promote GOF.

      (3) Dominant negative effect - the authors' claim of lack of DN effect needs to be strengthened further, as most p53 hot-spot mutations do exhibit DN effect. At the minimum, the authors should perform additional treatment with nutlin and gamma irradiation (or cytotoxic/damaging agents) and examine a set of canonical p53 target genes by qRT-PCR to strengthen their claim.

    1. eLife Assessment

      This manuscript provides potentially important findings examining in 2D and 3D models in MYC liver cancer cells changes in DNA repair genes and programs in response to hypoxia. The authors use convincing methodology in most cases, but there is some concern that the analysis is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this report, the authors made use of a murine cell life derived from a MYC-driven liver cancer to investigate the gene expression changes that accompany the switch from normoxic to hypoxia conditions during 2D growth and the switch from 2D monolayer to 3D organoid growth under normoxic conditions. They find a significant (ca. 40-50%) overlap among the genes that are dysregulated in response to hypoxia in 2D cultures and in response to spheroid formation. Unsurprisingly, hypoxia-related genes were among the most prominently deregulated under both sets of conditions. Many other pathways pertaining to metabolism, splicing, mitochondrial electron transport chain structure and function, DNA damage recognition/repair, and lipid biosynthesis were also identified.

      Major comments:

      (1) Lines 239-240: The authors state that genes involved in DNA repair were identified as being necessary to maintain survival of both 2D and 3D cultures (Figure S6A). Hypoxia is a strong inducer of ROS. Thus, the ROS-specific DNA damage/recognition/repair pathways might be particularly important. The authors should look more carefully at the various subgroups of the many genes that are involved in DNA repair. They should also obtain at least a qualitative assessment of ROS and ROS-mediated DNA damage by staining for total and mitochondrial-specific ROS using dyes such as CM-H2-DCFDA and MitoSox. Actual direct oxidative damage could be assessed by immunostaining for 8-oxo-dG and related to the sub-types of DNA damage-repair genes that are induced. The centrality of DNA damage genes also raises the question as to whether the previously noted prominence of the TP53 pathway (see point 5 below) might represent a response to ROS-induced DNA damage.

      (2) Because most of the pathway differences that distinguish the various cell states from one another are described only in terms of their transcriptome variations, it is not always possible to understand what the functional consequences of these changes actually are. For example, the authors report that hypoxia alters the expression of genes involved in PDH regulation but this is quite vague and not backed up with any functional or empirical analyses. PDH activity is complex and regulated primarily via phosphorylation/dephosphorylation (usually mediated by PDK1 and PDP2, respectively), which in turn are regulated by prevailing levels of ATP and ADP. Functionally, one might expect that hypoxia would lead to the down-regulation of PDH activity (i.e. increased PDH-pSer392) as respiration changes from oxidative to non-oxidative. This would not be appreciated simply by looking at PDH transcript levels. This notion could be tested by looking at total and phospho-PDH by western blotting and/or by measuring actual PDH activity as it converts pyruvate to AcCoA.

      (3) Line 439: Related to the above point: the authors state: "It is likely that blockade of acetyl-CoA production by PDH knockout may force cells to use alternative energy sources under hypoxic and 3D conditions, averting the Warburg effect and promoting cell survival under limited oxygen and nutrient availability in 3D spheroids." This could easily be tested by determining whether exogenous fatty acids are more readily oxidized by hypoxic 2D cultures or spheroids than occurs in normoxic 2D cultures.

      (4) Line 472: "Hypoxia induces high expression of Acaca and Fasn in NEJF10 cells indicating that hypoxia promotes saturated fatty acid synthesis...The beneficial effect of Fasn and Acaca KO to NEJF10 under hypoxia is probably due to reduction of saturated fatty acid synthesis, and this hypothesis needs to be tested in the future.". As with the preceding comment, this supposition could readily be supported directly by, for example, performing westerns blots for these enzymes and by showing that incubation of hypoxic 2D cells or spheroids converted more AcCoA into lipid.

      (5) In Supplementary Figure 2B&C, the central hub of the 2D normoxic cultures is Myc (as it should well be) whereas, in the normoxic 3D, the central hub is TP53 and Myc is not even present. The authors should comment on this. One would assume that Myc levels should still be quite high given that Myc is driven by an exogenous promoter. Does the centrality of TP53 indicate that the cells within the spheroids are growth-arrested, being subjected to DNA damage and/or undergoing apoptosis?

      (6) In the Materials and Methods section (lines 711-720), the description of how spheroid formation was achieved is unclear. Why were the cells first plated into non-adherent 96 well plates and then into non-adherent T75 flasks? Did the authors actually utilize and expand the cells from 144 T75 flasks and did the cells continue to proliferate after forming spheroids? Many cancer cell types will initially form monolayers when plated onto non-adherent surfaces such as plastic Petri dishes and will form spheroid-like structures only after several days. Other cells will only aggregate on the "non-adherent" surface and form spheroid-like structures but will not actually detach from the plate's surface. Have the authors actually documented the formation of true, non-adherent spheroids at 2 days and did they retain uniform size and shape throughout the collection period? The single photo in Supplementary Figure 1 does not explain when this was taken. The authors include a schematic in Figure 2A of the various conditions that were studied. A similar cartoon should be included to better explain precisely how the spheroids were generated and clarify the rationale for 96 well plating. Overall, a clearer and more concise description of how spheroids were actually generated and their appearance at different stages of formation needs to be provided.

      (7) The authors maintained 2D cultures in either normoxic or hypoxic (1% O2) states during the course of their experiments. On the other hand, 3D cultures were maintained under normoxic conditions, with the assumption that the interiors of the spheroids resemble the hypoxic interiors of tumors. However, the actual documentation of intra-spheroid hypoxia is never presented. It would be a good idea for the authors to compare the degree of hypoxia achieved by 2D (1% O2) and 3D cultures by staining with a hypoxia-detecting dye such as Image-iT Green. Comparing the fluorescence intensities in 2D cultures at various O2 concentrations might even allow for the construction of a "standard curve" that could serve to approximate the actual internal O2 concentration of spheroids. This would allow the authors to correlate the relative levels of hypoxia between 2D and 3D cultures.

      (8) Related to the previous 2 points, the authors performed RNAseq on spheroids only 48 hours after initiating 3D growth. I am concerned that this might not have been a sufficiently long enough time for the cells to respond fully to their hypoxic state, especially given my concerns in Point 6. Might the results have been even more robust had the authors waited longer to perform RNA seq? Why was this short time used?

      (9) What happens to the gene expression pattern if spheroids are re-plated into standard tissue culture plates after having been maintained as spheroids? Do they resume 2D growth and does the gene expression pattern change back?

      (10) Overall, the paper is quite descriptive in that it lists many gene sets that are altered in response to hypoxia and the formation of spheroids without really delving into the actual functional implications and/or prioritizing the sets. Some of these genes are shown by CRISPR screening to be essential for maintaining viability although in very few cases are these findings ever translated into functional studies (for example, see points 1-4 above). The list of genes and gene pathways could benefit from a better explanation and prioritization of which gene sets the authors believe to be most important for survival in response to hypoxia and for spheroid formation.

      (11) The authors used a single MYC-driven tumor cell line for their studies. However, in their original paper (Fang, et al. Nat Commun 2023, 14: 4003.) numerous independent cell lines were described. It would help to know whether RNAseq studies performed on several other similar cell lines gave similar results in terms of up & down-regulated transcripts (i.e. representative of the other cell lines are NEJF10 cells).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Fang et al., provides a tour-de-force study uncovering cancer cell's varied dependencies on several gene programs for their survival under different biological contexts. The authors addressed genomic differences in 2D vs 3D cultures and how hypoxia affects gene expression. They used a Myc-driven murine liver cancer model grown in 2D monolayer culture in normoxia and hypoxia as well as cells grown as 3D spheroids and performed CRISPR-based genome-wide KO screen to identify genes that play important roles in cell fitness. Some context-specific gene effects were further validated by in-vitro and in-vivo gene KO experiments.

      Strengths:

      The key findings in this manuscript are:

      (1) Close to 50% of differentially expressed genes were common between 2D Hypoxia and 3D spheroids conditions but they had differences in chromatin accessibility.<br /> (2) VHL-HIF1a pathway had differential cell fitness outcomes under 2D normoxia vs 2D hypoxia and 3D spheroids.<br /> (3) Individual components of the mitochondrial respiratory chain complex had contrasting effects on cell fitness under hypoxia.<br /> (4) Knockout of organogenesis or developmental pathway genes led to better cell growth specifically in the context of 3D spheroids and knockout of epigenetic modifiers had varied effects between 2D and 3D conditions.<br /> (5) Another key program that leads to cells fitness outcomes in normoxia vs hypoxia is the lipid and fatty acid metabolism.<br /> (6) Prmt5 is a key essential gene under all growth conditions, but in the context of 3D spheroids even partial loss of Prmt5 has a synthetic lethal effect with Mtap deletion and Mtap is epigenetically silenced specifically in the 3D spheroids.

      Issues to address:

      (1) The authors should clarify the link between the findings of the enrichment of TGFb-SMAD signaling REACTOME pathway to the findings that knocking out TGFb-SMAD pathway leads to better cell fitness outcomes for cells in the 3D growth conditions.

      (2) Supplementary Figure 4C has been cited in the text but doesn't exist in the supplementary figures section.

      (3) A small figure explaining this ABC-Myc driven liver cancer model in Supplementary Figure 1 would be helpful to provide context.

      (4) The method for spheroids formation is not found in the method section.

      (5) In Supplementary Figure 1b, the comparisons should be stated the opposite way - 3D vs 2D normoxia and 2D-Hypoxia vs 2D-Normoxia.

      (6) There are typos in the legend for Supplementary Figure 10.

      (7) Consider putting Supplementary Figure 1b into the main Figure 1.

      (8) Please explain only one timepoint (endpoint) for 3D spheroids was performed for the CRISPR KO screen experiment, while several timepoints were done for 2D conditions? Was this for technical convenience?

      (9) In line 372, it is indicated that Bcor KO (Fig 5e) had growth advantage - this was observed in only one of the gRNA -- same with Kmt2d KO in the same figure where there was an opposite effect. Please justify the use of only one gRNA.

      (10) Why was CRISPR based KO strategy not used for the PRMT5 gene but rather than the use of shRNA.? Note that one of the shRNA for PRMT5 had almost no KO (PRMT5-shRNA2 Figure 7B) but still showed phenotype (Figure 7D) - please explain.

      (11) In Figure 7D, which samples (which shRNA group) were being compared to do the t-test?

      (12) In line 240, it is stated that oxphos gene set is essential for NEJF10 cell survival in both normoxia and hypoxia conditions. But shouldn't oxphos be non-essential in hypoxia as cells move away from oxphos and become glycolytic?

      (13) In line 485 it is mentioned that Pmvk and Mvd genes which are involved in cholesterol synthesis when knocked out had a positive effect on cell growth in 3D conditions and since cholesterol synthesis is essential for cell growth how does this not matter much in the context of 3D - please explain.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Fang et al. systematically investigate the effects of culture conditions on gene expression, genome architecture, and gene dependency. To do this, they cultivate the murine HCC line NEJF10 under standard culture conditions (2D), then under similar conditions but under hypoxia (1% oxygen, 2D hypoxia) and under normoxia as spheroids (3D). NEJF10 was isolated from a marine HCC model that relies exclusively on MYC as a driver oncogene. In principle, (1) RNA-seq, (2) ATAC-seq and (3) genetic screens were then performed in this isogenic system and the results were systematically compared in the three cultivation methods. In particular, genome-wide screens with the CRISPR library Brie were performed very carefully. For example, in the 2D conditions, many different time points were harvested to control the selection process kinetically. The authors note differential dependencies for metabolic processes (not surprisingly, hypoxia signaling is affected) such as the regulation and activity of mitochondria, but also organogenesis signaling and epigenetic regulation.

      Strengths:

      The topic is interesting and relevant and the experimental set-up is carefully chosen and meaningful. The paper is well written. While the study does not reveal any major surprises, the results represent an important resource for the scientific community.

      Weaknesses:

      However, this presupposes that the statistical analysis and processing are carried out very carefully, and this is where my main suggestions for revision begin. Firstly, I cannot find any information on the number of replicates in RNA- and ATAC-seq. This should be clearly stated in the results section and figure legends and cut-offs, statistical procedures, p-values, etc. should be mentioned as well. In principle, all NGS experiments (here ATAC- and RNA-seq) should be performed in replicates (at least duplicates, better triplicates) or the results should be validated by RT-PCR in independent biological triplicates. Secondly, the quantification of the analyses shown in the figures and especially in the legends is not sufficiently careful. Units are often not mentioned. Example Figure 4a: The legend says: 'gRNA reads' but how can the read count be -1? I would guess these are FC, log2FC, or Z-values. All figure legends need careful revision.

      Furthermore, I would find a comparison of the sgRNA abundances at the earliest harvesting time with the distribution in the library interesting, to see whether and to what extent selection has already taken place before the three culture conditions were established (minor point).

    1. eLife Assessment

      Overall, this fundamental study identified a novel role of NOLC1 in regulating p53 nuclear transcriptional activity and p53-mediated ferroptosis in gastric cancer. The evidence supporting the conclusions is solid, although some new evidence is needed to make it more robust. The work will be of broad interest to cancer biologists and oncologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors identified that NOLC1 was upregulated in gastric cancer samples, which promoted cancer progression and cisplatin resistance. They further found that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis. There are several major concerns regarding the conclusions.

      Strengths:

      This study identified that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis in gastric cancer.

      Weaknesses:

      The major conclusions were not sufficiently supported by the results. The experiments were not conducted in a comprehensive manner.

    3. Reviewer #2 (Public review):

      Summary:

      Shengsheng Zhao et al. investigated the role of nucleolar and coiled-body phosphoprotein 1 (NOLC1) in relegating gastric cancer (GC) development and cisplatin-induced drug resistance in GC. They found a significant correlation between high NOLC1 expression and the poor prognosis of GC. Meanwhile, upregulation of NOLC1 was associated with cis-resistant GC. Experimentally, the authors demonstrate that knocking down NOLC1 increased GC sensitivity to Cis possibly by regulating ferroptosis. Mechanistically, they found NOLC1 suppressed ferroptosis by blocking the translocation of P53 from the cytoplasm to the nucleus and promoting its degradation. In addition, The authors also evaluated the effect of combinational treatment of anti-PD-1 and cisplatin in NOLC1 -knockdown tumor cells, revealing a potential role of NOLC1 in the targeted therapy for GC.

      Strengths:

      Chemoresistance is considered a major reason causing failure of tumor treatment and death of cancer patients. This paper explored the role of NOLC1 in the regulation of Cis-mediated resistance, which involves a regulated cell death named ferroptosis. These findings provide more evidence highlighting the study of regulated cell death to overcome drug resistance in cancer treatment, which could give us more potential strategies or targets for combating cancer.

      Weaknesses:

      More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored. Besides, the experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have put forth a compelling argument that NOLC1 is indispensable for gastric cancer resistance in both in vivo and in vitro models. They have further elucidated that NOLC1 silencing augments cisplatin-induced ferroptosis in gastric cancer cells. The mechanistic underpinning of their findings suggests that NOLC1 modulates the p53 nuclear/plasma ratio by engaging with the p53 DNA Binding Domain, which in turn impedes p53-mediated transcriptional regulation of ferroptosis. Additionally, the authors have shown that NOLC1 knockdown triggers the release of ferroptosis-induced damage-associated molecular patterns (DAMPs), which activate the tumor microenvironment (TME) and enhance the efficacy of the anti-PD-1 and cisplatin combination therapy.

      Strengths:

      The manuscript presents a robust dataset that substantiates the authors' conclusion. They have identified NOLC1 as a potential oncogene that confers resistance to immuno-chemotherapy in gastric cancer through the mediation of ferroptosis and subsequent TME reprogramming. This discovery positions NOLC1 as a promising therapeutic target for gastric cancer treatment. The authors have delineated a novel mechanistic pathway whereby NOLC1 suppresses p53 transcriptional functions by reducing its nuclear/plasma ratio, underscoring the significance of p53 nuclear levels in tumor suppression over total protein levels.

      Weaknesses:

      While the overall findings are commendable, there are specific areas that could benefit from further refinement. The authors have posited that NOLC1 suppresses p53-mediated ferroptosis; however, the mRNA levels of ferroptosis genes regulated by p53 have not been quantified, which is a critical gap in the current study. In Figure 4A, transmission electron microscopy (TEM) results are reported solely for the MGC-803 cell line. It would be beneficial to include TEM data for the MKN-45 cell line to strengthen the findings. The authors have proposed a link between NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance, yet the correlation between this ratio and patient prognosis remains unexplored, which is a significant limitation in the context of clinical relevance.

    1. eLife Assessment

      This important work advances our understanding of how the SARS-CoV-2 Nsp16 protein is regulated by host E3 ligases to promote viral mRNA capping. However, support for the overall claims is incomplete as the authors need to demonstrate Nsp16 ubiquitination and the role of E3 ligases in a more biologically relevant context (ie. infection). This work will be of interest to those working in host-viral interactions and the role of the ubiquitin-proteasome system in viral replication.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Tiang et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      While the conclusions are generally well-supported by the data, additional work is needed to confirm that NSP16 is ubiquitinated in a biologically relevant context and to better define the roles of the reported E3 ligases. Clarifications regarding aspects of data acquisition, data analysis, and text editing could notably strengthen the manuscript and its conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a novel understanding of CoV-host interaction, leading potential therapeutics for SARS-CoV2 infection. Tian et al. identified and demonstrated that the two E3 ligases UBR5 and MARCHF7 both interact with and catalyze the ubiquitination of NSP16 protein of SARS-CoV2, thereby leading to its degradation by the ubiquitin-proteasome system (UPS) and inhibiting SARS-CoV-2 replication. It is interesting to see that the two E3 ligases perform their functions on the same target independently.

      Strengths:

      Overall, the topic and initial discoveries appear interesting. The experimental designs of this study were rigorous and logical, most of the work has been carefully done, and the conclusions drawn from this study are relatively convincing and reliable.

      Weaknesses:

      The quality of the presentation could be improved with better organization, a more conservative interpretation of the data, and further clarity in the writing.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good. However, several major and minor changes/points need to be addressed.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Weaknesses:

      Most of the in vitro experiments (IP, overexpression) lack appropriate controls. The summary figure in actual terms does not show/correlate to the experimental findings.

    1. eLife Assessment

      In the field of early detection of disease and recurrence and monitoring of treatment efficacy by cfDNA analysis, this study presents a useful finding that HPV cfDNA level monitoring provides advantages over serum levels of squamous cell carcinoma antigen (SCC-Ag), specifically for HPV+ cervical cancer. The data were collected and analysed using solid and validated methodology but the sample number was limited.

    2. Reviewer #1 (Public review):

      Summary:

      The study "Monitoring of Cell-free Human Papillomavirus DNA in Metastatic or Recurrent Cervical Cancer: Clinical Significance and Treatment Implications" by Zhuomin Yin and colleagues focuses on the relationship between cell-free HPV (cfHPV) DNA and metastatic or recurrent cervical cancer patients. It expands the application of cfHPV DNA in tracking disease progression and evaluating treatment response in cervical cancer patients. The study is overall well-designed, including appropriate analyses.

      Strengths:

      The findings provide valuable reference points for monitoring drug efficacy and guiding treatment strategies in patients with recurrent and metastatic cervical cancer. The concordance between HPV cfDNA fluctuations and changes in disease status suggests that cfDNA could play a crucial role in precision oncology, allowing for more timely interventions. As with similar studies, the authors used Droplet Digital PCR to measure cfDNA copy numbers, a technique that offers ultrasensitive nucleic acid detection and absolute quantification, lending credibility to the conclusions.

      Weaknesses:

      Despite including 28 clinical cases, only 7 involved recurrent cervical cancer, which may not be sufficient to support some of the authors' conclusions fully. Future studies on larger cohorts could solidify HPV cfDNA's role as a standard in the personalized treatment of recurrent cervical cancer patients.

    3. Reviewer #2 (Public review):

      Summary:

      The authors conducted a study to evaluate the potential of circulating HPV cell-free DNA (cfDNA) as a biomarker for monitoring recurrent or metastatic HPV+ cervical cancer. They analyzed serum samples from 28 patients, measuring HPV cfDNA levels via digital droplet PCR and comparing these to squamous cell carcinoma antigen (SCC-Ag) levels in 26 SCC patients, while also testing the association between HPV cfDNA levels and clinical outcomes. The main hypothesis that the authors set out to test was whether circulating HPV cfDNA levels correlated with metastatic patterns and/or treatment response in HPV+ CC.

      The main claims put forward by the paper are that:

      (1) HPV cfDNA was detected in all 28 CC patients enrolled in the study and levels of HPV cfDNA varied over a median 2-month monitoring period.<br /> (2) 'Median baseline' HPV cfDNA varied according to 'metastatic pattern' in individual patients.<br /> (3) Positivity rate for HPV cfDNA was more consistent than SCC-Ag.<br /> (4) In 20 SCC patients monitored longitudinally, concordance with changes in disease status was 90% for HPV cfDNA.

      This study highlights HPV cfDNA as a promising biomarker with advantages over SCC-Ag, underscoring its potential for real-time disease surveillance and individualized treatment guidance in HPV-associated cervical cancer.

      Strengths:

      This study presents valuable insights into HPV+ cervical cancer with potential translational significance for management and guiding therapeutic strategies. The focus on a non-invasive approach is particularly relevant for women's cancers, and the study exemplifies the promising role of HPV cfDNA as a biomarker that could aid personalized treatment strategies.

      Weaknesses:

      While the authors acknowledge the study's small cohort and variability in sequential sampling protocols as a limitation, several revisions should be made to ensure that (1) the findings are presented in a way that aligns more closely with the data without overstatement and (2) that the statistical support for these findings is made more clear. Specific suggestions are outlined below.

      (1) The authors should provide source data for Figures 2, 3, and 4 as supplementary material.

      (2) Description of results in Figure 2: Figure 2 would benefit from clearer annotations regarding HPV virus subtypes. For example, does the color-coding in Figure 2B imply that all samples in the LR subgroup are of type HPV16? If that is the case, is it possible that detection variations are due to differences in subtype detection efficiency rather than cfDNA levels? The authors should clarify these aspects. Annotation of Figure 2B suggests that the p-value comes from comparing the LR and LN+H+DSM groups. This should be clarified in the legend. If this p-value comes from comparing HPV cfDNA copies for the (LR, LNM, HM) and (LN+HM, LN+HM+DSM) groups, did the authors carry out post-hoc pairwise comparisons? It would be helpful to include acronyms for these groups in the legend also.

      (3) Interpretation of results in Figure 2 and elsewhere: Significant differences detected in Figure 2B could imply potential associations between HPV cfDNA levels (or subtypes) and recurrence/metastasis patterns. Figure 2C shows that there is a difference in cfDNA levels between the groups compared, suggesting an association but this would not necessarily be a direct "correlation". Overall, interpretation of statistical findings would benefit from more precise language throughout the text and overstatement should be avoided.

      (4) The authors state that six patients showed cfDNA elevation with clinically progressive disease, yet only three are represented in Figure 3B1 under "Patients whose disease progressed during treatment." What is the expected baseline variability in cfDNA for patients? If we look at data from patients with early-stage cancer would we see similar fluctuations? And does the degree of variability vary for different HPV subtypes? Without understanding the normal fluctuations in cfDNA levels, interpreting these changes as progression indicators may be premature.

      (5) It would be helpful if where p-values are given, the test used to derive these values was also stated within parentheses e.g. (P < 0.05, permutation test with Benjamini-Hochberg procedure).

    1. eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the four specific ones used here.

    2. Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the four specific ones used here.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      It does appear that the proposed method works very well on the four investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods. That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

    3. Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the four specific ones used here.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      It does appear that the proposed method works very well on the four investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods. That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      This is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning (i.e., no method is at an F1-Score of 1).

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:

      *  comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      *  comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment, presentation of the results, and reproducibility.

      We address your concerns and misunderstandings below.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small nuclei, much smaller than the pretraining datasets of CellPose and StarDist. I assume that this is one of the main reasons why these well-established methods don't work for this dataset.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Limiting method comparison to only this dataset may create a misleading impression that CellSeg3D is superior for all kinds of 3D nucleus segmentation tasks, whereas this might only hold for small nuclei.

      The GT dataset we labeled has nuclei that are normal brain-cell sized. Moreover in Figure 2 we show very different samples with both dense and noisy (c-FOS) labeling.

      We also clearly do not claim this is superior for all tasks, from our text: “First, we benchmark our methods against Cellpose and StarDist, two leading supervised cell segmentation packages with user-friendly workflows, and show our methods match or outperform them in 3D instance segmentation on mesoSPIM-acquired volumes" – we explicitly do NOT claim beyond the scope of the benchmark. Moreover we state: "We found that WNet3D could be as good or better than the fully supervised models, especially in the low data regime, on this dataset at semantic and instance segmentation" – again noting on this dataset. Again, we only claimed we can be as good as these methods with an unsupervised approach, and in the low-GT data regime we can excel.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I doubt that the claims hold for larger and or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be much stronger if a **fair** comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a) this is not a valid experimental setup and amounts to training on your test set. If b) this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-ext ra.html#threshold-predictions.

      For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot reproduce any of the results using the plugin. I tried to reproduce some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite close (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix:https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to reproduce the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics.

      We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth. Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) One of the listed contributions is "adding the SoftNCuts loss". This is not true, reference 10 already introduced that loss.

      “Our changes include a conversion to a fully 3D architecture and adding the SoftNCuts loss” - we dropped the common and added the word “AND” to note that we added the 3D version of the SoftNCuts loss TO the 3D architecture, which 10 did not do.

      (2) "Typically, these methods use a multi-step approach" to segment 3D from 2D: this is only true for CellPose, StarDist does real 3D.

      That is why we preface with “typically” which implies not always.

      (3) "see Methods, Figure 1c, c)" is missing an opening in parentheses.

      (4) K is not introduced in equation (1) (presumably the number of classes, which seems to be 2 for all experiments considered).

      k actually was introduced just below equation 1 as the number of classes. We added the note that k was set to 2.

      (5) X is not introduced in equation (2) (presumably the spatial position of a voxel).

      Sorry for this oversight. We add that $X$ is the spatial position of the voxel.

      Reviewer #2 (Recommendations For The Authors):

      To improve the paper the weaknesses mentioned above should be addressed:

      (1) Compare to StarDist and/or CellPose on further datasets, esp. using pre-trained CellPose, to see if the claims of competitive performance with state-of-the-art approaches hold up for the case of different nucleus morphologies. The EmbedSeg datasets from Figure 2 c are well suited for this. In the current form, the claims are too broad and not supported if thorough experiments are performed on a single dataset with a very specific morphology. Note: even if the method is not fully competitive with CellPose / StarDist on these Datasets it holds merit since a segmentation method that works for small nuclei as in the mesoSPIM dataset and works self-supervised is very valuable.

      (2) Clarify how the best instance segmentation hyperparameters are found. If you indeed optimize these on the same part of the dataset used for evaluating metrics then the current experimental set-up is invalid. If this is not the case I would still rethink if this is a good way to report the results since it does not seem to reflect user experience. I found it not possible to find good hyperparameters for either of the two segmentation approaches I tried (see also next point) so I think these numbers are too optimistic.

      (3) Improve the instance segmentation part of the plugin: either provide troubleshooting for how to install pyClesperanto correctly to use the voronoi-based instance segmentation or implement it based on more standard functionality like skimage / scipy. Provide more guidance for finding good hyperparameters for the segmentation task.

      (4) Make sure image resizing is done correctly when using pre-trained CellPose models and report on this.

      (5) Report F1 Scores only (unless there is a compelling reason to also report Dice).

      (6) Address the limitations of the method in more detail.

      On a positive note: all data and code are available and easy to download/install. A minor comment: it would be very helpful to have line numbers for reviewing a revised version.

      All comments are also addressed in the public reviews.

    1. eLife Assessment

      This important study uses optogenetics in combination with single cell recordings to selectively activate sensory input channels within the olfactory bulb, providing direct evidence for activity-dependent and distance-independent enhancement of stimulus-evoked gamma oscillations via lateral interactions between input channels, most likely via granule cells. The article presents solid evidence to support the main conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Dalal and Haddad investigated how neurons in the olfactory bulb are synchronized in oscillatory rhythms at gamma frequency. Temporal coordination of action potentials fired by projection neurons can facilitate information transmission to downstream areas. In a previous paper (Dalal and Haddad 2022, https://doi.org/10.1016/j.celrep.2022.110693), the authors showed that gamma frequency synchronization of mitral/tufted cells (MTCs) in the olfactory bulb enhances the response in the piriform cortex. The present study builds on these findings and takes a closer look at how gamma synchronization is restricted to a specific subset of MTCs in the olfactory bulb. They combined odor and optogenetic stimulations in anesthetized mice with extracellular recordings.

      The main findings are that lateral synchronization of MTCs at gamma frequency is mediated by granule cells (GCs), independent of the spatial distance, and strongest for MTCs with firing rates close to 40 Hz. The authors conclude that this reveals a simple mechanism by which spatially distributed neurons can form a synchronized ensemble. In contrast to lateral synchronization, they found no evidence for the involvement of GCs in lateral inhibition of nearby MTCs.

      Strengths:

      Investigating the mechanisms of rhythmic synchronization in vivo is difficult because of experimental limitations for the readout and manipulation of neuronal populations at fast timescales. Using spatially patterned light stimulation of opsin-expressing neurons in combination with extracellular recordings is an elegant approach. The paper provides evidence for an activity-dependent synchronization of MTCs in gamma frequency that is mediated by GCs.

      Weaknesses:

      The study provides several results showing the firing of MTCs in gamma frequency range, however, direct evidence for the synchronization of MTCs in gamma frequency is missing.

    3. Reviewer #2 (Public review):

      Summary

      This study provides a detailed analysis and dissociation between two effects of activation of lateral inhibitory circuits in the olfactory bulb on ongoing single mitral/tufted cell (MTC) spiking activity, namely enhanced synchronization in the gamma frequency range or lateral inhibition of firing rate.

      The authors use a clever combination of single cell recordings, optogenetics with variable spatial stimulation of MTCs and sensory stimulation in vivo, and established mathematical methods, to describe changes in autocorrelation/synchronization of a single MTC's spiking activity upon activation of other, lateral glomerular MTC ensembles. This assay is rounded off by a gain of function experiment in which the authors enhance granule cell (GC) excitation to establish a causal relation between GC activation and enhanced synchronization of a single MTC's spiking to the gamma rhythm. They had used the same optogenetic manipulation in their previous paper Dalal & Haddad 2022, but use a smaller illumination spot here for spatially restricted activation.

      Strengths

      This study is of high interest for olfactory processing since it shows directly that interactions between only two selected active receptor channels are sufficient to enhance synchronization of single neurons to gamma in one receptor channel and thus by inference most likely in both. Such synchronization across co-active receptor channels in turn would enable upstream neurons in olfactory cortices to read out odour identity.

      The authors find that these interactions are distance-independent over many 100s of µms and thus can allow for non-topographical inhibitory action across the bulb, in contrast to the center-surround lateral inhibition known from other sensory modalities. In my view, analogies between vision and olfaction might have been overemphasized so far, since the combinatorial encoding of olfactory stimuli across the glomerular map might require different mechanisms of lateral interaction versus vision. This result is indicative of such a major difference.

      Such enhanced local synchronization to gamma in one channel was observed in a subset of activated channel pairs; in addition, the authors report another type of lateral interaction that does involve reduction of firing rates, drops off with distance and most likely is caused by a different circuit mediated by PV+ neurons (PVN). The evidence for the latter is more circumstantial since no manipulations of PVNs were performed.

      Weaknesses/Room for improvement

      This study is an impressive proof of concept that however does not yet allow for broad generalization. Thus the framing of results should be slightly more careful IMHO. While the claims in the initial version of this preprint have been toned down quite substantially, the authors do not provide direct hard evidence for synchronization across channels. Admittedly, this would be hard to achieve since it would require paired recordings from MTCs in different locations in vivo. Therefore, the term „lateral synchronization" as it is used in the abstract is still problematic, as well as the title which should rather say „can enable" instead of „enables". That being said, this study definitely provides important evidence regarding the concept of "lateral synchronization".

      The other comments and recommendations have been well taken care of in the new version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides in vivo evidence for the synchronization of projection neurons in the olfactory bulb at gamma frequency in an activity-dependent manner. This study uses optogenetics in combination with single-cell recordings to selectively activate sensory input channels within the olfactory bulb. The data are thoughtfully analyzed and presented; the evidence is solid, although some of the conclusions are only partially supported.

      We deeply thank all the reviewers for their time, effort, and insightful comments. Their revision led to a significant improvement of the paper.

      The reviewers suggested toning down our claim that we found a mechanism that synchronizes all odor-evoked MTC activities, as we do not directly show that. We concur and address this in our revised version to ensure a precise interpretation of our findings. In short, we state that we revealed a synchronization mechanism between two groups of active mitral and tufted cells (MTCs) and show that this synchronization is activity-dependent and distance-independent. This mechanism can enable the synchronization of all odor-activated MTCs.

      Another issue raised is the interpretation of the results obtained under Ketamine anesthesia. Ketamine is an NMDA receptor antagonist that plays a crucial role in the  MTC-GC reciprocal synapse. To address this, we include new analyses demonstrating that optogenetic activation of granule cells (GCs) can inhibit the recorded MTCs during baseline activity but does not substantially affect odor-evoked MTC firing rates. We show that this is correct in both Ketamine-induced anesthesia and awake mice (Dalal & Haddad, 2022). This indicates that GC-MTC connections are functional even under Ketamine anesthesia, however, they do not exert substantial suppression on odor-evoked MTC responses. We added a paragraph to the discussion section on the potential influence of Ketamine anesthesia on GC-MTC synapses and its implications on our findings.

      Finally, we discuss several recent studies that are particularly relevant to our research and expand the discussion on our hypothesis that parvalbumin-positive cells in the olfactory bulb may serve as key mediators of the activity- and distance-dependent lateral inhibition observed in our findings.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dalal and Haddad investigated how neurons in the olfactory bulb are synchronized in oscillatory rhythms at gamma frequency. Temporal coordination of action potentials fired by projection neurons can facilitate information transmission to downstream areas. In a previous paper (Dalal and Haddad 2022, https://doi.org/10.1016/j.celrep.2022.110693), the authors showed that gamma frequency synchronization of mitral/tufted cells (MTCs) in the olfactory bulb enhances the response in the piriform cortex. The present study builds on these findings and takes a closer look at how gamma synchronization is restricted to a specific subset of MTCs in the olfactory bulb. They combined odor and optogenetic stimulations in anesthetized mice with extracellular recordings.<br /> The main findings are that lateral synchronization of MTCs at gamma frequency is mediated by granule cells (GCs), independent of the spatial distance, and strongest for MTCs with firing rates close to 40 Hz. The authors conclude that this reveals a simple mechanism by which spatially distributed neurons can form a synchronized ensemble. In contrast to lateral synchronization, they found no evidence for the involvement of GCs in lateral inhibition of nearby MTCs.

      Strengths:

      Investigating the mechanisms of rhythmic synchronization in vivo is difficult because of experimental limitations for the readout and manipulation of neuronal populations at fast timescales. Using spatially patterned light stimulation of opsin-expressing neurons in combination with extracellular recordings is a nice approach. The paper provides evidence for an activity-dependent synchronization of MTCs in gamma frequency that is mediated by GCs.

      Weaknesses:

      An important weakness of the study is the lack of direct evidence for the main conclusion - the synchronization of MTCs in gamma frequency. The data shows that paired optogenetic stimulation of MTCs in different parts of the olfactory bulb increases the rhythmicity of individual MTCs (Figure 1) and that combined odor stimulation and GC stimulation increases rhythmicity and gamma phase locking of individual MTCs (Figure 4). However, a direct comparison of the firing of different MTCs is missing. This could be addressed with extracellular recordings at two different locations in the olfactory bulb. The minimum requirement to support this conclusion would be to show that the MTCs lock to the same phase of the gamma cycle. Also, showing the evoked gamma oscillations would help to interpret the data.

      We agree with the reviewer that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We based this on previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., 2011 showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Another weakness is that all experiments are performed under anesthesia with ketamine/medetomidine. Ketamine is an antagonist of NMDA receptors and NMDA receptors are critically involved in the interactions of MTCs and GCs at the reciprocal synapses (see for example Lage-Rupprecht et al. 2020, https://doi.org/10.7554/eLife.63737; Egger and Kuner 2021, https://doi.org/10.1007/s00441-020-03402-7). This should be considered for the interpretation of the presented data.

      This issue has been raised by reviewers #1 and #2. We think, as also reviewer #2 acknowledged, that this issue does not compromise our results. However, to address this important point we added the below section to the Discussion:

      “Our experiments were performed under Ketamine anesthesia, an NMDA receptor antagonist that affects the reciprocal dendro-dendritic synapses between MTCs and GCs (Egger and Kuner, 2021; Lage-Rupprecht et al., 2020). Consistent with that, recent studies reported lower excitability of GC activity under anesthesia (Cazakoff et al., 2014; Kato et al., 2012).  This raises the concern that our result might not be valid in the awake state. We argue that this is unlikely. First, (Fukunaga et al., 2014) reported that GCs baseline activity in anesthetized and awake mice is similar, suggesting that MTC-GC synapses are functioning. Second, we show that light activation of GCL neurons strongly inhibits the MTC baseline activity (Figure 5) and increases MTC odor-evoked spike-LFP coupling in the gamma range (Figure 4). These experiments validate that GCL neurons can exert inhibition over MTCs in our experimental setup. Third, we have shown that light-activating all accessible GCL neurons has a minor effect on the MTC odor-evoked firing rates in an awake state (Dalal and Haddad, 2022), corroborating the finding that GCL neurons are unlikely to provide strong suppression to MTCs. Fourth, and most importantly, we showed that optogenetic stimulation of MTCs entrains other MTC spike times, which is achieved via the GCL neurons. This suggests that the lack of lateral suppression following MTC or GCL neuron opto-activation is not due to MTC-GC synapse blockage. That said, we cannot exclude the unlikely possibility that NMDA receptor blockage under anesthesia impairs MTC-to-MTC suppressive interactions but not the MTC-to-MTC mediated spike entrainment.”

      Figure 1A and D from Dalal & Haddad 2022 show the effect of GCL neurons opto-activation during odor stimulation on MTC firing rates in awake and anesthetized mice.

      Furthermore, the direct effect of optogenetic stimulation on GCs activity is not shown. This is particularly important because they use Gad2-cre mice with virus injection in the olfactory bulb and expression might not be restricted to granule cells and might not target all subtypes of granule cells (Wachowiak et al., 2013, https://doi.org/10.1523/JNEUROSCI.4824-12.2013). This should be considered for the interpretation of the data, particularly for the absence of an effect of GC stimulation on lateral inhibition.

      In this study we used Gad2-cre mice, and the protocol for viral transfection of GCL neurons reported in Fukunaga et al., 2014. They reported that: ‘more than 90% of Cre-expressing neurons in the GCL also expressed fluorescently tagged ArchT’. Consistently, when Fukunaga et al. expressed ChR2 in the GCL using the same viral infection as we used, they reported that: ”Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in

      turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)”. This study shows clearly that this infection protocol is robust. Moreover, in new panels we added to the manuscript (Figure 5a-b), we show that optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions but not odor-evoked responses MTCs. This is consistent with the reports by Fukunaga et al, and indicates that GCL neurons are functional as they can suppress MTC baseline activity.

      Finally, since virus injection to the granule cell layer can target other GCL neuron types, we changed the reference in the text to GCL neurons (as was done in Gschwend et al., 2015) instead of ‘GCs’ when referring to GC. We replaced the image in Figure 4A, to show the expression of ChR2 is restricted to GCL neurons. That said, it is still possible that our protocol did not infect all GC subtypes. To address this, we added this line to the Discussion: “We also note that our viral transfection protocol in Gad2-Cre mice might not transfect all subtypes of GCs”

      Several conclusions are only supported by data from example neurons. The paper would benefit from a more detailed description of the analysis and the display of some additional analysis at the population level:

      - What were the criteria based on which the spots for light-activation were chosen from the receptive field map?

      In order to make this point clearer, we extended the explanation in the Methods on the selection criteria: “Spots were selected either randomly or manually. In the manual selection case, we selected spots that caused either significant or mild but insignificant inhibitory effect on the recorded MTC (e.g., local cold spots in the receptive-field map; see example in Figure 2a of example spots that were selected manually)”. We also add a reference in the text to the Methods: “see Methods for spots selection criteria”.

      - The absence of an effect on firing rate for paired stimulations is only shown for one example (Figure 1c). A quantification of the population level would be interesting.

      - Only one example neuron is shown to support the conclusion that "two different neural circuits mediate suppression and entrainment" in Figure 3. A population analysis would provide more evidence.

      Thank you very much for these comments. We added a population analysis in Figure 3. This analysis shows a dissociation between firing rate suppression and the entrainment groups (Figure 3c-d). This suggests that two different circuits mediate suppression and entrainment.

      - Only one example neuron is shown to illustrate the effect of GC stimulation on gamma rhythmicity of MTCs in Figures 4 f,g.

      In this figure, we show that the activation of subsets of GCL neurons elevated odor-evoked spike synchronization to the gamma rhythm. We thought it would be beneficial to demonstrate the change in spike entrainment following GCL neurons optogenetic activation regardless of the ongoing OB gamma oscillations, using the method presented by Fukunaga et al., 2014. However, this analysis requires that the neuron has a relatively high firing rate. As we describe in the figure legend of this panel, this neuron is probably a tufted cell based on the findings shown in Fukunaga et al., 2014 and Burton & Urban, 2021. Most of our recorded cells had a lower firing rate, which coincides with our typical recording depth, targeting mitral cells rather than tufted cells (~400µm deep). Since this analysis is shown only over a single neuron, we moved it to Supplementary Figure 4.

      - In Figure 5 and the corresponding text, "proximal" and "distal" GC activation are not clearly defined.

      We agree. Initially, we used these terms to refer to GC columns that include the recorded MTC (proximal) and columns that are away from it (distal). We decided that instead of using a coarse division, we would show the whole range of distances. We updated the analysis in Figure 5d to show the effect of GC optogenetic activation on MTC odor-evoked responses as a function of the distance from the recorded MTC.

      Reviewer #2 (Public Review):

      Summary

      This study provides a detailed analysis and dissociation between two effects of activation of lateral inhibitory circuits in the olfactory bulb on ongoing single mitral/tufted cell (MTC) spiking activity, namely enhanced synchronization in the gamma frequency range or lateral inhibition of firing rate.

      The authors use a clever combination of single-cell recordings, optogenetics with variable spatial stimulation of MTCs and sensory stimulation in vivo, and established mathematical methods to describe changes in autocorrelation/synchronization of a single MTC's spiking activity upon activation of lateral glomerular MTC ensembles. This assay is rounded off by a gain-of-function experiment in which the authors enhance granule cell (GC) excitation to establish a causal relation between GC activation and enhanced synchronization to gamma (they had used this manipulation in their previous paper Dalal & Haddad 2022, but use a smaller illumination spot here for spatially restricted activation).

      Strengths

      This study is of high interest for olfactory processing - since it shows directly that interactions between only two selected active receptor channels are sufficient to enhance the synchronization of single neurons to gamma in one channel (and thus by inference most likely in both). These interactions are distance-independent over many 100s of µms and thus can allow for non-topographical inhibitory action across the bulb, in contrast to the center-surround lateral inhibition known from other sensory modalities.

      In my view, parallels between vision and olfaction might have been overemphasized so far, since the combinatorial encoding of olfactory stimuli across the glomerular map might require different mechanisms of lateral interaction versus vision. This result is indicative of such a major difference.

      Such enhanced local synchronization was observed in a subset of activated channel pairs; in addition, the authors report another type of lateral interaction that does involve the reduction of firing rates, drops off with distance and most likely is caused by a different circuit-mediated by PV+ neurons (PVN; the evidence for which is circumstantial).

      Weaknesses/Room for improvement

      Thus this study is an impressive proof of concept that however does not yet allow for broad generalization. Therefore the framing of results should be slightly more careful in my opinion.

      We agree with the reviewer. We copy here our response to reviewer #1, who raised the same issue.

      We agree that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We relayed previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Along this line, the conclusions regarding two different circuits underlying lateral inhibition vs enhanced synchronization are not quite justified by the data, e.g.

      (1) The authors mention that their granule cell stimulation results in a local cold spot (l. 527 ff) - how can they then said to be not involved in the inhibition of firing rate (bullet point in Highlights)? Please elaborate further. In l.406 they also state that GCs can inhibit MTCs under certain conditions. The argument, that this stimulation is not physiological, makes sense, but still does not rule out anything. You might want to cite Aghvami et al 2022 on the very small amplitude of GC-mediated IPSPs, also McIntyre and Cleland 2015.

      We apologize for the lack of clarity. We reported that we found a local cold spot in the context of an additional experiment not presented in the manuscript and only described in the Methods section. Following the revision, we decided to add the analysis of this experiment to Figure 5. This experiment validated that optogenetic activation of GCs is potent and can affect the recorded MTC firing rates. This is particularly important as we performed all experiments under Ketamine anesthesia, which is a NMDA receptor antagonist. In this experiment, we recorded the activity of MTCs at baseline conditions (without odor presentation) under optogenetic activation of GCs. We divided the OB surface into a grid and optogenetically activated GC columns at a random order, one light spot in each trial, using light patches of size of size 330um2. We used the same light intensity as in the optogenetic GC activation during odor stimulation (reported in Figures 4-5). We show that the recorded MTC was strongly inhibited by GC light activation, mostly when activating GCs in its vicinity (within its column, i.e., local cold spot). This experiment validates that in our experimental setup, GCs can exert inhibition over MTCs at baseline conditions.

      (2) Even from the shown data, it appears that laterally increased synchronization might co-occur with lateral suppression (See also comment on Figures 1d,e and Figure S1c)

      We kindly note that the panels you referred to do not quantify the firing rate but the rhythmicity of MTC light-evoked responses. We should have explained these graphs better in the main text and not only in the Methods section. We added a panel to Supplementary Figure 1, which describes our analysis: In each of these examples, we performed a time-frequency Wavelet analysis over the average response of the neurons across trials (computed using a sliding Gaussian with a std of 2ms). The results of the Wavelet analysis allowed us to visually capture the enhanced spike alignment across trials under paired activation as a function of the stimulus duration (as, for example, in Figure 1c, middle panel). The response amplitude to light stimulation did not change in this example (shown in Figure 1c lower panel), and the spikes entrainment increased following paired activation of MTCs.

      To address the relations between lateral suppression and synchronization at the population level, we added additional analyses to Figure 3c-d.

      (3) There are no manipulations of PVN activity in this study, thus there is no direct evidence for the substrate of the second circuit.

      We completely agree with the reviewer. Using the current data, we can only claim that optogenetic activation of GCL neurons did not affect the MTC odor-evoked response. This finding is consistent with the loss-of-function experiment reported by Fukunaga et al., 2014, where GC suppression did not change odor-evoke responses in both anesthetized and awake mice. Therefore, we speculated that PVN might be a candidate OB interneuron to mediate lateral inhibition between MTCs. This hypothesis is based on their higher likelihood of interconnecting two MTCs compared with GCs (Burton, 2017). We elaborated on this in the discussion and made sure it is clearly stated as a hypothesis.

      (4) The manipulation of GC activity was performed in a transgenic line with viral transfection, which might result in a lower permeation of the population compared to the line used for optogenetic stimulation of MTCs.

      We used a previously validated protocol for optogenetic manipulation of GCs from Fukunaga et al., 2014 in order to minimize this caveat. As we cited previously from their paper, following the expression of ChR2 in the GCL, ‘Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)’. These results are consistent with the additional experiment we added to the manuscript, where optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions (without odor presentation). The high similarity between these two reports, in which, in the case of Fukunaga et al., GC activation was directly measured, suggests that lack of opsin expression or insufficient light intensity is unlikely to explain the lack of GCL neuron activation effect on lateral inhibition. Moreover, GCL neurons' optogenetic activation during odor stimulation increased MTC spike-LFP coupling in the gamma range. Therefore, the dissociation between the effects of GCL neurons on spike entrainment and lateral inhibition suggests that the lack of lateral inhibition following GC activation is unlikely due to low expression rates.

      In some instances, the authors tend to cite older literature - which was not yet aware of the prominent contribution of EPL neurons including PVN to recurrent and lateral inhibition of MT cells - as if roles that then were ascribed to granule cells for lack of better knowledge can still be unequivocally linked to granule cells now. For example, they should discuss Arevian et al (2006), Galan et al 2006, Giridhar et al., Yokoi et al. 1995, etc in the light of PVN action.

      Therefore it is also not quite justified to state that their result regarding the role of GCs specifically for synchronization, not suppression, is "in contrast to the field" (e.g. l.70 f.,, l.365, l. 400 ff).

      We changed several sentences in the discussion and introduction to explain that previous studies attributed lateral suppression to GC because they were not aware of the prominent contribution of EPL neurons as has been demonstrated by more recent studies (Burton 2024, Huang et al., 2016,  Kato et al., 2013, and more).

      We also toned down the statement that these findings are in contrast to the field. Instead, we state that our findings support the claim that GCs are not involved in affecting MTC odor-evoked firing rate.

      Why did the authors choose to use the term "lateral suppression", often interchangeably with lateral inhibition? If this term is intended to specifically reflect reductions of firing rates, it might be useful to clearly define it at first use (and cite earlier literature on it) and then use it consistently throughout.

      We agree and have changed the manuscript accordingly. We added the following in the introduction: “We use this phrase here to refer to a process that suppresses the firing rate of the post-synaptic neuron.”

      A discussion of anesthesia effects is missing - e.g. GC activity is known to be reportedly stronger in awake mice (Kato et al). This is not a contentious point at all since the authors themselves show that additional excitation of GCs enhances synchrony, but it should be mentioned.

      We completely agree and added a paragraph to the Discussion in this regard. Please see also the response to reviewer #1, who made a similar suggestion.

      Some citations should be added, in particular relevant recent preprints - e.g. Peace et al. BioRxiv 2024, Burton et al. BioRxiv 2024 and the direct evidence for a glutamate-dependent release of GABA from GCs (Lage-Rupprecht et al. 2020).

      We thank the reviewer for noting us these relevant recent manuscripts. We have now cited Peace et al., when discussing the spatial range of inhibition and gamma synchronization in the OB, Lage-Rupprecht et al in the context of the involvement of NMDA receptor in MTC-GC reciprocal synapse and Burton et al. when discussing PV neurons potential function.

      The introduction on the role of gamma oscillations in sensory systems (in particular vision) could be more elaborated.

      In our previous paper (Dalal & Haddad 2022) we had an elaborated introduction on the role of gamma oscillations in sensory processing, since we focused in this study in the effect of gamma synchronization on information transmission between brain regions. In the current study we looked at gamma rhythms as a mechanism that can facilitate ensemble synchronization.

      Reviewer #3 (Public Review):

      Summary:

      This study by Dalal and Haddad analyzes two facets of cooperative recruitment of M/TCs as discerned through direct, ChR2-mediated spot stimulations:

      (1) mutual inhibition and

      (2) entrainment of action potential timing within the gamma frequency range.

      This investigation is conducted by contrasting the evoked activity elicited by a "central" stimulus spot, which induces an excitatory response alone, with that elicited when paired with stimulations of surrounding areas. Additionally, the effect of Gad2-expressing granule cells is examined.

      Based on the observed distance dependence and the impact of GC stimulations, the authors infer that mutual inhibition and gamma entrainment are mediated by distinct mechanisms.

      Strengths:

      The results presented in this study offer a nice in vivo validation of the significant in vitro findings previously reported by Arevian, Kapoor, and Urban in 2008. Additionally, the distance-dependent analysis provides some mechanistic insights.

      We thank the reviewer for his comments. Indeed, the current study provides in-vivo replication of the results reported in Arevian et al., 2008 in-vitro, and adds further insights by showing that lateral inhibition is distant-dependent. However, this is not the main focus of the current study. Following the findings reported by Dalal & Haddad 2022, the motivation for this study was to test the mechanism that allows co-activated MTCs to entrain their spike timing. By light-activating pairs of MTCs at varying distances, we detected a subset of pairs in which paired light-activation evoked activity-dependent lateral inhibition, as was reported by Arevian et al., 2008. Moreover, we think it is highly important to know that a previous result in an in-vitro study is fully reproducible in-vivo.

      Weaknesses:

      The results largely reproduce previously reported findings, including those from the authors' own work, such as Dalal and Haddad (2022), where a key highlight was "Modulating GC activities dissociates MTCs odor-evoked gamma synchrony from firing rates." Some interpretations, particularly the claim regarding the distance independence of the entrainment effect, may be considered over-interpretations.

      We kindly disagree with the reviewer. We think the current study extends rather than reproduces the findings reported in Dalal & Haddad 2022. The 2022 study mainly focused on the effect of OB gamma synchronization on odor representation in the Piriform cortex. We bidirectionally modulated the level of MTC gamma synchronization and found that it had bidirectional effects on odor representation in one of their downstream targets, the anterior piriform cortex. The current study, however, focuses on the question of how spatially distributed odor-activated MTCs can synchronize their spiking activity. Our current main finding is that paired activation of MTCs can enhance the spikes entrainment of the recorded MTC in an activity-dependent and spatially independent manner. We suggest that this mechanism is mediated by GCL neurons.

      The reviewer did not explain why he\she thinks that the distance independence of the entrainment effects is an over-interpretation. However, to make our claim more precise we added the following sentence to the corresponding results section:” Furthermore, within the distance range that we were able to measure, the increased phase-locking did not significantly correlate with the distance from the MTC”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Line 17f: "This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, ..."

      This sentence implies a direct comparison of the simultaneously recorded firing of MTCs but I could not find evidence for this in this manuscript. I would suggest to change this.

      We thank the reviewer. The sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”.

      (2) Line 43f: A brief description of what glomeruli are could help to avoid confusion for readers less familiar with the OB. The phrasing of "activated glomeruli" and "each glomerulus innervates" are somewhat misleading given that they do not contain the cell bodies of the projection neurons.

      We edited this part of the introduction so it briefly describes what glomeruli are: ‘Olfactory processing starts with the activity of odorant-activated olfactory sensory neurons. The axons of these sensory neurons terminate in one or two anatomical structures called glomeruli located on the surface of the olfactory bulb (OB). Each glomerulus is innervated by several mitral and tufted cells (MTCs), which then project the odor information to several cortical regions. ‘

      (3) Line 78ff: The text sounds as if glomeruli are activated by the light stimulation but ChR2 is expressed in MTCs, the postsynaptic component of the glomeruli. It would be clearer to refer to the stimulation as light activation of MTCs.

      We corrected this sentence to: ‘We first mapped each recorded cell's receptive field, i.e., the set of MTCs on the dorsal OB that affect its firing rates when they are light-stimulated.’

      (4) Line 90: It would be great to mention somewhere in this paragraph that you are analyzing single-unit data sorted from extracellular recordings with tungsten electrodes.

      We added that to the description of the experimental setup: ‘To investigate how MTCs interact, we expressed the light-gated channel rhodopsin (ChR2) exclusively in MTCs by crossing the Tbet-Cre and Ai32 mouse lines (Grobman et al., 2018; Haddad et al., 2013), and extracellularly recorded the spiking activity of MTCs in anesthetized mice during optogenetic stimulation using tungsten electrodes.’

      (5) Line 97: The term "delta entrainment" could be easily confused with the entrainment of MTCs to respiration in the delta frequency band. Maybe better to use a different term or stick to "change in entrainment" also used in the text.

      We completely agree. The term was changed to “change in entrainment” throughout the manuscript and figures.

      (6) Line 121f: "Light stimulation did not affect ..." . Should this be "Paired light stimulation did not affect ..."?

      Corrected, thank you.

      (7) Supplementary Figure 1a: The example is not very convincing. It looks a bit like a rhythmic bursting neuron mildly depending on the stimulation.

      This panel serves to present our light stimulation method. The potency of the light stimulation protocol can be seen in the receptive field maps.

      (8) Supplementary Figure 1c: Why is there no confidence interval for 'Paired'?

      This panel shows the power spectrum density of the average neuron response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (9) Line 166f: "... across any light intensities". Maybe better "... for the four light intensities tested"?

      We agree, we changed the text in accordance.

      (10) Figure 2f: It would be more intuitive to have the x-axis in the same orientation as in 2e.

      Corrected, thank you.

      (11) Figure 4a: The image in this panel is identical to Figure 1a in Dalal and Haddad 2022 in Cell reports just with a different intensity. The reuse of items and data from previous publications should be indicated somewhere but I could not find it.

      We apologize for this replication. We replaced it with a photo showing a larger portion of the OB, demonstrating the restricted viral expression within the GCL.

      (12) Line 408ff: A brief explanation for the hypothesis of EPL parvalbumin interneurons as the ones mediating lateral inhibition would be great.

      We agree. We added the following paragraph to the discussion section: “We speculate that MTC-to-MTC suppression is mediated by EPL neurons, most likely the Parvalbumin neuron (PV). This hypothesis is based on their activity and connectivity properties with MTCs(Burton, 2017; Kato et al., 2013; Miyamichi et al., 2013; Burton, 2024). More studies are required to reveal how PV neurons affect MTC activity.”

      (13) Line 425ff: You show that only activity of high firing rate neurons is suppressed by lateral inhibition, whereas "low and noise MTC responses" are not affected. Wouldn't this rather support the conclusion that lateral inhibition prevents excess activity from the OB?

      We found lateral inhibition was mainly effective when the postsynaptic neurons fired at ~30-80Hz in response to light stimulation. That is, it affects MTC firing in this “intermediate” rate, and to a lesser extent when the MTC have low and very high firing rates. To prevent excess activity, one would expect a mechanism that affects more high firing rates than medium ones. This was demonstrated in Kato 2013 for PV-MTC inhibition

      (14) Line 387: "..., only ~20% of the tested MTC pairs exhibited significant lateral inhibition." This is higher than the 16% of neurons you reported to have lateral entrainment (line 100). Why do you consider the lateral inhibition as 'sparse' but the lateral entrainment as relevant?

      We apologize for this unclear statement. The papers we cited in this regard (Fantana et al., 2008; Lehmann et al., 2016; Pressler and Strowbridge, 2017) have tested lateral inhibition when the recorded MTC was not active, which resulted in a sparse MTC-MTC inhibition. We validated and replicated these findings in our setup, by systematically projecting light spots over the dorsal OB without simultaneous activation of the recorded MTC and found similar rates of largely scarce inhibition (data not shown). In this study, using spike-triggered average light stimulation protocol and paired activation of MTCs, we found higher rates of lateral inhibition, consistent with the reports by Isaacson and Strowbridge, 1998, Urban and Sakmann, 2002. We changed this paragraph to the following:

      “We found that in only ~20% of the tested MTC pairs exhibited significant lateral suppression. This rate is consistent with previous in-vitro studies that found lateral suppression between 10-20% of heterotypic MTC pairs (Isaacson and Strowbridge, 1998; Urban and Sakmann, 2002), and is higher compared to a case where the recorded MTC is not active (Lehmann et al., 2016).”

      Reviewer #2 (Recommendations For The Authors):

      Figure-by-figure comments:

      (1) Figures 1d,e: both these examples seem to show that the firing rate is decreased in the paired condition? From maxima at 110 to 58 Hz in d and 100 to 48 Hz in e. Please explain (see also comment on Figure S1c).

      Please see the response in the Public Review section, reviewer #2, bullet (2). We also added a panel to Supplementary Figure 1 to better explain this.

      (2) Figure 1 f The means and SEMs are hard to see. Why is the SEM bar plotted horizontally? Since this is a major finding of the paper, will there be a table provided that shows the distribution of ∆ shifts across animals?

      We apologize for the mistake. The horizontal bar was the marking of the mean. Since the SEM is small, we corrected the graph for better visualization of the SEM.

      (3) Figure 1g Showing the running average of data where there is almost none or no data points (beyond 50 Hz) seems not ideal. Is the enhanced entrainment around 40Hz significant? Perhaps the moving average should be replaced by binned data with indicated n?

      We prefer to show all data points instead of binning the data so the reader can see it all. We agree that such a wide range on the x-axis is unnecessary. We shorten this graph only to include the firing rate range in which the data points ranged.

      (4) Figure 1h Impressive result!

      Thank you!

      (5) Figure S1a: since the authors show the respiratory pattern here and there obviously was no alignment of light stimulation with inspiration, was there any correlation between the respiratory phase and efficiency of light stimulation with respect to lateral interactions?

      This is an interesting idea. In Haddad et al., 2013, figure 7, the authors performed a similar analysis, and showed that optogenetic activation of MTCs had a more pronounced effect on firing rate in the respiration phases where the neuron was less firing. However, we haven’t quantified the impact of lateral interactions with respect to the respiration phase. That being said, the data will be publicly available to test this question.

      (6) Figure S1c: Here the shift towards a lower firing rate seems to be obvious (see comment in Figures 1 d and e). Please also show the plot for Figure 1e.

      This panel shows the power spectrum density of the average neuron's response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (7) Figure 2b: show the same plot also for pair 2? Why is it stated that there is no lateral suppression for lateral stimulation alone, if the MTC did not spike spontaneously in the first place and thus inhibition cannot be demonstrated?

      We use Figure 2b to demonstrate the effect of lateral inhibition, and in Figure 2c we detail the responses under each light intensity for both pairs. We think that showing the mean and SEM for one example is enough to give a sense of the effect, as in Figure 2c we show the average response across time together with significant assessment for each pair (panels without a p-value have no significant difference between the conditions).

      However, we agree with the comment on this specific example and therefore deleted this sentence. However, at the population level we found no inhibition when activating the lateral spots, regardless of their firing rates (shown in Supplementary Figure 2a).

      (8) Figure 2d: why is there no distance-dependent color coding for the significant data points? Or, alternatively, since the distance plot is shown in 2e, perhaps drop this information altogether? Again, the moving average is problematic.

      Distance-dependent color coding is applied to all data points in this panel. Significant data points are shown in full circles and have distance-dependent color coding, which is mainly restricted to the lower part of the distance scale (cold colors).

      We used a moving average to relate to the similar result reported in Arevian 2008.In Figure 2e, the actual distance for each data point is indicated on the x-axis.

      (9) Figure 2f: the diagonal averaging method seems to neglect a lot of the data in Figure S2b, why not use radial coordinates for averaging?

      Thank you for the great suggestion. We indeed performed radial coordinates for the averaging, and the results are more robust and better summarize the entire data.

      (10) Figure 3: These are interesting observations, but are there cumulative data on such types of pairs? Please describe and show, otherwise this can only be a supplemental observation. Regarding 3b was it always the lower light intensity that resulted in suppression and the higher in sync? Since Burton et al. 2024 have just shown that PVNs require very little input to fire!

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      (11) Figure 4: This Figure and the corresponding section should be entitled "Additional GC activation... ", otherwise it might be confusing for the reader. A loss of function manipulation (local GC silencing) would be also great to have! You did this in the previous paper, why not here? Raw LFP data are not shown. In Figure 4e the reported odor response firing rate ranges only up to 40Hz, but the example in g shows a much higher frequency. Is the maximum in 4e significant? (same issue as for Figure 1g).

      We changed the phrase to ‘optogenetic GCL neurons activation’. Unfortunately, we haven’t performed experiments where we suppress GC columns. In the previous paper, we suppressed the activity of all accessible GCs, which resulted in reduced spike synchronization to the OB gamma oscillations. Silencing only the GC column is, we think, unlikely to have a substantial effect, especially if the GCs have low activity (but this needs to be tested). Furthermore, we added examples of raw LFP data for odor stimulation and odor combined with GCL column activation (see Supplementary Figure 4a).

      The instantaneous firing rate is high (~80Hz), however the firing rate values we report in Figure 4e is the average within a window of 2 seconds (the odor duration is 1.5 seconds and we extend the window to account for responses with late return to baseline). The average firing rate of this example neuron in this window was 28Hz.

      (12) Fig 5: what does "proximal" mean - does this mean stimulation of the GCs below the recorded MTC, that might actually belong to the same glomerular unit?

      Yes, by “proximal” we mean the activation of the GC in the column of the recorded MTC. However, we decided that instead of coarsely dividing the data into proximal and distal optogenetic activation of GCL neurons, we will show the data continuously to show that GC had no significant effect on MTC odor-evoked firing rates regardless of their location (Figure 5d).

      A comment on the title:

      Please tone it down: "Ensemble synchronization" is a hypothesis at this point, not directly shown in the paper. Also, the paper does not show lateral interactions between odor-activated neurons.

      We agree and have rephrased it to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons ”

      (1) Figure 1a, 2a scale bar missing.

      Corrected, thank you.

      (2) Figure 1 c is the "rebound" in the lateral stim trace (green) real or not significant?

      The activity during this rebound is not significantly different than the baseline activity before light stimulation.

      (3) Figure 2b legend: "lateral alone" instead of lateral?

      We appreciate the suggestion. For simplicity, we will keep it as “lateral”.

      (4) Figure 2c: some of the data plots seem to be breaking off, e.g. the blue line in the bottom third one.

      This line breaking is due to the lack of spikes in this period. The PSTHs used in all analyses result from the convolution of the spike train with a Gaussian window with a standard deviation of 50ms.

      (5) Figure 2f: Why is the x axis flopped vs 2d,e?

      This panel was mistakenly plotted that way, and was corrected.

      Comments on the text:

      Abstract - we had indicated suggestions by strike-throughs and color which are lost in the online submission system, please compare with your original text:

      Information in the brain is represented by the activity of neuronal ensembles. These ensembles are adaptive and dynamic, formed and truncated based on the animal`s experience. One mechanism by which spatially distributed neurons form an ensemble is via synchronization of their spiking activity in response to a sensory event. In the olfactory bulb, odor stimulation evokes rhythmic gamma activity in spatially distributed mitral and tufted cells (MTCs). This rhythmic activity is thought to enhance the relay of odor information to the downstream olfactory targets. However, how only specifically the odor-activated MTCs are synchronized is unknown. Here, we demonstrate that light optogenetic activation of activating one set of MTCs can gamma-entrain the spiking activity of another set. This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, facilitating the synchronization of only the odor-activated MTCs. Furthermore, we show that lateral synchronization did not depend on the distance between the MTCs and is mediated by granule cells. In contrast, lateral inhibition between MTCs that reduced their firing rates was spatially restricted to adjacent MTCs and was not mediated by granule cells. Our findings reveal lead us to propose ? a simple yet robust mechanism by which spatially distributed neurons entrain each other's spiking activity to form an ensemble.

      Thank you. We adopted most of the changes and edited the abstract to reflect the reported results better.

      "both MTCs fired at the gamma rhythm"/this is at this point unwarranted since the mutual entrainment is not shown - tone down or present as hypothesis?

      We completely agree. This sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm, facilitating the synchronization of the active MTC”.

      l. 28: distance-independent instead of "spatially independent"?

      Corrected

      l. 46: are there inhibitory neurons in the ONL? Or which 6 layers are you referring to here?

      Corrected to “spanning all OB layers”.

      l. 49: "is mediated" => "likely to be mediated". Schoppa's work is in vitro and did not account for PVNs, see comment in Public Review.

      Corrected. Indeed Schoppa`s work was performed in-vitro. We cite it here since it showed that the synchronized firing of two MTC pairs depends on granule cells.

      l.52: "method"? rather "mechanism"? "specifically" instread of "only"?

      Corrected.

      l.52: perhaps more precise: a recent hypothesis is that GCs enable synchronization solely between odor-activated MTCs via an activity-dependent mechanism for GABA-release (Lage Rupprecht et al. 2020 - please cite the experimental paper here). Again. Galan has no direct evidence for GCs vs PVNs, see comment in Public Review.

      Thank you, we updated this sentence here and in the discussion and added the relevant citation.

      l. 66: spike timings instead of spike's timing?

      Corrected to spike timings

      l. 67 -71: this part could be dropped.

      We appreciate the suggestion; however, we think that it is convenient to briefly read the main results before the results section.

      l. 76 mouse instead of mice.

      Corrected.

      l. 77: for clarification: " a single MTC"?

      In some cases, we recorded more than one cell simultaneously.

      l. 89: just use "hotspot".

      Corrected

      l. 97 instead of "change", "positive change" or "increase"?

      We left the word change, since we wanted to report that the change between hotspot alone and paired stimulation was significantly higher than zero.

      l. 104: the postsyn MTC's firing rate.

      Corrected to MTC instead of MTCs

      l.108: "distributed on the OB surface" sounds misleading, perhaps "across the glomerular map"?

      Corrected.

      l. 254: "which the MTCs form with each other"- perhaps "which interconnect MTCs".

      Corrected.

      l. 270 Additional GC activation.

      Corrected to ‘optogenetic activation of GCL neurons’

      l. 284 somewhat unclear - please expand.

      Corrected to ‘This measure minimizes the bias of the neuron's firing rate on the spike-LFP synchrony value’.

      l. 371: no odors in Schoppa et al.

      Corrected to ‘It has been shown that two active MTCs can synchronize their stimulus-evoked and odor-evoked spike timings’

      l. 406 ff. good point - but where is the transition? How does this observation rule out that GCs can mediate lateral suppression?

      It is an important question. We tested two setups of GCs optogenetic activation, either column activation (in this paper) or the activation of all accessible GCs of the dorsal OB (Dalal & Haddad, 2022). Although the latter manipulation results in significant firing rate suppression, the effect of MTC suppression was relatively small in anesthetized mice and even smaller in awake mice. Optogenetically activating GCs at baseline conditions resulted in a strong suppression of only the adjacent MTCs. Taken together, we think that GCs are capable of strongly inhibit MTCs, but it is not their main function in natural olfactory sensation.

      l. 422 ff: again, this is a hypothesis, please frame accordingly.

      Corrected to ‘Activity-dependent synchronization can enables the synchronization of odor-activated MTCs that are dispersed across the glomerular map’

      l. 551 typo.

      Corrected.

      l 556 ff: Figure 2 does not show odor responses.

      Corrected.

      l 582: Mix up of above/below and low/high?

      Corrected to ‘The values in the STA map that were above or below these high and low percentile thresholds’

      Reviewer #3 (Recommendations For The Authors):

      Line 76: "Ai39" should be corrected to "Ai32".

      Corrected. Thank you.

      Figure Legends: The legends should describe the results rather than interpret the data. For instance, the legends for Figures 1f, g, and h contain interpretations. The authors should review all legends and revise them accordingly.

      We appreciate the comment. However, we kindly disagree. We don’t see these opening sentences as interpretations but as guidance to the reader. For example, ‘Paired stimulation increases spikes’ temporal precision’ is not an interpretation; instead, it describes the finding presented in this panel. We think that legends that only repeat what can already be deduced from the graph are not helpful and, in many cases, obsolete. Explaining what we think this graph shows is common, and we prefer it as it helps the reader.

      For Figures 1d and e, it may be beneficial to add the spectrograms for the second stimulation alone.

      We show the stimulation of the hotspot alone and when we stimulate both.<br /> The spectrogram of the lateral alone does not show anything of importance.

      Figures 1a and 2a: Please add color bars so that readers can understand the meaning of the colors plotted.

      Color bars were added.

      Figure 3: The purpose of this figure is unclear. Why does the baseline firing rate for the paired activation differ? Is this an isolated observation, or is it observed in other units as well?

      This issue has been raised also by reviewer #2. Attached here is our response to reviewer #2

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      Figures 4 and 5 data seems to come from the same dataset as in Dalal and Haddad (2022) DOI: https://doi.org/10.1016/j.celrep.2022.110693. For example, the fluorescence image looks identical. If this is the case, the authors may want to state that that the image and and some of the data and analyses are reproduced.

      The recorded data shown in these figures are not reproduced from Dalal & Haddad 2022. We collected this data, using GC-columns activation instead of light activating the entire OB dorsal surface as was done in the 2022 paper.

      However, the histology image is the same and we now replaced it with a new image, which shows that the expression is restricted to the GCL.

      Figure 4d: the authors use the data plotted here to argue that the gamma entrainment is distance-independent. But there is a clear decrease over distance (e.g., delta PPC1 over 0.01 is not seen for distance beyond 1000 m). The claim of distance independence may be an over-interpretation of the data. Peace et al. (2024) also claimed that coupling via gamma oscillations occurs over a large spatial extent.

      From a statistical point of view, we can’t state that there is a dependency on distance as the correlation is insignificant (P = 0.86). PPC1 of value 0.01 can be found at 0, 500, and 700 microns. Lower values are found at far distances, but this can result from a smaller number of points. The reduced level of synchrony observed at distances above one mm could be the result of the reduced density of lateral interactions at these distances. That said, we rephrase the sentence to a more careful statement. Please see the rephrased sentence at the Public review section.

    1. eLife Assessment

      The findings of this study are potentially valuable, offering insights into the neural representation of reversal probability in decision-making tasks, with potential implications for understanding flexible behavior in changing environments. The evidence presented is incomplete, with interesting comparisons between neural data and models, but the analyses do not yet provide clear evidence against line attractor dynamics for the accumulation of evidence of a reversal in this probabilistic reversal learning task.

    2. Reviewer #1 (Public review):

      The authors aimed to investigate how the probability of a reversal in a decision-making task is represented in cortical neurons. They analyzed neural activity in the prefrontal cortex of monkeys and units in recurrent neural networks (RNNs) trained on a similar task. Their goal was to understand how the dynamical systems that implement computation perform a probabilistic reversal learning task in RNNs and nonhuman primates.

      Major strengths and weaknesses:

      Strengths:

      (1) Integrative Approach: The study exemplifies a modern approach by combining empirical data from monkey experiments with computational modeling using RNNs. This integration allows for a more comprehensive understanding of the dynamical systems that implement computation in both biological and artificial neural networks.

      (2) The focus on using perturbations to identify causal relationships in dynamical systems is a good goal. This approach aims to go beyond correlational observations.

      Weaknesses:

      (1) The description of the RNN training procedure and task structure lacks detail, making it difficult to fully evaluate the methodology.

      (2) The conclusion that the representation is better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature.

      (3) The use of targeted dimensionality reduction (TDR) to identify the axis determining reversal probability may not necessarily isolate the dimension along which the RNN computes reversal probability.

      Appraisal of aims and conclusions:

      The authors claim that substantial dynamics associated with intervening behaviors provide evidence against a line attractor. The conclusion that this representation is better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The authors found that the state was translated systematically in response to whether outcomes were rewarded, and this translation accumulated across trials. This is consistent with a line attractor, where reward input bumps the state along a line. The observed dynamics could still be consistent with a curved line attractor, with faster timescale dynamics superimposed on this structure.

      Likely impact and utility:

      This work contributes to our understanding of how probabilistic information is represented in neural circuits and how it influences decision-making. The methods used, particularly the combination of empirical data and RNN modeling, provide a valuable approach for investigating neural computations. However, the impact may be limited by some of the methodological concerns raised.

      The data and methods could be useful to the community, especially if the authors provide more detailed descriptions of their RNN training procedures and task structure. However, reverse engineering of the network dynamics was minimal. Most analyses didn't take advantage of the full access to the RNN's update equations.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors trained RNN to perform a reversal task also performed by animals while PFC activity is recorded. The authors devised a new method to train RNN on this type of reversal task, which in principle ensures that the behavior of the RNN matches the behavior of the animal. They then performed some analysis of neural activity, both RNN and PFC recording, focusing on the neural representation of the reversal probability and its evolution across trials. Given the analysis presented, it has been difficult for me to assess at which point RNN can reasonably be compared to PFC recordings.

      Strengths:

      Focusing on a reversal task, the authors address a challenge in RNN training, as they do not use a standard supervised learning procedure where the desired output is available for each trial. They propose a new way of doing that.

      They attempt to confront RNN and neural recordings in behaving animals.

      Weaknesses:

      The design of the task for the RNN does not seem well suited to support the claim of the paper: no action is required to be performed by neurons in the RNN, instead, the choice of the animal is determined by applying a non-linearity to the RNN's readout (equation 7), no intervening behavior is thus performed by neurons on which the analysis is performed throughout the paper. 
Instead, it would have been nice to mimic more closely the task structure of the experiments on monkeys, with a fixation period where the read-out is asked to be at a zero value, and then asked to reach a target value (not just taking its sign), depending on the expected choice after a cue presentation period.

      The comparison between RNN and neural data focuses on very specific features of neural activity. It would have been nice to see how individual units in the RNN behave over the course of the trial, do all units show oscillatory behavior like the readout shown in Figure 1B?

      It would be nice to justify why it has been chosen to take a network of inhibitory neurons and to know whether the analysis can also be performed with excitatory neurons.
 All the analysis relies on the dimensionality reduction. It would have been nice to have some other analysis confirming the claim of the absence of a line attractor in the neural data. Or at least to better characterize this dimensionality reduction procedure, e.g. how much of the variance is explained by this analysis for instance?

      It is thus difficult to grasp, besides the fact that reversal behavior is similar, to what extent the RNN is comparable to PFC functioning and to what extent we learn anything about the latter.

      Other computational works (e.g. [1,2]) have developed procedures to train RNN on reversal-like tasks, it would have been nice to compare the procedure presented here with these other works.

      [1] H Francis Song & Xiao-Jing Wang. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife doi:10.7554/elife.21492.001.

      [2] Molano-Mazón, M. et al. Recurrent networks endowed with structural priors explain suboptimal animal behavior. Current Biology 33, 622-638.e7 (2023).

    4. Reviewer #3 (Public review):

      Summary:

      Kim et al. present a study of the neural dynamics underlying reversal learning in monkey PFC and neural networks. The concept of studying neural dynamics throughout the task (including intervening behaviour) is interesting, and the data provides some insights into the neural dynamics driving reversal learning. The modelling seems to support the analyses, but both the modelling and analyses also leave several open questions.

      Strengths:

      The paper addresses an interesting topic of the neural dynamics underlying reversal learning in PFC, using a combination of biological and simulated data. Reversal learning has been studied extensively in neuroscience, but this paper takes a step further by analysing neural dynamics throughout the trials instead of focusing on just the evidence integration epoch.

      The authors show some close parallels between the experimental data and RNN simulations, both in terms of behaviour and neural dynamics. The analyses of how rewarded and unrewarded trials differentially affect dynamics throughout the trials in RNNs and PFC were particularly interesting. This work has the potential to provide new insights into the neural underpinnings of reversal learning.

      Weaknesses:

      Conceptual:

      A substantial focus of the paper is on the within-trial dynamics associated with "intervening behaviour", but it is not clear whether that is well-modelled by the RNN. In particular, since there is little description of the experimental task, and the RNN does not have to do any explicit computation during the non-feedback parts of the trial, it is unclear whether the RNN 'non-feedback' dynamics can be expected to reasonably model the experimental data.

      Data analyses:

      While the basic analyses seem mostly sound, it seems like a potential confound that they are all aligned to the inferred reversal trial rather than the true experimental reversal trial. For example, the analyses showing that 'x_rev' decays strongly after the reversal trial, irrespective of the reward outcome, seem like they are true essentially by design. The choice to align to the inferred reversal trial also makes this trial seem 'special' (e.g. in Figure 2, Figure 5A), but it is unclear whether this is a real feature of the data or an artifact of effectively conditioning on a change in behaviour. It would be useful to investigate whether any of these analyses differ when aligned to the true reversal trial. It is also unsurprising that x_rev increases before the reversal and decreases after the reversal (it is hard to imagine a system where this is not the case), yet all of Figure 5 and much of Figure 4 is devoted to this point.

      Most of the analyses focus on the dynamics specifically in the x_rev subspace, but a major point of the paper is to say that biological (and artificial) networks may also have to do other things at different times in the trial. If that is the case, it would be interesting to also ask what happens in other subspaces of neural activity, that are not specifically related to evidence integration or choice - are there other subspaces that explain substantial variance? Do they relate to any meaningful features of the experiment?

      On a related note, descriptions of the task itself, the behaviour of the animal(s?), and the neural recordings are largely absent, making it difficult to know what we should expect from neural dynamics throughout a trial. In fact, we are not even told how many monkeys were used for the paper or which part of PFC the recordings are from.

      Modelling:

      There are a number of surprising and non-standard modelling choices made in this paper. For example, the choice to only use inhibitory neurons is non-conventional and not consistent with prior work. The authors cite van Vreeswijk & Sompolinsky's balanced network paper, but this and most other balanced networks use a combination of excitatory and inhibitory neurons.

      It also seems like the inputs are provided without any learnable input weights (and the form of the inputs is not described in any detail). This makes it hard to interpret the input-driven dynamics during the different phases of a trial, despite these dynamics being a central topic of the paper.

      It is surprising that the RNN is "trained to flip its preferred choice a few trials after the inferred scheduled reversal trial", with the reversal trial inferred by an ideal Bayesian observer. A more natural approach would be to directly train the RNN to solve the task (by predicting the optimal choice) and then investigate the emergent behaviour & dynamics. If the authors prefer their imitation learning approach (which should at least be motivated), it is also surprising that the network is trained to predict the reversal trial inferred using Bayesian smoothing instead of Bayesian filtering.

    5. Author response:

      We appreciate Reviewer 1’s observation that our findings (i.e., separable dynamic trajectories are systematically translated in response to whether outcomes are rewarded, and this translation is accumulated across trials) are consistent with a line attractor model. We agree with this assessment and, in the revised manuscript, will reframe our findings about the dynamic trajectories to address its consistency with a line attractor.

      However, we would like to emphasize that a line attractor model does not account for the dynamic nature of reversal probability activity observed in the neural data. Line attractor, regardless of whether it is curved or straight, implies that the activity is fixed when no reward information is presented. The focus of our work is to highlight this dynamic nature of reversal probability activity and its incompatibility with the line attractor model.

      This leads to the question of how we could reconcile the line attractor-like properties and the dynamic nature of reversal probability activity. In the revised manuscript, we will provide evidence for an augmented model that has an attractor state at the beginning of each trial, followed by dynamic activity during the trial. Such a model is an example of superposition of initial attractor states with fast within-trial dynamics, as pointed out by Reviewer 1.

      We also thank Reviewer 2 and Reviewer 3 for their comments on how the manuscript could be improved. In the revised manuscript, we will provide detailed explanations to clarify the choice of network model, data analysis methods and experiment and model setups.

      In addition, we would like to take this opportunity to point out potentially misleading statements in the reviews by Reviewer 2 and Reviewer 3. Reviewer 2 stated that “no action is required to be performed by neurons in the RNN, …, no intervening behavior is thus performed by neurons”. Reviewer 3 stated that “the RNN does not have to do any explicit computation during the non-feedback parts of the trial…”. These statements convey the message that the trained RNN does not perform any computation. In fact, the RNN is trained to make a choice during non-feedback period in response to feedback. This is the (and the only) computation RNN performs. “Intervening behavior” refers to the choice the RNN makes across trials until reversing its initially preferred choice. We think that this confusion might have happened because the meaning of the term “intervening behavior” was unclear. We will clarify this point in the revised manuscript.

      Again, thank you for the insightful comments. We will provide a more detailed response to the reviews and revise the manuscript accordingly.

    1. eLife Assessment

      This important work examines how microexons contribute to brain activity, structure, and behavior. The authors find that loss of microexon sequences generally has subtle impacts on these metrics in larval zebrafish, with few exceptions. The evidence is still partially incomplete and needs to be strengthened by key experiments or more precise data descriptions. Overall, this work will be of interest to neuroscientists and generate further studies of interest to the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

    4. Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It is established that srrm4 mutants have no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022).

      We chose not to generate and characterize the behavior and brain activity of srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by srrm4, making the characterization of srrm4 mutants disconnected from our overarching project goal.

      We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes.

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We acknowledge that unexpected changes to the mRNA could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we will clarify our generated mutant alleles by adding a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we will experimentally determine whether the mRNA was modified as expected for a subset of mutants with phenotypes.

      Third, we will further emphasize in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous and ongoing work. We currently show one mutant, tcf7l2, which we consider to have strong neural phenotypes, and we will expand this comparison in the revision. In our study of 132 genes linked to schizophrenia (Thyme et al., 2019), we established a signal cut-off for whether a mutant would be designated as having a neural phenotype, and we classify this set of microexon mutants in this context. Far stronger phenotypes are expected of loss-of-function alleles for microexon-containing genes, as we showed in Figure S1 of this manuscript in addition to our published work.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      We had already included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 of this manuscript. We will make the conclusions and data in this figure more obvious in the main text.

      Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) describes mild behavioral phenotypes for a microexon removal for kif1b, and we already show developmental abnormalities for the kif1b loss-of-function allele (Figure S1).

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We will modify our manuscript to include a discussion of these mutants.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      See response to point 2. We will experimentally determine whether the mRNA was modified as expected for a subset of mutants with phenotypes.

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by srrm4. We will clarify the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as srrm3 (Ciampi et al., 2022). It is well-known that loss of srrm3 has stronger impacts on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators, but instead was to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 3.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to point 2.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we will add a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We will experimentally determine whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point 2 response to reviewer 1). We also have already compared the microexon removal to a loss-of-function mutant for two lines (Figure S1), and we will make that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point 3 response to reviewer 1).

      In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we will add a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      We will experimentally determine whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point 2 response to reviewer 1).

    1. eLife Assessment

      This study demonstrates the potential role of 17α-estradiol in modulating neuronal gene expression in the aged hypothalamus of male rats, identifying key pathways and neuron subtypes affected by the drug. While the findings are useful and provide a foundation for future research, the strength of supporting evidence is incomplete due to the lack of female comparison, a young male control group, unclear link to 17α-estradiol lifespan extension in rats, and insufficient analysis of glial cells and cellular stress in CRH neurons.

    2. Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.<br /> Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      (2) It's not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well described (Fig.1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      A more detailed analysis of glial cell types within the hypothalamus in response to drug should be provided.

      (4) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      Comments on revisions:

      Some of the concerns were addressed in this revised version, and the authors responded and addressed study design limitations in both sexes/ages.

      However, there are still some concerns that were not sufficiently addressed:

      While the term "senescent" was changed to "stressed," some histological/ cellular validation of this phenotype is still needed.

      Some discussion on the sex-specific effects of 17α-estradiol in the hypothalamus is still required. Previous studies in mice demonstrated that 17α-estradiol reduced hypothalamic microgliosis and astrogliosis in male but not female UM-HET3 mice.

      Additionally, the provided analysis on astrocytes and microglia is superficial.

    3. Reviewer #2 (Public review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels to those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons on mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      • The single nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.<br /> • There is a variety of functions used that allowed the differential analysis of a very complex type of data. This led to a better comparison between the different groups in many levels.<br /> • There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Weaknesses:

      • One main control group is missing from the study, the young males treated with 17α-Estradiol.<br /> • Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.<br /> • Although the authors claim to have several findings, the data fail to support these claims.<br /> • The study is about improving ageing but no physiological data from the study demonstrated such claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.<br /> • Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression is related to metabolic, synaptic or other function.

      Comments on revisions:

      The authors revised part of the manuscript to address some of the reviewers' comments This improved the language and the text flow to a certain extent. They also added an additional analysis including glial cells. However, they failed to address the main weaknesses brought up by the reviewers and did not add any experimental demonstration of their claims on lifespan expansion induced by 17α-estradiol in rats. In addition, they insisted i keeping parts in the discussion that are not directly linked to any of the papers' findings.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noticed the inappropriate claim and we have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in abstract and in results.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype"  or "abnormal phenotype" in the abstract and results to avoid such claim.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, similar to findings in male mice (PMID: 33289482), and we have noted this in the Introduction. We apologize for not conducting further experiments to validate this point.

      Additionally, we have revised the description of the phenotype of senescent CRH neurons to “stressed phenotype” without carrying out further experiments to confirm the senescent phenotype. To provide more clarity on the performance of glial cells during treatment, we have included additional enrichment analysis data of differentially expressed genes among young (Y), old (O), and old treated (O.T) microglia and astrocytes in Figure 2—figure supplement 3. Notably, the behavior of microglia contrasts with that of total neurons concerning synapse-related cellular processes. We apologize for being unable to include female and young controls in this study.

      Reviewer #2 (Recommendations For The Authors)

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We have changed the description to “Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors within all the neurons”.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse".

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was" .

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise.

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      These are not conclusions but rather intriguing topics for discussion. Given the role of 17alpha-estradiol in promoting testosterone and reducing estradiol levels in males, we believe it is worthwhile to explore the potential application of 17alpha-estradiol in increasing testosterone levels in aged males, particularly those with hypogonadism.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. Therefore, we have chosen not to make changes to this section.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. eLife Assessment

      This valuable study suggests that Naa10, an N-α-acetyltransferase with known mutations that disrupt neurodevelopment, acetylates Btbd3, which has been implicated in neurite outgrowth and obsessive-compulsive disorder, in a manner that regulates F-actin dynamics to facilitate neurite outgrowth. While the study provides promising insights and biochemical, co-immunoprecipitation, and proteomic data that enhance our understanding of protein N-acetylation in neuronal development, the evidence supporting larger claims is incomplete. Nonetheless, the implications of these findings are noteworthy, particularly regarding neurodevelopmental and psychiatric conditions tied to altered expression of Naa10 or Btbd3.

    2. Reviewer #1 (Public review):

      The manuscript examines the role of Naa10 in cKO animals, in immortalized neurons, and in primary neurons. Given that Naa10 mutations in humans produce defects in nervous system function, the authors used various strategies to try to find a relevant neuronal phenotype and its potential molecular mechanism.

      This work contains valuable findings that suggest that the depletion of Naa10 from CA1 neurons in mice exacerbates anxiety-like behaviors. Using neuronal-derived cell lines authors establish a link between N-acetylase activity, Btbd3 binding to CapZb, and F-actin, ultimately impinging on neurite extension. The evidence demonstrating this is in most cases incomplete, since some key controls are missing and clearly described or simply because claims are not supported by the data. The manuscript also contains biochemical, co-immunoprecipitation, and proteomic data that will certainly be of value to our knowledge of the effects of protein N--acetylation in neuronal development and function.

    3. Reviewer #2 (Public review):

      In this study, the authors sought to elucidate the neural mechanisms underlying the role of Naa10 in neurodevelopmental disruptions with a focus on its role in the hippocampus. The authors use an impressive array of techniques to identify a chain of events that occurs in the signaling pathway starting from Naa10 acetylating Btbd3 to regulation of F-actin dynamics that are fundamental to neurite outgrowth. They provide convincing evidence that Naa10 acetylates Btbd3, that Btbd3 facilitates CapZb binding to F-actin in a Naa10 acetylation-dependent manner, and that this CapZb binding to F-actin is key to neurite outgrowth. Besides establishing this signaling pathway, the authors contribute novel lists of Naa10 and Btbd3 interacting partners, which will be useful for future investigations into other mechanisms of action of Naa10 or Btbd3 through alternative cell signaling pathways. The evidence presented for an anxiety-like behavioral phenotype as a result of Naa10 dysfunction is mixed and tenuous, and assays for the primary behaviors known to be altered by Naa10 mutations in humans were not tested. As such, behavioral findings and their translational implications should be interpreted with caution. Finally, while not central to the main cell signaling pathway delineated, the characterization of brain region-specific and cell maturity of Naa10 expression patterns was presented in few to single animals and not quantified, and as such should also be interpreted with caution. On a broader level, these findings have implications for neurodevelopment and potentially, although not tested here, synaptic plasticity in adulthood, which means this novel pathway may be fundamental for brain health.

      Summarized list of minor concerns

      (1) The early claims of the manuscript are supported by very small sample sizes (often 1-3) and/or lack of quantification, particularly in Figures S1 and 1.

      (2) Evidence is insufficient for CA1-specific knockdown of Naa10.

      (3) The relationship between the behaviors measured, which centered around mood, and Ogden syndrome, was not clear, and likely other behavioral measures would be more translationally relevant for this study. Furthermore, the evidence for an anxiety-like phenotype was mixed.

      (4) Btbd3 is characterized by the authors as an OCD risk gene, but its status as such is not well supported by the most recent, better-powered genome-wide association studies than the one that originally implicated Btbd3. However, there is evidence that Btbd3 expression, including selectively in the hippocampus, is implicated in OCD-relevant behaviors in mice.

      (5) The reporting of the statistics lacks sufficient detail for the reader to deduce how experimental replicates were defined.

    4. Author response:

      eLife Assessment<br /> This valuable study suggests that Naa10, an N-α-acetyltransferase with known mutations that disrupt neurodevelopment, acetylates Btbd3, which has been implicated in neurite outgrowth and obsessive-compulsive disorder, in a manner that regulates F-actin dynamics to facilitate neurite outgrowth. While the study provides promising insights and biochemical, co-immunoprecipitation, and proteomic data that enhance our understanding of protein N-acetylation in neuronal development, the evidence supporting larger claims is incomplete. Nonetheless, the implications of these findings are noteworthy, particularly regarding neurodevelopmental and psychiatric conditions tied to altered expression of Naa10 or Btbd3.

      Thank you very much for recognizing our study, carefully reviewing our work, and providing insightful comments and constructive criticism!

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript examines the role of Naa10 in cKO animals, in immortalized neurons, and in primary neurons. Given that Naa10 mutations in humans produce defects in nervous system function, the authors used various strategies to try to find a relevant neuronal phenotype and its potential molecular mechanism.

      This work contains valuable findings that suggest that the depletion of Naa10 from CA1 neurons in mice exacerbates anxiety-like behaviors. Using neuronal-derived cell lines authors establish a link between N-acetylase activity, Btbd3 binding to CapZb, and F-actin, ultimately impinging on neurite extension. The evidence demonstrating this is in most cases incomplete, since some key controls are missing and clearly described or simply because claims are not supported by the data. The manuscript also contains biochemical, co-immunoprecipitation, and proteomic data that will certainly be of value to our knowledge of the effects of protein N--acetylation in neuronal development and function.

      Thanks! It would be appreciated if the Reviewer could point out in the public review which experiment lacks a control group.

      Reviewer #2 (Public review):

      In this study, the authors sought to elucidate the neural mechanisms underlying the role of Naa10 in neurodevelopmental disruptions with a focus on its role in the hippocampus. The authors use an impressive array of techniques to identify a chain of events that occurs in the signaling pathway starting from Naa10 acetylating Btbd3 to regulation of F-actin dynamics that are fundamental to neurite outgrowth. They provide convincing evidence that Naa10 acetylates Btbd3, that Btbd3 facilitates CapZb binding to F-actin in a Naa10 acetylation-dependent manner, and that this CapZb binding to F-actin is key to neurite outgrowth. Besides establishing this signaling pathway, the authors contribute novel lists of Naa10 and Btbd3 interacting partners, which will be useful for future investigations into other mechanisms of action of Naa10 or Btbd3 through alternative cell signaling pathways.

      Thank you very much for recognizing our study!

      The evidence presented for an anxiety-like behavioral phenotype as a result of Naa10 dysfunction is mixed and tenuous, and assays for the primary behaviors known to be altered by Naa10 mutations in humans were not tested. As such, behavioral findings and their translational implications should be interpreted with caution.

      (1) For the anxiety-like behavioral phenotype, we provided a paragraph titled “Naa10 and stress-induced anxiety” in the Discussion section of the text: “Our investigations revealed that hippocampal CA1-KO of Naa10 did not exhibit significant differences in the open field test (Figure S1K) but led to anxiety-like behavior in mice in the elevated plus maze (EPM) test (Figure 1A). This disparity might be attributed to the specific design of the EPM test, which is tailored to elicit a conflict between an animal's inclination to explore and its fear of open spaces and elevated areas. This distinction implies that Naa10 might play a role in stress responses within the emotional regulation circuitry, particularly in navigating potentially threatening and anxiety-provoking environments.” The open field test offers a less challenging, open environment that primarily promotes exploratory behavior. We agree that additional assays, such as the light-dark box test, would be helpful in clarifying the issue.

      (2) We agree that the behavioral findings and their translational implications should be interpreted with caution. The primary neurological behaviors known to be altered by Naa10 mutations in humans include intellectual disability and autism-like syndrome with defective emotional control. These behaviors are influenced by many factors, including defects in the hippocampal CA1. Thus, we tested hippocampal CA1 Naa10-KO mice using the Y-maze, tail suspension test, open field test, and elevated plus maze (EPM). However, only the EPM results were affected, while the other tests showed no significant changes. It should be noted that our study employed a postnatal, CA1-specific Naa10 conditional knockout (cKO) model driven by Camk2a-Cre, which selectively depletes Naa10 from hippocampal CA1 neurons after birth. In contrast, Naa10 mutations in human patients involve global effects and impact multiple brain regions from the embryonic stage, leading to a broader spectrum of phenotypes. The limited disruption in our model likely explains the absence of learning and memory deficits and the incomplete recapitulation of the full range of patient phenotypes. Furthermore, Naa10 knockout may not produce the same effects as Naa10 mutations. Our current study is primarily intended to explore the physiological function of Naa10 in hippocampal function.

      (3) We will replace all instances of “anxiety behavior” with “anxiety-like behavior.”

      Finally, while not central to the main cell signaling pathway delineated, the characterization of brain region-specific and cell maturity of Naa10 expression patterns was presented in few to single animals and not quantified, and as such should also be interpreted with caution.

      We agree that we should provide additional Naa10 immunostaining data from more than three WT and hippocampal CA1 Naa10-KO mouse brains, as well as quantify data such as the silver staining and Light Sheet Fluorescence Microscopy results presented in Figures 1C and 1D, respectively. Nevertheless, the current report presents consistent results across different mice used for various assays. For example, Figures 1B-D, with three different assays, each demonstrate that Naa10-cKO reduces neurite complexity in vivo.

      On a broader level, these findings have implications for neurodevelopment and potentially, although not tested here, synaptic plasticity in adulthood, which means this novel pathway may be fundamental for brain health.

      Thank you very much again for recognizing our study!

      Summarized list of minor concerns

      (1) The early claims of the manuscript are supported by very small sample sizes (often 1-3) and/or lack of quantification, particularly in Figures S1 and 1.

      We agree that we should provide additional Naa10 immunostaining data from more than three WT and hippocampal CA1 Naa10-KO mouse brains, as well as quantify data such as the silver staining and Light Sheet Fluorescence Microscopy results presented in Figures 1C and 1D, respectively. Nevertheless, the current report presents consistent results across different mice used for various assays. For example, Figures 1B-D, with three different assays, each demonstrate that Naa10-cKO reduces neurite complexity in vivo.

      (2) Evidence is insufficient for CA1-specific knockdown of Naa10.

      The Camk2a-Cre mice used in this study were derived from Dr. Susumu Tonegawa’s laboratory. According to the referenced paper, this strain restricts Cre/loxP recombination to the forebrain, with particularly high efficiency in the hippocampal CA1. Consistently, our data show that Naa10 was almost completely absent in the CA1 but partially depleted in the DG of the Naa10-cKO mice (Figure S1F in the text). Similar results were observed in a different pair of

      (3) The relationship between the behaviors measured, which centered around mood, and Ogden syndrome, was not clear, and likely other behavioral measures would be more translationally relevant for this study. Furthermore, the evidence for an anxiety-like phenotype was mixed.

      (1) For the anxiety-like behavioral phenotype, we provided a paragraph titled “Naa10 and stress-induced anxiety” in the Discussion section of the text: “Our investigations revealed that hippocampal CA1-KO of Naa10 did not exhibit significant differences in the open field test (Figure S1K) but led to anxiety-like behavior in mice in the elevated plus maze (EPM) test (Figure 1A). This disparity might be attributed to the specific design of the EPM test, which is tailored to elicit a conflict between an animal's inclination to explore and its fear of open spaces and elevated areas. This distinction implies that Naa10 might play a role in stress responses within the emotional regulation circuitry, particularly in navigating potentially threatening and anxiety-provoking environments.” The open field test offers a less challenging, open environment that primarily promotes exploratory behavior. We agree that additional assays, such as the light-dark box test, would be helpful in clarifying the issue.

      (2) We agree that the behavioral findings and their translational implications should be interpreted with caution. The primary neurological behaviors known to be altered by Naa10 mutations in humans include intellectual disability and autism-like syndrome with defective emotional control. These behaviors are influenced by many factors, including defects in the hippocampal CA1. Thus, we tested hippocampal CA1 Naa10-KO mice using the Y-maze, tail suspension test, open field test, and elevated plus maze (EPM). However, only the EPM results were affected, while the other tests showed no significant changes. It should be noted that our study employed a postnatal, CA1-specific Naa10 conditional knockout (cKO) model driven by Camk2a-Cre, which selectively depletes Naa10 from hippocampal CA1 neurons after birth. In contrast, Naa10 mutations in human patients involve global effects and impact multiple brain regions from the embryonic stage, leading to a broader spectrum of phenotypes. The limited disruption in our model likely explains the absence of learning and memory deficits and the incomplete recapitulation of the full range of patient phenotypes. Furthermore, Naa10 knockout may not produce the same effects as Naa10 mutations. Our current study is primarily intended to explore the physiological function of Naa10 in hippocampal function.

      (3) We will replace all instances of “anxiety behavior” with “anxiety-like behavior.”

      (4) Btbd3 is characterized by the authors as an OCD risk gene, but its status as such is not well supported by the most recent, better-powered genome-wide association studies than the one that originally implicated Btbd3. However, there is evidence that Btbd3 expression, including selectively in the hippocampus, is implicated in OCD-relevant behaviors in mice.

      Thanks for clarifying the issue!

      (5) The reporting of the statistics lacks sufficient detail for the reader to deduce how experimental replicates were defined.

      We believe we have provided sufficient detail for readers to deduce how experimental replicates were defined in each corresponding figure legend. It would be appreciated if the Reviewer could point out which specific figures lack sufficient details.

    1. eLife Assessment

      This study uncovers and characterizes a role for Pfdn5 in stabilizing axonal microtubules and synaptic morphology in the Drosophila peripheral nervous system. Although the mechanisms remain unresolved, the phenotypic characterization is an important contribution with solid evidence. The work also aims to address a potential interaction between Pfdn5/6 and Tau-mediated mechanisms of neurodegeneration; here, the evidence is partially incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5, and convincingly demonstrated that this protein functions in the regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule-associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377M-induced eye degeneration, they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, the overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, and even memory defects upon TauV377M expression.

      Strengths:

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5.

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010) pmid: 20071510, demonstrated that pan-neuronal expression of TauV377M disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner.

      If the mushroom bodies are defective upon TauV377M expression does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      The discovery that Pfdn5 (and 6 most likely) affect tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations, or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above are aiming at, will likely be helpful towards therapeutic interventions.

      Weaknesses:

      What is unclear however is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes.

      Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      Does Pfdn5 loss exacerbate TauV377M phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression?<br /> Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion.

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wild-type Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner.

      So, is TauV377M hyperphosphorylated in the larvae?? What happens to TauV377M phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tau-expressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of TauV377M (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which if true begs for a distinct mechanistic approach.

      Finally, the authors argue that TauV377M forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers and overexpression promotes the more benign large aggregates??

    3. Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and tau-mediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine age-dependent, neurodegeneration.

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

    4. Author response:

      Reviewer #1:

      Summary:<br /> In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      In a previous paper, we identified 95 Drosophila chaperones (Raut et al., 2017). We request that “all 64 known Drosophila chaperones” be replaced with “64 out of 95 known Drosophila chaperones” to make it factually correct.

      Strengths:

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010) pmid: 20071510, demonstrated that pan-neuronal expression of TauV377M disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner.

      If the mushroom bodies are defective upon TauV377M expression does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We agree that the disruptive organization of the mushroom body may cause memory deficits upon hTauV337M expression and that expression of Pfdn5 or Pfdn6 could reverse the deficits. One possible mechanism by which overexpression of Pfdn5/6 could rescue the Tau-induced memory deficits may be due to the stabilization of microtubules in the mushroom bodies.

      Proposed revision: We will assess if Tau-induced mushroom body disruption can be rescued with the overexpression of Pfdn5 or Pfdn6.

      Weakness:

      What is unclear however is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      Our data suggests that Pfdn5 stabilizes neuronal microtubules by directly associating with it, and loss of Pfdn5 exacerbates Tau-phenotypes by destabilizing microtubules. However, as the reviewer notes, analysis of direct interaction between Pfdn5 and hTau<sup>V337M</sup> might provide further insights into the mechanism of Pfdn5 and Tau-aggregation.

      Proposed revision: We will perform colocalization analysis and coimmunoprecipitation to ask if Pfdn5 colocalizes and directly interacts with Tau.

      Does Pfdn5 loss exacerbate TauV377M phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion.

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wild-type Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is TauV377M hyperphosphorylated in the larvae?? What happens to TauV377M phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      Proposed revisions: We will overexpress Pfdn5 or Pfdn6 with hTau<sup>V337M</sup> and ask if microtubule disruption caused by hTau<sup>V337M</sup> is rescued. Further, we will analyze the phospho-Tau levels in controls and Pfdn5 mutant background.

      Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tau-expressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      Proposed revision: We will incorporate data about the effect of heterozygous mutation of Pfdn5 on the lethality and synaptic phenotypes associated with the hTau<sup>WT</sup> and hTau<sup>V337M</sup> in the revised manuscript.

      Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of TauV377M (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which if true begs for a distinct mechanistic approach.

      We thank the reviewer for suggesting an alternate model for the Pfdn5 function. We will perform the Western blot analysis to assess Tau<sup>WT</sup> and Tau<sup>V337M</sup> levels in the absence of Pfdn5 or animals coexpressing Tau and Pfdn5. We will incorporate these data and conclusions in the revised manuscript.

      Finally, the authors argue that TauV377M forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers and overexpression promotes the more benign large aggregates??

      We will perform the Tau solubility assay in control, in the absence of Pfdn5 or animals coexpressing Tau and Pfdn5. Moreover, we will also ask if the large Tau puncta formed in the absence of Pfdn5 are soluble oligomers or stable aggregates. We have found that the coexpression of Tau and Pfdn5 does not result in the formation of  Tau aggregates. We will incorporate these and other relevant data in the revised manuscript.

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and tau-mediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine age-dependent, neurodegeneration.

      We have consistently found the progression of eye degeneration in the population of animals expressing Tau<sup>V337M</sup>, measured as the number of fused ommatidia/total number of ommatidia, with age. A few other studies have also shown age-dependent progressive degeneration in Drosophila retinal axons or lamina (Iijima-Ando et al., 2012; Sakakibara et al., 2018). We appreciate other studies that have proposed hTau-induced eye degeneration as a developmental defect (Malmanche et al., 2017; Sakakibara et al., 2023).

      Proposed revision: a) We will analyze the age-dependent neurodegeneration in the adult brain to further support our main conclusion that Pfdn5 ameliorates hTauV337M-induced progressive neurodegeneration.

      b) We have used three independent Pfdn5 RNAi lines (the RNAi's target different regions of Pfdn5) – all of which enhance the Tau phenotypes. The knockdown of any of these RNAi lines with GMR-Gal4 does not give detectable eye phenotypes. We will include these data in the revised manuscript.

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      Proteome analysis of Alzheimer's brain tissue shows that the Pfdn5 level is reduced in patients (Askenazi et al., 2023; Tao et al., 2020). Moreover, the Pfdn5 expression level was found to be reduced in the blood samples from AD patients (Ji et al., 2022). Another study further validates the age-dependent reduction of Pfdn5 in the tauopathy transgenic murine model (Kadoyama et al., 2019). Together, these reports highlight a potential link between Pfdn5 levels and tauopathies. We will revise the manuscript to reflect these findings in more detail.

      References

      Askenazi, M., Kavanagh, T., Pires, G., Ueberheide, B., Wisniewski, T., and Drummond, E. (2023). Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14, 4466. 10.1038/s41467-023-40208-x.

      Iijima-Ando, K., Sekiya, M., Maruko-Otake, A., Ohtake, Y., Suzuki, E., Lu, B., and Iijima, K.M. (2012). Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8, e1002918. 10.1371/journal.pgen.1002918.

      Ji, W., An, K., Wang, C., and Wang, S. (2022). Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159, 38. 10.1186/s41065-022-00252-x.

      Kadoyama, K., Matsuura, K., Takano, M., Maekura, K., Inoue, Y., and Matsuyama, S. (2019). Changes in the expression of prefoldin subunit 5 depending on synaptic plasticity in the mouse hippocampus. Neurosci Lett 712, 134484. 10.1016/j.neulet.2019.134484.

      Malmanche, N., Dourlen, P., Gistelinck, M., Demiautte, F., Link, N., Dupont, C., Vanden Broeck, L., Werkmeister, E., Amouyel, P., Bongiovanni, A., et al. (2017). Developmental Expression of 4-Repeat-Tau Induces Neuronal Aneuploidy in Drosophila Tauopathy Models. Sci Rep 7, 40764. 10.1038/srep40764.

      Raut, S., Mallik, B., Parichha, A., Amrutha, V., Sahi, C., and Kumar, V. (2017). RNAi-Mediated Reverse Genetic Screen Identified Drosophila Chaperones Regulating Eye and Neuromuscular Junction Morphology. G3 (Bethesda) 7, 2023-2038. 10.1534/g3.117.041632.

      Sakakibara, Y., Sekiya, M., Fujisaki, N., Quan, X., and Iijima, K.M. (2018). Knockdown of wfs1, a fly homolog of Wolfram syndrome 1, in the nervous system increases susceptibility to age- and stress-induced neuronal dysfunction and degeneration in Drosophila. PLoS Genet 14, e1007196. 10.1371/journal.pgen.1007196.

      Sakakibara, Y., Yamashiro, R., Chikamatsu, S., Hirota, Y., Tsubokawa, Y., Nishijima, R., Takei, K., Sekiya, M., and Iijima, K.M. (2023). Drosophila Toll-9 is induced by aging and neurodegeneration to modulate stress signaling and its deficiency exacerbates tau-mediated neurodegeneration. iScience 26, 105968. 10.1016/j.isci.2023.105968.

      Tao, Y., Han, Y., Yu, L., Wang, Q., Leng, S.X., and Zhang, H. (2020). The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11, 233. 10.3389/fneur.2020.00233.

    1. eLife Assessment

      This valuable study explores the role of spatial genome organization in oncogenic transformation, addressing an ambitious and significant topic. The authors have assembled comprehensive datasets from various subtypes of localized and lung-metastatic breast cancer cells, as well as from healthy and cancerous lung cells. They identified switching patterns in the 3D genome organization of lung-metastatic breast cancer cells, revealing a reconfiguration of genome architecture that resembles that of lung cells. If validated, this study could be critical for the field; however, at this stage, it is incomplete, as the main claims are only partially substantiated.

    2. Reviewer #1 (Public review):

      Summary:

      This study utilized publicly available Hi-C data to ensemble a comprehensive set of breast cancer cell lines (luminal, Her2+, TNBC) with varying metastatic features to answer whether breast cancer cells would acquire organ-specific features at the 3D genome level to metastasize to that specific organ. The authors focused on lung metastasis and included several controls as the comparison including normal mammary lines, normal lung epithelial lines, and lung cancer cell lines. Due to the lower resolution at 250KB binning size, the authors only addressed the compartments (A for active compartment and B for inactive compartment) not the other 3D organization of the genome. They started by performing clustering and PCA analysis for the compartment identity and discovered that this panel of cell lines could be well separated based on Her2 and epithelial-mesenchymal features according to the compartment identity. While correlating with the transcriptomic changes, the authors noticed the existence of concordance and divergence between the compartment changes and transcriptomic changes. The authors then switched gears to tackle the core question of metastatic organotropism to the lung. They discovered a set of "lung permissive compartment changes" and concluded that "lung metastatic breast cancer cell lines acquire lung-like genome architecture" and "organotropic 3D genome changes match target organ more than an unrelated organ". To prove the latter point, the authors enlisted an additional non-breast cancer cell line (prostate cancer) in the setting of brain metastasis. This is a piece of pure dry computational work without wet bench experiments.

      Strengths:

      The authors embarked on an ambitious journey to seek the answer regarding 3D genome changes predisposing to metastatic organotropism. The authors succeeded in the assembly of a comprehensive panel of breast cancer cell lines and the aggregation of the 3D genome structure data to conduct a hypothesis-driven computation analysis. The authors also achieved in including proper controls representing normal non-cancerous epithelium and the end organ of interest. The authors did well in the citation of relevant references in 3D genome organization and EMT.

      Weaknesses:

      (1) The authors should clearly indicate how they determine the patterns of spread of the breast cancer cell lines being utilized in this manuscript. How did the authors arrive at the conclusion that certain cell lines would be determined as "localized spread" and "metastatic tropism to the lung"? This definition is crucial, and I will explain why.

      Todd Golub's team from the Broad Institute of MIT and Harvard published "A metastasis map of human cancer cell lines" to exhaustively create a first-generation metastasis map (MetMap) that reveals organ-specific patterns of metastasis. (By the way, this work was not cited in the reference in this manuscript.) The MetMap Explorer (https://depmap.org/metmap/vis-app/index.html) is a public resource that could be openly accessed to visualize the metastatic potential of each cell line as determined by the in vivo barcoding approach as described in the MetMap paper in the format of petal plots. 5 organs were tested in the MetMap paper, including brain, lung, liver, kidney, and bone. The authors would discover that some of the organ-specific metastasis patterns defined in the MetMap Explorer would be different from the authors' classification. For example, the authors defined MCF7 as a line as lung metastatic, and rightly so the MetMap charted a signal towards lung with low penetrance and low metastatic potential. The authors defined ZR751 as a line with localized spread, however, the MetMap charted a signal towards the kidney with low penetrance and low metastatic potential, the signal strength similar to the lung metastasis in MCF7. A similar argument could be made for T47D. The TNBC line MDA-MB-231 is indeed highly metastatic, however, in MetMap data, its metastasis is not only specific to the lung but towards all 5 organs with high penetrance and metastatic potential. The 2 lung cancer cell lines mentioned in this study, A549 and H460, the authors defined them as localized spread to the lung. However, the MetMap data clearly indicated that A549 and H460 are highly metastatic to all 5 organs with high penetrance and high metastatic potential.

      Since results will vary among different experimental models testing metastatic organotropism, (intra-cardiac injection was the metastasis model being adopted in the MetMap), the authors should state more clearly which experimental model system served as the basis for their definition of organ-specific metastasis. In my opinion, this is the most crucial first step for this entire study to be sound and solid.

      (2) Figure 1b: The authors found that "MDA-MB-231 cells were grouped with the lung carcinoma cells. This implies that the genome organization of this cell line is closer to that of lung cells than to other breast epithelial cell lines.". In fact, another TNBC line BT549 was also clustered under the same clade. So this clade consisted of normal-like and highly metastatic lines. Therefore, the authors should be mindful of the fact that the compartment features might not directly link to metastasis (or even metastatic organotropism).

      (3) Figure 3: In the text, the authors stated, "To further investigate this result, we examined the transcription status of genes that changed compartment across the EMT spectrum and, conversely, the compartment status of genes that changed transcription (Fig. 3b, c, and d)". However, it was not apparent in the figure that the cell lines were arranged according to an EMT spectrum. Also, the clustering heatmaps did not provide sufficient information regarding the genes with concordant/divergent compartments vs transcription changes. It would be more informative if the authors could spend more effort in annotating these genes/pathways.

      (4) Figure 4: The title of the subheading of this section was 'Lung metastatic breast cancer cell lines acquire lung-like genome architecture". Echoing my comments in point 1, I am a bit hesitant to term it as "lung metastatic" but rather "metastatic' in general since cell lines such as MDA-MD-231 do metastasize to other organs as well. However, I do get the point that the definition of "lung metastasis" is derived from the common metastasis features among the cell lines here (MCF7, T47D, SKBR3, MDA-MB-231).

      There might be another argument about whether the "lung" carcinoma cell lines can be considered "localized" since they are also capable of metastasizing to other organs. In a way, what the authors probably were trying to leverage here is the "tissue" identity of that organ. Having said this, in addition to showing the "lung permissive changes", the authors should show the "breast identity conservation" as well. Because this section started to deal with the concept of "tissue/lineage identify", the authors should also clarify whether these breast cancer cell lines capable of making lung metastasis are also preserving their original tissue identity from the compartment features (which would most likely be the case).

      (5) Rest of the sections: The authors started to claim that the organ-specific metastasis permissive compartmental features mimic the destinated end organ. The authors utilized additional non-breast cancer cell lines (prostate cancer cell lines LNCaP as localized and DU145 as brain metastatic) in brain metastasis to strengthen this claim. (DU145 in MetMap again is highly metastatic to lung, brain, and kidney). However, this makes one wonder that for cell lines that are capable of metastasizing to multiple organ sites (eg. MDA-MB-231, DU145, A459, H460), does it mean that they all acquire the permissive features for all these organs? This scenario is clinically relevant in Stage 4 patients who often present with not only one metastatic lesion in one single organ but multiple metastatic lesions in more than one organ (eg. concomitant liver and lung metastasis). Do the authors think that there might be different clones having different tropism-permissive 3D genome features or there might be evolutionary trajectory in this?

      In my opinion, to further prove this point, the authors might need to consider doing in vivo experiments to collect paired primary and organ-specific metastatic samples to look at the 3D genome changes.

      (6) Technically, the study utilized public Hi-C data without generating new Hi-C data. The resolution of the Hi-C data for compartments was set at 250KB as the binning size indicating that the Hi-C data was at lower resolution so it might not be ideal to address other 3D genome architecture changes such as TADs or long-range loops. It is therefore unknown whether there might be permissive TAD/loop changes associated with organotropism and this is the limitation of this study.

      (7) In the final sentence of the discussion the authors stated "Overall, our results suggest that genome spatial compartment changes can help encode a cell state that favors metastasis (EMT)". The "metastasis (EMT)" was in fact not clearly linked inside the manuscript. The authors did not provide a strong link between metastasis and EMT in their result description. It is also unclear whether the EMT-associated compartment identity would also correlate with the organotropic compartment identity.